Jan Beulich [Tue, 29 Apr 2014 13:33:06 +0000 (15:33 +0200)]
x86/HVM: restrict HVMOP_set_mem_type
Permitting arbitrary type changes here has the potential of creating
present P2M (and hence EPT/NPT/IOMMU) entries pointing to an invalid
MFN (INVALID_MFN truncated to the respective hardware structure field's
width). This would become a problem the latest when something real sat
at the end of the physical address space; I'm suspecting though that
other things might break with such bogus entries.
Along with that drop a bogus (and otherwise becoming stale) log
message.
Afaict the similar operation in p2m_set_mem_access() is safe.
This is XSA-92.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 83bb5eb4d340acebf27b34108fb1dae062146a68
master date: 2014-04-29 15:11:31 +0200
Joby Poriyath [Tue, 4 Feb 2014 18:10:35 +0000 (18:10 +0000)]
xen/pygrub: grub2/grub.cfg from RHEL 7 has new commands in menuentry
menuentry in grub2/grub.cfg uses linux16 and initrd16 commands
instead of linux and initrd. Due to this RHEL 7 (beta) guest failed to
boot after the installation.
In addition to this, RHEL 7 menu entries have two different single-quote
delimited strings on the same line, and the greedy grouping for menuentry
parsing gets both strings, and the options inbetween.
Andrew Cooper [Wed, 22 Jan 2014 17:47:21 +0000 (17:47 +0000)]
libxc: Fix out-of-memory error handling in xc_cpupool_getinfo()
Avoid freeing info then returning it to the caller.
This is XSA-88.
Coverity-ID: 1056192 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit d883c179a74111a6804baf8cb8224235242a88fc)
Jan Beulich [Thu, 6 Feb 2014 16:41:28 +0000 (17:41 +0100)]
flask: restrict allocations done by hypercall interface
Other than in 4.2 and newer, we're not having an overflow issue here,
but uncontrolled exposure of the operations opens the host to be driven
out of memory by an arbitrary guest. Since all operations other than
FLASK_LOAD simply deal with ASCII strings, limiting the allocations
(and incoming buffer sizes) to a page worth of memory seems like the
best thing we can do.
Consequently, in order to not expose the larger allocation to arbitrary
guests, the permission check for FLASK_LOAD needs to be pulled ahead of
the allocation (and it's perhaps worth noting that - afaict - it was
pointlessly done with the sel_sem spin lock held).
Note that this breaks FLASK_AVC_CACHESTATS on systems with sufficiently
many CPUs (as requiring a buffer bigger than PAGE_SIZE there). No
attempt is made to address this here, as it would needlessly complicate
this fix with rather little gain.
This is XSA-84.
Reported-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
The index of boolean variables in FLASK_{GET,SET}BOOL was not always
checked against the bounds of the array.
Reported-by: John McDermott <john.mcdermott@nrl.navy.mil> Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Liu Jinsong [Mon, 9 Dec 2013 13:55:40 +0000 (14:55 +0100)]
VMX: fix cr0.cd handling
This patch solves XSA-60 security hole:
1. For guest w/o VT-d, and for guest with VT-d but snooped, Xen need
do nothing, since hardware snoop mechanism has ensured cache coherency.
2. For guest with VT-d but non-snooped, cache coherency can not be
guaranteed by h/w snoop, therefore it need emulate UC type to guest:
2.1). if it works w/ Intel EPT, set guest IA32_PAT fields as UC so that
guest memory type are all UC.
2.2). if it works w/ shadow, drop all shadows so that any new ones would
be created on demand w/ UC.
This patch also fix a bug of shadow cr0.cd setting. Current shadow has a
small window between cache flush and TLB invalidation, resulting in possilbe
cache pollution. This patch pause vcpus so that no vcpus context involved
into the window.
This is CVE-2013-2212 / XSA-60.
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jun Nakajima <jun.nakajima@intel.com> Acked-by: Keir Fraser <keir@xen.org>
master commit: 62652c00efa55fb45374bcc92f7d96fc411aebb2
master date: 2013-11-06 10:12:36 +0100
Liu Jinsong [Mon, 9 Dec 2013 13:51:46 +0000 (14:51 +0100)]
VMX: remove the problematic set_uc_mode logic
XSA-60 security hole comes from the problematic vmx_set_uc_mode.
This patch remove vmx_set_uc_mode logic, which will be replaced by
PAT approach at later patch.
This is CVE-2013-2212 / XSA-60.
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
master commit: 1c84d046735102e02d2df454ab07f14ac51f235d
master date: 2013-11-06 10:12:00 +0100
Liu Jinsong [Mon, 9 Dec 2013 13:50:55 +0000 (14:50 +0100)]
VMX: disable EPT when !cpu_has_vmx_pat
Recently Oracle developers found a Xen security issue as DOS affecting,
named as XSA-60. Please refer http://xenbits.xen.org/xsa/advisory-60.html
Basically it involves how to handle guest cr0.cd setting, which under
some environment it consumes much time resulting in DOS-like behavior.
This is a preparing patch for fixing XSA-60. Later patch will fix XSA-60
via PAT under Intel EPT case, which depends on cpu_has_vmx_pat.
This is CVE-2013-2212 / XSA-60.
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Jun Nakajima <jun.nakajima@intel.com>
master commit: c13b0d65ddedd74508edef5cd66defffe30468fc
master date: 2013-11-06 10:11:18 +0100
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 19f027cc5daff4a37fd0a28bca2514c721852dd0
master date: 2013-11-27 09:00:41 +0100
Matthew Daley [Thu, 10 Oct 2013 13:25:58 +0000 (15:25 +0200)]
x86: check segment descriptor read result in 64-bit OUTS emulation
When emulating such an operation from a 64-bit context (CS has long
mode set), and the data segment is overridden to FS/GS, the result of
reading the overridden segment's descriptor (read_descriptor) is not
checked. If it fails, data_base is left uninitialized.
This can lead to 8 bytes of Xen's stack being leaked to the guest
(implicitly, i.e. via the address given in a #PF).
Ignoring them generally implies using uninitialized data and, in all
but two of the cases dealt with here, potentially leaking hypervisor
stack contents to guests.
This is CVE-2013-4355 / XSA-63.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6bb838e7375f5b031e9ac346b353775c90de45dc
master date: 2013-09-30 14:17:46 +0200
Jan Beulich [Wed, 25 Sep 2013 10:11:52 +0000 (12:11 +0200)]
x86/xsave: initialize unused register state when restoring for guest
In order to avoid leaking register contents from the prior use of the
registers restored through xrstor due to a guest enabling certain xcr0
bits late (particularly after the context restor in question), force
restoring of all known registers (the ones that never got saved would
be forced to their init state).
This is CVE-2013-1442 / XSA-62.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 63a75ba0de817d6f384f96d25427a05c313e2179
master date: 2013-09-25 10:41:25 +0200
Ian Jackson [Mon, 9 Sep 2013 14:30:42 +0000 (15:30 +0100)]
oxenstored: Fix process.ml build after 070ab4c50593
This change: 070ab4c505934951f86f42dd8403cf62bc5822f0
"oxenstored: Protect oxenstored from malicious domains"
broke the build because it had an unresolved semantic (but not
textual) conflict with c69fddbd5dfa3004aaf2d0f2dde00c9ec3dd6d5d
"tools/ocaml: Remove log library from tools/ocaml/libs"
(which is in 4.2 but not 4.1)
Fix this by using the 4.1.x idiom in the new error handling introduced
in 070ab4c50593.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: David Scott <dave.scott@eu.citrix.com>
Signed-off-by: Ian Campbell <ijc@hellion.org.uk> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 258d27a1d9fb33a490bef1381f52d522225c3dca)
Conflicts:
tools/pygrub/src/pygrub Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Tue, 3 Sep 2013 10:55:48 +0000 (11:55 +0100)]
oxenstored: Protect oxenstored from malicious domains.
add check logic when read from IO ring, and if error happens,
then mark the reading connection as "bad", Unless vm reboot,
oxenstored will not handle message from this connection any more.
xs_ring_stubs.c: add a more strict check on ring reading
connection.ml, domain.ml: add getter and setter for bad flag
process.ml: if exception raised when reading from domain's ring,
mark this domain as "bad"
xenstored.ml: if a domain is marked as "bad", do not handle it.
Juergen Gross [Thu, 22 Aug 2013 09:32:00 +0000 (11:32 +0200)]
Correct X2-APIC HVM emulation
commit 6859874b61d5ddaf5289e72ed2b2157739b72ca5 ("x86/HVM: fix x2APIC
APIC_ID read emulation") introduced an error for the hvm emulation of
x2apic. Any try to write to APIC_ICR MSR will result in a GP fault.
Tim Deegan [Tue, 20 Aug 2013 14:13:20 +0000 (16:13 +0200)]
xen: Add stdbool.h workaround for BSD.
On *BSD, stdbool.h lives in /usr/include, but we don't want to have
that on the search path in case we pick up any headers from the build
host's C libraries.
Copy the equivalent hack already in place for stdarg.h: on all
supported compilers the contents of stdbool.h are trivial, so just
supply the things we need in a xen/stdbool.h header.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Tested-by: Patrick Welche <prlw1@cam.ac.uk>
master commit: 7b9685ca4ed2fd723600ce66eb20a6d0c115b6cb
master date: 2013-08-15 22:00:45 +0100
Jan Beulich [Tue, 20 Aug 2013 14:10:44 +0000 (16:10 +0200)]
VT-d: protect against bogus information coming from BIOS
Add checks similar to those done by Linux: The DRHD address must not
be all zeros or all ones (Linux only checks for zero), and capabilities
as well as extended capabilities must not be all ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ben Guthro <benjamin.guthro@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Ben Guthro <benjamin.guthro@citrix.com>
Acked by: Yang Zhang <yang.z.zhang@intel.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: e8e8b030ecf916fea19639f0b6a446c1c9dbe174
master date: 2013-08-14 11:18:24 +0200
Patrick Welche [Tue, 20 Aug 2013 14:09:48 +0000 (16:09 +0200)]
libelf: Fix typo in header guard macro
s/__LIBELF_PRIVATE_H_/__LIBELF_PRIVATE_H__/
Signed-off-by: Patrick Welche <prlw1@cam.ac.uk> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 0aec8823501f8ee058c1ba673d2ac3e0f3f2e8db
master date: 2013-08-08 12:47:38 +0100
Tim Deegan [Fri, 16 Aug 2013 10:11:21 +0000 (12:11 +0200)]
x86: explicit suffix in inline assembler (for clang).
This fixes the clang build, and has no effect on gcc's output.
Signed-off-by: Tim Deegan <tim@xen.org> Committed-by: Jan Beulich <jbeulich@suse.com>
master commit: 59a28b5f045331641cbf0c1fc8d5d67afe328939
master date: 2013-02-14 14:20:06 +0100
Note that this isn't just a build fix - if the "delta" input ends up
in memory, gas would default to 32-bit operand size (and should really
warn about the ambiguity).
Andrew Cooper [Thu, 8 Aug 2013 08:41:20 +0000 (10:41 +0200)]
x86/time: Update wallclock in shared info when altering domain time offset
domain_set_time_offset() udpates d->time_offset_seconds, but does not correct
the wallclock in the shared info, meaning that it is incorrect until the next
XENPF_settime hypercall from dom0 which resynchronises the wallclock for all
domains.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
master commit: 915a59f25c5eddd86bc2cae6389d0ed2ab87e69e
master date: 2013-07-18 09:16:15 +0200
Andrew Cooper [Thu, 8 Aug 2013 08:40:41 +0000 (10:40 +0200)]
x86/cpuidle: Change logging for unknown APIC IDs
Dom0 uses this hypercall to pass ACPI information to Xen. It is not very
uncommon for more cpus to be listed in the ACPI tables than are present on the
system, particularly on systems with a common BIOS for a 2 and 4 socket server
varients.
As Dom0 does not control the number of entries in the ACPI tables, and is
required to pass everything it finds to Xen, change the logging.
There is now an single unconditional warning for the first unknown ID, and
further warnings if "cpuinfo" is requested by the user on the command line.
Andrew Cooper [Thu, 8 Aug 2013 08:39:40 +0000 (10:39 +0200)]
x86/mm: Ensure useful progress in alloc_l2_table()
While debugging the issue which turned out to be XSA-58, a printk in this loop
showed that it was quite easy to never make useful progress, because of
consistently failing the preemption check.
One single l2 entry is a reasonable amount of work to do, even if an action is
pending, and also assures forwards progress across repeat continuations.
Tweak the continuation criteria to fail on the first iteration of the loop.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
master commit: d3a55d7d9bb518efe08143d050deff9f4ee80ec1
master date: 2013-07-04 10:33:18 +0200
Daniel Kiper [Tue, 7 May 2013 11:51:38 +0000 (13:51 +0200)]
tools/debugger/kdd: Remove dependencies files during make clean
Remove dependencies files during make clean.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 38bdfb9197b93262248ff489eed336d80db52b54)
Daniel Kiper [Fri, 10 May 2013 15:33:54 +0000 (17:33 +0200)]
tools/libfsimage: Fix clean and distclean make targets
If there is a single colon for a given target and the target
is redefined in another place (e.g. in included file) then
make executes only new target and displays following warning:
Makefile:35: warning: overriding commands for target `clean'
tools/libfsimage/common/../../../tools/libfsimage/Rules.mk:25:
warning: ignoring old commands for target `clean'
To cope with that issue define all required targets as double-colon
rules. Additionally, remove some redundant stuff.
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 667d8a84b244d02e9c6a2d02d6a02fc90c2efb4e)
pygrub/GrubConf: fix boot problem for fedora 19 grub.cfg (2nd attempt)
Booting a fedora 19 domU failed because a it could not properly
parse the grub.cfg file. This was cased by
set default="${next_entry}"
This statement actually is within an 'if' statement, so maybe it would
be better to skip code within if/fi blocks...
But this patch seems to work fine.
Signed-off-by: Marcel Mol <marcel@mesa.nl> Acked-by: Ian Campbell <ian.campbell@citix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit d513814db6af2b298b8776d7ffc5fb1261e176f4)
The IOMMU interrupt handling in bottom half must clear the PPR log interrupt
and event log interrupt bits to re-enable the interrupt. This is done by
writing 1 to the memory mapped register to clear the bit. Due to hardware bug,
if the driver tries to clear this bit while the IOMMU hardware also setting
this bit, the conflict will result with the bit being set. If the interrupt
handling code does not make sure to clear this bit, subsequent changes in the
event/PPR logs will no longer generating interrupts, and would result if
buffer overflow. After clearing the bits, the driver must read back
the register to verify.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Adjust to apply on top of heavily modified patch 1. Adjust flow to get away
with a single readl() in each instance of the status register checks.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 9eabb0735400e2b6059dfa3f0b47a426f61f570a
master date: 2013-07-02 08:50:41 +0200
iommu/amd: Fix logic for clearing the IOMMU interrupt bits
The IOMMU interrupt bits in the IOMMU status registers are
"read-only, and write-1-to-clear (RW1C). Therefore, the existing
logic which reads the register, set the bit, and then writing back
the values could accidentally clear certain bits if it has been set.
The correct logic would just be writing only the value which only
set the interrupt bits, and leave the rest to zeros.
This patch also, clean up #define masks as Jan has suggested.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
With iommu_interrupt_handler() properly having got switched its readl()
from status to control register, the subsequent writel() needed to be
switched too (and the RW1C comment there was bogus).
Some of the cleanup went too far - undone.
Further, with iommu_interrupt_handler() now actually disabling the
interrupt sources, they also need to get re-enabled by the tasklet once
it finished processing the respective log. This also implies re-running
the tasklet so that log entries added between reading the log and re-
enabling the interrupt will get handled in a timely manner.
Finally, guest write emulation to the status register needs to be done
with the RW1C (and RO for all other bits) semantics in mind too.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 2823a0c7dfc979db316787e1dd42a8845e5825c0
master date: 2013-07-02 08:49:43 +0200
Jan Beulich [Thu, 11 Jul 2013 13:09:07 +0000 (15:09 +0200)]
x86: don't pass negative time to gtime_to_gtsc() (try 2)
This mostly reverts commit eb60be3d ("x86: don't pass negative time to
gtime_to_gtsc()") and instead corrects __update_vcpu_system_time()'s
handling of this_cpu(cpu_time).stime_local_stamp dating back before the
start of a HVM guest (which would otherwise lead to a negative value
getting passed to gtime_to_gtsc(), causing scale_delta() to produce
meaningless output).
Flushing the value to zero was wrong, and printing a message for
something that can validly happen wasn't very useful either.
Andrew Cooper [Thu, 11 Jul 2013 13:08:14 +0000 (15:08 +0200)]
AMD/intremap: Prevent use of per-device vector maps until irq logic is fixed
XSA-36 changed the default vector map mode from global to per-device. This is
because a global vector map does not prevent one PCI device from impersonating
another and launching a DoS on the system.
However, the per-device vector map logic is broken for devices with multiple
MSI-X vectors, which can either result in a failed ASSERT() or misprogramming
of a guests interrupt remapping tables. The core problem is not trivial to
fix.
In an effort to get AMD systems back to a non-regressed state, introduce a new
type of vector map called per-device-global. This uses per-device vector maps
in the IOMMU, but uses a single used_vector map for the core IRQ logic.
This patch is intended to be removed as soon as the per-device logic is fixed
correctly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: f0fe8227624d5c02715ed086867d12cd24f6ff47
master date: 2013-06-27 14:01:18 +0200
x86/HVM: fix initialization of wallclock time for PVHVM on migration
Call update_domain_wallclock_time on hvm_latch_shinfo_size even if
the bitness of the guest has already been set, this fixes the problem
with the wallclock not being set for PVHVM guests on resume from
migration.
Zhenguo Wang [Thu, 11 Jul 2013 13:04:50 +0000 (15:04 +0200)]
x86/HVM: fix x2APIC APIC_ID read emulation
APIC and x2APIC have different format for APIC_ID register. Need
translation.
Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com> Signed-off-by: Xiaowei Yang <xiaowei.yang@huawei.com>
Convert code to use switch(), fixing coding style issue at once, and
use GET_xAPIC_ID() on the value read instead of VLAPIC_ID() (reading
the field again).
In the course of this also properly reject both read and writes on the
non-existing MSR corresponding to APIC_ICR2.
Jan Beulich [Tue, 9 Jul 2013 08:07:06 +0000 (10:07 +0200)]
libxl: suppress device assignment to HVM guest when there is no IOMMU
This in effect copies similar logic from xend: While there's no way to
check whether a device is assigned to a particular guest,
XEN_DOMCTL_test_assign_device at least allows checking whether an
IOMMU is there and whether a device has been assign to _some_
guest.
For the time being, this should be enough to cover for the missing
error checking/recovery in other parts of libxl's device assignment
paths.
There remains a (functionality-, but not security-related) race in
that the iommu should be set up earlier, but this is too risky a
change for this stage of the 4.3 release.
This is a security issue, XSA-61.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
master commit: 826eb17271d3c647516d9944c47b0779afedea25
master date: 2013-07-01 15:20:28 +0100
master commit: 826eb17271d3c647516d9944c47b0779afedea25
master date: 2013-07-01 15:20:28 +0100
xsm/flask: Fix XSM support for HVMOP_track_dirty_vram
The XSM check for HVMOP_track_dirty_vram is done with a call to xsm_hvm_param,
therefore the switch handling that case should be located in flask_hvm_param
and not in flask_hvmcontext.
Ian Jackson [Thu, 27 Jun 2013 16:27:04 +0000 (17:27 +0100)]
libxl: Restrict permissions on PV console device xenstore nodes
Matthew Daley has observed that the PV console protocol places sensitive host
state into a guest writeable xenstore locations, this includes:
- The pty used to communicate between the console backend daemon and its
client, allowing the guest administrator to read and write arbitrary host
files.
- The output file, allowing the guest administrator to write arbitrary host
files or to target arbitrary qemu chardevs which include sockets, udp, ptr,
pipes etc (see -chardev in qemu(1) for a more complete list).
- The maximum buffer size, allowing the guest administrator to consume more
resources than the host administrator has configured.
- The backend to use (qemu vs xenconsoled), potentially allowing the guest
administrator to confuse host software.
So we arrange to make the sensitive keys in the xenstore frontend directory
read only for the guest. This is safe since the xenstore permissions model,
unlike POSIX directory permissions, does not allow the guest to remove and
recreate a node if it has write access to the containing directory.
There are a few associated wrinkles:
- The primary PV console is "special". It's xenstore node is not under the
usual /devices/ subtree and it does not use the customary xenstore state
machine protocol. Unfortunately its directory is used for other things,
including the vnc-port node, which we do not want the guest to be able to
write to. Rather than trying to track down all the possible secondary uses
of this directory just make it r/o to the guest. All newly created
subdirectories inherit these permissions and so are now safe by default.
- The other serial consoles do use the customary xenstore state machine and
therefore need write access to at least the "protocol" and "state" nodes,
however they may also want to use arbitrary "feature-foo" nodes (although
I'm not aware of any) and therefore we cannot simply lock down the entire
frontend directory. Instead we add support to libxl__device_generic_add for
frontend keys which are explicitly read only and use that to lock down the
sensitive keys.
- Minios' console frontend wants to write the "type" node, which it has no
business doing since this is a host/toolstack level decision. This fails
now that the node has become read only to the PV guest. Since the toolstack
already writes this node just remove the attempt to set it.
This is CVE-2013-2211 / XSA-57
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Conflicts (4.2 backport):
tools/libxl/libxl.c (no vtpm, free front_ro on error in
libxl__device_console_add)
In the original patch 7 of the series addressing XSA-45 I mistakenly
took the addition of the call to get_page_light() in alloc_page_type()
to cover two decrements that would happen: One for the PGT_partial bit
that is getting set along with the call, and the other for the page
reference the caller hold (and would be dropping on its error path).
But of course the additional page reference is tied to the PGT_partial
bit, and hence any caller of a function that may leave
->arch.old_guest_table non-NULL for error cleanup purposes has to make
sure a respective page reference gets retained.
Similar issues were then also spotted elsewhere: In effect all callers
of get_page_type_preemptible() need to deal with errors in similar
ways. To make sure error handling can work this way without leaking
page references, a respective assertion gets added to that function.
This is CVE-2013-1432 / XSA-58.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 9b167bd2f394f821ae3252d74a15704a4bf91f6d
master date: 2013-06-26 15:32:58 +0200
When using a vtsc, hvm_set_guest_time changes hvm_vcpu.stime_offset,
which is used in the vcpu time structure to calculate the
tsc_timestamp, so after updating stime_offset we need to propagate the
change to vcpu_time in order for the guest to get the right time if
using the PV clock.
When booting Xen on VMware ESX 5.1 and Workstation 9, you hit a GPF
during MCE initialization. The culprit is line 631 in
set_poll_bankmask():
bitmap_copy(mb->bank_map, mca_allbanks->bank_map, nr_mce_banks);
What is happening is that in mca_cap_init(), nr_mce_banks is being set
to 0. This causes the allocation of bank_map to be set to
ZERO_BLOCK_PTR which is the return value for zero-size allocation by
xzalloc_array()/_xmalloc(). This results in the bitmap_copy() to fail
disastrously. The following patch fixes this issue.
Signed-off-by: Aravindh Puthiyaparambil <aravindp@cisco.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christoph Egger <chegger@amazon.de>
master commit: 5cffb77c4072fa5b46700a2dbb3e46c5a54eba6d
master date: 2013-06-03 15:42:46 +0200
Jan Beulich [Mon, 17 Jun 2013 09:09:09 +0000 (11:09 +0200)]
x86: fix ordering of operations in destroy_irq()
The fix for XSA-36, switching the default of vector map management to
be per-device, exposed more readily a problem with the cleanup of these
vector maps: dynamic_irq_cleanup() clearing desc->arch.used_vectors
keeps the subsequently invoked clear_irq_vector() from clearing the
bits for both the in-use and a possibly still outstanding old vector.
Fix this by folding dynamic_irq_cleanup() into destroy_irq(), which was
its only caller, deferring the clearing of the vector map pointer until
after clear_irq_vector().
Once at it, also defer resetting of desc->handler until after the loop
around smp_mb() checking for IRQ_INPROGRESS to be clear, fixing a
(mostly theoretical) issue with the intercation with do_IRQ(): If we
don't defer the pointer reset, do_IRQ() could, for non-guest IRQs, call
->ack() and ->end() with different ->handler pointers, potentially
leading to an IRQ remaining un-acked. The issue is mostly theoretical
because non-guest IRQs are subject to destroy_irq() only on (boot time)
error paths.
As to the changed locking: Invoking clear_irq_vector() with desc->lock
held is okay because vector_lock already nests inside desc->lock (proven
by set_desc_affinity(), which takes vector_lock and gets called from
various desc->handler->ack implementations, getting invoked with
desc->lock held).
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: 43427e65ccfea0c6dc0232f358287e6cc616b507
master date: 2013-05-31 08:35:22 +0200
Ian Jackson [Fri, 14 Jun 2013 15:45:41 +0000 (16:45 +0100)]
libxc: range checks in xc_dom_p2m_host and _guest
These functions take guest pfns and look them up in the p2m. They did
no range checking.
However, some callers, notably xc_dom_boot.c:setup_hypercall_page want
to pass untrusted guest-supplied value(s). It is most convenient to
detect this here and return INVALID_MFN.
This is part of the fix to a security issue, XSA-55.
Changes from Xen 4.2 version of this patch:
* 4.2 lacks dom->rambase_pfn, so don't add/subtract/check it.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:41 +0000 (16:45 +0100)]
libxc: check return values from malloc
A sufficiently malformed input to libxc (such as a malformed input ELF
or other guest-controlled data) might cause one of libxc's malloc() to
fail. In this case we need to make sure we don't dereference or do
pointer arithmetic on the result.
Search for all occurrences of \b(m|c|re)alloc in libxc, and all
functions which call them, and add appropriate error checking where
missing.
This includes the functions xc_dom_malloc*, which now print a message
when they fail so that callers don't have to do so.
The function xc_cpuid_to_str wasn't provided with a sane return value
and has a pretty strange API, which now becomes a little stranger.
There are no in-tree callers.
Changes in the Xen 4.2 version of this series:
* No need to fix code relating to ARM.
* No need to fix code relating to superpage support.
* Additionally fix `dom->p2m_host = xc_dom_malloc...' in xc_dom_ia64.c.
Changes in the Xen 4.1 version of this series:
* An additional check is needed in xc_flask.c:xc_flask_access.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:40 +0000 (16:45 +0100)]
libxc: check failure of xc_dom_*_to_ptr, xc_map_foreign_range
The return values from xc_dom_*_to_ptr and xc_map_foreign_range are
sometimes dereferenced, or subjected to pointer arithmetic, without
checking whether the relevant function failed and returned NULL.
Add an appropriate error check at every call site.
Changes in the 4.2 backport of this series:
* Fix tools/libxc/xc_dom_x86.c:setup_pgtables_x86_32.
* Fix tools/libxc/xc_dom_ia64.c:start_info_ia64.
* Fix tools/libxc/ia64/xc_ia64_dom_fwloader.c:xc_dom_load_fw_kernel.
Conflicts in the 4.1 backport of this series:
* xc_dom_load_elf_kernel has less error handling in 4.1.
* the VM generation ID code is not in 4.1.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:40 +0000 (16:45 +0100)]
libxc: Add range checking to xc_dom_binloader
This is a simple binary image loader with its own metadata format.
However, it is too careless with image-supplied values.
Add the following checks:
* That the image is bigger than the metadata table; otherwise the
pointer arithmetic to calculate the metadata table location may
yield undefined and dangerous values.
* When clamping the end of the region to search, that we do not
calculate pointers beyond the end of the image. The C
specification does not permit this and compilers are becoming ever
more determined to miscompile code when they can "prove" various
falsehoods based on assertions from the C spec.
* That the supplied image is big enough for the text we are allegedly
copying from it. Otherwise we might have a read overrun and copy
the results (perhaps a lot of secret data) into the guest.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:40 +0000 (16:45 +0100)]
libelf: check loops for running away
Ensure that libelf does not have any loops which can run away
indefinitely even if the input is bogus. (Grepped for \bfor, \bwhile
and \bgoto in libelf and xc_dom_*loader*.c.)
Changes needed:
* elf_note_next uses the note's unchecked alleged length, which might
wrap round. If it does, return ELF_MAX_PTRVAL (0xfff..fff) instead,
which will be beyond the end of the section and so terminate the
caller's loop.
* In various loops over section and program headers, check that the
calculated header pointer is still within the image, and quit the
loop if it isn't.
We have not changed loops which might, in principle, iterate over the
whole image - even if they might do so one byte at a time with a
nontrivial access check function in the middle.
This is part of the fix to a security issue, XSA-55.
Conflicts in Xen 4.1 version of the series:
* Trivial conflict due to elf_note_numeric_array not existing.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:40 +0000 (16:45 +0100)]
libxc: Introduce xc_bitops.h
Copy the one file tools/libxc/xc_bitops.h from xen.git#aa1355f9.
We will need this for the next patch, which calls for a bitmap in
libxc.
xc_bitops.h was introduced to unify various existing sets of bitmap
operations. In this patch we backport only the introduction, not the
replacement of the other instances. So we introduce another instance
Sorry :-/.
This is part of the fix to a security issue, XSA-55.
This patch is unique to the Xen 4.1 version of the XSA-55 series.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:39 +0000 (16:45 +0100)]
libelf: use only unsigned integers
Signed integers have undesirable undefined behaviours on overflow.
Malicious compilers can turn apparently-correct code into code with
security vulnerabilities etc.
So use only unsigned integers. Exceptions are booleans (which we have
already changed) and error codes.
We _do_ change all the chars which aren't fixed constants from our own
text segment, but not the char*s. This is because it is safe to
access an arbitrary byte through a char*, but not necessarily safe to
convert an arbitrary value to a char.
As a consequence we need to compile libelf with -Wno-pointer-sign.
It is OK to change all the signed integers to unsigned because all the
inequalities in libelf are in contexts where we don't "expect"
negative numbers.
In libelf-dominfo.c:elf_xen_parse we rename a variable "rc" to
"more_notes" as it actually contains a note count derived from the
input image. The "error" return value from elf_xen_parse_notes is
changed from -1 to ~0U.
grepping shows only one occurrence of "PRId" or "%d" or "%ld" in
libelf and xc_dom_elfloader.c (a "%d" which becomes "%u").
This is part of the fix to a security issue, XSA-55.
Conflicts in 4.1 series:
* xc_dom_load_elf_kernel has no rc variable to change.
* elf_load_image doesn't exist.
For those concerned about unintentional functional changes, the
following rune produces a version of the patch which is much smaller
and eliminates only non-functional changes:
Ian Jackson [Fri, 14 Jun 2013 15:45:39 +0000 (16:45 +0100)]
libelf: use C99 bool for booleans
We want to remove uses of "int" because signed integers have
undesirable undefined behaviours on overflow. Malicious compilers can
turn apparently-correct code into code with security vulnerabilities
etc.
In this patch we change all the booleans in libelf to C99 bool,
from <stdbool.h>.
For the one visible libelf boolean in libxc's public interface we
retain the use of int to avoid changing the ABI; libxc converts it to
a bool for consumption by libelf.
It is OK to change all values only ever used as booleans to _Bool
(bool) because conversion from any scalar type to a _Bool works the
same as the boolean test in if() or ?: and is always defined (C99
6.3.1.2). But we do need to check that all these variables really are
only ever used that way. (It is theoretically possible that the old
code truncated some 64-bit values to 32-bit ints which might become
zero depending on the value, which would mean a behavioural change in
this patch, but it seems implausible that treating 0x????????00000000
as false could have been intended.)
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:39 +0000 (16:45 +0100)]
libelf: Make all callers call elf_check_broken
This arranges that if the new pointer reference error checking
tripped, we actually get a message about it. In this patch these
messages do not change the actual return values from the various
functions: so pointer reference errors do not prevent loading. This
is for fear that some existing kernels might cause the code to make
these wild references, which would then break, which is not a good
thing in a security patch.
In xen/arch/x86/domain_build.c we have to introduce an "out" label and
change all of the "return rc" beyond the relevant point into "goto
out".
This is part of the fix to a security issue, XSA-55.
Differences in 4.1 backport:
* No xen/arch/arm.
* There was less error handling in xen/arch/x86/domain_build.c
so less need to change it.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:38 +0000 (16:45 +0100)]
libelf: Check pointer references in elf_is_elfbinary
elf_is_elfbinary didn't take a length parameter and could potentially
access out of range when provided with a very short image.
We only need to check the size is enough for the actual dereference in
elf_is_elfbinary; callers are just using it to check the magic number
and do their own checks (usually via the new elf_ptrval system) before
dereferencing other parts of the header.
This is part of the fix to a security issue, XSA-55.
Conflicts in 4.1 backport:
* xen/arch/x86/bzimage.c in 4.1 doesn't use elf_is_elfbinary.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:38 +0000 (16:45 +0100)]
libelf: check all pointer accesses
We change the ELF_PTRVAL and ELF_HANDLE types and associated macros:
* PTRVAL becomes a uintptr_t, for which we provide a typedef
elf_ptrval. This means no arithmetic done on it can overflow so
the compiler cannot do any malicious invalid pointer arithmetic
"optimisations". It also means that any places where we
dereference one of these pointers without using the appropriate
macros or functions become a compilation error.
So we can be sure that we won't miss any memory accesses.
All the PTRVAL variables were previously void* or char*, so
the actual address calculations are unchanged.
* ELF_HANDLE becomes a union, one half of which keeps the pointer
value and the other half of which is just there to record the
type.
The new type is not a pointer type so there can be no address
calculations on it whose meaning would change. Every assignment or
access has to go through one of our macros.
* The distinction between const and non-const pointers and char*s
and void*s in libelf goes away. This was not important (and
anyway libelf tended to cast away const in various places).
* The fields elf->image and elf->dest are renamed. That proves
that we haven't missed any unchecked uses of these actual
pointer values.
* The caller may fill in elf->caller_xdest_base and _size to
specify another range of memory which is safe for libelf to
access, besides the input and output images.
* When accesses fail due to being out of range, we mark the elf
"broken". This will be checked and used for diagnostics in
a following patch.
We do not check for write accesses to the input image. This is
because libelf actually does this in a number of places. So we
simply permit that.
* Each caller of libelf which used to set dest now sets
dest_base and dest_size.
* In xc_dom_load_elf_symtab we provide a new actual-pointer
value hdr_ptr which we get from mapping the guest's kernel
area and use (checking carefully) as the caller_xdest area.
* The STAR(h) macro in libelf-dominfo.c now uses elf_access_unsigned.
* elf-init uses the new elf_uval_3264 accessor to access the 32-bit
fields, rather than an unchecked field access (ie, unchecked
pointer access).
* elf_uval has been reworked to use elf_uval_3264. Both of these
macros are essentially new in this patch (although they are derived
from the old elf_uval) and need careful review.
* ELF_ADVANCE_DEST is now safe in the sense that you can use it to
chop parts off the front of the dest area but if you chop more than
is available, the dest area is simply set to be empty, preventing
future accesses.
* We introduce some #defines for memcpy, memset, memmove and strcpy:
- We provide elf_memcpy_safe and elf_memset_safe which take
PTRVALs and do checking on the supplied pointers.
- Users inside libelf must all be changed to either
elf_mem*_unchecked (which are just like mem*), or
elf_mem*_safe (which take PTRVALs) and are checked. Any
unchanged call sites become compilation errors.
* We do _not_ at this time fix elf_access_unsigned so that it doesn't
make unaligned accesses. We hope that unaligned accesses are OK on
every supported architecture. But it does check the supplied
pointer for validity.
This is part of the fix to a security issue, XSA-55.
Additional change in 4.1 backport:
* ELF_PRPTRVAL needs to be defined oddly on 4.1 and earlier because
Xen's headers provide no definitions of uintptr_t or PRIuPTR.
Conflicts:
* Callers of elf_load_binary don't check its return value in 4.1.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:38 +0000 (16:45 +0100)]
libelf: check nul-terminated strings properly
It is not safe to simply take pointers into the ELF and use them as C
pointers. They might not be properly nul-terminated (and the pointers
might be wild).
So we are going to introduce a new function elf_strval for safely
getting strings. This will check that the addresses are in range and
that there is a proper nul-terminated string. Of course it might
discover that there isn't. In that case, it will be made to fail.
This means that elf_note_name might fail, too.
For the benefit of call sites which are just going to pass the value
to a printf-like function, we provide elf_strfmt which returns
"(invalid)" on failure rather than NULL.
In this patch we introduce dummy definitions of these functions. We
introduce calls to elf_strval and elf_strfmt everywhere, and update
all the call sites with appropriate error checking.
There is not yet any semantic change, since before this patch all the
places where we introduce elf_strval dereferenced the value anyway, so
it mustn't have been NULL.
In future patches, when elf_strval is made able return NULL, when it
does so it will mark the elf "broken" so that an appropriate
diagnostic can be printed.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:37 +0000 (16:45 +0100)]
libelf: introduce macros for memory access and pointer handling
We introduce a collection of macros which abstract away all the
pointer arithmetic and dereferences used for accessing the input ELF
and the output area(s). We use the new macros everywhere.
For now, these macros are semantically identical to the code they
replace, so this patch has no functional change.
elf_is_elfbinary is an exception: since it doesn't take an elf*, we
need to handle it differently. In a future patch we will change it to
take, and check, a length parameter. For now we just mark it with a
fixme.
Nontrivial differences in the 4.1 backport:
* We need to provide our own elf_uintptr_t since Xen doesn't.
* We see some additional differences in our verification diff.
* The "function-filter" needs to massage additional symbol names.
Conflicts:
* In xc_dom_load_elf_symtab the old code used
*(Elf64_Word*)(&shdr->e64.sh_name) and the new Elf32_Word
but in fact the type in the struct has changed too so the
new code using elf_store_field is still correct.
* loadelfimage, elf_load_image etc. don't exist and are done
directly with memcpy/memset; patch adjusted appropriately.
* elf_note_numeric_array doesn't exist in 4.1.
That this patch has no functional change can be verified as follows:
0. Copy the scripts "comparison-generate" and "function-filter"
out of this commit message.
1. Check out the tree before this patch.
2. Run the script ../comparison-generate .... ../before
3. Check out the tree after this patch.
4. Run the script ../comparison-generate .... ../after
5. diff --exclude=\*.[soi] -ruN before/ after/ |less
Expect these differences:
* stubdom/zlib-x86_64/ztest*.s2
The filename of this test file apparently contains the pid.
* stubdom/grub/kexec.s2:
Large differences following ".section .debug_info" (which
the 4.1 build system erroneously fails to suppress).
* tools/libxc/xc_domain_restore.s2 (64-bit build):
One trivial code gen difference with no semantic import.
* xen/common/version.s2
The xen build timestamp appears in two diff hunks.
Verification that this is all that's needed:
In a completely built xen.git,
find * -name .*.d -type f | xargs grep -l libelf\.h
Expect results in:
xen/arch/x86: Checked above.
tools/libxc: Checked above.
tools/xcutils/readnotes: Checked above.
tools/xenstore: Checked above.
xen/common/libelf:
This is the build for the hypervisor; checked in B above.
stubdom:
We have one stubdom which reads ELFs using our libelf,
pvgrub, which is checked above.
I have not done this verification for ARM.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-8<- comparison-generate -8<-
#!/bin/bash
# usage:
# cd xen.git
# .../comparison-generate OUR-CONFIG BUILD-RUNE-PREFIX ../before|../after
# eg:
# .../comparison-generate ~/work/.config 'schroot -pc64 --' ../before
set -ex
test $# = 3 || need-exactly-three-arguments
our_config=$1
build_rune_prefix=$2
result_dir=$3
git clean -x -d -f
cp "$our_config" .
cat <<END >>.config
debug_symbols=n
CFLAGS += -save-temps
END
if [ -f ./configure ]; then
$build_rune_prefix ./configure
fi
$build_rune_prefix make -C xen
$build_rune_prefix make -C tools/include
$build_rune_prefix make -C stubdom grub
$build_rune_prefix make -C tools/libxc
$build_rune_prefix make -C tools/xenstore
$build_rune_prefix make -C tools/xcutils
rm -rf "$result_dir"
mkdir "$result_dir"
set +x
for f in `find xen tools stubdom -name \*.[soi]`; do
mkdir -p "$result_dir"/`dirname $f`
cp $f "$result_dir"/${f}
case $f in
*.s)
../function-filter <$f >"$result_dir"/${f}2
;;
esac
done
echo ok.
-8<-
-8<- function-filter -8<-
#!/usr/bin/perl -w
# function-filter
# script for massaging gcc-generated labels to be consistent
use strict;
our @lines;
my $sedderybody = "sub seddery () {\n";
while (<>) {
push @lines, $_;
if (m/^(__FUNCTION__|__func__|_ctx|note_desc|types|last_order|memflags|mutex|d\d_cpu_last|write_count|wall_last|__PRETTY_FUNCTION__)\.(\d+)\:/ ||
m/^\s+\.local\s+(_ctx|write_count|d\d_cpu_last|wall_last|mutex)\.(\d+)\s*$/) {
$sedderybody .= " s/\\b$1\\.$2\\b/__XSA55MANGLED__$1.$./g;\n";
}
}
$sedderybody .= "}\n1;\n";
eval $sedderybody or die $@;
foreach (@lines) {
seddery();
print or die $!;
}
-8<-
Ian Jackson [Fri, 14 Jun 2013 15:45:37 +0000 (16:45 +0100)]
libelf/xc_dom_load_elf_symtab: Do not use "syms" uninitialised
xc_dom_load_elf_symtab (with load==0) calls elf_round_up, but it
mistakenly used the uninitialised variable "syms" when calculating
dom->bsd_symtab_start. This should be a reference to "elf".
This change might have the effect of rounding the value differently.
Previously if the uninitialised value (a single byte on the stack) was
ELFCLASS64 (ie, 2), the alignment would be to 8 bytes, otherwise to 4.
However, the value is calculated from dom->kernel_seg.vend so this
could only make a difference if that value wasn't already aligned to 8
bytes.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:36 +0000 (16:45 +0100)]
libxc: Fix range checking in xc_dom_pfn_to_ptr etc.
* Ensure that xc_dom_pfn_to_ptr (when called with count==0) does not
return a previously-allocated block which is entirely before the
requested pfn (!)
* Provide a version of xc_dom_pfn_to_ptr, xc_dom_pfn_to_ptr_retcount,
which provides the length of the mapped region via an out parameter.
* Change xc_dom_vaddr_to_ptr to always provide the length of the
mapped region and change the call site in xc_dom_binloader.c to
check it. The call site in xc_dom_load_elf_symtab will be corrected
in a forthcoming patch, and for now ignores the returned length.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
v5: This patch is new in v5 of the series.
Ian Jackson [Fri, 14 Jun 2013 15:45:36 +0000 (16:45 +0100)]
libxc: introduce xc_dom_seg_to_ptr_pages
Provide a version of xc_dom_seg_to_ptr which returns the number of
guest pages it has actually mapped. This is useful for callers who
want to do range checking; we will use this later in this series.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Fri, 14 Jun 2013 15:45:36 +0000 (16:45 +0100)]
libelf: abolish libelf-relocate.c
This file is not actually used. It's not built in Xen's instance of
libelf; in libxc's it's built but nothing in it is called. Do not
compile it in libxc, and delete it.
This reduces the amount of work we need to do in forthcoming patches
to libelf (particularly since as libelf-relocate.c is not used it is
probably full of bugs).
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 4 Jun 2013 07:41:41 +0000 (09:41 +0200)]
x86/xsave: properly check guest input to XSETBV
Other than the HVM emulation path, the PV case so far failed to check
that YMM state requires SSE state to be enabled, allowing for a #GP to
occur upon passing the inputs to XSETBV inside the hypervisor.
Jan Beulich [Tue, 4 Jun 2013 07:40:12 +0000 (09:40 +0200)]
x86/xsave: recover from faults on XRSTOR
Just like FXRSTOR, XRSTOR can raise #GP if bad content is being passed
to it in the memory block (i.e. aspects not under the control of the
hypervisor, other than e.g. proper alignment of the block).
Also correct the comment explaining why FXRSTOR needs exception
recovery code to not wrongly state that this can only be a result of
the control tools passing a bad image.
Jan Beulich [Tue, 4 Jun 2013 07:38:49 +0000 (09:38 +0200)]
x86/xsave: fix information leak on AMD CPUs
Just like for FXSAVE/FXRSTOR, XSAVE/XRSTOR also don't save/restore the
last instruction and operand pointers as well as the last opcode if
there's no pending unmasked exception (see CVE-2006-1056 and commit
9747:4d667a139318).
While the FXSR solution sits in the save path, I prefer to have this in
the restore path because there the handling is simpler (namely in the
context of the pending changes to properly save the selector values for
32-bit guest code).
Also this is using FFREE instead of EMMS, as it doesn't seem unlikely
that in the future we may see CPUs with x87 and SSE/AVX but no MMX
support. The goal here anyway is just to avoid an FPU stack overflow.
I would have preferred to use FFREEP instead of FFREE (freeing two
stack slots at once), but AMD doesn't document that instruction.
Petr Matousek [Fri, 31 May 2013 10:28:18 +0000 (12:28 +0200)]
libxc: limit cpu values when setting vcpu affinity
When support for pinning more than 64 cpus was added, check for cpu
out-of-range values was removed. This can lead to subsequent
out-of-bounds cpumap array accesses in case the cpu number is higher
than the actual count.
Jan Beulich [Fri, 31 May 2013 10:27:52 +0000 (12:27 +0200)]
x86: fix boot time APIC mode detection
current_cpu_data becomes valid only relatively late in the boot
process, so looking there for a particular feature early in the game
would generally give the appearance of the feature being unavailable.
Getting this wrong means that at kexec time the system would get
returned to xAPIC mode, causing disconnect_bsp_APIC() to try to access
the APIC page, which on systems with x2APIC pre-enabled will never get
set up.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 234c4dde2fd4f1182fe1a6bea6bced83fe363007
master date: 2013-05-23 13:08:32 +0200
Jan Beulich [Thu, 23 May 2013 13:19:38 +0000 (15:19 +0200)]
tools: fix dependency file generation
There is a small set of places where files in subdirectories get
compiled from the parent directory. Dependency file wise this is no
problem as long as the files use names distinct without regard to the
directories they sit in, and tools/console/ violates this (in having
two main.c files). Hence we need to avoid losing the directory name,
both to ensure the two compiler instances don't simultaneously write
to the same file (happening of which is what triggered me looking
into this) and to guarantee dependencies for all files will be seen
by make on an incremental rebuild.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 4d788e164d6556d931bc3e0a69e36b8cf7280794
master date: 2013-05-21 10:16:30 +0200
Andrew Cooper [Thu, 23 May 2013 13:17:04 +0000 (15:17 +0200)]
AMD/iommu: SR56x0 Erratum 64 - Reset all head & tail pointers
Reference at time of patch:
http://support.amd.com/us/ChipsetMotherboard_TechDocs/46303.pdf
Erratum 64 states that the head and tail pointers for the Command buffer and
Event log are only reset on a cold boot, not a warm boot.
While the erratum is limited to systems using SR56xx chipsets (such as Family
10h CPUs), resetting the pointers is a sensible action in all cases, including
the PPR log for consistency.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 6d243308e1d75f866679db226159c797d6c83aad
master date: 2013-05-22 15:26:52 +0200
Jan Beulich [Thu, 23 May 2013 13:16:44 +0000 (15:16 +0200)]
fix XSA-46 regression with xend/xm
The hypervisor side changes for XSA-46 require the tool stack to now
always map the guest pIRQ before granting access permission to the
underlying host IRQ (GSI). This in particular requires that pciif.py
no longer can skip this step (assuming qemu would do it) for HVM
guests.
This in turn exposes, however, an inconsistency between xend and qemu:
The former wants to always establish 1:1 mappings between pIRQ and host
IRQ (for non-MSI only of course), while the latter always wants to
allocate an arbitrary mapping. Since the whole tool stack obviously
should always agree on the mapping model, make libxc enforce the 1:1
mapping as the more natural one (as well as being the one that allows
for easier debugging, since there no need to find out the extra
mapping). Users of libxc that want to establish a particular (rather
than an allocated) mapping are still free to do so, as well as tool
stacks not based on libxc wanting to implement an allocation based
model (which is why it's not the hypervisor that's being changed to
enforce either model).
Since libxl, like xend, already uses a 1:1 model, it's unaffected by
the libxc change (and it being unaffected by the original hypervisor
side changes is - afaict - simply due to qemu getting spawned at a
later point in time compared to the xend event flow).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andreas Falck <falck.andreas.lists@gmail.com> (on 4.1) Tested-by: Gordan Bobic <gordan@bobich.net> (on 4.2) Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 934a5253d932b6f67fe40fc48975a2b0117e4cce
master date: 2013-05-21 11:32:34 +0200
Jan Beulich [Thu, 23 May 2013 13:14:29 +0000 (15:14 +0200)]
x86/IO-APIC: fix guest RTE write corner cases
This fixes two regressions from c/s 20143:a7de5bd776ca ("x86: Make the
hypercall PHYSDEVOP_alloc_irq_vector hypercall dummy"):
For one, IRQs that had their vector set up by Xen internally without a
handler ever having got set (e.g. via "com<n>=..." without a matching
consumer option like "console=com<n>") would wrongly call
add_pin_to_irq() here, triggering the BUG_ON() in that function.
Second, when assign_irq_vector() fails this addition to irq_2_pin[]
needs to be undone.
In the context of this I'm also surprised that the irq_2_pin[]
manipulations here occur without any lock, i.e. rely on Dom0 to do
some sort of serialization.
Jan Beulich [Fri, 3 May 2013 07:46:54 +0000 (09:46 +0200)]
x86: cleanup after making various page table manipulation operations preemptible
This drops the "preemptible" parameters from various functions where
now they can't (or shouldn't, validated by assertions) be run in non-
preemptible mode anymore, to prove that manipulations of at least L3
and L4 page tables and page table entries are now always preemptible,
i.e. the earlier patches actually fulfill their purpose of fixing the
resulting security issue.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
master commit: b965b31a6bce8c37e67e525fae6da0e2f26d6b2e
master date: 2013-05-02 17:04:14 +0200
Jan Beulich [Thu, 2 May 2013 15:27:13 +0000 (17:27 +0200)]
VT-d: don't permit SVT_NO_VERIFY entries for known device types
Only in cases where we don't know what to do we should leave the IRTE
blank (suppressing all validation), but we should always log a warning
in those cases (as being insecure).
Jan Beulich [Thu, 2 May 2013 15:26:43 +0000 (17:26 +0200)]
x86: make page table handling error paths preemptible
... as they may take significant amounts of time.
This requires cloning the tweaked continuation logic from
do_mmuext_op() to do_mmu_update().
Note that in mod_l[34]_entry() a negative "preemptible" value gets
passed to put_page_from_l[34]e() now, telling the callee to store the
respective page in current->arch.old_guest_table (for a hypercall
continuation to pick up), rather than carrying out the put right away.
This is going to be made a little more explicit by a subsequent cleanup
patch.
This is part of CVE-2013-1918 / XSA-45.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
master commit: b8efae696c9a2d46e91fa0eda739427efc16c250
master date: 2013-05-02 16:39:37 +0200
Jan Beulich [Thu, 2 May 2013 15:26:13 +0000 (17:26 +0200)]
x86: make page table unpinning preemptible
... as it may take significant amounts of time.
Since we can't re-invoke the operation in a second attempt, the
continuation logic must be slightly tweaked so that we make sure
do_mmuext_op() gets run one more time even when the preempted unpin
operation was the last one in a batch.
This is part of CVE-2013-1918 / XSA-45.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
master commit: a3e049f8e86fe18e3b87f18dc0c7be665026fd9f
master date: 2013-05-02 16:39:06 +0200