Andrew Cooper [Wed, 7 Aug 2013 14:53:32 +0000 (16:53 +0200)]
x86/time: Update wallclock in shared info when altering domain time offset
domain_set_time_offset() udpates d->time_offset_seconds, but does not correct
the wallclock in the shared info, meaning that it is incorrect until the next
XENPF_settime hypercall from dom0 which resynchronises the wallclock for all
domains.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
master commit: 915a59f25c5eddd86bc2cae6389d0ed2ab87e69e
master date: 2013-07-18 09:16:15 +0200
Jan Beulich [Wed, 7 Aug 2013 14:52:34 +0000 (16:52 +0200)]
x86: don't use destroy_xen_mappings() for vunmap()
Its attempt to tear down intermediate page table levels may race with
map_pages_to_xen() establishing them, and now that
map_domain_page_global() is backed by vmap() this teardown is also
wasteful (as it's very likely to need the same address space populated
again within foreseeable time).
Andrew Cooper [Wed, 7 Aug 2013 14:51:56 +0000 (16:51 +0200)]
x86/cpuidle: Change logging for unknown APIC IDs
Dom0 uses this hypercall to pass ACPI information to Xen. It is not very
uncommon for more cpus to be listed in the ACPI tables than are present on the
system, particularly on systems with a common BIOS for a 2 and 4 socket server
varients.
As Dom0 does not control the number of entries in the ACPI tables, and is
required to pass everything it finds to Xen, change the logging.
There is now an single unconditional warning for the first unknown ID, and
further warnings if "cpuinfo" is requested by the user on the command line.
Jan Beulich [Wed, 7 Aug 2013 14:49:39 +0000 (16:49 +0200)]
adjust x86 EFI build
While the rule to generate .init.o files from .o ones already correctly
included $(extra-y), the setting of the necessary compiler flag didn't
have the same. With some yet to be posted patch this resulted in build
breakage because of the compiler deciding not to inline a few functions
(which then results in .text not being empty as required for these
object files).
Andrew Cooper [Wed, 7 Aug 2013 14:48:56 +0000 (16:48 +0200)]
x86/mm: Ensure useful progress in alloc_l2_table()
While debugging the issue which turned out to be XSA-58, a printk in this loop
showed that it was quite easy to never make useful progress, because of
consistently failing the preemption check.
One single l2 entry is a reasonable amount of work to do, even if an action is
pending, and also assures forwards progress across repeat continuations.
Tweak the continuation criteria to fail on the first iteration of the loop.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
master commit: d3a55d7d9bb518efe08143d050deff9f4ee80ec1
master date: 2013-07-04 10:33:18 +0200
The IOMMU interrupt handling in bottom half must clear the PPR log interrupt
and event log interrupt bits to re-enable the interrupt. This is done by
writing 1 to the memory mapped register to clear the bit. Due to hardware bug,
if the driver tries to clear this bit while the IOMMU hardware also setting
this bit, the conflict will result with the bit being set. If the interrupt
handling code does not make sure to clear this bit, subsequent changes in the
event/PPR logs will no longer generating interrupts, and would result if
buffer overflow. After clearing the bits, the driver must read back
the register to verify.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Adjust to apply on top of heavily modified patch 1. Adjust flow to get away
with a single readl() in each instance of the status register checks.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 9eabb0735400e2b6059dfa3f0b47a426f61f570a
master date: 2013-07-02 08:50:41 +0200
iommu/amd: Fix logic for clearing the IOMMU interrupt bits
The IOMMU interrupt bits in the IOMMU status registers are
"read-only, and write-1-to-clear (RW1C). Therefore, the existing
logic which reads the register, set the bit, and then writing back
the values could accidentally clear certain bits if it has been set.
The correct logic would just be writing only the value which only
set the interrupt bits, and leave the rest to zeros.
This patch also, clean up #define masks as Jan has suggested.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
With iommu_interrupt_handler() properly having got switched its readl()
from status to control register, the subsequent writel() needed to be
switched too (and the RW1C comment there was bogus).
Some of the cleanup went too far - undone.
Further, with iommu_interrupt_handler() now actually disabling the
interrupt sources, they also need to get re-enabled by the tasklet once
it finished processing the respective log. This also implies re-running
the tasklet so that log entries added between reading the log and re-
enabling the interrupt will get handled in a timely manner.
Finally, guest write emulation to the status register needs to be done
with the RW1C (and RO for all other bits) semantics in mind too.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 2823a0c7dfc979db316787e1dd42a8845e5825c0
master date: 2013-07-02 08:49:43 +0200
Jan Beulich [Mon, 15 Jul 2013 11:07:08 +0000 (13:07 +0200)]
x86: don't pass negative time to gtime_to_gtsc() (try 2)
This mostly reverts commit eb60be3d ("x86: don't pass negative time to
gtime_to_gtsc()") and instead corrects __update_vcpu_system_time()'s
handling of this_cpu(cpu_time).stime_local_stamp dating back before the
start of a HVM guest (which would otherwise lead to a negative value
getting passed to gtime_to_gtsc(), causing scale_delta() to produce
meaningless output).
Flushing the value to zero was wrong, and printing a message for
something that can validly happen wasn't very useful either.
Andrew Cooper [Tue, 2 Jul 2013 20:02:33 +0000 (21:02 +0100)]
docs: Pull Xen version from canonical location
rather than hard coding it and being wrong every time we branch for a release.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit f487767ad0e58acb6c1ed3cc56daa0fb71b1f23a)
Ian Jackson [Mon, 1 Jul 2013 14:20:28 +0000 (15:20 +0100)]
libxl: suppress device assignment to HVM guest when there is no IOMMU
This in effect copies similar logic from xend: While there's no way to
check whether a device is assigned to a particular guest,
XEN_DOMCTL_test_assign_device at least allows checking whether an
IOMMU is there and whether a device has been assign to _some_
guest.
For the time being, this should be enough to cover for the missing
error checking/recovery in other parts of libxl's device assignment
paths.
There remains a (functionality-, but not security-related) race in
that the iommu should be set up earlier, but this is too risky a
change for this stage of the 4.3 release.
This is a security issue, XSA-61.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Thu, 27 Jun 2013 17:13:30 +0000 (18:13 +0100)]
xen/arm: Rework the way to compute dom0 DTB base address
If the DTB is loading right after the kernel, on some setup, Linux will
overwrite the DTB during the decompression step.
To be sure the DTB won't be overwritten by the decompression stage, load
the DTB near the end of the first memory bank and below 4Gib (if memory range is
greater).
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Fri, 28 Jun 2013 11:25:57 +0000 (12:25 +0100)]
xen/arm: gic_shutdown_irq must only disable the right IRQ
When GICD_ICENABLERn is read, all the 1s bit represent enabled IRQs.
Currently gic_shutdown_irq:
- read GICD_ICENABLER
- set the corresping bit to 1
- write back the new value
That means, Xen will disable more IRQs than necessary.
Dongxiao Xu [Thu, 27 Jun 2013 15:01:26 +0000 (17:01 +0200)]
nested vmx: Fix the booting of L2 PAE guest
When doing virtual VM entry and virtual VM exit, we need to
sychronize the PAE PDPTR related VMCS registers. With this fix,
we can boot 32bit PAE L2 guest (Win7 & RHEL6.4) on "Xen on Xen"
environment.
Andrew Cooper [Thu, 27 Jun 2013 12:01:18 +0000 (14:01 +0200)]
AMD/intremap: Prevent use of per-device vector maps until irq logic is fixed
XSA-36 changed the default vector map mode from global to per-device. This is
because a global vector map does not prevent one PCI device from impersonating
another and launching a DoS on the system.
However, the per-device vector map logic is broken for devices with multiple
MSI-X vectors, which can either result in a failed ASSERT() or misprogramming
of a guests interrupt remapping tables. The core problem is not trivial to
fix.
In an effort to get AMD systems back to a non-regressed state, introduce a new
type of vector map called per-device-global. This uses per-device vector maps
in the IOMMU, but uses a single used_vector map for the core IRQ logic.
This patch is intended to be removed as soon as the per-device logic is fixed
correctly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
This grub.cfg from a default fedora 19 Beta install
caused pygrub failures.The previous pygrub commit
fixed taht. So this example file added for reference.
Signed-off-by: Marcel Mol <marcel@mesa.nl> Acked-by: Ian Campbell <ian.campbell@citrix.com>
pygrub/GrubConf: fix boot problem for fedora 19 grub.cfg (2nd attempt)
Booting a fedora 19 domU failed because a it could not properly
parse the grub.cfg file. This was cased by
set default="${next_entry}"
This statement actually is within an 'if' statement, so maybe it would
be better to skip code within if/fi blocks...
But this patch seems to work fine.
Signed-off-by: Marcel Mol <marcel@mesa.nl> Acked-by: Ian Campbell <ian.campbell@citix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Murray [Sat, 22 Jun 2013 12:38:11 +0000 (13:38 +0100)]
Xendomains was not correctly suspending domains when a STOP was issued.
The regex was not selecting the { when parsing JSON output of xl list -l.
It was also not selecting (domain when parsing xl list -l when SXP selected.
Pefixed { with 4 spaces, and removed an extra ( before domain in the regex
string
Added quotes around the grep strings so the spaces inserted into the string
didn't not break the grepping.
This has now been tested against 4.3RC5
Signed-off-by: Ian Murray <murrayie@yahoo.co.uk> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Anthony PERARD [Wed, 26 Jun 2013 15:54:31 +0000 (16:54 +0100)]
libxl: Use QMP cpu-add to hotplug CPU with qemu-xen.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Anthony PERARD [Wed, 26 Jun 2013 15:54:30 +0000 (16:54 +0100)]
libxl: Add "cpu-add" QMP command.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- rename index parameter to avoid Wshadow due to index(3) in strings.h ]
Andrew Cooper [Mon, 24 Jun 2013 15:47:05 +0000 (16:47 +0100)]
tools/libxc: Fix memory leaks in xc_domain_save()
Introduces outbuf_free() to mirror the currently existing outbuf_init().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Wed, 26 Jun 2013 13:23:35 +0000 (14:23 +0100)]
libxc: Fix guest boot on ARM after XSA-55
XSA-55 has exposed errors for guest creation on ARM:
- domain virt_base was not defined;
- xc_dom_alloc_segment allocates pfn from 0 instead of the RAM base address.
Jim Fehlig [Tue, 25 Jun 2013 22:02:15 +0000 (16:02 -0600)]
libxl: Fix assignment of devid value returned from libxl__device_nextid
Commit 5420f265 has some misplaced parenthesis that caused devid
to be assigned 1 or 0 based on checking return value of
libxl__device_nextid < 0, e.g.
devid = libxl__device_nextid(...) < 0
This works when only one instance of a given device type exists, but
subsequent devices of the same type will also have a devid = 1 if
libxl__device_nextid succeeds. Fix by checking the value assigned to
devid, e.g.
(devid = libxl__device_nextid(...)) < 0
Signed-off-by: Jim Fehlig <jfehlig@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
In the original patch 7 of the series addressing XSA-45 I mistakenly
took the addition of the call to get_page_light() in alloc_page_type()
to cover two decrements that would happen: One for the PGT_partial bit
that is getting set along with the call, and the other for the page
reference the caller hold (and would be dropping on its error path).
But of course the additional page reference is tied to the PGT_partial
bit, and hence any caller of a function that may leave
->arch.old_guest_table non-NULL for error cleanup purposes has to make
sure a respective page reference gets retained.
Similar issues were then also spotted elsewhere: In effect all callers
of get_page_type_preemptible() need to deal with errors in similar
ways. To make sure error handling can work this way without leaking
page references, a respective assertion gets added to that function.
This is CVE-2013-1432 / XSA-58.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 25 Jun 2013 13:57:44 +0000 (15:57 +0200)]
VMX/Viridian: suppress MSR-based APIC suggestion when having APIC-V
When the CPU has the necessary capabilities, having Windows use
synthetic MSR reads/writes is bogus, as this still requires emulation
(which is pretty much guaranteed to be slower than having the hardware
carry out the operation).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Ian Jackson [Tue, 25 Jun 2013 10:24:22 +0000 (11:24 +0100)]
libxl: Restrict permissions on PV console device xenstore nodes
Matthew Daley has observed that the PV console protocol places sensitive host
state into a guest writeable xenstore locations, this includes:
- The pty used to communicate between the console backend daemon and its
client, allowing the guest administrator to read and write arbitrary host
files.
- The output file, allowing the guest administrator to write arbitrary host
files or to target arbitrary qemu chardevs which include sockets, udp, ptr,
pipes etc (see -chardev in qemu(1) for a more complete list).
- The maximum buffer size, allowing the guest administrator to consume more
resources than the host administrator has configured.
- The backend to use (qemu vs xenconsoled), potentially allowing the guest
administrator to confuse host software.
So we arrange to make the sensitive keys in the xenstore frontend directory
read only for the guest. This is safe since the xenstore permissions model,
unlike POSIX directory permissions, does not allow the guest to remove and
recreate a node if it has write access to the containing directory.
There are a few associated wrinkles:
- The primary PV console is "special". It's xenstore node is not under the
usual /devices/ subtree and it does not use the customary xenstore state
machine protocol. Unfortunately its directory is used for other things,
including the vnc-port node, which we do not want the guest to be able to
write to. Rather than trying to track down all the possible secondary uses
of this directory just make it r/o to the guest. All newly created
subdirectories inherit these permissions and so are now safe by default.
- The other serial consoles do use the customary xenstore state machine and
therefore need write access to at least the "protocol" and "state" nodes,
however they may also want to use arbitrary "feature-foo" nodes (although
I'm not aware of any) and therefore we cannot simply lock down the entire
frontend directory. Instead we add support to libxl__device_generic_add for
frontend keys which are explicitly read only and use that to lock down the
sensitive keys.
- Minios' console frontend wants to write the "type" node, which it has no
business doing since this is a host/toolstack level decision. This fails
now that the node has become read only to the PV guest. Since the toolstack
already writes this node just remove the attempt to set it.
This is a security issue, XSA-57.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Fri, 21 Jun 2013 16:36:26 +0000 (17:36 +0100)]
tools/libxc: Fix memory leaks in xc_domain_restore()
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> (re 4.3 release) Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
George Dunlap [Tue, 18 Jun 2013 09:08:01 +0000 (10:08 +0100)]
libxl,hvmloader: Don't relocate memory for MMIO hole
At the moment, qemu-xen can't handle memory being relocated by
hvmloader. This may happen if a device with a large enough memory
region is passed through to the guest. At the moment, if this
happens, then at some point in the future qemu will crash and the
domain will hang. (qemu-traditional is fine.)
It's too late in the release to do a proper fix, so we try to do
damage control.
hvmloader already has mechanisms to relocate memory to 64-bit space if
it can't make a big enough MMIO hole. By default this is 2GiB; if we
just refuse to make the hole bigger if it will overlap with guest
memory, then the relocation will happen by default.
v5:
- Update comment to not refer to "this series".
v4:
- Wrap long line in libxl_dm.c
- Fix comment
v3:
- Fix polarity of comparison
- Move diagnostic messages to another patch
- Tested with xen platform pci device hacked to have different BAR sizes
{256MiB, 1GiB} x {qemu-xen, qemu-traditional} x various memory
configurations
- Add comment explaining why we default to "allow"
- Remove cast to bool
v2:
- style fixes
- fix and expand comment on the MMIO hole loop
- use "%d" rather than "%s" -> (...)?"1":"0"
- use bool instead of uint8_t
- Move 64-bit bar relocate detection to another patch
- Add more diagnostic messages
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org> CC: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 18 Jun 2013 13:56:29 +0000 (14:56 +0100)]
hvmloader: Remove minimum size for BARs to relocate to 64-bit space
Allow devices with BARs less than 512MiB to be relocated to high
memory.
This will only be invoked if there is not enough low MMIO space to map
the device, and will be done preferentially to large devices first; so
in all likelihood only large devices will be remapped anyway.
This is needed to work-around the issue of qemu-xen not being able to
handle moving guest memory around to resize the MMIO hole. The
default MMIO hole size is less than 256MiB.
v3:
- Fixed minor style issue
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 18 Jun 2013 14:32:35 +0000 (15:32 +0100)]
hvmloader: Load large devices into high MMIO space as needed
Keep track of how much mmio space is left total, as well as the amount
of "low" MMIO space (<4GiB), and only load devices into high memory if
there is not enough low memory for the rest of the devices to fit.
Because devices are processed by size in order from large to small,
this should preferentially relocate devices with large BARs to 64-bit
space.
v3:
- Just use mmio_total rather than introducing a new variable.
- Port to using mem_resource directly rather than low_mmio_left
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 18 Jun 2013 14:11:03 +0000 (15:11 +0100)]
hvmloader: Correct bug in low mmio region accounting
When deciding whether to map a device in low MMIO space (<4GiB),
hvmloader compares it with "mmio_left", which is set to the size of
the low MMIO range (pci_mem_end - pci_mem_start). However, even if it
does map a device in high MMIO space, it still removes the size of its
BAR from mmio_left.
In reality we don't need to do a separate accounting of the low memory
available -- this can be calculated from mem_resource. Just get rid
of the variable and the duplicate accounting entirely. This will make
the code more robust.
Note also that the calculation of whether to move a device to 64-bit
is fragile at the moment, depending on some unstated assumptions.
State those assumptions in a comment for future reference.
v5:
- Add comment documenting fragility of the move-to-highmem check
v3:
- Use mem_resource values directly instead of doing duplicate
accounting
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap [Tue, 18 Jun 2013 11:48:36 +0000 (12:48 +0100)]
hvmloader: Fix check for needing a 64-bit bar
After attempting to resize the MMIO hole, the check to determine
whether there is a need to relocate BARs into 64-bit space checks the
specific thing that caused the loop to exit (MMIO hole == 2GiB) rather
than checking whether the required MMIO will fit in the hole.
But even then it does it wrong: the polarity of the check is
backwards.
Check for the actual condition we care about (the sizeof the MMIO
hole) rather than checking for the loop exit condition.
v3:
- Move earlier in the series, before other functional changes
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
George Dunlap [Wed, 19 Jun 2013 16:30:13 +0000 (17:30 +0100)]
hvmloader: Set up highmem resouce appropriately if there is no RAM above 4G
hvmloader will read hvm_info->high_mem_pgend to calculate where to
start the highmem PCI region. However, if the guest does not have any
memory in the high region, this is set to zero, which will cause
hvmloader to use the "0" for the base of the highmem region, rather
than 1 << 32.
Check to see whether hvm_info->high_mem_pgend is set; if so, do the
normal calculation; otherwise, use 1<<32.
v4:
- Handle case where hfm_info->high_mem_pgend is non-zero but doesn't
point into high memory, throwing a warning.
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Stefano Stabellini <stefano.stabellini@citrix.com> CC: Hanweidong <hanweidong@huawei.com> CC: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 21 Jun 2013 06:39:43 +0000 (08:39 +0200)]
tpmif: fix identifier prefixes
The definitions here shouldn't use vtpm_ or VPTM_ as their prefixes,
the interface should instead make use of tpmif_ and TPMIF_. This
fixes a build failure after syncing the public headers to
linux-2.6.18-xen.hg (where a struct vtpm_state already exists).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Jan Beulich [Mon, 17 Jun 2013 08:30:57 +0000 (10:30 +0200)]
AMD IOMMU: make interrupt work again
Commit 899110e3 ("AMD IOMMU: include IOMMU interrupt information in 'M'
debug key output") made the AMD IOMMU MSI setup code use more of the
generic MSI setup code (as other than for VT-d this is an ordinary MSI-
capable PCI device), but failed to notice that till now interrupt setup
there _required_ the subsequent affinity setup to be done, as that was
the only point where the MSI message would get written. The generic MSI
affinity setting routine, however, does only an incremental change,
i.e. relies on this setup to have been done before.
In order to not make the code even more clumsy, introduce a new low
level helper routine __setup_msi_irq(), thus eliminating the need for
the AMD IOMMU code to directly fiddle with the IRQ descriptor.
Ian Campbell [Sat, 15 Jun 2013 08:30:47 +0000 (09:30 +0100)]
xen: arm: fix build after libelf changes.
ed65808a8ed4 "libelf: check all pointer accesses" caused:
kernel.c: In function 'kernel_elf_load':
kernel.c:162:18: error: 'struct elf_binary' has no member named 'dest'
make[4]: *** [kernel.o] Error 1
The field is now called dest_base. We also need to populate dest_size.
This fixes the build for me although have not tested it. I have a feeling that
loading the kernel from an ELF file on ARM doesn't currently work anyway
(everyone uses the zImage loader as far as I am aware).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Matthew Daley [Fri, 14 Jun 2013 15:39:38 +0000 (16:39 +0100)]
libxc: check blob size before proceeding in xc_dom_check_gzip
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Matthew Daley <mattjd@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v8: Add a comment explaining where the number 6 comes from.
Ian Jackson [Fri, 14 Jun 2013 15:39:38 +0000 (16:39 +0100)]
libxc: range checks in xc_dom_p2m_host and _guest
These functions take guest pfns and look them up in the p2m. They did
no range checking.
However, some callers, notably xc_dom_boot.c:setup_hypercall_page want
to pass untrusted guest-supplied value(s). It is most convenient to
detect this here and return INVALID_MFN.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Tim Deegan <tim@xen.org> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
v6: Check for underflow too (thanks to Andrew Cooper).
Ian Jackson [Fri, 14 Jun 2013 15:39:38 +0000 (16:39 +0100)]
libxc: check return values from malloc
A sufficiently malformed input to libxc (such as a malformed input ELF
or other guest-controlled data) might cause one of libxc's malloc() to
fail. In this case we need to make sure we don't dereference or do
pointer arithmetic on the result.
Search for all occurrences of \b(m|c|re)alloc in libxc, and all
functions which call them, and add appropriate error checking where
missing.
This includes the functions xc_dom_malloc*, which now print a message
when they fail so that callers don't have to do so.
The function xc_cpuid_to_str wasn't provided with a sane return value
and has a pretty strange API, which now becomes a little stranger.
There are no in-tree callers.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v8: Move a check in xc_exchange_page to the previous patch
(ie, remove it from this patch).
v7: Add a missing check for a call to alloc_str.
Add arithmetic overflow check in xc_dom_malloc.
Coding style fix.
v6: Fix a missed call `pfn_err = calloc...' in xc_domain_restore.c.
Fix a missed call `new_pfn = xc_map_foreign_range...' in
xc_offline_page.c
v5: This patch is new in this version of the series.
Ian Jackson [Fri, 14 Jun 2013 15:39:38 +0000 (16:39 +0100)]
libxc: check failure of xc_dom_*_to_ptr, xc_map_foreign_range
The return values from xc_dom_*_to_ptr and xc_map_foreign_range are
sometimes dereferenced, or subjected to pointer arithmetic, without
checking whether the relevant function failed and returned NULL.
Add an appropriate error check at every call site.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v8: Add a missing check in xc_offline_page.c:xc_exchange_page,
which was in the next patch in v7 of the series.
Also improve the message.
I think in this particular error case it may be that the results
are a broken guest, but turning this from a possible host tools
crash into a guest problem seems to solve the potential security
problem.
v7: Simplify an error DOMPRINTF to not use "load ? : ".
Make DOMPRINTF allocation error messages consistent.
Do not set elf->dest_pages in xc_dom_load_elf_kernel
if xc_dom_seg_to_ptr_pages fails.
v5: This patch is new in this version of the series.
Ian Jackson [Fri, 14 Jun 2013 15:39:37 +0000 (16:39 +0100)]
libxc: Add range checking to xc_dom_binloader
This is a simple binary image loader with its own metadata format.
However, it is too careless with image-supplied values.
Add the following checks:
* That the image is bigger than the metadata table; otherwise the
pointer arithmetic to calculate the metadata table location may
yield undefined and dangerous values.
* When clamping the end of the region to search, that we do not
calculate pointers beyond the end of the image. The C
specification does not permit this and compilers are becoming ever
more determined to miscompile code when they can "prove" various
falsehoods based on assertions from the C spec.
* That the supplied image is big enough for the text we are allegedly
copying from it. Otherwise we might have a read overrun and copy
the results (perhaps a lot of secret data) into the guest.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
v9: Use clearer code for calculating probe_end in find_table.
v6: Add a missing `return -EINVAL' (Matthew Daley).
Fix an error in the commit message (Matthew Daley).
v5: This patch is new in this version of the series.
Ian Jackson [Fri, 14 Jun 2013 15:39:36 +0000 (16:39 +0100)]
libelf: abolish obsolete macros
Abolish ELF_PTRVAL_[CONST_]{CHAR,VOID}; change uses to elf_ptrval.
Abolish ELF_HANDLE_DECL_NONCONST; change uses to ELF_HANDLE_DECL.
Abolish ELF_OBSOLETE_VOIDP_CAST; simply remove all uses.
No functional change. (Verified by diffing assembler output.)
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v2: New patch.
Ian Jackson [Fri, 14 Jun 2013 15:39:36 +0000 (16:39 +0100)]
libelf: check loops for running away
Ensure that libelf does not have any loops which can run away
indefinitely even if the input is bogus. (Grepped for \bfor, \bwhile
and \bgoto in libelf and xc_dom_*loader*.c.)
Changes needed:
* elf_note_next uses the note's unchecked alleged length, which might
wrap round. If it does, return ELF_MAX_PTRVAL (0xfff..fff) instead,
which will be beyond the end of the section and so terminate the
caller's loop. Also check that the returned psuedopointer is sane.
* In various loops over section and program headers, check that the
calculated header pointer is still within the image, and quit the
loop if it isn't.
* Some fixed limits to avoid potentially O(image_size^2) loops:
- maximum length of strings: 4K (longer ones ignored totally)
- maximum total number of ELF notes: 65536 (any more are ignored)
* Check that the total program contents (text, data) we copy or
initialise doesn't exceed twice the output image area size.
* Remove an entirely useless loop from elf_xen_parse (!)
* Replace a nested search loop in in xc_dom_load_elf_symtab in
xc_dom_elfloader.c by a precomputation of a bitmap of referenced
symtabs.
We have not changed loops which might, in principle, iterate over the
whole image - even if they might do so one byte at a time with a
nontrivial access check function in the middle.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v8: Fix the two loops in libelf-dominfo.c; the comment about
PT_NOTE and SHT_NOTE wasn't true because the checks did
"continue", not "break".
Add a comment about elf_note_next's expectations of the caller's
loop conditions (which most plausible callers will follow anyway).
v5: Fix regression due to wrong image size loop limit calculation.
Check return value from xc_dom_malloc.
v4: Fix regression due to misplacement of test in elf_shdr_by_name
(uninitialised variable).
Introduce fixed limits.
Avoid O(size^2) loops.
Check returned psuedopointer from elf_note_next is correct.
A few style fixes.
v3: Fix a whitespace error.
v2: BUGFIX: elf_shdr_by_name, elf_note_next: Reject new <= old, not just <.
elf_shdr_by_name: Change order of checks to be a bit clearer.
elf_load_bsdsyms: shdr loop check, improve chance of brokenness detection.
Style fixes.
Ian Jackson [Fri, 14 Jun 2013 15:39:36 +0000 (16:39 +0100)]
libelf: use only unsigned integers
Signed integers have undesirable undefined behaviours on overflow.
Malicious compilers can turn apparently-correct code into code with
security vulnerabilities etc.
So use only unsigned integers. Exceptions are booleans (which we have
already changed) and error codes.
We _do_ change all the chars which aren't fixed constants from our own
text segment, but not the char*s. This is because it is safe to
access an arbitrary byte through a char*, but not necessarily safe to
convert an arbitrary value to a char.
As a consequence we need to compile libelf with -Wno-pointer-sign.
It is OK to change all the signed integers to unsigned because all the
inequalities in libelf are in contexts where we don't "expect"
negative numbers.
In libelf-dominfo.c:elf_xen_parse we rename a variable "rc" to
"more_notes" as it actually contains a note count derived from the
input image. The "error" return value from elf_xen_parse_notes is
changed from -1 to ~0U.
grepping shows only one occurrence of "PRId" or "%d" or "%ld" in
libelf and xc_dom_elfloader.c (a "%d" which becomes "%u").
This is part of the fix to a security issue, XSA-55.
For those concerned about unintentional functional changes, the
following rune produces a version of the patch which is much smaller
and eliminates only non-functional changes:
set +e
diff -pu --label "$path~" <(seddery <"$in") --label "$path" <(seddery <"$out")
rc=$?
set -e
if [ $rc = 1 ]; then rc=0; fi
exit $rc
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v8: Use "?!?!" to express consternation instead of a ruder phrase.
v5: Introduce ELF_NOTE_INVALID, instead of using a literal ~0U.
v4: Fix regression in elf_round_up; use uint64_t here.
v3: Changes to booleans split off into separate patch.
v2: BUGFIX: Eliminate conversion to int of return from elf_xen_parse_notes.
BUGFIX: Fix the one printf format thing which needs changing.
Remove irrelevant change to constify note_desc.name in libelf-dominfo.c.
In xc_dom_load_elf_symtab change one sizeof(int) to sizeof(unsigned).
Do not change type of 2nd argument to memset.
Provide seddery for easier review.
Style fix.
Ian Jackson [Fri, 14 Jun 2013 15:39:36 +0000 (16:39 +0100)]
libelf: use C99 bool for booleans
We want to remove uses of "int" because signed integers have
undesirable undefined behaviours on overflow. Malicious compilers can
turn apparently-correct code into code with security vulnerabilities
etc.
In this patch we change all the booleans in libelf to C99 bool,
from <stdbool.h>.
For the one visible libelf boolean in libxc's public interface we
retain the use of int to avoid changing the ABI; libxc converts it to
a bool for consumption by libelf.
It is OK to change all values only ever used as booleans to _Bool
(bool) because conversion from any scalar type to a _Bool works the
same as the boolean test in if() or ?: and is always defined (C99
6.3.1.2). But we do need to check that all these variables really are
only ever used that way. (It is theoretically possible that the old
code truncated some 64-bit values to 32-bit ints which might become
zero depending on the value, which would mean a behavioural change in
this patch, but it seems implausible that treating 0x????????00000000
as false could have been intended.)
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v3: Use <stdbool.h>'s bool (or _Bool) instead of defining elf_bool.
Split this into a separate patch.
Ian Jackson [Fri, 14 Jun 2013 15:39:36 +0000 (16:39 +0100)]
libelf: Make all callers call elf_check_broken
This arranges that if the new pointer reference error checking
tripped, we actually get a message about it. In this patch these
messages do not change the actual return values from the various
functions: so pointer reference errors do not prevent loading. This
is for fear that some existing kernels might cause the code to make
these wild references, which would then break, which is not a good
thing in a security patch.
In xen/arch/x86/domain_build.c we have to introduce an "out" label and
change all of the "return rc" beyond the relevant point into "goto
out".
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v5: Fix two whitespace errors.
v3.1:
Add error check to xc_dom_parse_elf_kernel.
Move check in xc_hvm_build_x86.c:setup_guest to right place.
v2 was Acked-by: Ian Campbell <ian.campbell@citrix.com>
v2 was Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Ian Jackson [Fri, 14 Jun 2013 15:39:36 +0000 (16:39 +0100)]
libelf: Check pointer references in elf_is_elfbinary
elf_is_elfbinary didn't take a length parameter and could potentially
access out of range when provided with a very short image.
We only need to check the size is enough for the actual dereference in
elf_is_elfbinary; callers are just using it to check the magic number
and do their own checks (usually via the new elf_ptrval system) before
dereferencing other parts of the header.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v7: Add a comment about the limited function of elf_is_elfbinary.
Ian Jackson [Fri, 14 Jun 2013 15:39:36 +0000 (16:39 +0100)]
libelf: check all pointer accesses
We change the ELF_PTRVAL and ELF_HANDLE types and associated macros:
* PTRVAL becomes a uintptr_t, for which we provide a typedef
elf_ptrval. This means no arithmetic done on it can overflow so
the compiler cannot do any malicious invalid pointer arithmetic
"optimisations". It also means that any places where we
dereference one of these pointers without using the appropriate
macros or functions become a compilation error.
So we can be sure that we won't miss any memory accesses.
All the PTRVAL variables were previously void* or char*, so
the actual address calculations are unchanged.
* ELF_HANDLE becomes a union, one half of which keeps the pointer
value and the other half of which is just there to record the
type.
The new type is not a pointer type so there can be no address
calculations on it whose meaning would change. Every assignment or
access has to go through one of our macros.
* The distinction between const and non-const pointers and char*s
and void*s in libelf goes away. This was not important (and
anyway libelf tended to cast away const in various places).
* The fields elf->image and elf->dest are renamed. That proves
that we haven't missed any unchecked uses of these actual
pointer values.
* The caller may fill in elf->caller_xdest_base and _size to
specify another range of memory which is safe for libelf to
access, besides the input and output images.
* When accesses fail due to being out of range, we mark the elf
"broken". This will be checked and used for diagnostics in
a following patch.
We do not check for write accesses to the input image. This is
because libelf actually does this in a number of places. So we
simply permit that.
* Each caller of libelf which used to set dest now sets
dest_base and dest_size.
* In xc_dom_load_elf_symtab we provide a new actual-pointer
value hdr_ptr which we get from mapping the guest's kernel
area and use (checking carefully) as the caller_xdest area.
* The STAR(h) macro in libelf-dominfo.c now uses elf_access_unsigned.
* elf-init uses the new elf_uval_3264 accessor to access the 32-bit
fields, rather than an unchecked field access (ie, unchecked
pointer access).
* elf_uval has been reworked to use elf_uval_3264. Both of these
macros are essentially new in this patch (although they are derived
from the old elf_uval) and need careful review.
* ELF_ADVANCE_DEST is now safe in the sense that you can use it to
chop parts off the front of the dest area but if you chop more than
is available, the dest area is simply set to be empty, preventing
future accesses.
* We introduce some #defines for memcpy, memset, memmove and strcpy:
- We provide elf_memcpy_safe and elf_memset_safe which take
PTRVALs and do checking on the supplied pointers.
- Users inside libelf must all be changed to either
elf_mem*_unchecked (which are just like mem*), or
elf_mem*_safe (which take PTRVALs) and are checked. Any
unchanged call sites become compilation errors.
* We do _not_ at this time fix elf_access_unsigned so that it doesn't
make unaligned accesses. We hope that unaligned accesses are OK on
every supported architecture. But it does check the supplied
pointer for validity.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v7: Remove a spurious whitespace change.
v5: Use allow_size value from xc_dom_vaddr_to_ptr to set xdest_size
correctly.
If ELF_ADVANCE_DEST advances past the end, mark the elf broken.
Always regard NULL allowable region pointers (e.g. dest_base)
as invalid (since NULL pointers don't point anywhere).
v4: Fix ELF_UNSAFE_PTR to work on 32-bit even when provided 64-bit
values.
Fix xc_dom_load_elf_symtab not to call XC_DOM_PAGE_SIZE
unnecessarily if load is false. This was a regression.
v3.1:
Introduce a change to elf_store_field to undo the effects of
the v3.1 change to the previous patch (the definition there
is not compatible with the new types).
v3: Fix a whitespace error.
v2 was Acked-by: Ian Campbell <ian.campbell@citrix.com>
v2: BUGFIX: elf_strval: Fix loop termination condition to actually work.
BUGFIX: elf_strval: Fix return value to not always be totally wild.
BUGFIX: xc_dom_load_elf_symtab: do proper check for small header size.
xc_dom_load_elf_symtab: narrow scope of `hdr_ptr'.
xc_dom_load_elf_symtab: split out uninit'd symtab.class ref fix.
More comments on the lifetime/validity of elf-> dest ptrs etc.
libelf.h: write "obsolete" out in full
libelf.h: rename "dontuse" to "typeonly" and add doc comment
elf_ptrval_in_range: Document trustedness of arguments.
Style and commit message fixes.
Ian Jackson [Fri, 14 Jun 2013 15:39:36 +0000 (16:39 +0100)]
libelf: check nul-terminated strings properly
It is not safe to simply take pointers into the ELF and use them as C
pointers. They might not be properly nul-terminated (and the pointers
might be wild).
So we are going to introduce a new function elf_strval for safely
getting strings. This will check that the addresses are in range and
that there is a proper nul-terminated string. Of course it might
discover that there isn't. In that case, it will be made to fail.
This means that elf_note_name might fail, too.
For the benefit of call sites which are just going to pass the value
to a printf-like function, we provide elf_strfmt which returns
"(invalid)" on failure rather than NULL.
In this patch we introduce dummy definitions of these functions. We
introduce calls to elf_strval and elf_strfmt everywhere, and update
all the call sites with appropriate error checking.
There is not yet any semantic change, since before this patch all the
places where we introduce elf_strval dereferenced the value anyway, so
it mustn't have been NULL.
In future patches, when elf_strval is made able return NULL, when it
does so it will mark the elf "broken" so that an appropriate
diagnostic can be printed.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v7: Change readnotes.c check to use two if statements rather than ||.
Use the new PTRVAL macros and elf_access_unsigned in
print_l1_mfn_valid_note.
No functional change unless the input is wrong, or we are reading a
file for a different endianness.
Separated out from the previous patch because this change does produce
a difference in the generated code.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
v2: Split out into its own patch.
Ian Jackson [Fri, 14 Jun 2013 15:39:35 +0000 (16:39 +0100)]
libelf: introduce macros for memory access and pointer handling
We introduce a collection of macros which abstract away all the
pointer arithmetic and dereferences used for accessing the input ELF
and the output area(s). We use the new macros everywhere.
For now, these macros are semantically identical to the code they
replace, so this patch has no functional change.
elf_is_elfbinary is an exception: since it doesn't take an elf*, we
need to handle it differently. In a future patch we will change it to
take, and check, a length parameter. For now we just mark it with a
fixme.
That this patch has no functional change can be verified as follows:
0. Copy the scripts "comparison-generate" and "function-filter"
out of this commit message.
1. Check out the tree before this patch.
2. Run the script ../comparison-generate .... ../before
3. Check out the tree after this patch.
4. Run the script ../comparison-generate .... ../after
5. diff --exclude=\*.[soi] -ruN before/ after/ |less
Expect these differences:
* stubdom/zlib-x86_64/ztest*.s2
The filename of this test file apparently contains the pid.
* xen/common/version.s2
The xen build timestamp appears in two diff hunks.
Verification that this is all that's needed:
In a completely built xen.git,
find * -name .*.d -type f | xargs grep -l libelf\.h
Expect results in:
xen/arch/x86: Checked above.
tools/libxc: Checked above.
tools/xcutils/readnotes: Checked above.
tools/xenstore: Checked above.
xen/common/libelf:
This is the build for the hypervisor; checked in B above.
stubdom:
We have one stubdom which reads ELFs using our libelf,
pvgrub, which is checked above.
I have not done this verification for ARM.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v7: Add uintptr_t cast to ELF_UNSAFE_PTR. Still verifies.
Use git foo not git-foo in commit message verification script.
v4: Fix elf_load_binary's phdr message to be correct on 32-bit.
Fix ELF_OBSOLETE_VOIDP_CAST to work on 32-bit.
Indent scripts in commit message.
v3.1:
Change elf_store_field to verify correctly on 32-bit.
comparison-generate copes with Xen 4.1's lack of ./configure.
v2: Use Xen style for multi-line comments.
Postpone changes to readnotes.c:print_l1_mfn_valid_note.
Much improved verification instructions with new script.
Fixed commit message subject.
if [ -f ./configure ]; then
$build_rune_prefix ./configure
fi
$build_rune_prefix make -C xen
$build_rune_prefix make -C tools/include
$build_rune_prefix make -C stubdom grub
$build_rune_prefix make -C tools/libxc
$build_rune_prefix make -C tools/xenstore
$build_rune_prefix make -C tools/xcutils
rm -rf "$result_dir"
mkdir "$result_dir"
set +x
for f in `find xen tools stubdom -name \*.[soi]`; do
mkdir -p "$result_dir"/`dirname $f`
cp $f "$result_dir"/${f}
case $f in
*.s)
../function-filter <$f >"$result_dir"/${f}2
;;
esac
done
echo ok.
-8<-
-8<- function-filter -8<-
#!/usr/bin/perl -w
# function-filter
# script for massaging gcc-generated labels to be consistent
use strict;
our @lines;
my $sedderybody = "sub seddery () {\n";
while (<>) {
push @lines, $_;
if (m/^(__FUNCTION__|__func__)\.(\d+)\:/) {
$sedderybody .= " s/\\b$1\\.$2\\b/__XSA55MANGLED__$1.$./g;\n";
}
}
$sedderybody .= "}\n1;\n";
eval $sedderybody or die $@;
foreach (@lines) {
seddery();
print or die $!;
}
-8<-
Ian Jackson [Fri, 14 Jun 2013 15:39:35 +0000 (16:39 +0100)]
libelf/xc_dom_load_elf_symtab: Do not use "syms" uninitialised
xc_dom_load_elf_symtab (with load==0) calls elf_round_up, but it
mistakenly used the uninitialised variable "syms" when calculating
dom->bsd_symtab_start. This should be a reference to "elf".
This change might have the effect of rounding the value differently.
Previously if the uninitialised value (a single byte on the stack) was
ELFCLASS64 (ie, 2), the alignment would be to 8 bytes, otherwise to 4.
However, the value is calculated from dom->kernel_seg.vend so this
could only make a difference if that value wasn't already aligned to 8
bytes.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
v2: Split this change into its own patch for proper review.
Ian Jackson [Fri, 14 Jun 2013 15:39:35 +0000 (16:39 +0100)]
libelf: move include of <asm/guest_access.h> to top of file
libelf-loader.c #includes <asm/guest_access.h>, when being compiled
for Xen. Currently it does this in the middle of the file.
Move this #include to the top of the file, before libelf-private.h.
This is necessary because in forthcoming patches we will introduce
private #defines of memcpy etc. which would interfere with definitions
in headers #included from guest_access.h.
No semantic or functional change in this patch.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Ian Jackson [Fri, 14 Jun 2013 15:39:34 +0000 (16:39 +0100)]
libelf: abolish elf_sval and elf_access_signed
These are not used anywhere.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Ian Jackson [Fri, 14 Jun 2013 15:39:34 +0000 (16:39 +0100)]
libelf: add `struct elf_binary*' parameter to elf_load_image
The meat of this function is going to need a copy of the elf pointer,
in forthcoming patches.
No functional change in this patch.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Ian Jackson [Fri, 14 Jun 2013 15:39:34 +0000 (16:39 +0100)]
libxc: Fix range checking in xc_dom_pfn_to_ptr etc.
* Ensure that xc_dom_pfn_to_ptr (when called with count==0) does not
return a previously-allocated block which is entirely before the
requested pfn (!)
* Provide a version of xc_dom_pfn_to_ptr, xc_dom_pfn_to_ptr_retcount,
which provides the length of the mapped region via an out parameter.
* Change xc_dom_vaddr_to_ptr to always provide the length of the
mapped region and change the call site in xc_dom_binloader.c to
check it. The call site in xc_dom_load_elf_symtab will be corrected
in a forthcoming patch, and for now ignores the returned length.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
v5: This patch is new in v5 of the series.
Ian Jackson [Fri, 14 Jun 2013 15:39:34 +0000 (16:39 +0100)]
libxc: introduce xc_dom_seg_to_ptr_pages
Provide a version of xc_dom_seg_to_ptr which returns the number of
guest pages it has actually mapped. This is useful for callers who
want to do range checking; we will use this later in this series.
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
v7: xc_dom_seg_to_ptr_pages now always expects pages_out!=NULL.
(It seems silly to have it tolerate NULL when all the real callers
pass non-NULL and there's a version which doesn't need pages_out
anyway. Fix the call in xc_dom_seg_to_ptr to have a dummy pages
for pages_out.)
v5: xc_dom_seg_to_ptr_pages sets *pages_out=0 if it returns NULL.
v4 was:
Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ian Jackson [Fri, 14 Jun 2013 15:39:34 +0000 (16:39 +0100)]
libelf: abolish libelf-relocate.c
This file is not actually used. It's not built in Xen's instance of
libelf; in libxc's it's built but nothing in it is called. Do not
compile it in libxc, and delete it.
This reduces the amount of work we need to do in forthcoming patches
to libelf (particularly since as libelf-relocate.c is not used it is
probably full of bugs).
This is part of the fix to a security issue, XSA-55.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Chuck Anderson <chuck.anderson@oracle.com>
Ian Campbell [Tue, 4 Jun 2013 10:54:10 +0000 (11:54 +0100)]
xen/arm64: fix stack dump in show_trace
On aarch64 the frame pointer points to the next frame pointer and the return
address is the previous stack slot (so below on the downward growing stack,
therefore above in memory):
|<RETURN ADDR> ^addresses grow up
FP -> |<NEXT FP> |
| |
v |
stack grows down.
This is contrary to aarch32 where the frame pointer points to the return
address and the next frame pointer is the next stack slot (so above on the
downward growing stack, below in memory):
FP -> |<RETURN ADDR> ^addresses grow up
|<NEXT FP> |
| |
v |
stack grows down.
In addition print out LR as part of the trace, since it may contain the
penultimate return address e.g. if the ultimate function is a leaf function.
Lastly nuke some unnecessary braces.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
A bit of debugging revealed that the map_domain_page and unmap_domain_page
are meant for short life-time mappings. And that those mappings are finite.
In the 2 VCPU guest we only have 32 entries and once we have exhausted those
we trigger the BUG_ON condition.
The two functions - tmh_persistent_pool_page_[get,put] are used by the xmem_pool
when xmem_pool_[alloc,free] are called. These xmem_pool_* function are wrapped
in macro and functions - the entry points are via: tmem_malloc
and tmem_page_alloc. In both cases the users are in the hypervisor and they
do not seem to suffer from using the hypervisor virtual addresses.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Thu, 13 Jun 2013 08:49:01 +0000 (10:49 +0200)]
x86: fix map_domain_page() last resort fallback
Guests with vCPU count not divisible by 4 have unused bits in the last
word of their inuse bitmap, and the garbage collection code therefore
would get mislead believing that some entries were actually recoverable
for use.
Also use an earlier established local variable in mapcache_vcpu_init()
instead of re-calculating the value (noticed while investigating the
generally better option of setting those overhanging bits once during
setup - this didn't work out in a simple enough fashion because the
mapping getting established there isn't in the current address space,
and hence the bitmap isn't directly accessible there).
Reported-by: Konrad Wilk <konrad.wilk@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 12 Jun 2013 15:27:08 +0000 (17:27 +0200)]
x86: drop setup_idle_pagetable()
With vcpu->domain->arch.perdomain_l3_pg no longer getting set up for
the idle domain, this creates an invalid L4 entry (due to translating
a NULL struct page_info pointer to a physical address).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Dario Faggioli [Fri, 7 Jun 2013 16:14:35 +0000 (18:14 +0200)]
libxl: add LIBXL_HAVE_<foo> for outstanding_pages and outstanding_memkb
Commits d0782481 ("xl: export 'outstanding_pages' value from xcinfo")
and bec8f17e ("xen: Remove the XENMEM_get_oustanding_pages and provide
the data via xc_phys_info") added these two fields in libxl_physinfo
and in libxl_dominfo, respectively, but did not include the needed
LIBXL_HAVE_<foo> runes. Adding them.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Daniel De Graaf [Thu, 30 May 2013 12:57:22 +0000 (08:57 -0400)]
flask/policy: device model stubdom fixes
This fixes framebuffer support for device model stubdoms after 3f28d007
which added the target_hack permission but did not allow the permission
to the stubdom it was created for.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Wed, 12 Jun 2013 08:07:48 +0000 (10:07 +0200)]
io/ring.h: drop unused and broken *_RING_ATTACH() macros
Initializing r*_prod_pvt and r*_cons from independent shared ring
fields is broken, as other macros in this header rely on them being
coupled. Furthermore using the backend variant would also imply a
security vulnerability.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Wed, 12 Jun 2013 08:07:20 +0000 (10:07 +0200)]
io/ring.h: new macro to detect whether there are too many requests on the ring
Backends may need to protect themselves against an insane number of
produced requests stored by a frontend, in case they iterate over
requests until reaching the req_prod value. There can't be more
requests on the ring than the difference between produced requests
and produced (but possibly not yet published) responses.
This is a more strict alternative to a patch previously posted by
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Roger Pau Monné [Wed, 12 Jun 2013 08:05:39 +0000 (10:05 +0200)]
x86/HVM: fix initialization of wallclock time for PVHVM on migration
Call update_domain_wallclock_time on hvm_latch_shinfo_size even if
the bitness of the guest has already been set, this fixes the problem
with the wallclock not being set for PVHVM guests on resume from
migration.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Clean up the resulting code and retain the (slightly adjusted) original
comment.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Zhenguo Wang [Tue, 11 Jun 2013 07:45:55 +0000 (09:45 +0200)]
x86/HVM: fix x2APIC APIC_ID read emulation
APIC and x2APIC have different format for APIC_ID register. Need
translation.
Signed-off-by: Zhenguo Wang <wangzhenguo@huawei.com> Signed-off-by: Xiaowei Yang <xiaowei.yang@huawei.com>
Convert code to use switch(), fixing coding style issue at once, and
use GET_xAPIC_ID() on the value read instead of VLAPIC_ID() (reading
the field again).
In the course of this also properly reject both read and writes on the
non-existing MSR corresponding to APIC_ICR2.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
This was never reported to be hit, and we assume to have taken care of
the problem by excluding legacy IRQs from the IRQ move cleanup logic.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Wed, 5 Jun 2013 08:05:49 +0000 (10:05 +0200)]
AMD/IOMMU: revert "SR56x0 Erratum 64 - Reset all head & tail pointers"
The code this patch added is redundant with already present code in
set_iommu_{command_buffer,{event,ppr}_log}_control(). Just switch those
ones from using writel() to writeq() for consistency.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Roger Pau Monné [Wed, 5 Jun 2013 08:03:08 +0000 (10:03 +0200)]
x86/vtsc: update vcpu_time in hvm_set_guest_time
When using a vtsc, hvm_set_guest_time changes hvm_vcpu.stime_offset,
which is used in the vcpu time structure to calculate the
tsc_timestamp, so after updating stime_offset we need to propagate the
change to vcpu_time in order for the guest to get the right time if
using the PV clock.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Tue, 4 Jun 2013 15:25:41 +0000 (17:25 +0200)]
x86: fix XCR0 handling
- both VMX and SVM ignored the ECX input to XSETBV
- both SVM and VMX used the full 64-bit RAX when calculating the input
mask to XSETBV
- faults on XSETBV did not get recovered from
Also consolidate the handling for PV and HVM into a single function,
and make the per-CPU variable "xcr0" static to xstate.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Tue, 4 Jun 2013 15:23:11 +0000 (17:23 +0200)]
x86: preserve FPU selectors for 32-bit guest code
Doing {,F}X{SAVE,RSTOR} unconditionally with 64-bit operand size leads
to the selector values associated with the last instruction/data
pointers getting lost. This, besides being inconsistent and not
compatible with native hardware behavior especially for 32-bit guests,
leads to bug checks in 32-bit Windows when running with "Driver
verifier" (see e.g. http://support.microsoft.com/kb/244617).
In a first try I made the code figure out the current guest mode, but
that has the disadvantage of only taking care of the issue when the
guest executes in the mode for which the state currently is (i.e.
namely not in the case of a 64-bit guest running a 32-bit application,
but being in kernle [64-bit] mode).
The solution presented here is to conditionally execute an auxiliary
FNSTENV and use the selectors from there.
In either case the determined word size gets stored in the last byte
of the FPU/SSE save area, which is available for software use (and I
verified is being cleared to zero by all versions of Xen, i.e. will not
present a problem when migrating guests from older to newer hosts), and
evaluated for determining the operand size {,F}XRSTOR is to be issued
with.
Note that I did check whether using a second FXSAVE or a partial second
XSAVE would be faster than FNSTENV - neither on my Westmere (FXSAVE)
nor on my Romley (XSAVE) they are.
I was really tempted to use branches into the middle of instructions
(i.e. past the REX64 prefixes) here, as that would have allowed to
collapse the otherwise identical fault recovery blocks. I stayed away
from doing so just because I expect others to dislike such slightly
subtle/tricky code.
Reported-by: Ben Guthro <Benjamin.Guthro@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>