Jan Beulich [Tue, 12 Aug 2014 13:44:26 +0000 (15:44 +0200)]
x86/paging: make log-dirty operations preemptible
Both the freeing and the inspection of the bitmap get done in (nested)
loops which - besides having a rather high iteration count in general,
albeit that would be covered by XSA-77 - have the number of non-trivial
iterations they need to perform (indirectly) controllable by both the
guest they are for and any domain controlling the guest (including the
one running qemu for it).
This is CVE-2014-5146 / XSA-97.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 95e6d82224689fdfd967a093a4d69efc24c17e91
master date: 2014-08-12 15:30:11 +0200
This is a patch repairing a regression in code previously functional in 4.1.x.
It appears that, during some refactoring work, call to hvm_memory_event_cr0 was lost.
This function was originally called in mov_to_cr() of vmx.c, but the commit
http://xenbits.xen.org/hg/xen-unstable.hg/rev/1276926e3795 abstracted the
original code into generic functions up a level in hvm.c, dropping the call
in the process.
The same issue affected the CR3 and CR4 events, which were fixed in patch
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/7ab899e46347.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 5d570c1d0274cac3b333ef378af3325b3b69905e
master date: 2014-07-23 18:05:11 +0200
avoid crash when doing shutdown with active cpupools
When shutting down the machine while there are cpus in a cpupool other than
Pool-0 a crash is triggered due to cpupool handling rejecting offlining the
non-boot cpus in other cpupools.
It is easy to detect this case and allow offlining those cpus.
Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Stefan Bader <stefan.bader@canonical.com>
master commit: 05377dede434c746e6708f055858378d20f619db
master date: 2014-07-23 18:03:19 +0200
For safety reasons, c/s 6ae2df93c27 "mem_access: Add helper API to setup
ring and enable mem_access" has to pause the domain while it performs a set of
operations.
However without properly reference counted hypercalls, xc_mem_event_enable()
now unconditionally unpauses a previously paused domain.
To prevent toolstack software running wild, there is an arbitrary limit of 255
on the toolstack pause count. This is high enough for several components of
the toolstack to safely use, but prevents over/underflow of d->pause_count.
The previous domain_{,un}pause_by_systemcontroller() functions are updated to
return an error code. domain_pause_by_systemcontroller() is modified to have
a common stub and take a pause_fn pointer, allowing for both sync and nosync
domain pauses. domain_pause_for_debugger() has a hand-rolled nosync pause
replaced with the new domain_pause_by_systemcontroller_nosync(), and has its
variables shuffled slightly to avoid rereading current multiple times.
Suggested-by: Don Slutz <dslutz@verizon.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
With a couple of formatting adjustments: Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/gdbsx: invert preconditions for XEN_DOMCTL_gdbsx_{,un}pausevcpu hypercalls
c/s 3eb1c708ab "properly reference count DOMCTL_{,un}pausedomain hypercalls"
accidentally inverted the use of d->controller_pause_count.
Revert back to how it was originally, i.e. the XEN_DOMCTL_gdbsx_{,un}pausevcpu
hypercalls are only valid for a domain already paused by the system controller.
Jan Beulich [Mon, 28 Jul 2014 12:50:45 +0000 (14:50 +0200)]
VT-d/ATS: correct and clean up dev_invalidate_iotlb()
While this was intended to only do cleanup (replace the two bogus
"ret |= " constructs, and a simple formatting correction), this now
also
- fixes the bit manipulations for size_order > 0
a) correct an off-by-one in the use of size_order for shifting (till
now double the requested size got invalidated)
b) in fact setting bit 12 and up if necessary (without which too
small a region might have got invalidated)
c) making them capable of dealing with regions of 4Gb size and up
- corrects the return value handling, such that a later iteration's
success won't clear an earlier iteration's error indication
- uses PCI_BDF2() instead of open coding it
- bail immediately on bad passed in invalidation type, rather than
repeatedly printing the same message for each ATS-capable device, at
once also no longer hiding that failure from the caller
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: fd33987ba27607c3cc7da258cf1d86d21beeb735
master date: 2014-06-30 15:57:40 +0200
This causes Xen to accept the more generic names specified in
http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions/Multiboot as of
2014-06-06.
These names are more generic than those proposed by Andre in
http://thread.gmane.org/gmane.linux.linaro.announce.boot/326 and those
used in earlier drafts of the /Multiboot wiki page.
This will allow bootloaders to not special case Xen (or at least to reduce
the amount which is required).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit a860dfeec090fe46d856b5d3fc6da28ccf7d1ba5)
Ian Campbell [Thu, 22 May 2014 09:46:37 +0000 (10:46 +0100)]
tools: arm: report an error if the guest RAM is too large
Due to the layout of the guest physical address space we cannot support more
than 768M of RAM before overrunning the area set aside for the grant table. Due
to the presence of the magic pages at the end of the RAM region guests are
actually limited to 767M.
Catch this case during domain build and fail gracefully instead of obscurely
later on.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 5a959f44ed03398870b6ec0dfebb59dcd5981f94)
Andrew Cooper [Wed, 18 Jun 2014 18:04:14 +0000 (19:04 +0100)]
tools/libxl: Fix free() of wild pointer in libxl__initiate_device_remove()
libxl__initiate_device_remove() had a preexisting error path issue where
libxl_dominfo_dispose() could be called on a libxl_dominfo object before it
had been initialised with libxl_dominfo_init().
This was safe until c/s ab44401 added the pointer ssid_label, which point
libxl_dominfo_dispose() free()s.
Unconditionally initialise info in libxl__initiate_device_remove() before
taking an error path which will free it.
Coverity-ID: 1223212 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit ddb4aa5dfa13781e8f31ba20923c14c1a083ce83)
Dario Faggioli [Fri, 20 Jun 2014 14:09:00 +0000 (16:09 +0200)]
blktap2: Fix two 'maybe uninitialized' variables
for which gcc 4.9.0 complains about, like this:
block-qcow.c: In function `get_cluster_offset':
block-qcow.c:431:3: error: `tmp_ptr' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
memcpy(tmp_ptr, l1_ptr, 4096);
^
block-qcow.c:606:7: error: `tmp_ptr2' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
if (write(s->fd, tmp_ptr2, 4096) != 4096) {
^
cc1: all warnings being treated as errors
/home/dario/Sources/xen/xen/xen.git/tools/blktap2/drivers/../../../tools/Rules.mk:89:
recipe for target 'block-qcow.o' failed
make[5]: *** [block-qcow.o] Error 1
The proper behavior is to return upon allocation failure.
About what to return, 0 seems the best option, looking
at both the function and the call sites.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 345e44a85d71a1a910385f33c7f1ba3683026d18)
Ian Campbell [Thu, 26 Jun 2014 08:53:42 +0000 (09:53 +0100)]
xen: arm: take FIQ exceptions to Xen not guest by setting HCR_EL2.FMO
As with HCR_EL2.{IMO,AMO} we want to route FIQs to Xen not the guest. See ARM
ARM DDI 0406C.b B1.8.4.
So far none of the platforms which we support use FIQ for anything, but when we
end up supporting one it would be far better to surprise Xen with them than
whatever guest happens to be running...
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 4bb74e39987b428429c2aacad7f59356d4942e39)
xen/arm: Implement a dummy debug monitor for ARM32
XSA-93 (commit 0b18220 "xen/arm: Don't let guess access to Debug and Performance
Monitors registers") disable Debug Registers access.
When CONFIG_PERF_EVENTS is enabled in the Linux Kernel, it will try to
initialize the debug monitors. If an error occured Linux won't use this
feature.
The implementation made Xen expose a minimal set of registers which let think
the guest (i.e.) thinks HW debug won't work.
Signed-off-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- s/DBGCR/DBGBCR/ to use correct register name ] Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 68c69978352adb5ab7c06598056f9eb88d7d6031)
[ ijc -- s/is_32bit_domain/is_pv32_domain/ ]
xen/arm: Implement a dummy Performance Monitor for ARM32
XSA-93 (commit 0b18220 "xen/arm: Don't let guess access to Debug and Performance
Monitor registers") disable Performance Monitor.
When CONFIG_PERF_EVENTS is enabled in the Linux Kernel, regardless the
ID_DFR0 (which tell if Perfomance Monitors Extension is implemented) the
kernel will try to access to PMCR.
Therefore we tell the guest we have 0 counters. Unfortunately we must always
support PMCCNTR (the cycle counter): we just RAZ/WI for all PM register,
which doesn't crash the kernel at least.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit aa0d443718372b46c432af7cb6274050cda32fc6)
Julien Grall [Tue, 17 Jun 2014 20:44:28 +0000 (21:44 +0100)]
xen/arm: Panic when we receive an unexpected trap
The current implementation of do_unexpected_trap make Xen spin forever
on the current physical CPU. This may lead to stall guests VCPU and print
unhelpful message (RCU stall...).
Usually when Xen receives an unexpected trap, it means that something goes
wrong either in the hypervisor or in the CPU. In this case we should
directly panic to also stop the other CPUs.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 4f5ab681d208993f94553203f4be323b3c929070)
David Vrabel [Tue, 24 Jun 2014 12:22:54 +0000 (14:22 +0200)]
x86/nmi: be less verbose when testing the NMI watchdog
There's no need to print all the CPUs that are ok, only the ones that
got stuck.
The resulting output is either:
Testing NMI watchdog on all CPUs: 1 4 6 stuck
or
Testing NMI watchdog on all CPUs: ok
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: f64b1901564b6206dbbe946699619fcd22446de8
master date: 2014-05-15 15:32:36 +0200
Jan Beulich [Tue, 24 Jun 2014 07:43:31 +0000 (09:43 +0200)]
VT-d/qinval: make local variable used for communication with IOMMU "volatile"
Without that there is - afaict - nothing preventing the compiler from
putting the variable into a register for the duration of the wait loop.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: ceec46c02074e1b2ade0b13c3c4a2f3942ae698c
master date: 2014-06-20 10:25:33 +0200
Jan Beulich [Tue, 24 Jun 2014 07:42:49 +0000 (09:42 +0200)]
x86/EFI: allow FPU/XMM use in runtime service functions
UEFI spec update 2.4B developed a requirement to enter runtime service
functions with CR0.TS (and CR0.EM) clear, thus making feasible the
already previously stated permission for these functions to use some of
the XMM registers. Enforce this requirement (along with the connected
ones on FPU control word and MXCSR) by going through a full FPU save
cycle (if the FPU was dirty) in efi_rs_enter() (along with loading the
specified values into the other two registers).
Note that the UEFI spec mandates that extension registers other than
XMM ones (for our purposes all that get restored eagerly) are preserved
across runtime function calls, hence there's nothing we need to restore
in efi_rs_leave() (they do get saved, but just for simplicity's sake).
Malcolm Crossley [Tue, 24 Jun 2014 07:41:45 +0000 (09:41 +0200)]
IOMMU: prevent VT-d device IOTLB operations on wrong IOMMU
PCIe ATS allows for devices to contain IOTLBs, the VT-d code was iterating
around all ATS capable devices and issuing IOTLB operations for all IOMMUs,
even though each ATS device is only accessible via one particular IOMMU.
Issuing an IOMMU operation to a device not accessible via that IOMMU results
in an IOMMU timeout because the device does not reply. VT-d IOMMU timeouts
result in a Xen panic.
Therefore this bug prevents any Intel system with 2 or more ATS enabled IOMMUs,
each with an ATS device connected to them, from booting Xen.
The patch adds a IOMMU pointer to the ATS device struct so the VT-d code can
ensure it does not issue IOMMU ATS operations on the wrong IOMMU. A void
pointer has to be used because AMD and Intel IOMMU implementations do not have
a common IOMMU structure or indexing mechanism.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 84c340ba4c3eb99278b6ba885616bb183b88ad67
master date: 2014-06-18 15:50:02 +0200
x86/mce: don't spam the console with "CPUx: Temperature z"
If the machine has been quite busy it ends up with these messages
printed on the hypervisor console:
(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature above threshold
(XEN) CPU0: Running in modulated clock mode
(XEN) CPU1: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
While the state changes are important, the non-altered state
information is not needed. As such add a latch mechanism to only print
the information if it has changed since the last update (and the
hardware doesn't properly suppress redundant notifications).
This was observed on Intel DQ67SW,
BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christoph Egger <chegger@amazon.de>
master commit: 323338f86fb6cd6f6dba4f59a84eed71b3552d21
master date: 2014-06-16 11:59:32 +0200
Jan Beulich [Tue, 24 Jun 2014 07:40:01 +0000 (09:40 +0200)]
x86/HVM: refine SMEP test in HVM_CR4_GUEST_RESERVED_BITS()
Andrew validly points out that the use of the macro on the restore path
can't rely on the CPUID bits for the guest already being in place (as
their setting by the tool stack in turn requires the other restore
operations already having taken place). And even worse, using
hvm_cpuid() is invalid here because that function assumes to be used in
the context of the vCPU in question.
Reverting to the behavior prior to the change from checking
cpu_has_sm?p to hvm_vcpu_has_sm?p() would break the other (non-restore)
use of the macro. So let's revert to the prior behavior only for the
restore path, by adding a respective second parameter to the macro.
Obviously the two cpu_has_* uses in the macro should really also be
converted to hvm_cpuid() based checks at least for the non-restore
path.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: David Vrabel <david.vrabel@citrix.com>
master commit: 584287380baf81e5acdd9dc7dfc7ffccd1e9a856
master date: 2014-06-10 13:12:05 +0200
Juergen Gross [Tue, 24 Jun 2014 07:38:48 +0000 (09:38 +0200)]
avoid crash on HVM domain destroy with PCI passthrough
c/s bac6334b5 "move domain to cpupool0 before destroying it" introduced a
problem when destroying a HVM domain with PCI passthrough enabled. The
moving of the domain to cpupool0 includes moving the pirqs to the cpupool0
cpus, but the event channel infrastructure already is unusable for the
domain. So just avoid moving pirqs for dying domains.
Roger Pau Monné [Tue, 24 Jun 2014 07:37:37 +0000 (09:37 +0200)]
x86: fix reboot/shutdown with running HVM guests
If there's a guest using VMX/SVM when the hypervisor shuts down, it
can lead to the following crash due to VMX/SVM functions being called
after hvm_cpu_down has been called. In order to prevent that, check in
{svm/vmx}_ctxt_switch_from that the cpu virtualization extensions are
still enabled.
Andrew Cooper [Tue, 24 Jun 2014 07:36:49 +0000 (09:36 +0200)]
x86/domctl: two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate
Interacting with the vcpu itself should be protected by vcpu_pause().
Buggy/naive toolstacks might encounter adverse interaction with a vcpu context
switch, or increase of xcr0_accum. There are no much problems with current
in-tree code.
Explicitly permit a NULL guest handle as being a request for size. It is the
prevailing Xen style, and without it, valgrind's ioctl handler is unable to
determine whether evc->buffer actually got written to.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
# Commit 895661ae98f0249f50280b4acfb9dda70b76d7e9
# Date 2014-06-10 12:03:16 +0200
# Author Andrew Cooper <andrew.cooper3@citrix.com>
# Committer Jan Beulich <jbeulich@suse.com>
x86/domctl: further fix to XEN_DOMCTL_[gs]etvcpuextstate
Do not clobber errors from certain codepaths. Clobbering of -EINVAL from
failing "evc->size <= PV_XSAVE_SIZE(_xcr0_accum)" was a pre-existing bug.
However, clobbering -EINVAL/-EFAULT from the get codepath was a bug
unintentionally introduced by 090ca8c1 "x86/domctl: two functional fixes to
XEN_DOMCTL_[gs]etvcpuextstate".
Jan Beulich [Tue, 24 Jun 2014 07:34:57 +0000 (09:34 +0200)]
VT-d: honor APEI firmware-first mode in XSA-59 workaround code
When firmware-first mode is being indicated by firmware, we shouldn't
be modifying AER registers - these are considered to be owned by
firmware in that case. Violating this is being reported to result in
SMI storms. While circumventing the workaround means re-exposing
affected hosts to the XSA-59 issues, this in any event seems better
than not booting at all. Respective messages are being issued to the
log, so the situation can be diagnosed.
The basic building blocks were taken from Linux 3.15-rc. Note that
this includes a block of code enclosed in #ifdef CONFIG_X86_MCE - we
don't define that symbol, and that code also wouldn't build without
suitable machine check side code added; that should happen eventually,
but isn't subject of this change.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 1cc37ba8dbd89fb86dad3f6c78c3fba06019fe21
master date: 2014-06-05 17:49:14 +0200
Mukesh Rathor [Tue, 24 Jun 2014 07:33:18 +0000 (09:33 +0200)]
x86/PVH: avoid call to handle_mmio
handle_mmio() is currently unsafe for pvh guests. A call to it would
result in call to vioapic_range that will crash xen since the vioapic
ptr in struct hvm_domain is not initialized for pvh guests.
However, one path exists for such a call. If a pvh guest, dom0 or domU,
unintentionally touches non-existing memory, an EPT violation would occur.
This would result in unconditional call to hvm_hap_nested_page_fault. In
that function, because get_gfn_type_access returns p2m_mmio_dm for non
existing mfns by default, handle_mmio() will get called. This would result
in xen crash instead of the guest crash. This patch addresses that.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Use < instead of <= (which I wrongly suggested), return -ENODATA
instead of -EINVAL, and make description match code.
A failure would result in log message like so-
(XEN) microcode: CPU0 update from revision 0x6000637 to 0x6000626 failed
^^^^^^^^^^^^^^^^^^^^^^
The above message has the revision numbers inverted. Fix this.
Julien Grall [Wed, 19 Mar 2014 15:43:38 +0000 (15:43 +0000)]
xen/arm: Use p2m_restore_state in construct_dom0
The address translation functions used while building dom0 rely on certain EL1
state being configured. In particular they are subject to the behaviour of
SCTLR_EL1.M (stage 1 MMU enabled).
The Xen (and Linux) boot protocol require that the kernel be entered with the
MMU disabled but they don't say anything explicitly about exception levels
other than the one which is active when entering the kernels. Arguably the
protocol could be said to apply to all exception levels but in any case we
should cope with this and setup the EL1 state as necessary.
Fu Wei discovered this when booting Xen from grub.efi over UEFI, it's not
clear whether grub or UEFI is responsible for leaving stage 1 MMU enabled.
Use directly the newly created function p2m_restore_state to retrieve a
correct EL1 state to translate an address.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Reported-by: Fu Wei <fu.wei@linaro.org> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit d6dd3a9ae7adead322e8ce96f83db96dce64c982)
[ ijc -- adjusted because this and 278283cd0b81 were backported in the opposite
order from their application to staging. The result is as if they had
been backported in the correct order. ]
Julien Grall [Wed, 19 Mar 2014 15:43:37 +0000 (15:43 +0000)]
xen/arm: Move p2m context save/restore in a separate function
Introduce p2m_{save,restore}_state to save/restore p2m context.
The both functions will take care of:
- VTTBR: contains the pointer to the domain P2M
- Update HCR_RW if the domain is 64 bit
- SCTLR: contains bit to know if the MMU is enabled or not
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 84ca4629d0aa71dc45c969f625d069373fb88828)
[ ijc -- s/is_32bit_domain/is_pv32_domain ]
Ian Campbell [Wed, 4 Jun 2014 13:58:58 +0000 (14:58 +0100)]
xen: arm: ensure we hold a reference to guest pages while we copy to/from them
This at once:
- prevents the page from being reassigned under our feet
- ensures that the domain owns the page, which stops a domain from giving a
grant mapping, MMIO region, other non-RAM as a hypercall input/output.
We need to hold the p2m lock while doing the lookup until we have the
reference.
This also requires that during domain 0 building current is set to an actual
dom0 vcpu, so take care of this at the same time as the p2m is temporarily
loaded.
Lastly when dumping the guest stack we need to make sure that the guest hasn't
pointed its sp off into the weeds and/or misaligned it, which could lead to
hypervisor traps. Solve this by using the new function and checking alignment
first.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- backported to 4.4, using p2m_load_VTTBR ]
Ian Campbell [Wed, 4 Jun 2014 13:58:56 +0000 (14:58 +0100)]
xen: arm: check permissions when copying to/from guest virtual addresses
In particular we need to make sure the guest has write permissions to buffers
which it passes as output buffers for hypercalls, otherwise the guest can
overwrite memory which it shouldn't be able to write (like r/o grant table
mappings).
This is XSA-98.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org>
Jan Beulich [Tue, 3 Jun 2014 14:09:55 +0000 (16:09 +0200)]
x86/HVM: eliminate vulnerabilities from hvm_inject_msi()
- pirq_info() returns NULL for a non-allocated pIRQ, and hence we
mustn't unconditionally de-reference it, and we need to invoke it
another time after having called map_domain_emuirq_pirq()
- don't use printk(), namely without XENLOG_GUEST, for error reporting
Ross Lagerwall [Tue, 3 Jun 2014 10:12:43 +0000 (12:12 +0200)]
timers: set the deadline more accurately
Program the timer to the deadline of the closest timer if it is further
than 50us ahead, otherwise set it 50us ahead. This way a single event
fires on time rather than 50us late (as it would have previously) while
still preventing too many timer wakeups in the case of having many
timers scheduled close together.
Jan Beulich [Tue, 3 Jun 2014 10:12:08 +0000 (12:12 +0200)]
x86: don't use VA for cache flush when also flushing TLB
Doing both flushes at once is a strong indication for the address
mapping to either having got dropped (in which case the cache flush,
when done via INVLPG, would fault) or its physical address having
changed (in which case the cache flush would end up being done on the
wrong address range). There is no adverse effect (other than the
obvious performance one) using WBINVD in this case regardless of the
range's size; only map_pages_to_xen() uses combined flushes at present.
This problem was observed with the 2nd try backport of d6cb14b3 ("VT-d:
suppress UR signaling for desktop chipsets") to 4.2 (where ioremap()
needs to be replaced with set_fixmap_nocache(); the now commented out
__set_fixmap(, 0, 0) there to undo the mapping resulted in the first of
the above two scenarios).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 50df6f7429f73364bbddb0970a3a34faa01a7790
master date: 2014-05-28 09:51:07 +0200
Jan Beulich [Tue, 3 Jun 2014 10:11:29 +0000 (12:11 +0200)]
AMD IOMMU: don't free page table prematurely
iommu_merge_pages() still wants to look at the next level page table,
the TLB flush necessary before freeing too happens in that function,
and if it fails no free should happen at all. Hence the freeing must
be done after that function returned successfully, not before it's
being called.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 6b4d71d028f445cba7426a144751fddc8bfdd67b
master date: 2014-05-28 09:50:33 +0200
Jan Beulich [Tue, 3 Jun 2014 10:10:36 +0000 (12:10 +0200)]
VT-d: fix mask applied to DMIBAR in desktop chipset XSA-59 workaround
In commit ("VT-d: suppress UR signaling for desktop chipsets")
the mask applied to the value read from DMIBAR is to narrow, only the
comment accompanying it was correct. Fix that and tag the literal
number as "long" at once to avoid eventual compiler warnings.
The widest possible value so far is 39 bits; all chipsets covered here
but having less than this number of bits have the remaining bits marked
reserved (zero), and hence there's no need for making the mask chipset
specific.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: f8ecf31c31906552522c2a1b0d1cada07d78876e
master date: 2014-05-26 12:28:46 +0200
Jan Beulich [Tue, 3 Jun 2014 10:09:27 +0000 (12:09 +0200)]
ACPI/ERST: fix table mapping
acpi_get_table(), when executed before reaching SYS_STATE_active, will
return a mapping valid only until the next invocation of that funciton.
Consequently storing the returned pointer for later use is incorrect.
Copy the logic used in VT-d's DMAR handling.
Juergen Gross [Fri, 23 May 2014 13:20:02 +0000 (15:20 +0200)]
move domain to cpupool0 before destroying it
Currently when a domain is destroyed it is removed from the domain_list
before all of it's resources, including the cpupool membership, are freed.
This can lead to a situation where the domain is still member of a cpupool
without for_each_domain_in_cpupool() (or even for_each_domain()) being
able to find it any more. This in turn can result in rejection of removing
the last cpu from a cpupool, because there seems to be still a domain in
the cpupool, even if it can't be found by scanning through all domains.
This situation can be avoided by moving the domain to be destroyed to
cpupool0 first and then remove it from this cpupool BEFORE deleting it from
the domain_list. As cpupool0 is always active and a domain without any cpupool
membership is implicitly regarded as belonging to cpupool0, this poses no
problem.
Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
master date: 2014-05-20 15:55:42 +0200
Jan Beulich [Fri, 23 May 2014 13:19:19 +0000 (15:19 +0200)]
VT-d: extend error report masking workaround to newer chipsets
Add two more PCI IDs to the set that has been taken care of with a
different workaround long before XSA-59, and (for constency with the
newer workarounds) log a message here too.
Also move the function wide comment to the cases it applies to; this
should really have been done by d061d200 ("VT-d: suppress UR signaling
for server chipsets").
This is CVE-2013-3495 / XSA-59.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 04734664eb20c3bf239e473af182bb7ab901d779
master date: 2014-05-20 15:54:01 +0200
Jan Beulich [Fri, 23 May 2014 13:18:44 +0000 (15:18 +0200)]
VT-d: apply quirks at device setup time rather than only at boot
Accessing extended config space may not be possible at boot time, e.g.
when the memory space used by MMCFG is reserved only via ACPI tables,
but not in the E820/UEFI memory maps (which we need Dom0 to tell us
about). Consequently the change here still leaves the issue unaddressed
for systems where the extended config space remains inaccessible (due
to firmware bugs, i.e. not properly reserving the address space of
those regions).
With the respective messages now potentially getting logged more than
once, we ought to consider whether we should issue them only if we in
fact were required to do any masking (i.e. if the relevant mask bits
weren't already set).
This is CVE-2013-3495 / XSA-59.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 5786718fbaafbe47d72cc1512cd93de79b8fc2fa
master date: 2014-05-20 15:53:20 +0200
Kai Huang [Fri, 23 May 2014 13:17:56 +0000 (15:17 +0200)]
x86/MCE: bypass uninitialized vcpu in vMCE injection
Dom0 may bring up less number of vCPUs than xen hypervisor actually created for
it, and in this case, on Intel platform, vMCE injection to dom0 will fail due to
injecting vMCE to uninitialized vcpu, and cause dom0 crash.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Acked-by: Christoph Egger <chegger@amazon.de>
master commit: a07084525c126c596326dc1442dd218f522f51b4
master date: 2014-05-14 10:54:39 +0200
Edmund H White [Fri, 23 May 2014 13:17:21 +0000 (15:17 +0200)]
Nested VMX: load current_vmcs only when it exists
There may not have valid vmcs on current CPU. So only load it when it exists.
This original fixing is from Edmud <edmund.h.white@intel.com>.
Signed-off-by: Edmund H White <edmund.h.white@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 99c03bc6a1f8c6722926d2db781ece045f9d09ae
master date: 2014-05-12 11:59:19 +0200
Boris Ostrovsky [Fri, 23 May 2014 13:14:23 +0000 (15:14 +0200)]
x86: use native RDTSC(P) execution when guest and host frequencies are the same
We should be able to continue using native RDTSC(P) execution on
HVM/PVH guests after migration if host and guest frequencies are
equal (this includes the case when the frequencies are made equal
by TSC scaling feature).
This also allows us to revert main part of commit 4aab59a3 (svm: Do not
intercept RDTSC(P) when TSC scaling is supported by hardware) which
was wrong: while RDTSC intercepts were disabled domain's vtsc could
still be set, leading to inconsistent view of guest's TSC.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 82713ec8d2b65d17f13e46a131e38bfe5baf8bd6
master date: 2014-04-22 12:07:37 +0200
Tim Deegan [Fri, 23 May 2014 13:11:51 +0000 (15:11 +0200)]
x86/hvm/rtc: always deassert the IRQ line when clearing REG_C.IRQF
Even in no-ack mode, there's no reason to leave the line asserted
after an explicit ack of the interrupt.
Furthermore, rtc_update_irq() is an unconditional noop having just cleared
REG_C.
Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6d27a537727ca933bfef8ba01bc65847dc97cee1
master date: 2014-02-25 09:30:21 +0100
Tim Deegan [Fri, 23 May 2014 13:10:42 +0000 (15:10 +0200)]
x86/hvm/rtc: inject RTC periodic interupts from the vpt code
Let the vpt code drive the RTC's timer interrupts directly, as it does
for other periodic time sources, and fix up the register state in a
vpt callback when the interrupt is injected.
This fixes a hang seen on Windows 2003 in no-missed-ticks mode, where
when a tick was pending, the early callback from the VPT code would
always set REG_C.PF on every VMENTER; meanwhile the guest was in its
interrupt handler reading REG_C in a loop and waiting to see it clear.
One drawback is that a guest that attempts to suppress RTC periodic
interrupts by failing to read REG_C will receive up to 10 spurious
interrupts, even in 'strict' mode. However:
- since all previous RTC models have had this property (including
the current one, since 'no-ack' mode is hard-coded on) we're
pretty sure that all guests can handle this; and
- we're already playing some other interesting games with this
interrupt in the vpt code.
One other corner case: a guest that enables the PF timer interrupt,
masks the interupt in the APIC and then polls REG_C looking for PF
will not see PF getting set. The more likely case of enabling the
timers and masking the interrupt with REG_B.PIE is already handled
correctly.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c7e35c6ec705d777c0a11124ec28876f1468f2c5
master date: 2014-02-25 09:29:26 +0100
Tim Deegan [Fri, 23 May 2014 13:08:48 +0000 (15:08 +0200)]
x86/hvm/rtc: don't run the vpt timer when !REG_B.PIE
If the guest has not asked for interrupts, don't run the vpt timer
to generate them. This is a prerequisite for a patch to simplify how
the vpt interacts with the RTC, and also gets rid of a timer series in
Xen in a case where it's unlikely to be needed.
Instead, calculate the correct value for REG_C.PF whenever REG_C is
read or PIE is enabled. This allow a guest to poll for the PF bit
while not asking for actual timer interrupts. Such a guest would no
longer get the benefit of the vpt's timer modes.
Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 4c15a82f034c9c2213a18b6320834f3906d00ba9
master date: 2014-02-25 09:26:45 +0100
Ian Campbell [Thu, 8 May 2014 15:13:55 +0000 (16:13 +0100)]
xen: arm: bitops take unsigned int
Xen bitmaps can be 4 rather than 8 byte aligned, so use the appropriate type.
Otherwise the compiler can generate unaligned 8 byte accesses and cause traps.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
(cherry picked from commit cd338e967c598bf747b03dcfd9d8d45dc40bac1a)
Ian Campbell [Thu, 17 Apr 2014 12:57:24 +0000 (13:57 +0100)]
xen: arm: fully implement multicall interface.
I'm not sure what I was smoking at the time of 5d74ad1a082e "xen: arm:
implement do_multicall_call for both 32 and 64-bit" but it is obviously
insufficient since it doesn't actually wire up the hypercall.
Before doing so we need to make the usual adjustments for ARM and turn the
unsigned longs into xen_ulong_t. There is no difference in the resulting
structure for x86.
There are knock on changes to the trace interface, but again they are nops on
x86.
For 32-bit ARM guests we require that the arguments which they pass to a
hypercall via a multicall do not use the upper bits of xen_ulong_t and kill
them if they violate this. This should ensure that no ABI surprises can be
silently lurking when running on a 32-bit hypervisor waiting to pounce when the
same kernel is run on a 64-bit hypervisor. Killing the guest is harsh but it
will be far easier to relax the restriction if it turns out to cause problems
than to tighten it up if we were lax to begin with.
In the interests of clarity and always using explicitly sized types change the
unsigned int in the hypercall arguments to a uint32_t. There is no actual
change here on any platform.
We should consider backporting this to 4.4.1 in case a guest decides they want
to use a multicall in common code e.g. I suggested such a thing while
reviewing a netback change recently.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: keir@xen.org Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit f0dbdc628a0ecdc44d6afab28a9d5a52c996eec5)
[ ijc -- s/is_32bit_domain/is_pv32_domain ]
Ian Campbell [Wed, 9 Apr 2014 11:51:14 +0000 (12:51 +0100)]
tools: arm: improve placement of initial modules.
314c9815e2f5 "tools: implement initial ramdisk support for ARM." broke starting
guests with <= 128 MB ram by placing the boot modules (dtb and initrd)
immediately after the kernel in this case, running the risk of them being
overwritten. Instead place the modules at the end of RAM, as the hypervisor
does for dom0.
The hypervisor also falls back to placing things before the kernel as a last
resort before failing, so add that here too.
Tested with the Debian installer initrd and guests of 96MB, 128MB, 256MB and
1GB. All work, also tested with 64MB but the installer doesn't run with so
little RAM (but our placement of the initrd is correct).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 6f4ff742a5caa411397fc38233f818e64a0c541c)
Ian Campbell [Fri, 4 Apr 2014 13:28:45 +0000 (14:28 +0100)]
tools: implement initial ramdisk support for ARM.
The ramdisk is passed to the kernel as a property in the chosen node of the
device tree. This is somewhat tricky since in order to place the ramdisk and
dtb in ram we first need to know the size of the dtb. So we initially create a
DTB with placeholders for the ramdisk and finalise the value (which doesn't
change the size) once we know where everything is.
Rename libxl__arch_domain_configure to xl__arch_domain_init_hw_description to
better reflect its use and to be consistent with the new
libxl__arch_domain_finalise_hw_description.
The common xc_dom_build_image() function did not support explicit placement of
the ramdisk, instead passing 0 to xc_dom_alloc_segment, meaning "pick
somewhere". This change instead passes ramdisk_seg.vstart. If nothing has set
vstart then it will be zero because the entire dom struct is zeroed on
allocation in xc_dom_allocate(). Therefore there is no change to the behaviour
on x86. This is also consistent with how other segments (kernel, dtb) are
handled.
Furthermore if the ramdisk has been explicitly placed then xc_dom_build_image()
assumes that it is not to be decompressed (since that would muck up the sizings
used on placement).
With all that I'm able to boot a domain using the current Debian Jessie armhf
installer initrd and have it complete successfully.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- s/itherwise/otherwise and dropped bogus emacs magic change ]
(cherry picked from commit 314c9815e2f5dc8a9fec11e0cf9b49b16ed0e96b)
Jason Andryuk [Fri, 16 May 2014 20:41:17 +0000 (16:41 -0400)]
libxc: Free logger after printing error message
On error, PERROR calls the already destroyed logger, which can segfault.
Re-order the calls, so the logger is still available.
Signed-off-by: Jason Andryuk <andryuk@aero.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 86216963fd1d89883bb8120535704fdc79fdad50)
Ian Jackson [Wed, 19 Feb 2014 14:03:29 +0000 (14:03 +0000)]
libxl: Fix error path in libxl_device_events_handler
libxl_device_events_handler would fail to call AO_ABORT if it failed;
instead it would simply return rc. (This leaves the egc etc. from the
now-abolished stack frame potentially live, and leaves the ctx
locked.)
In xl, this is of no consequence, because xl will immediately exit in
this situation. This is very likely to be true in any other callers
(of which we don't know of any, anyway).
Coverity-ID: 1181840 Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> CC: coverity@xenproject.org
(cherry picked from commit c566ab68af7da089ae2b0ff664d02a93a0647584)
Andrew Cooper [Tue, 18 Feb 2014 15:59:05 +0000 (15:59 +0000)]
tools/libxl: Don't read off the end of tinfo[]
It is very common for BIOSes to advertise more cpus than are actually present
on the system, and mark some of them as offline. This is what Xen does to
allow for later CPU hotplug, and what BIOSes common to multiple different
systems do to to save fully rewriting the MADT in memory.
An excerpt from `xl info` might look like:
...
nr_cpus : 2
max_cpu_id : 3
...
Which shows 4 CPUs in the MADT, but only 2 online (as this particular box is
the dual-core rather than the quad-core SKU of its particular brand)
Because of the way Xen exposes this information, a libxl_cputopology array is
bounded by 'nr_cpus', while cpu bitmaps are bounded by 'max_cpu_id + 1'.
The current libxl code has two places which erroneously assume that a
libxl_cputopology array is as long as the number of bits found in a cpu
bitmap, and valgrind complains:
==14961== Invalid read of size 4
==14961== at 0x407AB7F: libxl__get_numa_candidate (libxl_numa.c:230)
==14961== by 0x407030B: libxl__build_pre (libxl_dom.c:167)
==14961== by 0x406246F: libxl__domain_build (libxl_create.c:371)
...
==14961== Address 0x4324788 is 8 bytes after a block of size 24 alloc'd
==14961== at 0x402669D: calloc (in/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==14961== by 0x4075BB9: libxl__zalloc (libxl_internal.c:83)
==14961== by 0x4052F87: libxl_get_cpu_topology (libxl.c:4408)
==14961== by 0x407A899: libxl__get_numa_candidate (libxl_numa.c:342)
...
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 81b03050485708698ce2245d9abefce07aafb704)
Andrew Cooper [Sat, 10 May 2014 01:18:33 +0000 (02:18 +0100)]
tools/pygrub: Fix error handling if no valid partitions are found
If no partitions at all are found, pygrub never creates the name 'fs',
resulting in a NameError indicating the lack of fs, rather than a
RuntimeError explaining that no partitions were found.
Set fs to None right at the start, and use the pythonic idiom "if fs is None:"
to protect against otherwise valid values for fs which compare equal to
0/False.
Reported-by: Sven Köhler <sven.koehler@gmail.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit d75215805ce6ed20b3807955fab6a7f7a3368bee)
Wei Liu [Wed, 9 Apr 2014 13:29:13 +0000 (14:29 +0100)]
libxl_json: remove extra "break"
... otherwise JSON array elements are not freed and memory is leaked.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 3eb54a2fdbc216b39dc2c0a86f11a32d4c838269)
do_tmem_destroy_pool is checking if pools == NULL. But, pools is a fixed
array.
Clang 3.5 will fail to compile xen/common/tmem.c with the following error:
tmem.c:1848:18: error: comparison of array 'client->pools' equal to a null
pointer is always false [-Werror,-Wtautological-pointer-compare]
if ( client->pools == NULL )
Roger Pau Monne [Tue, 11 Feb 2014 10:38:24 +0000 (11:38 +0100)]
tools: require OCaml version 3.09.3 or greater
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Don Slutz <dslutz@verizon.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@citrix.com>
(cherry picked from commit a37c389930936c3a9b1215c385fdd22854836871)
Ian Campbell [Wed, 14 May 2014 14:19:13 +0000 (15:19 +0100)]
tools: arm: remove code to check for a DTB appended to the kernel
The code to check for an appended DTB was confusing and unnecessary. Since we
know the size of the kernel binary passed to us we should just load the entire
thing into guest RAM (subject to the limits checks). Removing this code avoids
a whole raft of overflow and alignment issues.
We also need to validate the limits of the segment where we intend to load the
kernel to avoid overflow issues.
For ARM32 we control the load address, but we need to validate the size. The
entry point is only relevant within the guest so we don't need to worry about
that.
For ARM64 we need to validate both the load address (which is the same as the
entry point) and the size.
This is XSA-95.
Reported-by: Thomas Leonard <talex5@gmail.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Mon, 12 May 2014 15:19:01 +0000 (17:19 +0200)]
x86: fix guest CPUID handling
The way XEN_DOMCTL_set_cpuid got handled so far allowed for surprises
to the caller. With this set of operations
- set leaf A (using array index 0)
- set leaf B (using array index 1)
- clear leaf A (clearing array index 0)
- set leaf B (using array index 0)
- clear leaf B (clearing array index 0)
the entry for leaf B at array index 1 would still be in place, while
the caller would expect it to be cleared.
While looking at the use sites of d->arch.cpuid[] I also noticed that
the allocation of the array needlessly uses the zeroing form - the
relevant fields of the array elements get set in a loop immediately
following the allocation.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 4c0ff6bd54b5a67f8f820f9ed0a89a79f1a26a1c
master date: 2014-05-02 12:09:03 +0200
Paul Durrant [Mon, 12 May 2014 15:18:08 +0000 (17:18 +0200)]
hvm_set_ioreq_page() releases wrong page in error path
The function calls prepare_ring_for_helper() to acquire a mapping for the
given gmfn, then checks (under lock) to see if the ioreq page is already
set up but, if it is, the function then releases the in-use ioreq page
mapping on the error path rather than the one it just acquired. This patch
fixes this bug.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 16e2a7596e9fc86881c73cef57602b2c88155528
master date: 2014-05-02 11:46:32 +0200
Jan Beulich [Mon, 12 May 2014 15:13:32 +0000 (17:13 +0200)]
VT-d: suppress UR signaling for desktop chipsets
Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the desktop chipsets dealt with here.
This is CVE-2013-3495 / XSA-59.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Don Dugger <donald.d.dugger@intel.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: d6cb14b34ffc2a830022d059f1aa22bf19dcf55f
master date: 2014-04-25 12:12:38 +0200
Jan Beulich [Mon, 12 May 2014 15:11:12 +0000 (17:11 +0200)]
VT-d: suppress UR signaling for server chipsets
Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the server chipsets dealt with here.
IDs 0xe00, 0xe01, and 0xe04 ... 0xe0b (Ivytown) aren't needed here -
Intel confirmed the issue to be fixed in hardware there.
This is CVE-2013-3495 / XSA-59.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Don Dugger <donald.d.dugger@intel.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: d061d200eb92bcb1d86f9b55c6de73e35ce63fdf
master date: 2014-04-25 12:11:55 +0200
Jan Beulich [Thu, 8 May 2014 08:02:24 +0000 (10:02 +0200)]
x86/nested HAP: don't BUG() on legitimate error
p2m_set_entry() can fail without there being a bug in the code - crash
the domain rather than the host in that case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
master commit: 1ca73aaf51eba14256794bf045c2eb01e88e1324
master date: 2014-04-14 12:50:56 +0200
Julien Grall [Thu, 1 May 2014 10:55:14 +0000 (11:55 +0100)]
xen/arm: Correctly save/restore CNTKCTL_EL1
CNTKCTL_EL1 is used by the guest to control access to the timer from
userspace. It therefore needs to be save/restored by Xen as part of
the VCPU state.
By default Linux on ARM64 exposes the timer to userspace. Furthermore on
ARM64, Linux provides helpers in a VDSO (gettimeofday/__do_get_tspec)
that use the timer counter. Conversely, during CPU bring up, Xen will
set CNTKCTL_EL1 to 0 (i.e disallow timer access to the userspace). As
a result, currently, if dom0 has 1 VCPU which is migrated to another
PCPU, init might crash.
Alternatively, a guest (malicious or not) might decide to disable
access to the timer from userspace. If the register is not
save/restored, when a DOM0 VCPU runs again, a similar crash would
result.
Also, drop CNTKCTL_EL1 initialization in init_timer_interrupt. Xen
should let the guest deal with this register.
This is XSA-91 / CVE-2014-3125.
Reported-by: Chen Baozi <baozich@gmail.com> Signed-off-by: Julien Grall <julien.grall@linaro.org> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 29 Apr 2014 13:27:22 +0000 (15:27 +0200)]
x86/HVM: restrict HVMOP_set_mem_type
Permitting arbitrary type changes here has the potential of creating
present P2M (and hence EPT/NPT/IOMMU) entries pointing to an invalid
MFN (INVALID_MFN truncated to the respective hardware structure field's
width). This would become a problem the latest when something real sat
at the end of the physical address space; I'm suspecting though that
other things might break with such bogus entries.
Along with that drop a bogus (and otherwise becoming stale) log
message.
Afaict the similar operation in p2m_set_mem_access() is safe.
This is XSA-92.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 83bb5eb4d340acebf27b34108fb1dae062146a68
master date: 2014-04-29 15:11:31 +0200
On arm64, VFP instructions requires vfpregs to be 128-byte aligned.
By chance, the field is already correctly aligned. In the case if someone
decides to add a new field before, Xen will receive a data abort as soon as
it saves/restores VFP.
We are safe on arm32 as the only constraint is to be 32-byte aligned.
Reported-by: Chen Baozi <baozich@gmail.com> Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 9b4e96724eeb916f2cd311d9133f00c216caa321)
Ian Campbell [Wed, 23 Apr 2014 15:32:45 +0000 (16:32 +0100)]
xen/arm: vgic: Check rank in GICD_ICFGR* emulation before locking
The function vgic_irq_rank may return NULL is the IRQ is not in range handled
by the guest. This will result to derefence a NULL pointer which will crash
Xen.
I've checked the rest of the emulation and this is only place where the lock
is taken before the rank is checked.
This is CVE-2014-2986 / XSA-94.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Reported-by: Thomas Leonard <talex5@gmail.com> Reviewed-by: Jan Beulich <JBeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 23 Apr 2014 14:25:21 +0000 (16:25 +0200)]
xen: x86 & generic: change to __builtin_prefetch()
Quoting Andi Kleen in Linux b483570a13be from 2007:
gcc 3.2+ supports __builtin_prefetch, so it's possible to use it on all
architectures. Change the generic fallback in linux/prefetch.h to use it
instead of noping it out. gcc should do the right thing when the
architecture doesn't support prefetching
Undefine the x86-64 inline assembler version and use the fallback.
ARM wants to use the builtins.
Fix a pair of spelling errors, one of which was from Lucas De Marchi in the
Linux tree.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Acked-by: Tim Deegan <tim@xen.org>
master commit: 630017f420f111e0c0332dbd99df30ebb8fed207
master date: 2014-04-03 17:15:41 +0100
A guest is allowed to use invalidate cache by set/way instruction (i.e DCISW)
without any restriction. As the cache is shared with Xen, the guest invalidate
an address being in used by Xen. This may lead a Xen crash because the memory
state is invalid.
Set the bit HCR.SWIO to upgrade invalidate cache by set/way instruction to an
invalidate and clean.
This is CVE-2014-2915 / XSA-93.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Reported-by: Thomas Leonard <tal36@cam.ac.uk> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: Don't let the guest access the coprocessors registers
In Xen we only handle save/restore for coprocessor 10 and 11 (NEON). Other
coprocessors (0-9, 12-13) are currently exposed to the guest and may lead
to data shared between guest.
Disable access to all coprocessor except 10 and 11 by setting correctly
HCTPR.
This is CVE-2014-2915 / XSA-93.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: Inject an undefined instruction when the coproc/sysreg is not handled
Currently Xen panics if it's unable to handle a coprocessor/sysreg instruction.
Replace this behavior by inject an undefined instruction to the faulty guest
and log if Xen is in debug mode.
This is CVE-2014-2915 / XSA-93.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Samuel Thibault [Fri, 21 Mar 2014 01:56:56 +0000 (02:56 +0100)]
PV-GRUB: fix blk access at end of disk
GRUB usually always loads a whole disk track, even if that means going
beyond the end of the disk. We thus have to gracefully return an error,
instead of letting the blkfront go panic.
Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 51e18e41e39a682de5a2e60ad86048dc6344efec)