]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agox86/paging: make log-dirty operations preemptible
Jan Beulich [Tue, 12 Aug 2014 13:44:26 +0000 (15:44 +0200)]
x86/paging: make log-dirty operations preemptible

Both the freeing and the inspection of the bitmap get done in (nested)
loops which - besides having a rather high iteration count in general,
albeit that would be covered by XSA-77 - have the number of non-trivial
iterations they need to perform (indirectly) controllable by both the
guest they are for and any domain controlling the guest (including the
one running qemu for it).

This is CVE-2014-5146 / XSA-97.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 95e6d82224689fdfd967a093a4d69efc24c17e91
master date: 2014-08-12 15:30:11 +0200

10 years agoupdate Xen version to 4.4.1-rc2 4.4.1-rc2
Jan Beulich [Tue, 5 Aug 2014 11:41:22 +0000 (13:41 +0200)]
update Xen version to 4.4.1-rc2

10 years agox86/mem_event: fix regression affecting CR0 memory events
Tamas K Lengyel [Mon, 28 Jul 2014 12:59:00 +0000 (14:59 +0200)]
x86/mem_event: fix regression affecting CR0 memory events

This is a patch repairing a regression in code previously functional in 4.1.x.
It appears that, during some refactoring work, call to hvm_memory_event_cr0 was lost.

This function was originally called in mov_to_cr() of vmx.c, but the commit
http://xenbits.xen.org/hg/xen-unstable.hg/rev/1276926e3795 abstracted the
original code into generic functions up a level in hvm.c, dropping the call
in the process.

The same issue affected the CR3 and CR4 events, which were fixed in patch
http://xenbits.xensource.com/hg/xen-unstable.hg/rev/7ab899e46347.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 5d570c1d0274cac3b333ef378af3325b3b69905e
master date: 2014-07-23 18:05:11 +0200

10 years agox86/mem_event: prevent underflow of vcpu pause counts
Andrew Cooper [Mon, 28 Jul 2014 12:57:47 +0000 (14:57 +0200)]
x86/mem_event: prevent underflow of vcpu pause counts

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Tested-by: Aravindh Puthiyaparambil <aravindp@cisco.com>
master commit: 868d9b99b39c53dc1f6ae9bfd7b148c206fd7240
master date: 2014-07-23 18:08:04 +0200

10 years agoevtchn: eliminate 64k ports limitation
Jan Beulich [Mon, 28 Jul 2014 12:56:33 +0000 (14:56 +0200)]
evtchn: eliminate 64k ports limitation

The introduction of FIFO event channels claimed to support over 100k
ports, but failed to widen a number of 16-bit variables/operations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 8f7f6ab879a9ad9d2bf66b8c6b46a0653086b79f
master date: 2014-04-11 11:25:56 +0200

10 years agox86/mem_event: validate the response vcpu_id before acting on it
Andrew Cooper [Mon, 28 Jul 2014 12:54:15 +0000 (14:54 +0200)]
x86/mem_event: validate the response vcpu_id before acting on it

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
master commit: ee75480b3c8856db9ef1aa45418f35ec0d78989d
master date: 2014-07-23 18:07:11 +0200

10 years agoavoid crash when doing shutdown with active cpupools
Juergen Gross [Mon, 28 Jul 2014 12:53:22 +0000 (14:53 +0200)]
avoid crash when doing shutdown with active cpupools

When shutting down the machine while there are cpus in a cpupool other than
Pool-0 a crash is triggered due to cpupool handling rejecting offlining the
non-boot cpus in other cpupools.

It is easy to detect this case and allow offlining those cpus.

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
master commit: 05377dede434c746e6708f055858378d20f619db
master date: 2014-07-23 18:03:19 +0200

10 years agoproperly reference count DOMCTL_{,un}pausedomain hypercalls
Andrew Cooper [Mon, 28 Jul 2014 12:52:10 +0000 (14:52 +0200)]
properly reference count DOMCTL_{,un}pausedomain hypercalls

For safety reasons, c/s 6ae2df93c27 "mem_access: Add helper API to setup
ring and enable mem_access" has to pause the domain while it performs a set of
operations.

However without properly reference counted hypercalls, xc_mem_event_enable()
now unconditionally unpauses a previously paused domain.

To prevent toolstack software running wild, there is an arbitrary limit of 255
on the toolstack pause count.  This is high enough for several components of
the toolstack to safely use, but prevents over/underflow of d->pause_count.

The previous domain_{,un}pause_by_systemcontroller() functions are updated to
return an error code.  domain_pause_by_systemcontroller() is modified to have
a common stub and take a pause_fn pointer, allowing for both sync and nosync
domain pauses.  domain_pause_for_debugger() has a hand-rolled nosync pause
replaced with the new domain_pause_by_systemcontroller_nosync(), and has its
variables shuffled slightly to avoid rereading current multiple times.

Suggested-by: Don Slutz <dslutz@verizon.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
With a couple of formatting adjustments:
Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/gdbsx: invert preconditions for XEN_DOMCTL_gdbsx_{,un}pausevcpu hypercalls

c/s 3eb1c708ab "properly reference count DOMCTL_{,un}pausedomain hypercalls"
accidentally inverted the use of d->controller_pause_count.

Revert back to how it was originally, i.e. the XEN_DOMCTL_gdbsx_{,un}pausevcpu
hypercalls are only valid for a domain already paused by the system controller.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3eb1c708ab0fe1067a436498a684907afa14dacf
master date: 2014-07-03 16:51:13 +0200
master commit: 680d79f10bb70691a9ae3b4a6a8b669e0f2837f6
master date: 2014-07-25 11:53:31 +0200

10 years agoVT-d/ATS: correct and clean up dev_invalidate_iotlb()
Jan Beulich [Mon, 28 Jul 2014 12:50:45 +0000 (14:50 +0200)]
VT-d/ATS: correct and clean up dev_invalidate_iotlb()

While this was intended to only do cleanup (replace the two bogus
"ret |= " constructs, and a simple formatting correction), this now
also
- fixes the bit manipulations for size_order > 0
  a) correct an off-by-one in the use of size_order for shifting (till
     now double the requested size got invalidated)
  b) in fact setting bit 12 and up if necessary (without which too
     small a region might have got invalidated)
  c) making them capable of dealing with regions of 4Gb size and up
- corrects the return value handling, such that a later iteration's
  success won't clear an earlier iteration's error indication
- uses PCI_BDF2() instead of open coding it
- bail immediately on bad passed in invalidation type, rather than
  repeatedly printing the same message for each ATS-capable device, at
  once also no longer hiding that failure from the caller

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: fd33987ba27607c3cc7da258cf1d86d21beeb735
master date: 2014-06-30 15:57:40 +0200

10 years agoxen: arm: implement generic multiboot compatibility strings
Ian Campbell [Fri, 18 Jul 2014 13:08:11 +0000 (14:08 +0100)]
xen: arm: implement generic multiboot compatibility strings

This causes Xen to accept the more generic names specified in
http://wiki.xen.org/wiki/Xen_ARM_with_Virtualization_Extensions/Multiboot as of
2014-06-06.

These names are more generic than those proposed by Andre in
http://thread.gmane.org/gmane.linux.linaro.announce.boot/326 and those
used in earlier drafts of the /Multiboot wiki page.

This will allow bootloaders to not special case Xen (or at least to reduce
the amount which is required).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit a860dfeec090fe46d856b5d3fc6da28ccf7d1ba5)

10 years agoxen: arm: flush TLB after overwriting 1:1 mapping in boot page tables
Ian Campbell [Mon, 14 Jul 2014 16:39:10 +0000 (17:39 +0100)]
xen: arm: flush TLB after overwriting 1:1 mapping in boot page tables

Otherwise a stale TLB entry can shadow the fixmap/UART or DTB mapping

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit f1870804e58565399cd770e93f62e7ce57cd5231)

10 years agoxen: arm: use physical processor ID (MPIDR) when calling psci CPU_ON
Ian Campbell [Mon, 14 Jul 2014 16:21:47 +0000 (17:21 +0100)]
xen: arm: use physical processor ID (MPIDR) when calling psci CPU_ON

Xen's logical CPU map can differ from the underlying layout.

Also add an emacs magic block to this file.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit d99504872178523b024bfb36736b158a42c5060e)

10 years agoxen: Install arch-arm directory headers
Julien Grall [Tue, 8 Jul 2014 17:04:48 +0000 (18:04 +0100)]
xen: Install arch-arm directory headers

Some headers for ARM are not installed on the host. This may make external
software relying on Xen headers failed to compile on ARM.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit f224b60791b539df66c1fe89d7866170653428b6)

10 years agoQEMU_TAG update - FIX!
Ian Jackson [Thu, 3 Jul 2014 12:58:23 +0000 (13:58 +0100)]
QEMU_TAG update - FIX!

My qemu push and tag update script had transposed a revision number
from qemu-xen-4.3-testing into xen-4.4-testing's Config.mk!

The script is now fixed, but we also need to fix the tree.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
10 years agoQEMU_TAG update
Ian Jackson [Wed, 2 Jul 2014 15:06:31 +0000 (16:06 +0100)]
QEMU_TAG update

10 years agotools: arm: report an error if the guest RAM is too large
Ian Campbell [Thu, 22 May 2014 09:46:37 +0000 (10:46 +0100)]
tools: arm: report an error if the guest RAM is too large

Due to the layout of the guest physical address space we cannot support more
than 768M of RAM before overrunning the area set aside for the grant table. Due
to the presence of the magic pages at the end of the RAM region guests are
actually limited to 767M.

Catch this case during domain build and fail gracefully instead of obscurely
later on.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 5a959f44ed03398870b6ec0dfebb59dcd5981f94)

10 years agotools/libxl: Fix free() of wild pointer in libxl__initiate_device_remove()
Andrew Cooper [Wed, 18 Jun 2014 18:04:14 +0000 (19:04 +0100)]
tools/libxl: Fix free() of wild pointer in libxl__initiate_device_remove()

libxl__initiate_device_remove() had a preexisting error path issue where
libxl_dominfo_dispose() could be called on a libxl_dominfo object before it
had been initialised with libxl_dominfo_init().

This was safe until c/s ab44401 added the pointer ssid_label, which point
libxl_dominfo_dispose() free()s.

Unconditionally initialise info in libxl__initiate_device_remove() before
taking an error path which will free it.

Coverity-ID: 1223212
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit ddb4aa5dfa13781e8f31ba20923c14c1a083ce83)

10 years agoblktap2: Fix two 'maybe uninitialized' variables
Dario Faggioli [Fri, 20 Jun 2014 14:09:00 +0000 (16:09 +0200)]
blktap2: Fix two 'maybe uninitialized' variables

for which gcc 4.9.0 complains about, like this:

block-qcow.c: In function `get_cluster_offset':
block-qcow.c:431:3: error: `tmp_ptr' may be used uninitialized in this function
[-Werror=maybe-uninitialized]
   memcpy(tmp_ptr, l1_ptr, 4096);
   ^
block-qcow.c:606:7: error: `tmp_ptr2' may be used uninitialized in this
function [-Werror=maybe-uninitialized]
   if (write(s->fd, tmp_ptr2, 4096) != 4096) {
       ^
cc1: all warnings being treated as errors
/home/dario/Sources/xen/xen/xen.git/tools/blktap2/drivers/../../../tools/Rules.mk:89:
 recipe for target 'block-qcow.o' failed
make[5]: *** [block-qcow.o] Error 1

The proper behavior is to return upon allocation failure.
About what to return, 0 seems the best option, looking
at both the function and the call sites.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 345e44a85d71a1a910385f33c7f1ba3683026d18)

10 years agoxen: arm: make sure gcc doesn't use floating-point registers on arm64
Ian Campbell [Thu, 26 Jun 2014 16:30:14 +0000 (17:30 +0100)]
xen: arm: make sure gcc doesn't use floating-point registers on arm64

By using -mgeneral-regs-only which is the Aarch64 equivalent to
-msoft-float.

Otherwise gcc will corrupt the d* registers, which we don't save/restore when
trapping to/from the hypervisor.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit c0726c18e8135f87a5a5793d993d6bea1e3fa925)

10 years agoxen: arm: Implement OSDLR_EL1 trap as RAZ/WO.
Ian Campbell [Fri, 13 Jun 2014 12:15:04 +0000 (13:15 +0100)]
xen: arm: Implement OSDLR_EL1 trap as RAZ/WO.

I'm not sure why this wasn't added at the same time as the other
debug registers.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 92b0b80f0d2d29d0e80bf35ea839ed6058b7f0fa)

10 years agoxen: arm: take FIQ exceptions to Xen not guest by setting HCR_EL2.FMO
Ian Campbell [Thu, 26 Jun 2014 08:53:42 +0000 (09:53 +0100)]
xen: arm: take FIQ exceptions to Xen not guest by setting HCR_EL2.FMO

As with HCR_EL2.{IMO,AMO} we want to route FIQs to Xen not the guest. See ARM
ARM DDI 0406C.b B1.8.4.

So far none of the platforms which we support use FIQ for anything, but when we
end up supporting one it would be far better to surprise Xen with them than
whatever guest happens to be running...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 4bb74e39987b428429c2aacad7f59356d4942e39)

Conflicts:
xen/arch/arm/traps.c

10 years agoxen/arm: Implement a dummy debug monitor for ARM32
Julien Grall [Thu, 24 Apr 2014 22:45:55 +0000 (23:45 +0100)]
xen/arm: Implement a dummy debug monitor for ARM32

XSA-93 (commit 0b18220 "xen/arm: Don't let guess access to Debug and Performance
Monitors registers") disable Debug Registers access.

When CONFIG_PERF_EVENTS is enabled in the Linux Kernel, it will try to
initialize the debug monitors. If an error occured Linux won't use this
feature.

The implementation made Xen expose a minimal set of registers which let think
the guest (i.e.) thinks HW debug won't work.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- s/DBGCR/DBGBCR/ to use correct register name ]
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 68c69978352adb5ab7c06598056f9eb88d7d6031)
[ ijc -- s/is_32bit_domain/is_pv32_domain/ ]

10 years agoxen/arm: Implement a dummy Performance Monitor for ARM32
Julien Grall [Thu, 24 Apr 2014 22:45:54 +0000 (23:45 +0100)]
xen/arm: Implement a dummy Performance Monitor for ARM32

XSA-93 (commit 0b18220 "xen/arm: Don't let guess access to Debug and Performance
Monitor registers") disable Performance Monitor.

When CONFIG_PERF_EVENTS is enabled in the Linux Kernel, regardless the
ID_DFR0 (which tell if Perfomance Monitors Extension is implemented) the
kernel will try to access to PMCR.

Therefore we tell the guest we have 0 counters. Unfortunately we must always
support PMCCNTR (the cycle counter): we just RAZ/WI for all PM register,
which doesn't crash the kernel at least.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit aa0d443718372b46c432af7cb6274050cda32fc6)

10 years agoxen/arm: Panic when we receive an unexpected trap
Julien Grall [Tue, 17 Jun 2014 20:44:28 +0000 (21:44 +0100)]
xen/arm: Panic when we receive an unexpected trap

The current implementation of do_unexpected_trap make Xen spin forever
on the current physical CPU. This may lead to stall guests VCPU and print
unhelpful message (RCU stall...).

Usually when Xen receives an unexpected trap, it means that something goes
wrong either in the hypervisor or in the CPU. In this case we should
directly panic to also stop the other CPUs.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 4f5ab681d208993f94553203f4be323b3c929070)

10 years agoxen: arm: initialise the grant_table_gpfn array on allocation
Ian Campbell [Wed, 25 Jun 2014 12:58:59 +0000 (13:58 +0100)]
xen: arm: initialise the grant_table_gpfn array on allocation

Avoids leaking uninitialised memory via the grant table setup hypercall.

This is XSA-101.

Reported-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
10 years agox86/nmi: be less verbose when testing the NMI watchdog
David Vrabel [Tue, 24 Jun 2014 12:22:54 +0000 (14:22 +0200)]
x86/nmi: be less verbose when testing the NMI watchdog

There's no need to print all the CPUs that are ok, only the ones that
got stuck.

The resulting output is either:

  Testing NMI watchdog on all CPUs: 1 4 6 stuck

or

  Testing NMI watchdog on all CPUs: ok

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: f64b1901564b6206dbbe946699619fcd22446de8
master date: 2014-05-15 15:32:36 +0200

10 years agox86: Intel CPU family update
Jan Beulich [Tue, 24 Jun 2014 12:22:09 +0000 (14:22 +0200)]
x86: Intel CPU family update

... according to revision 49 of the Intel SDM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 3385bf3aad25082e3bc6ab0e1cbd639512983e4d
master date: 2014-03-18 11:51:43 +0100

10 years agox86/mwait_idle: fix trace output
Ross Lagerwall [Tue, 24 Jun 2014 07:44:23 +0000 (09:44 +0200)]
x86/mwait_idle: fix trace output

Use the C-state's type when tracing, not its index since the index is
not set by the mwait_idle driver.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: d17ac1d5433ba2c25d7fab11baba59173e339896
master date: 2014-06-20 10:37:21 +0200

10 years agoVT-d/qinval: make local variable used for communication with IOMMU "volatile"
Jan Beulich [Tue, 24 Jun 2014 07:43:31 +0000 (09:43 +0200)]
VT-d/qinval: make local variable used for communication with IOMMU "volatile"

Without that there is - afaict - nothing preventing the compiler from
putting the variable into a register for the duration of the wait loop.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: ceec46c02074e1b2ade0b13c3c4a2f3942ae698c
master date: 2014-06-20 10:25:33 +0200

10 years agox86/EFI: allow FPU/XMM use in runtime service functions
Jan Beulich [Tue, 24 Jun 2014 07:42:49 +0000 (09:42 +0200)]
x86/EFI: allow FPU/XMM use in runtime service functions

UEFI spec update 2.4B developed a requirement to enter runtime service
functions with CR0.TS (and CR0.EM) clear, thus making feasible the
already previously stated permission for these functions to use some of
the XMM registers. Enforce this requirement (along with the connected
ones on FPU control word and MXCSR) by going through a full FPU save
cycle (if the FPU was dirty) in efi_rs_enter() (along with loading  the
specified values into the other two registers).

Note that the UEFI spec mandates that extension registers other than
XMM ones (for our purposes all that get restored eagerly) are preserved
across runtime function calls, hence there's nothing we need to restore
in efi_rs_leave() (they do get saved, but just for simplicity's sake).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: e0fe297dabc96d8161d568f19a99722c4739b9f9
master date: 2014-06-18 15:53:27 +0200

10 years agoIOMMU: prevent VT-d device IOTLB operations on wrong IOMMU
Malcolm Crossley [Tue, 24 Jun 2014 07:41:45 +0000 (09:41 +0200)]
IOMMU: prevent VT-d device IOTLB operations on wrong IOMMU

PCIe ATS allows for devices to contain IOTLBs, the VT-d code was iterating
around all ATS capable devices and issuing IOTLB operations for all IOMMUs,
even though each ATS device is only accessible via one particular IOMMU.

Issuing an IOMMU operation to a device not accessible via that IOMMU results
in an IOMMU timeout because the device does not reply. VT-d IOMMU timeouts
result in a Xen panic.

Therefore this bug prevents any Intel system with 2 or more ATS enabled IOMMUs,
each with an ATS device connected to them, from booting Xen.

The patch adds a IOMMU pointer to the ATS device struct so the VT-d code can
ensure it does not issue IOMMU ATS operations on the wrong IOMMU. A void
pointer has to be used because AMD and Intel IOMMU implementations do not have
a common IOMMU structure or indexing mechanism.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 84c340ba4c3eb99278b6ba885616bb183b88ad67
master date: 2014-06-18 15:50:02 +0200

10 years agox86/mce: don't spam the console with "CPUx: Temperature z"
Konrad Rzeszutek Wilk [Tue, 24 Jun 2014 07:40:56 +0000 (09:40 +0200)]
x86/mce: don't spam the console with "CPUx: Temperature z"

If the machine has been quite busy it ends up with these messages
printed on the hypervisor console:

(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU0: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal
(XEN) CPU1: Temperature/speed normal
(XEN) CPU0: Temperature above threshold
(XEN) CPU0: Running in modulated clock mode
(XEN) CPU1: Temperature/speed normal
(XEN) CPU2: Temperature/speed normal
(XEN) CPU3: Temperature/speed normal

While the state changes are important, the non-altered state
information is not needed. As such add a latch mechanism to only print
the information if it has changed since the last update (and the
hardware doesn't properly suppress redundant notifications).

This was observed on Intel DQ67SW,
BIOS SWQ6710H.86A.0066.2012.1105.1504 11/05/2012

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
master commit: 323338f86fb6cd6f6dba4f59a84eed71b3552d21
master date: 2014-06-16 11:59:32 +0200

10 years agox86/HVM: refine SMEP test in HVM_CR4_GUEST_RESERVED_BITS()
Jan Beulich [Tue, 24 Jun 2014 07:40:01 +0000 (09:40 +0200)]
x86/HVM: refine SMEP test in HVM_CR4_GUEST_RESERVED_BITS()

Andrew validly points out that the use of the macro on the restore path
can't rely on the CPUID bits for the guest already being in place (as
their setting by the tool stack in turn requires the other restore
operations already having taken place). And even worse, using
hvm_cpuid() is invalid here because that function assumes to be used in
the context of the vCPU in question.

Reverting to the behavior prior to the change from checking
cpu_has_sm?p to hvm_vcpu_has_sm?p() would break the other (non-restore)
use of the macro. So let's revert to the prior behavior only for the
restore path, by adding a respective second parameter to the macro.

Obviously the two cpu_has_* uses in the macro should really also be
converted to hvm_cpuid() based checks at least for the non-restore
path.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: David Vrabel <david.vrabel@citrix.com>
master commit: 584287380baf81e5acdd9dc7dfc7ffccd1e9a856
master date: 2014-06-10 13:12:05 +0200

10 years agoavoid crash on HVM domain destroy with PCI passthrough
Juergen Gross [Tue, 24 Jun 2014 07:38:48 +0000 (09:38 +0200)]
avoid crash on HVM domain destroy with PCI passthrough

c/s bac6334b5 "move domain to cpupool0 before destroying it" introduced a
problem when destroying a HVM domain with PCI passthrough enabled. The
moving of the domain to cpupool0 includes moving the pirqs to the cpupool0
cpus, but the event channel infrastructure already is unusable for the
domain. So just avoid moving pirqs for dying domains.

Signed-off-by: Juergen Gross <jgross@suse.com>
master commit: b9ae60907e6dbc686403e52a7e61a6f856401a1b
master date: 2014-06-10 12:04:08 +0200

10 years agox86: fix reboot/shutdown with running HVM guests
Roger Pau Monné [Tue, 24 Jun 2014 07:37:37 +0000 (09:37 +0200)]
x86: fix reboot/shutdown with running HVM guests

If there's a guest using VMX/SVM when the hypervisor shuts down, it
can lead to the following crash due to VMX/SVM functions being called
after hvm_cpu_down has been called. In order to prevent that, check in
{svm/vmx}_ctxt_switch_from that the cpu virtualization extensions are
still enabled.

(XEN) Domain 0 shutdown: rebooting machine.
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ----[ Xen-4.5-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    0
(XEN) RIP:    e008:[<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
...
(XEN) Xen call trace:
(XEN)    [<ffff82d0801d90ce>] vmx_ctxt_switch_from+0x1e/0x14c
(XEN)    [<ffff82d08015d129>] __context_switch+0x127/0x462
(XEN)    [<ffff82d080160acf>] __sync_local_execstate+0x6a/0x8b
(XEN)    [<ffff82d080160af9>] sync_local_execstate+0x9/0xb
(XEN)    [<ffff82d080161728>] map_domain_page+0x88/0x4de
(XEN)    [<ffff82d08014e721>] map_vtd_domain_page+0xd/0xf
(XEN)    [<ffff82d08014cda2>] io_apic_read_remap_rte+0x158/0x29f
(XEN)    [<ffff82d0801448a8>] iommu_read_apic_from_ire+0x27/0x29
(XEN)    [<ffff82d080165625>] io_apic_read+0x17/0x65
(XEN)    [<ffff82d080166143>] __ioapic_read_entry+0x38/0x61
(XEN)    [<ffff82d080166aa8>] clear_IO_APIC_pin+0x1a/0xf3
(XEN)    [<ffff82d080166bae>] clear_IO_APIC+0x2d/0x60
(XEN)    [<ffff82d080166f63>] disable_IO_APIC+0xd/0x81
(XEN)    [<ffff82d08018228b>] smp_send_stop+0x58/0x68
(XEN)    [<ffff82d080181aa7>] machine_restart+0x80/0x20a
(XEN)    [<ffff82d080181c3c>] __machine_restart+0xb/0xf
(XEN)    [<ffff82d080128fb9>] smp_call_function_interrupt+0x99/0xc0
(XEN)    [<ffff82d080182330>] call_function_interrupt+0x33/0x43
(XEN)    [<ffff82d08016bd89>] do_IRQ+0x9e/0x63a
(XEN)    [<ffff82d08016406f>] common_interrupt+0x5f/0x70
(XEN)    [<ffff82d0801a8600>] mwait_idle+0x29c/0x2f7
(XEN)    [<ffff82d08015cf67>] idle_loop+0x58/0x76
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'read_cr0() & X86_CR0_TS' failed at vmx.c:644
(XEN) ****************************************

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 39ede234d1fd683430ffb1784d6d35b096f16457
master date: 2014-06-05 17:53:35 +0200

10 years agox86/domctl: two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate
Andrew Cooper [Tue, 24 Jun 2014 07:36:49 +0000 (09:36 +0200)]
x86/domctl: two functional fixes to XEN_DOMCTL_[gs]etvcpuextstate

Interacting with the vcpu itself should be protected by vcpu_pause().
Buggy/naive toolstacks might encounter adverse interaction with a vcpu context
switch, or increase of xcr0_accum.  There are no much problems with current
in-tree code.

Explicitly permit a NULL guest handle as being a request for size.  It is the
prevailing Xen style, and without it, valgrind's ioctl handler is unable to
determine whether evc->buffer actually got written to.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
# Commit 895661ae98f0249f50280b4acfb9dda70b76d7e9
# Date 2014-06-10 12:03:16 +0200
# Author Andrew Cooper <andrew.cooper3@citrix.com>
# Committer Jan Beulich <jbeulich@suse.com>
x86/domctl: further fix to XEN_DOMCTL_[gs]etvcpuextstate

Do not clobber errors from certain codepaths.  Clobbering of -EINVAL from
failing "evc->size <= PV_XSAVE_SIZE(_xcr0_accum)" was a pre-existing bug.

However, clobbering -EINVAL/-EFAULT from the get codepath was a bug
unintentionally introduced by 090ca8c1 "x86/domctl: two functional fixes to
XEN_DOMCTL_[gs]etvcpuextstate".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 090ca8c155b7321404ea7713a28aaedb7ac4fffd
master date: 2014-06-05 17:52:57 +0200
master commit: 895661ae98f0249f50280b4acfb9dda70b76d7e9
master date: 2014-06-10 12:03:16 +0200

10 years agoVT-d: honor APEI firmware-first mode in XSA-59 workaround code
Jan Beulich [Tue, 24 Jun 2014 07:34:57 +0000 (09:34 +0200)]
VT-d: honor APEI firmware-first mode in XSA-59 workaround code

When firmware-first mode is being indicated by firmware, we shouldn't
be modifying AER registers - these are considered to be owned by
firmware in that case. Violating this is being reported to result in
SMI storms. While circumventing the workaround means re-exposing
affected hosts to the XSA-59 issues, this in any event seems better
than not booting at all. Respective messages are being issued to the
log, so the situation can be diagnosed.

The basic building blocks were taken from Linux 3.15-rc. Note that
this includes a block of code enclosed in #ifdef CONFIG_X86_MCE - we
don't define that symbol, and that code also wouldn't build without
suitable machine check side code added; that should happen eventually,
but isn't subject of this change.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 1cc37ba8dbd89fb86dad3f6c78c3fba06019fe21
master date: 2014-06-05 17:49:14 +0200

10 years agox86/PVH: avoid call to handle_mmio
Mukesh Rathor [Tue, 24 Jun 2014 07:33:18 +0000 (09:33 +0200)]
x86/PVH: avoid call to handle_mmio

handle_mmio() is currently unsafe for pvh guests. A call to it would
result in call to vioapic_range that will crash xen since the vioapic
ptr in struct hvm_domain is not initialized for pvh guests.

However, one path exists for such a call. If a pvh guest, dom0 or domU,
unintentionally touches non-existing memory, an EPT violation would occur.
This would result in unconditional call to hvm_hap_nested_page_fault. In
that function, because get_gfn_type_access returns p2m_mmio_dm for non
existing mfns by default, handle_mmio() will get called. This would result
in xen crash instead of the guest crash. This patch addresses that.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
master commit: 7c4870915c2d50acbc66347a532e33b452f64f17
master date: 2014-06-04 11:27:50 +0200

10 years agoACPI: Prevent acpi_table_entries from falling into a infinite loop
Malcolm Crossley [Tue, 24 Jun 2014 07:31:57 +0000 (09:31 +0200)]
ACPI: Prevent acpi_table_entries from falling into a infinite loop

If a buggy BIOS programs an ACPI table with to small an entry length
then acpi_table_entries gets stuck in an infinite loop.

To aid debugging, report the error and exit the loop.

Based on Linux kernel commit 369d913b242cae2205471b11b6e33ac368ed33ec

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Use < instead of <= (which I wrongly suggested), return -ENODATA
instead of -EINVAL, and make description match code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 9c1e8cae657bc13e8b1ddeede17603d77f3ad341
master date: 2014-06-04 11:26:15 +0200

10 years agox86, amd_ucode: flip revision numbers in printk
Aravind Gopalakrishnan [Tue, 24 Jun 2014 07:30:28 +0000 (09:30 +0200)]
x86, amd_ucode: flip revision numbers in printk

A failure would result in log message like so-
(XEN) microcode: CPU0 update from revision 0x6000637 to 0x6000626 failed
                                           ^^^^^^^^^^^^^^^^^^^^^^
The above message has the revision numbers inverted. Fix this.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
master commit: 071a4c70a634f7d4f74cde4086ff3202968538c9
master date: 2014-06-02 10:19:27 +0200

10 years agopage-alloc: scrub pages used by hypervisor upon freeing
Jan Beulich [Tue, 17 Jun 2014 14:01:35 +0000 (16:01 +0200)]
page-alloc: scrub pages used by hypervisor upon freeing

... unless they're part of a fully separate pool (and hence can't ever
be used for guest allocations).

This is CVE-2014-4021 / XSA-100.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 4bd78937ec324bcef4e29ef951e0ff9815770de1
master date: 2014-06-17 15:21:10 +0200

10 years agoupdate Xen version to 4.4.1-rc1 4.4.1-rc1
Jan Beulich [Tue, 17 Jun 2014 14:00:42 +0000 (16:00 +0200)]
update Xen version to 4.4.1-rc1

10 years agoxen: arm: correct backport of 84ca4629d0aa
Ian Campbell [Thu, 5 Jun 2014 13:02:42 +0000 (14:02 +0100)]
xen: arm: correct backport of 84ca4629d0aa

This hunk from "Move p2m context save/restore in a separate
function" was accidentally dropped in the backport done to 4.4 as
commit 9ca83a4bd3bf.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
10 years agoxen/arm: Use p2m_restore_state in construct_dom0
Julien Grall [Wed, 19 Mar 2014 15:43:38 +0000 (15:43 +0000)]
xen/arm: Use p2m_restore_state in construct_dom0

The address translation functions used while building dom0 rely on certain EL1
state being configured. In particular they are subject to the behaviour of
SCTLR_EL1.M (stage 1 MMU enabled).

The Xen (and Linux) boot protocol require that the kernel be entered with the
MMU disabled but they don't say anything explicitly about exception levels
other than the one which is active when entering the kernels. Arguably the
protocol could be said to apply to all exception levels but in any case we
should cope with this and setup the EL1 state as necessary.

Fu Wei discovered this when booting Xen from grub.efi over UEFI, it's not
clear whether grub or UEFI is responsible for leaving stage 1 MMU enabled.

Use directly the newly created function p2m_restore_state to retrieve a
correct EL1 state to translate an address.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reported-by: Fu Wei <fu.wei@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit d6dd3a9ae7adead322e8ce96f83db96dce64c982)
[ ijc -- adjusted because this and 278283cd0b81 were backported in the opposite
 order from their application to staging. The result is as if they had
         been backported in the correct order. ]

10 years agoxen/arm: Move p2m context save/restore in a separate function
Julien Grall [Wed, 19 Mar 2014 15:43:37 +0000 (15:43 +0000)]
xen/arm: Move p2m context save/restore in a separate function

Introduce p2m_{save,restore}_state to save/restore p2m context.

The both functions will take care of:
    - VTTBR: contains the pointer to the domain P2M
    - Update HCR_RW if the domain is 64 bit
    - SCTLR: contains bit to know if the MMU is enabled or not

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 84ca4629d0aa71dc45c969f625d069373fb88828)
[ ijc -- s/is_32bit_domain/is_pv32_domain ]

10 years agoxen: arm: ensure we hold a reference to guest pages while we copy to/from them
Ian Campbell [Wed, 4 Jun 2014 13:58:58 +0000 (14:58 +0100)]
xen: arm: ensure we hold a reference to guest pages while we copy to/from them

This at once:
 - prevents the page from being reassigned under our feet
 - ensures that the domain owns the page, which stops a domain from giving a
   grant mapping, MMIO region, other non-RAM as a hypercall input/output.

We need to hold the p2m lock while doing the lookup until we have the
reference.

This also requires that during domain 0 building current is set to an actual
dom0 vcpu, so take care of this at the same time as the p2m is temporarily
loaded.

Lastly when dumping the guest stack we need to make sure that the guest hasn't
pointed its sp off into the weeds and/or misaligned it, which could lead to
hypervisor traps. Solve this by using the new function and checking alignment
first.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- backported to 4.4, using p2m_load_VTTBR ]

10 years agoxen: arm: check permissions when copying to/from guest virtual addresses
Ian Campbell [Wed, 4 Jun 2014 13:58:56 +0000 (14:58 +0100)]
xen: arm: check permissions when copying to/from guest virtual addresses

In particular we need to make sure the guest has write permissions to buffers
which it passes as output buffers for hypercalls, otherwise the guest can
overwrite memory which it shouldn't be able to write (like r/o grant table
mappings).

This is XSA-98.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
10 years agox86/HVM: eliminate vulnerabilities from hvm_inject_msi()
Jan Beulich [Tue, 3 Jun 2014 14:09:55 +0000 (16:09 +0200)]
x86/HVM: eliminate vulnerabilities from hvm_inject_msi()

- pirq_info() returns NULL for a non-allocated pIRQ, and hence we
  mustn't unconditionally de-reference it, and we need to invoke it
  another time after having called map_domain_emuirq_pirq()
- don't use printk(), namely without XENLOG_GUEST, for error reporting

This is XSA-96.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 6f4cc0ac41625a054861b417ea1fc3ab88e2e40a
master date: 2014-06-03 15:17:14 +0200

10 years agotimers: set the deadline more accurately
Ross Lagerwall [Tue, 3 Jun 2014 10:12:43 +0000 (12:12 +0200)]
timers: set the deadline more accurately

Program the timer to the deadline of the closest timer if it is further
than 50us ahead, otherwise set it 50us ahead.  This way a single event
fires on time rather than 50us late (as it would have previously) while
still preventing too many timer wakeups in the case of having many
timers scheduled close together.

(where 50us is the timer_slop)

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: 054b6dfb61eab00d86ddd5d0ac508f5302da0d52
master date: 2014-05-28 10:07:50 +0200

10 years agox86: don't use VA for cache flush when also flushing TLB
Jan Beulich [Tue, 3 Jun 2014 10:12:08 +0000 (12:12 +0200)]
x86: don't use VA for cache flush when also flushing TLB

Doing both flushes at once is a strong indication for the address
mapping to either having got dropped (in which case the cache flush,
when done via INVLPG, would fault) or its physical address having
changed (in which case the cache flush would end up being done on the
wrong address range). There is no adverse effect (other than the
obvious performance one) using WBINVD in this case regardless of the
range's size; only map_pages_to_xen() uses combined flushes at present.

This problem was observed with the 2nd try backport of d6cb14b3 ("VT-d:
suppress UR signaling for desktop chipsets") to 4.2 (where ioremap()
needs to be replaced with set_fixmap_nocache(); the now commented out
__set_fixmap(, 0, 0) there to undo the mapping resulted in the first of
the above two scenarios).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 50df6f7429f73364bbddb0970a3a34faa01a7790
master date: 2014-05-28 09:51:07 +0200

10 years agoAMD IOMMU: don't free page table prematurely
Jan Beulich [Tue, 3 Jun 2014 10:11:29 +0000 (12:11 +0200)]
AMD IOMMU: don't free page table prematurely

iommu_merge_pages() still wants to look at the next level page table,
the TLB flush necessary before freeing too happens in that function,
and if it fails no free should happen at all. Hence the freeing must
be done after that function returned successfully, not before it's
being called.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 6b4d71d028f445cba7426a144751fddc8bfdd67b
master date: 2014-05-28 09:50:33 +0200

10 years agoVT-d: fix mask applied to DMIBAR in desktop chipset XSA-59 workaround
Jan Beulich [Tue, 3 Jun 2014 10:10:36 +0000 (12:10 +0200)]
VT-d: fix mask applied to DMIBAR in desktop chipset XSA-59 workaround

In commit  ("VT-d: suppress UR signaling for desktop chipsets")
the mask applied to the value read from DMIBAR is to narrow, only the
comment accompanying it was correct. Fix that and tag the literal
number as "long" at once to avoid eventual compiler warnings.

The widest possible value so far is 39 bits; all chipsets covered here
but having less than this number of bits have the remaining bits marked
reserved (zero), and hence there's no need for making the mask chipset
specific.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: f8ecf31c31906552522c2a1b0d1cada07d78876e
master date: 2014-05-26 12:28:46 +0200

10 years agoACPI/ERST: fix table mapping
Jan Beulich [Tue, 3 Jun 2014 10:09:27 +0000 (12:09 +0200)]
ACPI/ERST: fix table mapping

acpi_get_table(), when executed before reaching SYS_STATE_active, will
return a mapping valid only until the next invocation of that funciton.
Consequently storing the returned pointer for later use is incorrect.
Copy the logic used in VT-d's DMAR handling.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: fca69b1fc606ece62430076ca4a157e4bed749a8
master date: 2014-05-26 12:25:01 +0200

10 years agomove domain to cpupool0 before destroying it
Juergen Gross [Fri, 23 May 2014 13:20:02 +0000 (15:20 +0200)]
move domain to cpupool0 before destroying it

Currently when a domain is destroyed it is removed from the domain_list
before all of it's resources, including the cpupool membership, are freed.
This can lead to a situation where the domain is still member of a cpupool
without for_each_domain_in_cpupool() (or even for_each_domain()) being
able to find it any more. This in turn can result in rejection of removing
the last cpu from a cpupool, because there seems to be still a domain in
the cpupool, even if it can't be found by scanning through all domains.

This situation can be avoided by moving the domain to be destroyed to
cpupool0 first and then remove it from this cpupool BEFORE deleting it from
the domain_list. As cpupool0 is always active and a domain without any cpupool
membership is implicitly regarded as belonging to cpupool0, this poses no
problem.

Signed-off-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
master commit: bac6334b51d9bcfe57ecf4a4cb5288348fcf044a
master date: 2014-05-20 15:55:42 +0200

10 years agoVT-d: extend error report masking workaround to newer chipsets
Jan Beulich [Fri, 23 May 2014 13:19:19 +0000 (15:19 +0200)]
VT-d: extend error report masking workaround to newer chipsets

Add two more PCI IDs to the set that has been taken care of with a
different workaround long before XSA-59, and (for constency with the
newer workarounds) log a message here too.

Also move the function wide comment to the cases it applies to; this
should really have been done by d061d200 ("VT-d: suppress UR signaling
for server chipsets").

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 04734664eb20c3bf239e473af182bb7ab901d779
master date: 2014-05-20 15:54:01 +0200

10 years agoVT-d: apply quirks at device setup time rather than only at boot
Jan Beulich [Fri, 23 May 2014 13:18:44 +0000 (15:18 +0200)]
VT-d: apply quirks at device setup time rather than only at boot

Accessing extended config space may not be possible at boot time, e.g.
when the memory space used by MMCFG is reserved only via ACPI tables,
but not in the E820/UEFI memory maps (which we need Dom0 to tell us
about). Consequently the change here still leaves the issue unaddressed
for systems where the extended config space remains inaccessible (due
to firmware bugs, i.e. not properly reserving the address space of
those regions).

With the respective messages now potentially getting logged more than
once, we ought to consider whether we should issue them only if we in
fact were required to do any masking (i.e. if the relevant mask bits
weren't already set).

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
master commit: 5786718fbaafbe47d72cc1512cd93de79b8fc2fa
master date: 2014-05-20 15:53:20 +0200

10 years agox86/MCE: bypass uninitialized vcpu in vMCE injection
Kai Huang [Fri, 23 May 2014 13:17:56 +0000 (15:17 +0200)]
x86/MCE: bypass uninitialized vcpu in vMCE injection

Dom0 may bring up less number of vCPUs than xen hypervisor actually created for
it, and in this case, on Intel platform, vMCE injection to dom0 will fail due to
injecting vMCE to uninitialized vcpu, and cause dom0 crash.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Acked-by: Christoph Egger <chegger@amazon.de>
master commit: a07084525c126c596326dc1442dd218f522f51b4
master date: 2014-05-14 10:54:39 +0200

10 years agoNested VMX: load current_vmcs only when it exists
Edmund H White [Fri, 23 May 2014 13:17:21 +0000 (15:17 +0200)]
Nested VMX: load current_vmcs only when it exists

There may not have valid vmcs on current CPU. So only load it when it exists.

This original fixing is from Edmud <edmund.h.white@intel.com>.

Signed-off-by: Edmund H White <edmund.h.white@intel.com>
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 99c03bc6a1f8c6722926d2db781ece045f9d09ae
master date: 2014-05-12 11:59:19 +0200

10 years agox86: use native RDTSC(P) execution when guest and host frequencies are the same
Boris Ostrovsky [Fri, 23 May 2014 13:14:23 +0000 (15:14 +0200)]
x86: use native RDTSC(P) execution when guest and host frequencies are the same

We should be able to continue using native RDTSC(P) execution on
HVM/PVH guests after migration if host and guest frequencies are
equal (this includes the case when the frequencies are made equal
by TSC scaling feature).

This also allows us to revert main part of commit 4aab59a3 (svm: Do not
intercept RDTSC(P) when TSC scaling is supported by hardware) which
was wrong: while RDTSC intercepts were disabled domain's vtsc could
still be set, leading to inconsistent view of guest's TSC.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 82713ec8d2b65d17f13e46a131e38bfe5baf8bd6
master date: 2014-04-22 12:07:37 +0200

10 years agox86/hvm/rtc: always deassert the IRQ line when clearing REG_C.IRQF
Tim Deegan [Fri, 23 May 2014 13:11:51 +0000 (15:11 +0200)]
x86/hvm/rtc: always deassert the IRQ line when clearing REG_C.IRQF

Even in no-ack mode, there's no reason to leave the line asserted
after an explicit ack of the interrupt.

Furthermore, rtc_update_irq() is an unconditional noop having just cleared
REG_C.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6d27a537727ca933bfef8ba01bc65847dc97cee1
master date: 2014-02-25 09:30:21 +0100

10 years agox86/hvm/rtc: inject RTC periodic interupts from the vpt code
Tim Deegan [Fri, 23 May 2014 13:10:42 +0000 (15:10 +0200)]
x86/hvm/rtc: inject RTC periodic interupts from the vpt code

Let the vpt code drive the RTC's timer interrupts directly, as it does
for other periodic time sources, and fix up the register state in a
vpt callback when the interrupt is injected.

This fixes a hang seen on Windows 2003 in no-missed-ticks mode, where
when a tick was pending, the early callback from the VPT code would
always set REG_C.PF on every VMENTER; meanwhile the guest was in its
interrupt handler reading REG_C in a loop and waiting to see it clear.

One drawback is that a guest that attempts to suppress RTC periodic
interrupts by failing to read REG_C will receive up to 10 spurious
interrupts, even in 'strict' mode.  However:
 - since all previous RTC models have had this property (including
   the current one, since 'no-ack' mode is hard-coded on) we're
   pretty sure that all guests can handle this; and
 - we're already playing some other interesting games with this
   interrupt in the vpt code.

One other corner case: a guest that enables the PF timer interrupt,
masks the interupt in the APIC and then polls REG_C looking for PF
will not see PF getting set.  The more likely case of enabling the
timers and masking the interrupt with REG_B.PIE is already handled
correctly.

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c7e35c6ec705d777c0a11124ec28876f1468f2c5
master date: 2014-02-25 09:29:26 +0100

10 years agox86/hvm/rtc: don't run the vpt timer when !REG_B.PIE
Tim Deegan [Fri, 23 May 2014 13:08:48 +0000 (15:08 +0200)]
x86/hvm/rtc: don't run the vpt timer when !REG_B.PIE

If the guest has not asked for interrupts, don't run the vpt timer
to generate them.  This is a prerequisite for a patch to simplify how
the vpt interacts with the RTC, and also gets rid of a timer series in
Xen in a case where it's unlikely to be needed.

Instead, calculate the correct value for REG_C.PF whenever REG_C is
read or PIE is enabled.  This allow a guest to poll for the PF bit
while not asking for actual timer interrupts.  Such a guest would no
longer get the benefit of the vpt's timer modes.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 4c15a82f034c9c2213a18b6320834f3906d00ba9
master date: 2014-02-25 09:26:45 +0100

10 years agoxen: arm: bitops take unsigned int
Ian Campbell [Thu, 8 May 2014 15:13:55 +0000 (16:13 +0100)]
xen: arm: bitops take unsigned int

Xen bitmaps can be 4 rather than 8 byte aligned, so use the appropriate type.
Otherwise the compiler can generate unaligned 8 byte accesses and cause traps.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
(cherry picked from commit cd338e967c598bf747b03dcfd9d8d45dc40bac1a)

10 years agoxen/arm: Add missing newline after commit 60f7376
Julien Grall [Thu, 24 Apr 2014 22:45:53 +0000 (23:45 +0100)]
xen/arm: Add missing newline after commit 60f7376

Commit 60f7376 "xen/arm: Inject an undefined instruction when the coproc/sysreg
is not handled" replaced panic by gdprintk.

Unfortunately panic message string doesn't need newline, rather than gdprintk
will request one.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 237f260efa3e69ca330e8218293fa2d79c5dabe1)

10 years agoxen: arm: fully implement multicall interface.
Ian Campbell [Thu, 17 Apr 2014 12:57:24 +0000 (13:57 +0100)]
xen: arm: fully implement multicall interface.

I'm not sure what I was smoking at the time of 5d74ad1a082e "xen: arm:
implement do_multicall_call for both 32 and 64-bit" but it is obviously
insufficient since it doesn't actually wire up the hypercall.

Before doing so we need to make the usual adjustments for ARM and turn the
unsigned longs into xen_ulong_t. There is no difference in the resulting
structure for x86.

There are knock on changes to the trace interface, but again they are nops on
x86.

For 32-bit ARM guests we require that the arguments which they pass to a
hypercall via a multicall do not use the upper bits of xen_ulong_t and kill
them if they violate this. This should ensure that no ABI surprises can be
silently lurking when running on a 32-bit hypervisor waiting to pounce when the
same kernel is run on a 64-bit hypervisor. Killing the guest is harsh but it
will be far easier to relax the restriction if it turns out to cause problems
than to tighten it up if we were lax to begin with.

In the interests of clarity and always using explicitly sized types change the
unsigned int in the hypercall arguments to a uint32_t. There is no actual
change here on any platform.

We should consider backporting this to 4.4.1 in case a guest decides they want
to use a multicall in common code e.g. I suggested such a thing while
reviewing a netback change recently.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: keir@xen.org
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit f0dbdc628a0ecdc44d6afab28a9d5a52c996eec5)
[ ijc -- s/is_32bit_domain/is_pv32_domain ]

10 years agotools: arm: improve placement of initial modules.
Ian Campbell [Wed, 9 Apr 2014 11:51:14 +0000 (12:51 +0100)]
tools: arm: improve placement of initial modules.

314c9815e2f5 "tools: implement initial ramdisk support for ARM." broke starting
guests with <= 128 MB ram by placing the boot modules (dtb and initrd)
immediately after the kernel in this case, running the risk of them being
overwritten. Instead place the modules at the end of RAM, as the hypervisor
does for dom0.

The hypervisor also falls back to placing things before the kernel as a last
resort before failing, so add that here too.

Tested with the Debian installer initrd and guests of 96MB, 128MB, 256MB and
1GB. All work, also tested with 64MB but the installer doesn't run with so
little RAM (but our placement of the initrd is correct).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 6f4ff742a5caa411397fc38233f818e64a0c541c)

10 years agotools: implement initial ramdisk support for ARM.
Ian Campbell [Fri, 4 Apr 2014 13:28:45 +0000 (14:28 +0100)]
tools: implement initial ramdisk support for ARM.

The ramdisk is passed to the kernel as a property in the chosen node of the
device tree. This is somewhat tricky since in order to place the ramdisk and
dtb in ram we first need to know the size of the dtb. So we initially create a
DTB with placeholders for the ramdisk and finalise the value (which doesn't
change the size) once we know where everything is.

Rename libxl__arch_domain_configure to xl__arch_domain_init_hw_description to
better reflect its use and to be consistent with the new
libxl__arch_domain_finalise_hw_description.

The common xc_dom_build_image() function did not support explicit placement of
the ramdisk, instead passing 0 to xc_dom_alloc_segment, meaning "pick
somewhere". This change instead passes ramdisk_seg.vstart. If nothing has set
vstart then it will be zero because the entire dom struct is zeroed on
allocation in xc_dom_allocate(). Therefore there is no change to the behaviour
on x86. This is also consistent with how other segments (kernel, dtb) are
handled.

Furthermore if the ramdisk has been explicitly placed then xc_dom_build_image()
assumes that it is not to be decompressed (since that would muck up the sizings
used on placement).

With all that I'm able to boot a domain using the current Debian Jessie armhf
installer initrd and have it complete successfully.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- s/itherwise/otherwise and dropped bogus emacs magic change ]
(cherry picked from commit 314c9815e2f5dc8a9fec11e0cf9b49b16ed0e96b)

10 years agolibxc: Free logger after printing error message
Jason Andryuk [Fri, 16 May 2014 20:41:17 +0000 (16:41 -0400)]
libxc: Free logger after printing error message

On error, PERROR calls the already destroyed logger, which can segfault.
Re-order the calls, so the logger is still available.

Signed-off-by: Jason Andryuk <andryuk@aero.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 86216963fd1d89883bb8120535704fdc79fdad50)

10 years agolibxl: Fix error path in libxl_device_events_handler
Ian Jackson [Wed, 19 Feb 2014 14:03:29 +0000 (14:03 +0000)]
libxl: Fix error path in libxl_device_events_handler

libxl_device_events_handler would fail to call AO_ABORT if it failed;
instead it would simply return rc.  (This leaves the egc etc. from the
now-abolished stack frame potentially live, and leaves the ctx
locked.)

In xl, this is of no consequence, because xl will immediately exit in
this situation.  This is very likely to be true in any other callers
(of which we don't know of any, anyway).

Coverity-ID: 1181840
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
CC: coverity@xenproject.org
(cherry picked from commit c566ab68af7da089ae2b0ff664d02a93a0647584)

10 years agotools/libxl: Don't read off the end of tinfo[]
Andrew Cooper [Tue, 18 Feb 2014 15:59:05 +0000 (15:59 +0000)]
tools/libxl: Don't read off the end of tinfo[]

It is very common for BIOSes to advertise more cpus than are actually present
on the system, and mark some of them as offline.  This is what Xen does to
allow for later CPU hotplug, and what BIOSes common to multiple different
systems do to to save fully rewriting the MADT in memory.

An excerpt from `xl info` might look like:

...
nr_cpus                : 2
max_cpu_id             : 3
...

Which shows 4 CPUs in the MADT, but only 2 online (as this particular box is
the dual-core rather than the quad-core SKU of its particular brand)

Because of the way Xen exposes this information, a libxl_cputopology array is
bounded by 'nr_cpus', while cpu bitmaps are bounded by 'max_cpu_id + 1'.

The current libxl code has two places which erroneously assume that a
libxl_cputopology array is as long as the number of bits found in a cpu
bitmap, and valgrind complains:

==14961== Invalid read of size 4
==14961==    at 0x407AB7F: libxl__get_numa_candidate (libxl_numa.c:230)
==14961==    by 0x407030B: libxl__build_pre (libxl_dom.c:167)
==14961==    by 0x406246F: libxl__domain_build (libxl_create.c:371)
...
==14961==  Address 0x4324788 is 8 bytes after a block of size 24 alloc'd
==14961==    at 0x402669D: calloc (in/usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==14961==    by 0x4075BB9: libxl__zalloc (libxl_internal.c:83)
==14961==    by 0x4052F87: libxl_get_cpu_topology (libxl.c:4408)
==14961==    by 0x407A899: libxl__get_numa_candidate (libxl_numa.c:342)
...

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 81b03050485708698ce2245d9abefce07aafb704)

10 years agotools/pygrub: Fix error handling if no valid partitions are found
Andrew Cooper [Sat, 10 May 2014 01:18:33 +0000 (02:18 +0100)]
tools/pygrub: Fix error handling if no valid partitions are found

If no partitions at all are found, pygrub never creates the name 'fs',
resulting in a NameError indicating the lack of fs, rather than a
RuntimeError explaining that no partitions were found.

Set fs to None right at the start, and use the pythonic idiom "if fs is None:"
to protect against otherwise valid values for fs which compare equal to
0/False.

Reported-by: Sven Köhler <sven.koehler@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit d75215805ce6ed20b3807955fab6a7f7a3368bee)

10 years agolibxl_json: remove extra "break"
Wei Liu [Wed, 9 Apr 2014 13:29:13 +0000 (14:29 +0100)]
libxl_json: remove extra "break"

... otherwise JSON array elements are not freed and memory is leaked.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 3eb54a2fdbc216b39dc2c0a86f11a32d4c838269)

10 years agotmem: remove dumb check in do_tmem_destroy_pool
Julien Grall [Fri, 4 Apr 2014 09:13:32 +0000 (11:13 +0200)]
tmem: remove dumb check in do_tmem_destroy_pool

do_tmem_destroy_pool is checking if pools == NULL. But, pools is a fixed
array.

Clang 3.5 will fail to compile xen/common/tmem.c with the following error:
tmem.c:1848:18: error: comparison of array 'client->pools' equal to a null
pointer is always false [-Werror,-Wtautological-pointer-compare]
    if ( client->pools == NULL )

Coverity-ID:1055632

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ac0f56a2fa407e0704fade12630a5a960dedce87)

10 years agotools: require OCaml version 3.09.3 or greater
Roger Pau Monne [Tue, 11 Feb 2014 10:38:24 +0000 (11:38 +0100)]
tools: require OCaml version 3.09.3 or greater

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Don Slutz <dslutz@verizon.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@citrix.com>
(cherry picked from commit a37c389930936c3a9b1215c385fdd22854836871)

11 years agotools: arm: remove code to check for a DTB appended to the kernel
Ian Campbell [Wed, 14 May 2014 14:19:13 +0000 (15:19 +0100)]
tools: arm: remove code to check for a DTB appended to the kernel

The code to check for an appended DTB was confusing and unnecessary. Since we
know the size of the kernel binary passed to us we should just load the entire
thing into guest RAM (subject to the limits checks). Removing this code avoids
a whole raft of overflow and alignment issues.

We also need to validate the limits of the segment where we intend to load the
kernel to avoid overflow issues.

For ARM32 we control the load address, but we need to validate the size. The
entry point is only relevant within the guest so we don't need to worry about
that.

For ARM64 we need to validate both the load address (which is the same as the
entry point) and the size.

This is XSA-95.

Reported-by: Thomas Leonard <talex5@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agox86: fix guest CPUID handling
Jan Beulich [Mon, 12 May 2014 15:19:01 +0000 (17:19 +0200)]
x86: fix guest CPUID handling

The way XEN_DOMCTL_set_cpuid got handled so far allowed for surprises
to the caller. With this set of operations
- set leaf A (using array index 0)
- set leaf B (using array index 1)
- clear leaf A (clearing array index 0)
- set leaf B (using array index 0)
- clear leaf B (clearing array index 0)
the entry for leaf B at array index 1 would still be in place, while
the caller would expect it to be cleared.

While looking at the use sites of d->arch.cpuid[] I also noticed that
the allocation of the array needlessly uses the zeroing form - the
relevant fields of the array elements get set in a loop immediately
following the allocation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 4c0ff6bd54b5a67f8f820f9ed0a89a79f1a26a1c
master date: 2014-05-02 12:09:03 +0200

11 years agohvm_set_ioreq_page() releases wrong page in error path
Paul Durrant [Mon, 12 May 2014 15:18:08 +0000 (17:18 +0200)]
hvm_set_ioreq_page() releases wrong page in error path

The function calls prepare_ring_for_helper() to acquire a mapping for the
given gmfn, then checks (under lock) to see if the ioreq page is already
set up but, if it is, the function then releases the in-use ioreq page
mapping on the error path rather than the one it just acquired. This patch
fixes this bug.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 16e2a7596e9fc86881c73cef57602b2c88155528
master date: 2014-05-02 11:46:32 +0200

11 years agox86/HVM: correct the SMEP logic for HVM_CR0_GUEST_RESERVED_BITS
Feng Wu [Mon, 12 May 2014 15:15:50 +0000 (17:15 +0200)]
x86/HVM: correct the SMEP logic for HVM_CR0_GUEST_RESERVED_BITS

When checking the SMEP feature for HVM guests, we should check the
VCPU instead of the host CPU.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 31ee951a3bee6e7cc21f94f900fe989e3701a79a
master date: 2014-04-28 12:47:24 +0200

11 years agopassthrough: allow to suppress SERR and PERR signaling altogether
Jan Beulich [Mon, 12 May 2014 15:14:46 +0000 (17:14 +0200)]
passthrough: allow to suppress SERR and PERR signaling altogether

This is just to have a workaround at hand in case other chipsets (not
covered by the previous two patches) also have similar issues.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: 1a2a390a560e8319a6be98c7ab6cfaebd230f67e
master date: 2014-04-25 12:13:31 +0200

11 years agoVT-d: suppress UR signaling for desktop chipsets
Jan Beulich [Mon, 12 May 2014 15:13:32 +0000 (17:13 +0200)]
VT-d: suppress UR signaling for desktop chipsets

Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the desktop chipsets dealt with here.

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: d6cb14b34ffc2a830022d059f1aa22bf19dcf55f
master date: 2014-04-25 12:12:38 +0200

11 years agoVT-d: suppress UR signaling for server chipsets
Jan Beulich [Mon, 12 May 2014 15:11:12 +0000 (17:11 +0200)]
VT-d: suppress UR signaling for server chipsets

Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the server chipsets dealt with here.

IDs 0xe00, 0xe01, and 0xe04 ... 0xe0b (Ivytown) aren't needed here -
Intel confirmed the issue to be fixed in hardware there.

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: d061d200eb92bcb1d86f9b55c6de73e35ce63fdf
master date: 2014-04-25 12:11:55 +0200

11 years agox86: add missing break in dom0_pit_access()
Jan Beulich [Thu, 8 May 2014 08:03:38 +0000 (10:03 +0200)]
x86: add missing break in dom0_pit_access()

Coverity ID 1203045

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 815dc9f1dba5782dcef77d8a002a11f5b1e5cc37
master date: 2014-04-23 15:07:11 +0200

11 years agox86/HAP: also flush TLB when altering a present 1G or intermediate entry
Jan Beulich [Thu, 8 May 2014 08:03:01 +0000 (10:03 +0200)]
x86/HAP: also flush TLB when altering a present 1G or intermediate entry

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
master commit: c82fbfe6ec8be597218eb943641d1f7a81c4c01e
master date: 2014-04-14 15:14:47 +0200

11 years agox86/nested HAP: don't BUG() on legitimate error
Jan Beulich [Thu, 8 May 2014 08:02:24 +0000 (10:02 +0200)]
x86/nested HAP: don't BUG() on legitimate error

p2m_set_entry() can fail without there being a bug in the code - crash
the domain rather than the host in that case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
master commit: 1ca73aaf51eba14256794bf045c2eb01e88e1324
master date: 2014-04-14 12:50:56 +0200

11 years agox86/AMD: feature masking is unavailable on Fam11
Jan Beulich [Thu, 8 May 2014 08:01:03 +0000 (10:01 +0200)]
x86/AMD: feature masking is unavailable on Fam11

Reported-by: Aravind Gopalakrishnan<aravind.gopalakrishnan@amd.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 70e79fad6dc6f533ff83ee23b8d13de5a696d896
master date: 2014-04-09 16:13:25 +0200

11 years agoxen/arm: Correctly save/restore CNTKCTL_EL1
Julien Grall [Thu, 1 May 2014 10:55:14 +0000 (11:55 +0100)]
xen/arm: Correctly save/restore CNTKCTL_EL1

CNTKCTL_EL1 is used by the guest to control access to the timer from
userspace.  It therefore needs to be save/restored by Xen as part of
the VCPU state.

By default Linux on ARM64 exposes the timer to userspace.  Furthermore on
ARM64, Linux provides helpers in a VDSO (gettimeofday/__do_get_tspec)
that use the timer counter.  Conversely, during CPU bring up, Xen will
set CNTKCTL_EL1 to 0 (i.e disallow timer access to the userspace).  As
a result, currently, if dom0 has 1 VCPU which is migrated to another
PCPU, init might crash.

Alternatively, a guest (malicious or not) might decide to disable
access to the timer from userspace.  If the register is not
save/restored, when a DOM0 VCPU runs again, a similar crash would
result.

Also, drop CNTKCTL_EL1 initialization in init_timer_interrupt.  Xen
should let the guest deal with this register.

This is XSA-91 / CVE-2014-3125.

Reported-by: Chen Baozi <baozich@gmail.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agox86/HVM: restrict HVMOP_set_mem_type
Jan Beulich [Tue, 29 Apr 2014 13:27:22 +0000 (15:27 +0200)]
x86/HVM: restrict HVMOP_set_mem_type

Permitting arbitrary type changes here has the potential of creating
present P2M (and hence EPT/NPT/IOMMU) entries pointing to an invalid
MFN (INVALID_MFN truncated to the respective hardware structure field's
width). This would become a problem the latest when something real sat
at the end of the physical address space; I'm suspecting though that
other things might break with such bogus entries.

Along with that drop a bogus (and otherwise becoming stale) log
message.

Afaict the similar operation in p2m_set_mem_access() is safe.

This is XSA-92.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 83bb5eb4d340acebf27b34108fb1dae062146a68
master date: 2014-04-29 15:11:31 +0200

11 years agoxen/arm64: Correctly align VFP regs
Julien Grall [Thu, 10 Apr 2014 11:43:57 +0000 (12:43 +0100)]
xen/arm64: Correctly align VFP regs

On arm64, VFP instructions requires vfpregs to be 128-byte aligned.

By chance, the field is already correctly aligned. In the case if someone
decides to add a new field before, Xen will receive a data abort as soon as
it saves/restores VFP.

We are safe on arm32 as the only constraint is to be 32-byte aligned.

Reported-by: Chen Baozi <baozich@gmail.com>
Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 9b4e96724eeb916f2cd311d9133f00c216caa321)

11 years agoxen: arm: prevent building with CONFIG_EARLY_PRINTK if not a debug build
Ian Campbell [Wed, 5 Mar 2014 01:02:29 +0000 (01:02 +0000)]
xen: arm: prevent building with CONFIG_EARLY_PRINTK if not a debug build

early printk on ARM is tied to debug being enabled, so error out instead of silently and unexpectedly building without early printk when asked.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 5940d3d095661f541a843e5d4c5f9363c18cd63c)

11 years agoxen/arm: domain_vgic_init: Check xzalloc_* return
Julien Grall [Thu, 20 Mar 2014 13:51:26 +0000 (13:51 +0000)]
xen/arm: domain_vgic_init: Check xzalloc_* return

The allocations for shared_irqs and pending_irqs are not checked and use
later. This may lead to a Xen segfault if the hypervisor run out of memory.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit a34f6affe799cf493640b58a794132d213288ba3)

11 years agoxen/arm: vgic: Check rank in GICD_ICFGR* emulation before locking
Ian Campbell [Wed, 23 Apr 2014 15:32:45 +0000 (16:32 +0100)]
xen/arm: vgic: Check rank in GICD_ICFGR* emulation before locking

The function vgic_irq_rank may return NULL is the IRQ is not in range handled
by the guest. This will result to derefence a NULL pointer which will crash
Xen.

I've checked the rest of the emulation and this is only place where the lock
is taken before the rank is checked.

This is CVE-2014-2986 / XSA-94.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reported-by: Thomas Leonard <talex5@gmail.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: x86 & generic: change to __builtin_prefetch()
Ian Campbell [Wed, 23 Apr 2014 14:25:21 +0000 (16:25 +0200)]
xen: x86 & generic: change to __builtin_prefetch()

Quoting Andi Kleen in Linux b483570a13be from 2007:
    gcc 3.2+ supports __builtin_prefetch, so it's possible to use it on all
    architectures. Change the generic fallback in linux/prefetch.h to use it
    instead of noping it out. gcc should do the right thing when the
    architecture doesn't support prefetching

    Undefine the x86-64 inline assembler version and use the fallback.

ARM wants to use the builtins.

Fix a pair of spelling errors, one of which was from Lucas De Marchi in the
Linux tree.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Acked-by: Tim Deegan <tim@xen.org>
master commit: 630017f420f111e0c0332dbd99df30ebb8fed207
master date: 2014-04-03 17:15:41 +0100

11 years agox86/mm: fix checks against max_mapped_pfn
Jan Beulich [Wed, 23 Apr 2014 14:24:02 +0000 (16:24 +0200)]
x86/mm: fix checks against max_mapped_pfn

This value is an inclusive one, i.e. this fixes an off-by-one in memory
sharing and an off-by-two in shadow code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 088ee1d47b65d6bb92de61b404805f4ca92e3240
master date: 2014-04-03 12:08:43 +0100

11 years agoxen/arm: Don't let guess access to Debug and Performance Monitor registers
Julien Grall [Tue, 15 Apr 2014 13:06:42 +0000 (14:06 +0100)]
xen/arm: Don't let guess access to Debug and Performance Monitor registers

Debug and performance registers are not properly switched by Xen.

Trap them and inject an undefined instruction, except for those registers
which might be unconditionally accessed which we implement as RAZ/WI.

This is CVE-2014-2915 / XSA-93.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Don't expose implementation defined registers (Cp15 c15) to the guest
Julien Grall [Tue, 15 Apr 2014 11:45:28 +0000 (12:45 +0100)]
xen/arm: Don't expose implementation defined registers (Cp15 c15) to the guest

On Cortex-A15, CP15 c15 contains registers to retrieve data from L1/L2 RAM.

Exposing this registers to guest may result to leak data from Xen and/or
another guest.

By default trap every registers and inject an undefined instruction.

This is CVE-2014-2915 / XSA-93.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Trap cache and TCM lockdown registers
Julien Grall [Mon, 14 Apr 2014 19:00:14 +0000 (20:00 +0100)]
xen/arm: Trap cache and TCM lockdown registers

Some cp15 c9/c10/c11 encodings are used for:
     - cache control
     - TCM control
     - branch predictor control

All of them are implementation defined. For now inject an undefined exception
if the guest wants try to access it.

This is CVE-2014-2915 / XSA-93.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Upgrade DCISW into DCCISW
Julien Grall [Mon, 14 Apr 2014 19:46:43 +0000 (20:46 +0100)]
xen/arm: Upgrade DCISW into DCCISW

A guest is allowed to use invalidate cache by set/way instruction (i.e DCISW)
without any restriction. As the cache is shared with Xen, the guest invalidate
an address being in used by Xen. This may lead a Xen crash because the memory
state is invalid.
Set the bit HCR.SWIO to upgrade invalidate cache by set/way instruction to an
invalidate and clean.

This is CVE-2014-2915 / XSA-93.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reported-by: Thomas Leonard <tal36@cam.ac.uk>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Don't let the guest access the coprocessors registers
Julien Grall [Mon, 14 Apr 2014 19:37:16 +0000 (20:37 +0100)]
xen/arm: Don't let the guest access the coprocessors registers

In Xen we only handle save/restore for coprocessor 10 and 11 (NEON). Other
coprocessors (0-9, 12-13) are currently exposed to the guest and may lead
to data shared between guest.

Disable access to all coprocessor except 10 and 11 by setting correctly
HCTPR.

This is CVE-2014-2915 / XSA-93.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen/arm: Inject an undefined instruction when the coproc/sysreg is not handled
Julien Grall [Mon, 14 Apr 2014 18:01:20 +0000 (19:01 +0100)]
xen/arm: Inject an undefined instruction when the coproc/sysreg is not handled

Currently Xen panics if it's unable to handle a coprocessor/sysreg instruction.
Replace this behavior by inject an undefined instruction to the faulty guest
and log if Xen is in debug mode.

This is CVE-2014-2915 / XSA-93.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoPV-GRUB: fix blk access at end of disk
Samuel Thibault [Fri, 21 Mar 2014 01:56:56 +0000 (02:56 +0100)]
PV-GRUB: fix blk access at end of disk

GRUB usually always loads a whole disk track, even if that means going
beyond the end of the disk.  We thus have to gracefully return an error,
instead of letting the blkfront go panic.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 51e18e41e39a682de5a2e60ad86048dc6344efec)