]> xenbits.xensource.com Git - xen.git/log
xen.git
10 years agotools/pygrub: Fix error handling if no valid partitions are found
Andrew Cooper [Sat, 10 May 2014 01:18:33 +0000 (02:18 +0100)]
tools/pygrub: Fix error handling if no valid partitions are found

If no partitions at all are found, pygrub never creates the name 'fs',
resulting in a NameError indicating the lack of fs, rather than a
RuntimeError explaining that no partitions were found.

Set fs to None right at the start, and use the pythonic idiom "if fs is None:"
to protect against otherwise valid values for fs which compare equal to
0/False.

Reported-by: Sven Köhler <sven.koehler@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit d75215805ce6ed20b3807955fab6a7f7a3368bee)
(cherry picked from commit 5ee75ef147f83457fa28d4d4374efcf066581e26)
(cherry picked from commit 11b2541f458a3d09c63980e669c166cf6e96980a)

10 years agolibxl_json: remove extra "break"
Wei Liu [Wed, 9 Apr 2014 13:29:13 +0000 (14:29 +0100)]
libxl_json: remove extra "break"

... otherwise JSON array elements are not freed and memory is leaked.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 3eb54a2fdbc216b39dc2c0a86f11a32d4c838269)
(cherry picked from commit d6eff6fcc05f7167e5b2232d3bc60047fffb8fc4)
(cherry picked from commit a14bb4db517ca076ad7d785be52d4bd7a6df6de9)

10 years agotmem: remove dumb check in do_tmem_destroy_pool
Julien Grall [Fri, 4 Apr 2014 09:13:32 +0000 (11:13 +0200)]
tmem: remove dumb check in do_tmem_destroy_pool

do_tmem_destroy_pool is checking if pools == NULL. But, pools is a fixed
array.

Clang 3.5 will fail to compile xen/common/tmem.c with the following error:
tmem.c:1848:18: error: comparison of array 'client->pools' equal to a null
pointer is always false [-Werror,-Wtautological-pointer-compare]
    if ( client->pools == NULL )

Coverity-ID:1055632

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
(cherry picked from commit ac0f56a2fa407e0704fade12630a5a960dedce87)
(cherry picked from commit 6ce0c3fca9bd1c0d45908452d6e5e9f7bf22f7b7)
(cherry picked from commit 804d9af208c5c95156140b1c62cf8857ba250b03)

10 years agoRevert "VT-d: suppress UR signaling for desktop chipsets"
Jan Beulich [Wed, 21 May 2014 14:33:57 +0000 (16:33 +0200)]
Revert "VT-d: suppress UR signaling for desktop chipsets"

This reverts commit a910070d4289fdf71c3ca35886192a602a3724d5 -
the use of ioremap()/iounmap() is only valid from 4.3 onwards.

10 years agoadd xen/include/xen/pci_ids.h forgotten in 64c4c7c8
Jan Beulich [Tue, 13 May 2014 06:30:56 +0000 (08:30 +0200)]
add xen/include/xen/pci_ids.h forgotten in 64c4c7c8

10 years agox86: fix guest CPUID handling
Jan Beulich [Mon, 12 May 2014 15:43:00 +0000 (17:43 +0200)]
x86: fix guest CPUID handling

The way XEN_DOMCTL_set_cpuid got handled so far allowed for surprises
to the caller. With this set of operations
- set leaf A (using array index 0)
- set leaf B (using array index 1)
- clear leaf A (clearing array index 0)
- set leaf B (using array index 0)
- clear leaf B (clearing array index 0)
the entry for leaf B at array index 1 would still be in place, while
the caller would expect it to be cleared.

While looking at the use sites of d->arch.cpuid[] I also noticed that
the allocation of the array needlessly uses the zeroing form - the
relevant fields of the array elements get set in a loop immediately
following the allocation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 4c0ff6bd54b5a67f8f820f9ed0a89a79f1a26a1c
master date: 2014-05-02 12:09:03 +0200

10 years agohvm_set_ioreq_page() releases wrong page in error path
Paul Durrant [Mon, 12 May 2014 15:42:33 +0000 (17:42 +0200)]
hvm_set_ioreq_page() releases wrong page in error path

The function calls prepare_ring_for_helper() to acquire a mapping for the
given gmfn, then checks (under lock) to see if the ioreq page is already
set up but, if it is, the function then releases the in-use ioreq page
mapping on the error path rather than the one it just acquired. This patch
fixes this bug.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 16e2a7596e9fc86881c73cef57602b2c88155528
master date: 2014-05-02 11:46:32 +0200

10 years agox86/HVM: correct the SMEP logic for HVM_CR0_GUEST_RESERVED_BITS
Feng Wu [Mon, 12 May 2014 15:41:42 +0000 (17:41 +0200)]
x86/HVM: correct the SMEP logic for HVM_CR0_GUEST_RESERVED_BITS

When checking the SMEP feature for HVM guests, we should check the
VCPU instead of the host CPU.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 31ee951a3bee6e7cc21f94f900fe989e3701a79a
master date: 2014-04-28 12:47:24 +0200

11 years agopassthrough: allow to suppress SERR and PERR signaling altogether
Jan Beulich [Mon, 12 May 2014 15:39:59 +0000 (17:39 +0200)]
passthrough: allow to suppress SERR and PERR signaling altogether

This is just to have a workaround at hand in case other chipsets (not
covered by the previous two patches) also have similar issues.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: 1a2a390a560e8319a6be98c7ab6cfaebd230f67e
master date: 2014-04-25 12:13:31 +0200

11 years agoVT-d: suppress UR signaling for desktop chipsets
Jan Beulich [Mon, 12 May 2014 15:35:18 +0000 (17:35 +0200)]
VT-d: suppress UR signaling for desktop chipsets

Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the desktop chipsets dealt with here.

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: d6cb14b34ffc2a830022d059f1aa22bf19dcf55f
master date: 2014-04-25 12:12:38 +0200

11 years agoVT-d: suppress UR signaling for server chipsets
Jan Beulich [Mon, 12 May 2014 15:33:41 +0000 (17:33 +0200)]
VT-d: suppress UR signaling for server chipsets

Unsupported Requests can be signaled for malformed writes to the MSI
address region, e.g. due to buggy or malicious DMA set up to that
region. These should normally result in IOMMU faults, but don't on
the server chipsets dealt with here.

IDs 0xe00, 0xe01, and 0xe04 ... 0xe0b (Ivytown) aren't needed here -
Intel confirmed the issue to be fixed in hardware there.

This is CVE-2013-3495 / XSA-59.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Don Dugger <donald.d.dugger@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: d061d200eb92bcb1d86f9b55c6de73e35ce63fdf
master date: 2014-04-25 12:11:55 +0200

11 years agox86: add missing break in dom0_pit_access()
Jan Beulich [Mon, 12 May 2014 15:32:15 +0000 (17:32 +0200)]
x86: add missing break in dom0_pit_access()

Coverity ID 1203045

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 815dc9f1dba5782dcef77d8a002a11f5b1e5cc37
master date: 2014-04-23 15:07:11 +0200

11 years agox86/HAP: also flush TLB when altering a present 1G or intermediate entry
Jan Beulich [Mon, 12 May 2014 15:31:37 +0000 (17:31 +0200)]
x86/HAP: also flush TLB when altering a present 1G or intermediate entry

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
master commit: c82fbfe6ec8be597218eb943641d1f7a81c4c01e
master date: 2014-04-14 15:14:47 +0200

11 years agox86/AMD: feature masking is unavailable on Fam11
Jan Beulich [Mon, 12 May 2014 15:30:59 +0000 (17:30 +0200)]
x86/AMD: feature masking is unavailable on Fam11

Reported-by: Aravind Gopalakrishnan<aravind.gopalakrishnan@amd.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 70e79fad6dc6f533ff83ee23b8d13de5a696d896
master date: 2014-04-09 16:13:25 +0200

11 years agox86/mm: fix checks against max_mapped_pfn
Jan Beulich [Mon, 12 May 2014 15:30:04 +0000 (17:30 +0200)]
x86/mm: fix checks against max_mapped_pfn

This value is an inclusive one, i.e. this fixes an off-by-one in memory
sharing and an off-by-two in shadow code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 088ee1d47b65d6bb92de61b404805f4ca92e3240
master date: 2014-04-03 12:08:43 +0100

11 years agox86/HVM: restrict HVMOP_set_mem_type
Jan Beulich [Tue, 29 Apr 2014 13:31:28 +0000 (15:31 +0200)]
x86/HVM: restrict HVMOP_set_mem_type

Permitting arbitrary type changes here has the potential of creating
present P2M (and hence EPT/NPT/IOMMU) entries pointing to an invalid
MFN (INVALID_MFN truncated to the respective hardware structure field's
width). This would become a problem the latest when something real sat
at the end of the physical address space; I'm suspecting though that
other things might break with such bogus entries.

Along with that drop a bogus (and otherwise becoming stale) log
message.

Afaict the similar operation in p2m_set_mem_access() is safe.

This is XSA-92.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 83bb5eb4d340acebf27b34108fb1dae062146a68
master date: 2014-04-29 15:11:31 +0200

11 years agoVMX: fix PAT value seen by guest
Jan Beulich [Wed, 9 Apr 2014 09:52:21 +0000 (11:52 +0200)]
VMX: fix PAT value seen by guest

The XSA-60 fixes introduced a window during which the guest PAT gets
forced to all zeros. This shouldn't be visible to the guest. Therefore
we need to intercept PAT MSR accesses during that time period.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
master commit: fce79f8ce91dc45f3a4d699ee67c49e6cbeb1197
master date: 2014-04-01 16:49:18 +0200

11 years agox86/EPT: relax treatment of APIC MFN
Jan Beulich [Wed, 9 Apr 2014 09:40:50 +0000 (11:40 +0200)]
x86/EPT: relax treatment of APIC MFN

There's no point in this being mapped UC by the guest due to using a
respective PAT index - set the ignore-PAT flag to true.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 1f8b57779785bf9f55c16312bb1ec679929c314b
master date: 2014-03-28 13:43:25 +0100

11 years agox86/HVM: correct CPUID leaf 80000008 handling
Jan Beulich [Wed, 9 Apr 2014 09:40:09 +0000 (11:40 +0200)]
x86/HVM: correct CPUID leaf 80000008 handling

CPUID[80000008].EAX[23:16] have been given the meaning of the guest
physical address restriction (in case it needs to be smaller than the
host's), hence we need to mirror that into vCPUID[80000008].EAX[7:0].

Enforce a lower limit at the same time, as well as a fixed value for
the virtual address bits, and zero for the guest physical address ones.

In order for the vMTRR code to see these overrides we need to make it
call hvm_cpuid() instead of domain_cpuid(), which in turn requires
special casing (and relaxing) the controlling domain.

This additionally should hide an ordering problem in the tools: Both
xend and xl appear to be restoring a guest from its image before
setting up the CPUID policy in the hypervisor, resulting in
domain_cpuid() returning all zeros and hence the check in
mtrr_var_range_msr_set() failing if the guest previously had more than
the minimum 36 physical address bits.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: ef437690af8b75e6758dce77af75a22b63982883
master date: 2014-03-28 13:33:34 +0100

11 years agox86: fix determination of bit count for struct domain allocations
Jan Beulich [Wed, 9 Apr 2014 09:39:08 +0000 (11:39 +0200)]
x86: fix determination of bit count for struct domain allocations

We can't just add in the hole shift value, as the hole may be at or
above the 44-bit boundary. Instead we need to determine the total bit
count until reaching 32 significant (not squashed out) bits in PFN
representations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: b3d2f8b2cba9fce5bc8995612d0d13fcefec7769
master date: 2014-03-24 10:48:03 +0100

11 years agox86/Intel: work around Xeon 7400 series erratum AAI65
Jan Beulich [Wed, 9 Apr 2014 09:38:20 +0000 (11:38 +0200)]
x86/Intel: work around Xeon 7400 series erratum AAI65

Linux commit 40e2d7f9b5dae048789c64672bf3027fbb663ffa ("x86 idle:
Repair large-server 50-watt idle-power regression") tells us that this
applies not just to the named Xeon 7400 series, but also NHM-EX and
WSM-EX; sadly Intel's documentation is so badly searchable that I
wasn't able to locate the respective errata (and hence can't quote
their numbers here).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 96d1b237ae9b2f2718bb1c59820701f17d3d86e0
master date: 2014-03-17 16:47:22 +0100

11 years agoVT-d: fix RMRR handling
Jan Beulich [Wed, 9 Apr 2014 09:37:15 +0000 (11:37 +0200)]
VT-d: fix RMRR handling

Removing mapped RMRR tracking structures in dma_pte_clear_one() is
wrong for two reasons: First, these regions may cover more than a
single page. And second, multiple devices (and hence multiple devices
assigned to any particular guest) may share a single RMRR (whether
assigning such devices to distinct guests is a safe thing to do is
another question).

Therefore move the removal of the tracking structures into the
counterpart function to the one doing the insertion -
intel_iommu_remove_device(), and add a reference count to the tracking
structure.

Further, for the handling of the mappings of the respective memory
regions to be correct, RMRRs must not overlap. Add a respective check
to acpi_parse_one_rmrr().

And finally, with all of this being VT-d specific, move the cleanup
of the list as well as the structure type definition where it belongs -
in VT-d specific rather than IOMMU generic code.

Note that this doesn't address yet another issue associated with RMRR
handling: The purpose of the RMRRs as well as the way the respective
IOMMU page table mappings get inserted both suggest that these regions
would need to be marked E820_RESERVED in all (HVM?) guests' memory
maps, yet nothing like this is being done in hvmloader. (For PV guests
this would also seem to be necessary, but may conflict with PV guests
possibly assuming there to be just a single E820 entry representing all
of its RAM.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: dd527061770789d8152b1dea68056987b202d87a
master date: 2014-03-17 16:45:04 +0100

11 years agox86: make hypercall preemption checks consistent
Jan Beulich [Wed, 9 Apr 2014 09:30:12 +0000 (11:30 +0200)]
x86: make hypercall preemption checks consistent

- never preempt on the first iteration (ensure forward progress)
- never preempt on the last iteration (pointless/wasteful)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
master commit: fd7bfce0395ace266159760e35dc49f7af3b90ce
master date: 2014-03-13 14:27:51 +0100

11 years agocommon: make hypercall preemption checks consistent
Jan Beulich [Wed, 9 Apr 2014 09:28:46 +0000 (11:28 +0200)]
common: make hypercall preemption checks consistent

- never preempt on the first iteration (ensure forward progress)
- do cheap checks first

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 8c0eed2cc8d8a2ccccdffe4c386b625b672dc12a
master date: 2014-03-13 14:26:35 +0100

11 years agoPV-GRUB: fix blk access at end of disk
Samuel Thibault [Fri, 21 Mar 2014 01:56:56 +0000 (02:56 +0100)]
PV-GRUB: fix blk access at end of disk

GRUB usually always loads a whole disk track, even if that means going
beyond the end of the disk.  We thus have to gracefully return an error,
instead of letting the blkfront go panic.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 51e18e41e39a682de5a2e60ad86048dc6344efec)
(cherry picked from commit 03eb5134056d61167e6781eecf7e570b491bda73)
(cherry picked from commit e3f630b73c159078a6991161c5255048b16d366f)

11 years agoxen/pygrub: grub2/grub.cfg from RHEL 7 has new commands in menuentry
Joby Poriyath [Tue, 4 Feb 2014 18:10:35 +0000 (18:10 +0000)]
xen/pygrub: grub2/grub.cfg from RHEL 7 has new commands in menuentry

menuentry in grub2/grub.cfg uses linux16 and initrd16 commands
instead of linux and initrd. Due to this RHEL 7 (beta) guest failed to
boot after the installation.

In addition to this, RHEL 7 menu entries have two different single-quote
delimited strings on the same line, and the greedy grouping for menuentry
parsing gets both strings, and the options inbetween.

Signed-off-by: Joby Poriyath <joby.poriyath@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: george.dunlap@citrix.com
(cherry picked from commit dd03048708af072374963d6d0721cc6d4c5f52cf)
(cherry picked from commit 607d9c98e8161d93fc93dd0e2c3a5b5be57f0d2a)
(cherry picked from commit 4481b30d5ea980fe469c8dfa1580ba2d107fa12f)

11 years agolibxl: Fix carefd lock leak in save callout
Ian Jackson [Mon, 24 Feb 2014 14:19:15 +0000 (14:19 +0000)]
libxl: Fix carefd lock leak in save callout

If libxl_pipe fails we leave the carefd locked, which translates to
the atfork lock remaining held.  This would probably cause the process
to deadlock shortly afterwards.

Of course libxl_pipe is very unlikely to fail unless things are
already going very badly.  This bug has not been observed anywhere as
far as we are aware.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
(cherry picked from commit 7eb73add5de5839f160b902dd894d3aecc10ba0c)
(cherry picked from commit 4bb3a17449a4472930030a627631f788bb678123)

11 years agolibxl: Hold the atfork lock while closing carefd
Ian Jackson [Mon, 24 Feb 2014 14:19:14 +0000 (14:19 +0000)]
libxl: Hold the atfork lock while closing carefd

This avoids the process being forked while a carefd is recorded in the
list but the actual fd has been closed.  If that happened, a
subsequent libxl_postfork_child_noexec would attempt to close the fd
again.  If we are lucky that results in a harmless warning; but if we
are unlucky the fd number has been reused and we close an unrelated
fd.

This race has not been observed anywhere as far as we are aware.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
(cherry picked from commit 2a0c3a62ea4ad6c6bcbf80122b070f3ff3fe7dae)
(cherry picked from commit 86c00cb6e2d78d5be861656a1e83956c9de96003)

11 years agox86: enforce preemption in HVM_set_mem_access / p2m_set_mem_access()
Jan Beulich [Tue, 25 Mar 2014 16:25:14 +0000 (17:25 +0100)]
x86: enforce preemption in HVM_set_mem_access / p2m_set_mem_access()

Processing up to 4G PFNs may take almost arbitrarily long, so
preemption is needed here.

This is CVE-2014-2599 / XSA-89.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
master commit: 0fe53c4f279e1a8ef913e71ed000236d21ce96de
master date: 2014-03-25 15:23:57 +0100

11 years agoxl: do not leak cpupool names.
Ian Campbell [Fri, 14 Sep 2012 09:02:50 +0000 (10:02 +0100)]
xl: do not leak cpupool names.

Valgrind reports:
==3076== 7 bytes in 1 blocks are definitely lost in loss record 1 of 1
==3076==    at 0x402458C: malloc (vg_replace_malloc.c:270)
==3076==    by 0x406F86D: libxl_cpupoolid_to_name (libxl_utils.c:102)
==3076==    by 0x8058742: parse_config_data (xl_cmdimpl.c:639)
==3076==    by 0x805BD56: create_domain (xl_cmdimpl.c:1838)
==3076==    by 0x805DAED: main_create (xl_cmdimpl.c:3903)
==3076==    by 0x804D39D: main (xl.c:285)

And indeed there are several places where xl uses
libxl_cpupoolid_to_name as a boolean to test if the pool name is
valid and leaks the name if it is. Introduce an is_valid helper and
use that instead.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Juergen Gross<juergen.gross@ts.fujitsu.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>
(cherry picked from commit 10a194b1c57de7ddc9d4fce07e01f2cd7d0ca26a)

11 years agox86/HVM: consolidate passthrough handling in epte_get_entry_emt()
Jan Beulich [Fri, 14 Mar 2014 16:52:27 +0000 (17:52 +0100)]
x86/HVM: consolidate passthrough handling in epte_get_entry_emt()

It is inconsistent to depend on iommu_enabled alone: For a guest
without devices passed through to it, it is of no concern whether the
IOMMU is enabled.

There's one rather special case to take care of: VMX code marks the
LAPIC access page as MMIO. The added assertion needs to take this into
consideration, and the subsequent handling of the direct MMIO case was
inconsistent too: That page would have been WB in the absence of an
IOMMU, but UC in the presence of it, while in fact the cachabilty of
this page is entirely unrelated to an IOMMU being in use.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: "Xu, Dongxiao" <dongxiao.xu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 3089a6d82bdf3112ccb1dd074ce34a8cbdc4ccd8
master date: 2014-03-10 11:04:36 +0100

11 years agox86/HVM: fix memory type merging in epte_get_entry_emt()
Jan Beulich [Fri, 14 Mar 2014 16:51:39 +0000 (17:51 +0100)]
x86/HVM: fix memory type merging in epte_get_entry_emt()

Using the minimum numeric value of guest and host specified memory
types is too simplistic - it works only correctly for a subset of
types. It is in particular the WT/WP combination that needs conversion
to UC if the two types conflict.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: "Xu, Dongxiao" <dongxiao.xu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: b99113b9d5fac5149de8496f55afa00e285b1ff3
master date: 2014-03-10 11:03:53 +0100

11 years agox86/hvm: refine the judgment on IDENT_PT for EMT
Dongxiao Xu [Fri, 14 Mar 2014 16:51:07 +0000 (17:51 +0100)]
x86/hvm: refine the judgment on IDENT_PT for EMT

When trying to get the EPT EMT type, the judgment on
HVM_PARAM_IDENT_PT is not correct which always returns WB type if
the parameter is not set. Remove the related code.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
We can't fully drop the dependency yet, but we should certainly avoid
overriding cases already properly handled. The reason for this is that
the guest setting up its MTRRs happens _after_ the EPT tables got
already constructed, and no code is in place to propagate this to the
EPT code. Without this check we're forcing the guest to run with all of
its memory uncachable until something happens to re-write every single
EPT entry. But of course this has to be just a temporary solution.

In the same spirit we should defer the "very early" (when the guest is
still being constructed and has no vCPU yet) override to the last
possible point.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: "Xu, Dongxiao" <dongxiao.xu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: cadfd7bca999c0a795dc27be72d43c92e8943a0b
master date: 2014-03-10 11:02:25 +0100

11 years agoIOMMU: generalize and correct softirq processing during Dom0 device setup
Jan Beulich [Fri, 14 Mar 2014 16:50:03 +0000 (17:50 +0100)]
IOMMU: generalize and correct softirq processing during Dom0 device setup

c/s 21039:95f5a4ce8f24 ("VT-d: reduce default verbosity") having put a
call to process_pending_softirqs() in VT-d's domain_context_mapping()
was wrong in two ways: For one we shouldn't be doing this when setting
up a device during DomU assignment. And then - I didn't check whether
that was the case already back then - we shouldn't call that function
with the pcidevs_lock (or in fact any spin lock) held.

Move the "preemption" into generic code, at once dealing with further
actual (too much output elsewhere - particularly on systems with very
many host bridge like devices - having been observed to still cause the
watchdog to trigger when enabled) and potential (other IOMMU code may
also end up being too verbose) issues.

Do the "preemption" once per device actually being set up when in
verbose mode, and once per bus otherwise.

Note that dropping pcidevs_lock around the process_pending_softirqs()
invocation is specifically not a problem here: We're in an __init
function and aren't racing with potential additions/removals of PCI
devices. Not acquiring the lock in setup_dom0_pci_devices() otoh is not
an option, as there are too many places that assert the lock being
held.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
master commit: 9ef5aa944a6a0df7f5938983043c7e46f158bbc6
master date: 2014-03-04 10:52:20 +0100

11 years agox86/mce: Reduce boot-time logspam
Andrew Cooper [Fri, 14 Mar 2014 16:45:52 +0000 (17:45 +0100)]
x86/mce: Reduce boot-time logspam

When booting with "no-mce", the user does not need to be told that "MCE
support [was] disabled by bootparam" for each cpu.  Furthermore, a file:line
reference is unnecessary.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: a5ab9c9fa29cda7e1b18dbcaa69a5dbded96de32
master date: 2014-02-25 09:30:59 +0100

11 years agox86/MSI: don't risk division by zero
Jan Beulich [Fri, 14 Mar 2014 16:45:14 +0000 (17:45 +0100)]
x86/MSI: don't risk division by zero

The check in question is redundant with the one in the immediately
following if(), where dividing by zero gets carefully avoided.

Spotted-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 5d160d913e03b581bdddde73535c18ac670cf0a9
master date: 2014-02-24 12:11:01 +0100

11 years agox86/MCE: Fix race condition in mctelem_reserve
Frediano Ziglio [Fri, 14 Mar 2014 16:44:19 +0000 (17:44 +0100)]
x86/MCE: Fix race condition in mctelem_reserve

These lines (in mctelem_reserve)

        newhead = oldhead->mcte_next;
        if (cmpxchgptr(freelp, oldhead, newhead) == oldhead) {

are racy. After you read the newhead pointer it can happen that another
flow (thread or recursive invocation) change all the list but set head
with same value. So oldhead is the same as *freelp but you are setting
a new head that could point to whatever element (even already used).

This patch use instead a bit array and atomic bit operations.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
master commit: 60ea3a3ac3d2bcd8e85b250fdbfc46b3b9dc7085
master date: 2014-02-24 12:07:41 +0100

11 years agox86/pci: Store VF's memory space displacement in a 64-bit value
Boris Ostrovsky [Fri, 14 Mar 2014 16:43:15 +0000 (17:43 +0100)]
x86/pci: Store VF's memory space displacement in a 64-bit value

VF's memory space offset can be greater than 4GB and therefore needs
to be stored in a 64-bit variable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
master commit: 001bdcee7bc19be3e047d227b4d940c04972eb02
master date: 2014-02-13 10:49:55 +0100

11 years agotools/libxc: Correct read_exact() error messages
Andrew Cooper [Tue, 7 Jan 2014 10:04:23 +0000 (10:04 +0000)]
tools/libxc: Correct read_exact() error messages

The errors have been incorrectly identifying their function since c/s
861aef6e1558bebad8fc60c1c723f0706fd3ed87 which did a lot of error handling
cleanup.

Use __func__ to ensure the name remains correct in the future.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 1671cdeac7da663fb2963f3e587fa279dcd0238b)

11 years agox86: don't drop guest visible state updates when 64-bit PV guest is in user mode
Jan Beulich [Thu, 20 Feb 2014 07:41:22 +0000 (08:41 +0100)]
x86: don't drop guest visible state updates when 64-bit PV guest is in user mode

Since 64-bit PV uses separate kernel and user mode page tables, kernel
addresses (as usually provided via VCPUOP_register_runstate_memory_area)
aren't necessarily accessible when the respective updating occurs. Add
logic for toggle_guest_mode() to take care of this (if necessary) the
next time the vCPU switches to kernel mode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 231d7f4098c8ac9cdb78f18fcb820d8618c8b0c2
master date: 2014-01-23 10:30:08 +0100

11 years agox86-64/percpu: Force INVALID_PERCPU_AREA into the non-canonical address region
Andrew Cooper [Thu, 20 Feb 2014 07:40:24 +0000 (08:40 +0100)]
x86-64/percpu: Force INVALID_PERCPU_AREA into the non-canonical address region

This causes accidental uses of per_cpu() on a pcpu with an INVALID_PERCPU_AREA
to result in a #GF for attempting to access the middle of the non-canonical
virtual address region.

This is preferable to the current behaviour, where incorrect use of per_cpu()
will result in an effective NULL structure dereference which has security
implication in the context of PV guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 7cfb0053629c4dd1a6f01dc43cca7c0c25b8b7bf
master date: 2013-10-04 12:24:34 +0200

11 years agox86/AMD: work around erratum 793 for 32-bit
Jan Beulich [Thu, 20 Feb 2014 07:38:43 +0000 (08:38 +0100)]
x86/AMD: work around erratum 793 for 32-bit

The original change went into a 64-bit only code section, thus leaving
the issue unfixed on 32-bit. Re-order code to address this.

This is part of CVE-2013-6885 / XSA-82.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agoupdate Xen version to 4.2.5-pre
Jan Beulich [Thu, 20 Feb 2014 07:37:47 +0000 (08:37 +0100)]
update Xen version to 4.2.5-pre

11 years agoupdate Xen version to 4.2.4 RELEASE-4.2.4
Jan Beulich [Fri, 14 Feb 2014 15:24:39 +0000 (16:24 +0100)]
update Xen version to 4.2.4

11 years agoflask: check permissions first thing in flask_security_set_bool()
Jan Beulich [Thu, 13 Feb 2014 09:21:42 +0000 (10:21 +0100)]
flask: check permissions first thing in flask_security_set_bool()

Nothing else should be done if the caller isn't permitted to set
boolean values.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: ebe867052e0f782139147015c4e91b37aa5e68f1
master date: 2014-02-11 11:14:10 +0100

11 years agoflask: fix error propagation from flask_security_set_bool()
Jan Beulich [Thu, 13 Feb 2014 09:21:02 +0000 (10:21 +0100)]
flask: fix error propagation from flask_security_set_bool()

The function should return an error when flask_security_make_bools()
fails as well as when the input ID is out of range.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: 31f3620be0e3158c205a3669135f9c4bfa40b1c7
master date: 2014-02-11 11:13:22 +0100

11 years agoflask: fix memory leaks
Jan Beulich [Thu, 13 Feb 2014 09:19:20 +0000 (10:19 +0100)]
flask: fix memory leaks

Plus, in the case of security_preserve_bools(), prevent double freeing
in the case of security_get_bools() failing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: 57c9f2caf05de41913b3e4eb48c0c3ad6c18dd3f
master date: 2014-02-11 11:11:48 +0100

11 years agoAMD IOMMU: fail if there is no southbridge IO-APIC
Jan Beulich [Thu, 13 Feb 2014 09:18:13 +0000 (10:18 +0100)]
AMD IOMMU: fail if there is no southbridge IO-APIC

... but interrupt remapping is requested (with per-device remapping
tables). Without it, the timer interrupt is usually not working.

Inspired by Linux'es "iommu/amd: Work around wrong IOAPIC device-id in
IVRS table" (commit c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059) by Joerg
Roedel <joerg.roedel@amd.com>.

Reported-by: Eric Houby <ehouby@yahoo.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Eric Houby <ehouby@yahoo.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 06bbcaf48d09c18a41c482866941ddd5d2846b44
master date: 2014-02-10 10:05:24 +0100

11 years agox86/AMD: apply workaround for AMD F16h erratum 792
Aravind Gopalakrishnan [Thu, 13 Feb 2014 09:17:40 +0000 (10:17 +0100)]
x86/AMD: apply workaround for AMD F16h erratum 792

Workaround for the Erratum will be in BIOSes spun only after
Jan 2014 onwards. But initial production parts shipped in 2013
itself. Since there is a coverage hole, we should carry this fix
in software in case BIOS does not do the right thing or someone
is using old BIOS.

Description:
 Processor does not ensure DRAM scrub read/write sequence is atomic wrt
 accesses to CC6 save state area. Therefore if a concurrent scrub
 read/write access is to same address the entry may appear as if it is
 not written. This quirk applies to Fam16h models 00h-0Fh

See "Revision Guide" for AMD F16h models 00h-0fh, document 51810 rev.
3.04, Nov 2013.

Equivalent Linux patch link:
 http://marc.info/?l=linux-kernel&m=139066012217149&w=2

Tested the patch on Fam16h server platform and it works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Corrected checking for boot CPU. Made warning message conditional.
Compacted warning message text. Moved comment to commit message.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 4d3ebb84df43d90db4cc25a48f4658709bd11678
master date: 2014-02-07 11:12:22 +0100

11 years agox86/domctl: don't ignore errors from vmce_restore_vcpu()
Jan Beulich [Thu, 13 Feb 2014 09:16:13 +0000 (10:16 +0100)]
x86/domctl: don't ignore errors from vmce_restore_vcpu()

What started out as a simple cleanup patch (eliminating the redundant
check of domctl->cmd before copying back the output data) revealed a
bug in the handling of XEN_DOMCTL_set_ext_vcpucontext.

Fix this, retaining the cleanup.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: af172d655c3900822d1f710ac13ee38ee9d482d2
master date: 2014-02-04 09:22:12 +0100

11 years agolibxc: Fix out-of-memory error handling in xc_cpupool_getinfo()
Andrew Cooper [Wed, 22 Jan 2014 17:47:21 +0000 (17:47 +0000)]
libxc: Fix out-of-memory error handling in xc_cpupool_getinfo()

Avoid freeing info then returning it to the caller.

This is XSA-88.

Coverity-ID: 1056192
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit d883c179a74111a6804baf8cb8224235242a88fc)

11 years agolibvchan: Fix handling of invalid ring buffer indices
Marek Marczykowski-Górecki [Thu, 6 Feb 2014 16:39:17 +0000 (17:39 +0100)]
libvchan: Fix handling of invalid ring buffer indices

The remote (hostile) process can set ring buffer indices to any value
at any time. If that happens, it is possible to get "buffer space"
(either for writing data, or ready for reading) negative or greater
than buffer size.  This will end up with buffer overflow in the second
memcpy inside of do_send/do_recv.

Fix this by introducing new available bytes accessor functions
raw_get_data_ready and raw_get_buffer_space which are robust against
mad ring states, and only return sanitised values.

Proof sketch of correctness:

Now {rd,wr}_{cons,prod} are only ever used in the raw available bytes
functions, and in do_send and do_recv.

The raw available bytes functions do unsigned arithmetic on the
returned values.  If the result is "negative" or too big it will be
>ring_size (since we used unsigned arithmetic).  Otherwise the result
is a positive in-range value representing a reasonable ring state, in
which case we can safely convert it to int (as the rest of the code
expects).

do_send and do_recv immediately mask the ring index value with the
ring size.  The result is always going to be plausible.  If the ring
state has become mad, the worst case is that our behaviour is
inconsistent with the peer's ring pointer.  I.e. we read or write to
arguably-incorrect parts of the ring - but always parts of the ring.
And of course if a peer misoperates the ring they can achieve this
effect anyway.

So the security problem is fixed.

This is XSA-86.

(The patch is essentially Ian Jackson's work, although parts of the
commit message are by Marek.)

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
master commit: 2efcb0193bf3916c8ce34882e845f5ceb1e511f7
master date: 2014-02-06 16:44:41 +0100

11 years agoxsm/flask: correct off-by-one in flask_security_avc_cachestats cpu id check
Matthew Daley [Thu, 6 Feb 2014 16:38:22 +0000 (17:38 +0100)]
xsm/flask: correct off-by-one in flask_security_avc_cachestats cpu id check

This is XSA-85.

Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 2e1cba2da4631c5cd7218a8f30d521dce0f41370
master date: 2014-02-06 16:42:36 +0100

11 years agoflask: fix reading strings from guest memory
Jan Beulich [Thu, 6 Feb 2014 16:36:40 +0000 (17:36 +0100)]
flask: fix reading strings from guest memory

Since the string size is being specified by the guest, we must range
check it properly before doing allocations based on it. While for the
two cases that are exposed only to trusted guests (via policy
restriction) this just uses an arbitrary upper limit (PAGE_SIZE), for
the FLASK_[GS]ETBOOL case (which any guest can use) the upper limit
gets enforced based on the longest name across all boolean settings.

This is XSA-84.

Reported-by: Matthew Daley <mattd@bugfuzz.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
master commit: 6c79e0ab9ac6042e60434c02e1d99b0cf0cc3470
master date: 2014-02-06 16:33:50 +0100

11 years agocommon/sysctl: Don't leak status in SYSCTL_page_offline_op
Andrew Cooper [Thu, 30 Jan 2014 08:01:28 +0000 (09:01 +0100)]
common/sysctl: Don't leak status in SYSCTL_page_offline_op

Also fix the indentation of the arguments to copy_to_guest() to help clarify
that the 'ret = -EFAULT' is not part of the condition.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: efd8ff0a04740a698b2b8b2b9adccd639e0fa6c9
master date: 2014-01-20 09:48:11 +0100

11 years agomce: fix race condition in mctelem_xchg_head
Frediano Ziglio [Thu, 30 Jan 2014 08:01:01 +0000 (09:01 +0100)]
mce: fix race condition in mctelem_xchg_head

The function (mctelem_xchg_head()) used to exchange mce telemetry
list heads is racy.  It may write to the head twice, with the second
write linking to an element in the wrong state.

If there are two threads, T1 inserting on committed list; and T2
trying to consume it.

1. T1 starts inserting an element (A), sets prev pointer (mcte_prev).
2. T1 is interrupted after the cmpxchg succeeded.
3. T2 gets the list and changes element A and updates the commit list
   head.
4. T1 resumes, reads pointer to prev again and compare with result
   from the cmpxchg which succeeded but in the meantime prev changed
   in memory.
5. T1 thinks the cmpxchg failed and goes around the loop again,
   linking head to A again.

To solve the race use temporary variable for prev pointer.

*linkp (which point to a field in the element) must be updated before
the cmpxchg() as after a successful cmpxchg the element might be
immediately removed and reinitialized.

The wmb() prior to the cmpchgptr() call is not necessary since it is
already a full memory barrier.  This wmb() is thus removed.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
master commit: e9af61b969906976188609379183cb304935f448
master date: 2014-01-17 15:58:27 +0100

11 years agodbg_rw_guest_mem: need to call put_gfn in error path
Andrew Cooper [Thu, 30 Jan 2014 08:00:09 +0000 (09:00 +0100)]
dbg_rw_guest_mem: need to call put_gfn in error path

Using a 1G hvm domU (in grub) and gdbsx:

(gdb) set arch i8086
warning: A handler for the OS ABI "GNU/Linux" is not built into this configuration
of GDB.  Attempting to continue with the default i8086 settings.

The target architecture is assumed to be i8086
(gdb) target remote localhost:9999
Remote debugging using localhost:9999
Remote debugging from host 127.0.0.1
0x0000d475 in ?? ()
(gdb) x/1xh 0x6ae9168b

Will reproduce this bug.

With a debug=y build you will get:

Assertion '!preempt_count()' failed at preempt.c:37

For a debug=n build you will get a dom0 VCPU hung (at some point) in:

         [ffff82c4c0126eec] _write_lock+0x3c/0x50
          ffff82c4c01e43a0  __get_gfn_type_access+0x150/0x230
          ffff82c4c0158885  dbg_rw_mem+0x115/0x360
          ffff82c4c0158fc8  arch_do_domctl+0x4b8/0x22f0
          ffff82c4c01709ed  get_page+0x2d/0x100
          ffff82c4c01031aa  do_domctl+0x2ba/0x11e0
          ffff82c4c0179662  do_mmuext_op+0x8d2/0x1b20
          ffff82c4c0183598  __update_vcpu_system_time+0x288/0x340
          ffff82c4c015c719  continue_nonidle_domain+0x9/0x30
          ffff82c4c012938b  add_entry+0x4b/0xb0
          ffff82c4c02223f9  syscall_enter+0xa9/0xae

And gdb output:

(gdb) x/1xh 0x6ae9168b
0x6ae9168b:     0x3024
(gdb) x/1xh 0x6ae9168b
0x6ae9168b:     Ignoring packet error, continuing...
Reply contains invalid hex digit 116

The 1st one worked because the p2m.lock is recursive and the PCPU
had not yet changed.

crash reports (for example):

crash> mm_rwlock_t 0xffff83083f913010
struct mm_rwlock_t {
  lock = {
    raw = {
      lock = 2147483647
    },
    debug = {<No data fields>}
  },
  unlock_level = 0,
  recurse_count = 1,
  locker = 1,
  locker_function = 0xffff82c4c022c640 <__func__.13514> "__get_gfn_type_access"
}

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: Mukesh Rathor <mukesh.rathor@oracle.com>
master commit: 3dbab7a8bf4bef1bb2967cb3a8c7ed2146482ab3
master date: 2014-01-10 17:45:01 +0100

11 years agoupdate Xen version to 4.2.4-rc1 4.2.4-rc1
Jan Beulich [Tue, 28 Jan 2014 14:52:53 +0000 (15:52 +0100)]
update Xen version to 4.2.4-rc1

11 years agox86: PHYSDEVOP_{prepare,release}_msix are privileged
Jan Beulich [Fri, 24 Jan 2014 12:46:43 +0000 (13:46 +0100)]
x86: PHYSDEVOP_{prepare,release}_msix are privileged

Yet this wasn't being enforced.

This is XSA-87.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 9c7e789a1b60b6114e0b1ef16dff95f03f532fb5
master date: 2014-01-24 13:41:36 +0100

11 years agox86/irq: avoid use-after-free on error path in pirq_guest_bind()
Andrew Cooper [Thu, 23 Jan 2014 12:59:35 +0000 (13:59 +0100)]
x86/irq: avoid use-after-free on error path in pirq_guest_bind()

This is XSA-83.

Coverity-ID: 1146952

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 650fc2f76d0a156e23703683d0c18fa262ecea36
master date: 2014-01-23 13:55:42 +0100

11 years agokexec: prevent deadlock on reentry to the crash path
Andrew Cooper [Fri, 17 Jan 2014 15:42:37 +0000 (16:42 +0100)]
kexec: prevent deadlock on reentry to the crash path

In some cases, such as suffering a queued-invalidation timeout while
performing an iommu_crash_shutdown(), Xen can end up reentering the crash
path. Previously, this would result in a deadlock in one_cpu_only(), as the
test_and_set_bit() would fail.

The crash path is not reentrant, and even if it could be made to be so, it is
almost certain that we would fall over the same reentry condition again.

The new code can distinguish a reentry case from multiple cpus racing down the
crash path.  In the case that a reentry is detected, return back out to the
nested panic() call, which will maybe_reboot() on our behalf.  This requires a
bit of return plumbing back up to kexec_crash().

While fixing this deadlock, also fix up an minor niggle seen recently from a
XenServer crash report.  The report was from a Bank 8 MCE, which had managed
to crash on all cpus at once.  The result was a lot of stack traces with cpus
in kexec_common_shutdown(), which was infact the inlined version of
one_cpu_only().  The kexec crash path is not a hotpath, so we can easily
afford to prevent inlining for the sake of clarity in the stack traces.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
master commit: 470f58c159410b280627c2ea7798ea12ad93bd7c
master date: 2013-11-27 15:13:48 +0100

11 years agox86/VT-x: Disable MSR intercept for SHADOW_GS_BASE
Paul Durrant [Fri, 17 Jan 2014 15:41:38 +0000 (16:41 +0100)]
x86/VT-x: Disable MSR intercept for SHADOW_GS_BASE

Intercepting this MSR is pointless - The swapgs instruction does not cause a
vmexit, so the cached result of this is potentially stale after the next guest
instruction.  It is correctly saved and restored on vcpu context switch.

Furthermore, 64bit Windows writes to this MSR on every thread context switch,
so interception causes a substantial performance hit.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
master commit: a82e98d473fd212316ea5aa078a7588324b020e5
master date: 2013-11-15 11:02:17 +0100

11 years agox86/ats: Fix parsing of 'ats' command line option
Andrew Cooper [Fri, 17 Jan 2014 15:39:08 +0000 (16:39 +0100)]
x86/ats: Fix parsing of 'ats' command line option

This is really a boolean_param() hidden inside a hand-coded attempt to
replicate boolean_param(), which misses the 'no-' prefix semantics
expected with Xen boolean parameters.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 7b5af1df122092243a3697409d5a5ad3b9944da4
master date: 2013-11-04 14:45:17 +0100

11 years agohvm_save_one: return correct data
Don Slutz [Mon, 13 Jan 2014 15:04:23 +0000 (16:04 +0100)]
hvm_save_one: return correct data

It is possible that hvm_sr_handlers[typecode].save does not use all
the provided room.  Also it can use variable sized records.  In both
cases, using:

   instance * hvm_sr_handlers[typecode].size

does not select the correct instance.  Add code to search for the
correct instance.

Signed-off-by: Don Slutz <dslutz@verizon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: e019c606f598eb76585cc5d26a242a40dfc4d580
master date: 2014-01-08 09:15:03 +0100

11 years agoAMD/IOMMU: fix infinite loop due to ivrs_bdf_entries larger than 16-bit value
Suravee Suthikulpanit [Mon, 13 Jan 2014 15:03:28 +0000 (16:03 +0100)]
AMD/IOMMU: fix infinite loop due to ivrs_bdf_entries larger than 16-bit value

Certain AMD systems could have upto 0x10000 ivrs_bdf_entries.
However, the loop variable (bdf) is declared as u16 which causes
inifinite loop when parsing IOMMU event log with IO_PAGE_FAULT event.
This patch changes the variable to u32 instead.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 81b1c7de2339d2788352b162057e70130803f3cf
master date: 2014-01-07 15:09:42 +0100

11 years agoVTD/DMAR: free() correct pointer on error from acpi_parse_one_atsr()
Andrew Cooper [Mon, 13 Jan 2014 15:00:28 +0000 (16:00 +0100)]
VTD/DMAR: free() correct pointer on error from acpi_parse_one_atsr()

Free the allocated structure rather than the ACPI table ATS entry.

On further analysis, there is another memory leak.  acpi_parse_dev_scope()
could allocate scope->devices, and return with -ENOMEM.  All callers of
acpi_parse_dev_scope() would then free the underlying structure, loosing the
pointer.

These errors can only actually be reached through acpi_parse_dev_scope()
(which passes type = DMAR_TYPE), but I am quite surprised Coverity didn't spot
it.

Coverity-ID: 1146949
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 62d33ca1048f4e08eaeb026c7b79239b4605b636
master date: 2014-01-07 14:59:31 +0100

11 years agox86/mm: Prevent leaking domain mappings in paging_log_dirty_op()
Andrew Cooper [Mon, 13 Jan 2014 14:59:28 +0000 (15:59 +0100)]
x86/mm: Prevent leaking domain mappings in paging_log_dirty_op()

Coverity ID: 1135374 1135375 1135376 1135377

If {copy_to,clear}_guest_offset() fails, we would leak the domain mappings for
l4 thru l1.

Fixing this requires having conditional unmaps on the faulting path, which in
turn requires explicitly initialising the pointers to NULL because of the
early ENOMEM exit.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
master commit: 0725f326358cbb2ba7f9626976e346b963d74c37
master date: 2013-12-17 16:38:07 +0100

11 years agoix86: fix linear page table construction in alloc_l2_table()
Jan Beulich [Mon, 13 Jan 2014 14:57:07 +0000 (15:57 +0100)]
ix86: fix linear page table construction in alloc_l2_table()

Slot 0 got updated when slot 3 was meant. The mistake was hidden by
create_pae_xen_mappings() correcting things immediately afterwards
(i.e. before the new entries could get used the first time).

Reported-by: CHENG Yueqiang <yqcheng.2008@phdis.smu.edu.sg>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by: Tim Deegan <tim@xen.org>
11 years agox86/p2m: restrict auditing to debug builds
Jan Beulich [Fri, 10 Jan 2014 10:44:10 +0000 (11:44 +0100)]
x86/p2m: restrict auditing to debug builds

... since iterating through all of a guest's pages may take unduly
long.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
master commit: 4476d05cf5e8d3880f88ce16649766df67e0791e
master date: 2013-12-13 15:06:11 +0100

11 years agokexec/x86: do not map crash kernel area
Daniel Kiper [Fri, 10 Jan 2014 10:43:20 +0000 (11:43 +0100)]
kexec/x86: do not map crash kernel area

This mapping was apparently never used.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
master commit: 7113a45451a9f656deeff070e47672043ed83664
master date: 2013-12-11 10:37:25 +0100

11 years agox86/PV: don't commit debug register values early in arch_set_info_guest()
Jan Beulich [Fri, 10 Jan 2014 10:42:35 +0000 (11:42 +0100)]
x86/PV: don't commit debug register values early in arch_set_info_guest()

They're being taken care of later (via set_debugreg()), and temporarily
copying them into struct vcpu means that bad values may end up getting
loaded during context switch if the vCPU is already running and the
function errors out between the premature and real commit step, leading
to the same issue that XSA-12 dealt with.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 398c39b6c18d0b55acfc88f5ee75b3a793e6eeec
master date: 2013-12-11 10:33:19 +0100

11 years agox86/cpuidle: publish new states only after fully initializing them
Jan Beulich [Fri, 10 Jan 2014 10:41:53 +0000 (11:41 +0100)]
x86/cpuidle: publish new states only after fully initializing them

Since state information coming from Dom0 can arrive at any time, on
any CPU, we ought to make sure that a new state is fully initialized
before the target CPU might be using it.

Once touching that code, also do minor cleanup: A missing (but benign)
"break" and some white space adjustments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
master commit: 4ca6f9f0377a30755a299cc60a6d44ab6c3b34d0
master date: 2013-12-11 10:30:02 +0100

11 years agoamd/passthrough: Do not leak domain mappings from do_invalidate_dte()
Andrew Cooper [Fri, 10 Jan 2014 10:41:01 +0000 (11:41 +0100)]
amd/passthrough: Do not leak domain mappings from do_invalidate_dte()

Coverity ID: 1135379

As the code stands, the domain mapping will be leaked on each error path.

The mapping can be for a much shorter period of time, and all the relevent
information can be pulled out at once.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
master commit: 80dbe90a4e6b31f8cb859f7450fa3eed8695fd1d
master date: 2013-12-10 16:16:49 +0100

11 years agodefer the domain mapping in scrub_one_page()
Andrew Cooper [Fri, 10 Jan 2014 10:39:21 +0000 (11:39 +0100)]
defer the domain mapping in scrub_one_page()

This avoids a resource leak and needless playing with the pagetables in the
case that the page is broken.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Keir Fraser <keir@xen.org>
master commit: 7dd4f9da063cb2cd43426c785535534c9d958ce5
master date: 2013-12-09 14:13:23 +0100

11 years agoQEMU_TAG update
Ian Jackson [Thu, 9 Jan 2014 12:56:55 +0000 (12:56 +0000)]
QEMU_TAG update

11 years agoxenstore: sanity check incoming message body lengths
Matthew Daley [Sat, 30 Nov 2013 00:20:04 +0000 (13:20 +1300)]
xenstore: sanity check incoming message body lengths

This is for the client-side receiving messages from xenstored, so there
is no security impact, unlike XSA-72.

Coverity-ID: 1055449
Coverity-ID: 1056028
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 8da1ed9031341381c218b7e6eaab5b4f239a327b)
(cherry picked from commit 014f9219f1dca3ee92948f0cfcda8d1befa6cbcd)

11 years agolibxl: don't leak pcidevs in libxl_pcidev_assignable
Matthew Daley [Sun, 1 Dec 2013 10:15:03 +0000 (23:15 +1300)]
libxl: don't leak pcidevs in libxl_pcidev_assignable

Coverity-ID: 1055896
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 26b35b9ace97f433fcf4c5dfbdfb573d1075255f)
(cherry picked from commit cfa252b05855a712eda0da80cd638c7093ddf89f)

11 years agolibxl: don't leak output vcpu info on error in libxl_list_vcpu
Matthew Daley [Sun, 1 Dec 2013 10:15:01 +0000 (23:15 +1300)]
libxl: don't leak output vcpu info on error in libxl_list_vcpu

Coverity-ID: 1055887
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 3c113a57f55dc4e36e3552342721db01efa832c6)
(cherry picked from commit d41c205e0173ee923e791c2fd320c7eb25f2e9cb)

11 years agolibxl: actually abort if initializing a ctx's lock fails
Matthew Daley [Sun, 1 Dec 2013 10:15:00 +0000 (23:15 +1300)]
libxl: actually abort if initializing a ctx's lock fails

If initializing the ctx's lock fails, don't keep going, but instead
error out.

Coverity-ID: 1055289
Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit b1cb2bdde1f2393d75a925e6c15862b93d3e7abd)
(cherry picked from commit 62f88c08b31259032c81163f4133d6f25f033c1e)

11 years agoxl: fixes for do_daemonize
Roger Pau Monne [Fri, 22 Nov 2013 11:54:09 +0000 (12:54 +0100)]
xl: fixes for do_daemonize

Fix usage of CHK_ERRNO in do_daemonize and also remove the usage of a
bogus for(;;).

Coverity-ID: 1130516 and 1130520
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit ed8c9047f6fc6d28fc27d37576ec8c8c1be68efe)

Conflicts:
tools/libxl/xl_cmdimpl.c
(cherry picked from commit c393ff09ade45d1a2a8f1c12eac5eab4d38947a3)

11 years agolibxl: fix fd check in libxl__spawn_local_dm
Roger Pau Monne [Fri, 22 Nov 2013 11:54:08 +0000 (12:54 +0100)]
libxl: fix fd check in libxl__spawn_local_dm

Checking the logfile_w fd for -1 on failure is no longer true, because
libxl__create_qemu_logfile will now return ERROR_FAIL on failure which
is -3.

While there also add an error check for opening /dev/null.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 3b88d95e9c0a5ff91d5b60e94d81f1982af57e7f)

Conflicts:
tools/libxl/libxl_dm.c
(cherry picked from commit 8f1bd27fcd7f8be1353e7309f450283f3e5f7cd0)

Conflicts:
tools/libxl/libxl_dm.c

11 years agotools/libxl: Avoid deliberate NULL pointer dereference
Andrew Cooper [Mon, 25 Nov 2013 11:12:50 +0000 (11:12 +0000)]
tools/libxl: Avoid deliberate NULL pointer dereference

Coverity ID: 1055290

Calling LIBXL__LOG_ERRNO(ctx,) with a ctx pointer we have just failed to
allocate is going to end badly.  Opencode a suitable use of xtl_log() instead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 1677af03c14f2d8d88d2ed9ed8ce6d4906d19fb4)
(cherry picked from commit 4cbbbdfb775d387dc1e0931b44e14d3205c92265)

11 years agotools/libxc: Improve xc_dom_malloc_filemap() error handling
Andrew Cooper [Mon, 25 Nov 2013 11:05:49 +0000 (11:05 +0000)]
tools/libxc: Improve xc_dom_malloc_filemap() error handling

Coverity ID 1055563

In the original function, mmap() could be called with a length of -1 if the
second lseek failed and the caller had not provided max_size.

While fixing up this error, improve the logging of other error paths.  I know
from personal experience that debugging failures function is rather difficult
given only "xc_dom_malloc_filemap: failed (on file <somefile>)" in the logs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit c635c1ef7833e7505423f6567bf99bd355101587)
(cherry picked from commit a5febe4aeff4ab80ce0411f63f336c25951098cf)

11 years agotools/xc_restore: Initialise console and store mfns
Andrew Cooper [Mon, 25 Nov 2013 11:05:47 +0000 (11:05 +0000)]
tools/xc_restore: Initialise console and store mfns

If the console or store mfn chunks are not present in the migration stream,
stack junk gets reported for the mfns.

XenServer had a very hard to track down VM corruption issue caused by exactly
this issue.  Xenconsoled would connect to a junk mfn and incremented the ring
pointer if the junk happend to look like a valid gfn.

Coverity ID: 1056093 1056094

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 592b614f3469bb83d1158c3dc8c15b67aacfbf4f)
(cherry picked from commit 6d7b67c67039ceac36a780b59c2b890739094b95)

Conflicts:
tools/xcutils/xc_restore.c

11 years agotools/xenconsoled: Fix file handle leaks
Andrew Cooper [Mon, 25 Nov 2013 11:06:39 +0000 (11:06 +0000)]
tools/xenconsoled: Fix file handle leaks

Coverity ID: 715218 1055876 1055877

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit 9ab1792e1ce9e77afe2cd230d69e56a0737a735f)
(cherry picked from commit 6f6d936af8acb7d9e36b70e5e70953f695ca3b36)

11 years agotools/xenconsole: Use xc_domain_getinfo() correctly
Andrew Cooper [Mon, 25 Nov 2013 11:06:38 +0000 (11:06 +0000)]
tools/xenconsole: Use xc_domain_getinfo() correctly

Coverity ID: 1055018

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
(cherry picked from commit aa344500a3bfceb3ef01931609ac1cfaf6dcf52d)
(cherry picked from commit 74cd17f84649012bec7ce484bf7b9c3f3a9e79ae)

11 years agotools/libxl: Fix integer overflows in sched_sedf_domain_set()
Andrew Cooper [Mon, 25 Nov 2013 11:12:51 +0000 (11:12 +0000)]
tools/libxl: Fix integer overflows in sched_sedf_domain_set()

Coverity ID: 1055662 1055663 1055664

Widen from int to uint64_t before multiplcation, rather than afterwards.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 9c01516fee7d548af58fd310d3c93dd71ea9ea28)
(cherry picked from commit 2de748569f827b037ec10104f7c12f44d01d0ffa)

11 years agotools/libxl: Fix memory leak in sched_domain_output()
Andrew Cooper [Mon, 25 Nov 2013 11:16:48 +0000 (11:16 +0000)]
tools/libxl: Fix memory leak in sched_domain_output()

Coverity ID: 1055904

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
(cherry picked from commit 0792426b798fd3b39909d618cf8fe8bac30594f4)

Conflicts:
tools/libxl/xl_cmdimpl.c
(cherry picked from commit 338a8b13757d6ef36ff4e321cb4ef4190ba6ec02)

11 years agoIOMMU: clear "don't flush" override on error paths
Jan Beulich [Tue, 10 Dec 2013 15:21:57 +0000 (16:21 +0100)]
IOMMU: clear "don't flush" override on error paths

Both xenmem_add_to_physmap() and iommu_populate_page_table() each have
an error path that fails to clear that flag, thus suppressing further
flushes on the respective pCPU.

In iommu_populate_page_table() also slightly re-arrange code to avoid
the false impression of the flag in question being guarded by a
domain's page_alloc_lock.

This is CVE-2013-6400 / XSA-80.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 552b7fcb9a70f1d4dd0e0cd5fb4d3d9da410104a
master date: 2013-12-10 16:10:37 +0100

11 years agox86/boot: fix BIOS memory corruption on certain IBM systems
Andrew Cooper [Mon, 9 Dec 2013 13:47:43 +0000 (14:47 +0100)]
x86/boot: fix BIOS memory corruption on certain IBM systems

IBM System x3530 M4 BIOSes (including the latest available at the time of this
patch) will corrupt a byte at physical address 0x105ff1 to the value of 0x86
if %esp has the value 0x00080000 when issuing an `int $0x15 (ax=0xec00)` to
inform the system about our intended operating mode.

Xen gets unhappy when the bootloader has placed it's .text section in over
this specific region of RAM.

After dropping into 16bit mode, clear all 32 bits of %esp, and for the BIOS
call already documented to be affected by BIOS bugs clear all GPRs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 1ed76797439e384de18fcd6810bd4743d4f38b1e
master date: 2013-12-06 11:28:00 +0100

11 years agox86: fix early boot command line parsing
Daniel Kiper [Mon, 9 Dec 2013 13:47:07 +0000 (14:47 +0100)]
x86: fix early boot command line parsing

There is no reliable way to encode NUL character as a character so encode
it as a number. Read: http://sourceware.org/binutils/docs/as/Characters.html.
Octal and hex encoding do not work on at least one system (GNU assembler
version 2.22 (x86_64-linux-gnu) using BFD version (GNU Binutils for Debian) 2.22).
Without this fix e.g. no-real-mode option at the end of xen.gz command line
is not detected.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: dc37e0bfffc673f4bdce1d69ad86098bfb0ab531
master date: 2013-12-04 13:26:37 +0100

11 years agofix locking in offline_page()
Jan Beulich [Mon, 9 Dec 2013 13:46:26 +0000 (14:46 +0100)]
fix locking in offline_page()

Coverity ID 1055655

Apart from the Coverity-detected lock order reversal (a domain's
page_alloc_lock taken with the heap lock already held), calling
put_page() with heap_lock is a bad idea too (as a possible descendant
from put_page() is free_heap_pages(), which wants to take this very
lock).

From all I can tell the region over which heap_lock was held was far
too large: All we need to protect are the call to mark_page_offline()
and reserve_heap_page() (and I'd even put under question the need for
the former). Hence by slightly re-arranging the if/else-if chain we
can drop the lock much earlier, at once no longer covering the two
put_page() invocations.

Once at it, do a little bit of other cleanup: Put the "pod_replace"
code path inline rather than at its own label, and drop the effectively
unused variable "ret".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
master commit: d4837a56da4a59259dd0cf9f3bdc073159d81d7a
master date: 2013-12-03 12:40:57 +0100

11 years agoFix ptr calculation when converting from a VA
Jean-Yves Migeon [Mon, 9 Dec 2013 13:45:59 +0000 (14:45 +0100)]
Fix ptr calculation when converting from a VA

The ptr calculation shall take the offset into the page into account
when ptr is valid.

Reported regression on NetBSD's port-xen with last known working libxen
being rev 2.9. This corrupts the kernel symbol table when the table is
not loaded on a page boundary.

Issue was tracked down by FastIce and Jeff Rizzo. See also
http://mail-index.netbsd.org/port-xen/2013/10/16/msg008088.html

Signed-off-by: Jean-Yves Migeon <jym@NetBSD.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: cb08944a482a5e80a3ff1113f0735761cc4c6cb8
master date: 2013-11-29 11:07:01 +0000

11 years agox86: properly handle MSI-X unmask operation from guests
Feng Wu [Mon, 9 Dec 2013 13:45:00 +0000 (14:45 +0100)]
x86: properly handle MSI-X unmask operation from guests

For a pass-through device with MSI-x capability, when guest tries
to unmask the MSI-x interrupt for the passed through device, xen
doesn't clear the mask bit for MSI-x in hardware in the following
scenario, which will cause network disconnection:

1. Guest masks the MSI-x interrupt
2. Guest updates the address and data for it
3. Guest unmasks the MSI-x interrupt (This is the problematic step)

In the step #3 above, Xen doesn't handle it well. When guest tries
to unmask MSI-X interrupt, it traps to Xen, Xen just returns to Qemu
if it notices that address or data has been modified by guest before,
then Qemu will update Xen with the latest value of address/data by
hypercall. However, in this whole process, the MSI-X interrupt unmask
operation is missing, which means Xen doesn't clear the mask bit in
hardware for the MSI-X interrupt, so it remains disabled, that is why
it loses the network connection.

This patch fixes this issue.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Only latch the address if the guest really is unmasking the entry.

Clean up the entire change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 74fd0036deb585a139b63b26db025805ecedc37a
master date: 2013-11-27 15:15:43 +0100

11 years agoVMX: fix cr0.cd handling
Liu Jinsong [Mon, 9 Dec 2013 13:43:34 +0000 (14:43 +0100)]
VMX: fix cr0.cd handling

This patch solves XSA-60 security hole:
1. For guest w/o VT-d, and for guest with VT-d but snooped, Xen need
do nothing, since hardware snoop mechanism has ensured cache coherency.

2. For guest with VT-d but non-snooped, cache coherency can not be
guaranteed by h/w snoop, therefore it need emulate UC type to guest:
2.1). if it works w/ Intel EPT, set guest IA32_PAT fields as UC so that
guest memory type are all UC.
2.2). if it works w/ shadow, drop all shadows so that any new ones would
be created on demand w/ UC.

This patch also fix a bug of shadow cr0.cd setting. Current shadow has a
small window between cache flush and TLB invalidation, resulting in possilbe
cache pollution. This patch pause vcpus so that no vcpus context involved
into the window.

This is CVE-2013-2212 / XSA-60.

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
master commit: 62652c00efa55fb45374bcc92f7d96fc411aebb2
master date: 2013-11-06 10:12:36 +0100

11 years agoVMX: remove the problematic set_uc_mode logic
Liu Jinsong [Mon, 9 Dec 2013 13:41:44 +0000 (14:41 +0100)]
VMX: remove the problematic set_uc_mode logic

XSA-60 security hole comes from the problematic vmx_set_uc_mode.
This patch remove vmx_set_uc_mode logic, which will be replaced by
PAT approach at later patch.

This is CVE-2013-2212 / XSA-60.

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
master commit: 1c84d046735102e02d2df454ab07f14ac51f235d
master date: 2013-11-06 10:12:00 +0100

11 years agoVMX: disable EPT when !cpu_has_vmx_pat
Liu Jinsong [Mon, 9 Dec 2013 13:40:51 +0000 (14:40 +0100)]
VMX: disable EPT when !cpu_has_vmx_pat

Recently Oracle developers found a Xen security issue as DOS affecting,
named as XSA-60. Please refer http://xenbits.xen.org/xsa/advisory-60.html
Basically it involves how to handle guest cr0.cd setting, which under
some environment it consumes much time resulting in DOS-like behavior.

This is a preparing patch for fixing XSA-60. Later patch will fix XSA-60
via PAT under Intel EPT case, which depends on cpu_has_vmx_pat.

This is CVE-2013-2212 / XSA-60.

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
master commit: c13b0d65ddedd74508edef5cd66defffe30468fc
master date: 2013-11-06 10:11:18 +0100

11 years agox86/hvm: fix segment validation
Tim Deegan [Mon, 9 Dec 2013 13:36:58 +0000 (14:36 +0100)]
x86/hvm: fix segment validation

Also Coverity CID 1055180.

Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Use _SEGMENT_* instead of plain numbers and adjust a comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6ed4bfbabd487b41021caa7ed03cee1f00ecbabf
master date: 2013-11-26 09:54:21 +0100

11 years agox86/AMD: work around erratum 793
Jan Beulich [Tue, 3 Dec 2013 13:15:34 +0000 (14:15 +0100)]
x86/AMD: work around erratum 793

The recommendation is to set a bit in an MSR - do this if the firmware
didn't, considering that otherwise we expose ourselves to a guest
induced DoS.

This is CVE-2013-6885 / XSA-82.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
master commit: 98162f256ee33994a9881a720419dda9ad4c03a8
master date: 2013-12-03 09:49:54 +0100

11 years agox86/xsave: fix nonlazy state handling
Liu Jinsong [Mon, 2 Dec 2013 14:56:09 +0000 (15:56 +0100)]
x86/xsave: fix nonlazy state handling

Nonlazy xstates should be xsaved each time when vcpu_save_fpu.
Operation to nonlazy xstates will not trigger #NM exception, so
whenever vcpu scheduled in it got restored and whenever scheduled
out it should get saved.

Currently this bug affects AMD LWP feature, and later Intel MPX
feature. With the bugfix both LWP and MPX will work fine.

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Furthermore, during restore we also need to set nonlazy_xstate_used
according to the incoming accumulated XCR0.

Also adjust the changes to i387.c such that there won't be a pointless
clts()/stts() pair.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
master commit: 7d8b5dd98463524686bdee8b973b53c00c232122
master date: 2013-11-25 11:19:04 +0100