Jan Beulich [Thu, 2 May 2013 15:22:36 +0000 (17:22 +0200)]
x86: make vcpu_destroy_pagetables() preemptible
... as it may take significant amounts of time.
The function, being moved to mm.c as the better home for it anyway, and
to avoid having to make a new helper function there non-static, is
given a "preemptible" parameter temporarily (until, in a subsequent
patch, its other caller is also being made capable of dealing with
preemption).
This is part of CVE-2013-1918 / XSA-45.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
master commit: 6cdc9be2a5f2a87b4504404fbf648d16d9503c19
master date: 2013-05-02 16:34:21 +0200
Ian Jackson [Thu, 18 Apr 2013 16:44:17 +0000 (17:44 +0100)]
libxl: Fix SEGV in network-attach
When "device/vif" directory exists but is empty l!=NULL, but nb==0, so
l[nb-1] is invalid. Add missing check.
Signed-off-by: Marek Marczykowski <marmarek@invisiblethingslab.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
(cherry picked from commit 9f1a6ff38b8e7bb97a016794115de28553a6559f)
Conflicts:
tools/libxl/libxl.c Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Thu, 18 Apr 2013 15:38:17 +0000 (17:38 +0200)]
Fix rcu domain locking for transitive grants
When acquiring a transitive grant for copy then the owning domain
needs to be locked down as well as the granting domain. This was being
done, but the unlocking was not. The acquire code now stores the
struct domain * of the owning domain (rather than the domid) in the
active entry in the granting domain. The release code then does the
unlock on the owning domain. Note that I believe I also fixed a bug
where, for non-transitive grants the active entry contained a
reference to the acquiring domain rather than the granting
domain. From my reading of the code this would stop the release code
for transitive grants from terminating its recursion correctly.
Jan Beulich [Thu, 18 Apr 2013 14:24:08 +0000 (16:24 +0200)]
x86: fix various issues with handling guest IRQs
- properly revoke IRQ access in map_domain_pirq() error path
- don't permit replacing an in use IRQ
- don't accept inputs in the GSI range for MAP_PIRQ_TYPE_MSI
- track IRQ access permission in host IRQ terms, not guest IRQ ones
(and with that, also disallow Dom0 access to IRQ0)
Jan Beulich [Thu, 18 Apr 2013 14:23:07 +0000 (16:23 +0200)]
x86: clear EFLAGS.NT in SYSENTER entry path
... as it causes problems if we happen to exit back via IRET: In the
course of trying to handle the fault, the hypervisor creates a stack
frame by hand, and uses PUSHFQ to set the respective EFLAGS field, but
expects to be able to IRET through that stack frame to the second
portion of the fixup code (which causes a #GP due to the stored EFLAGS
having NT set).
And even if this worked (e.g if we cleared NT in that path), it would
then (through the fail safe callback) cause a #GP in the guest with the
SYSENTER handler's first instruction as the source, which in turn would
allow guest user mode code to crash the guest kernel.
Inject a #GP on the fake (NULL) address of the SYSENTER instruction
instead, just like in the case where the guest kernel didn't register
a corresponding entry point.
On 32-bit we also need to make sure we clear SYSENTER_CS for all CPUs
(neither #RESET nor #INIT guarantee this).
This is CVE-2013-1917 / XSA-44.
Reported-by: Andrew Cooper <andrew.cooper3@citirx.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: fdac9515607b757c044e7ef0d61b1453ef999b08
master date: 2013-04-18 16:00:35 +0200
On the crash path in nmi_shootdown_cpus(), we shut down the IOMMU, then
disable the IOAPIC.
On systems which support interrupt remapping, the variable iommu_intremap
remains set, meaning that disable_IO_APIC() issues interrupt remapping
invalidate requests.
IOAPIC interrupt remapping used to be conditional on iommu_enabled, but is now
conditional on iommu_intremap, following the above changeset.
This behaviour can be fixed by also indicating that interrupt remapping is not
enabled after shutting down the IOMMU.
disabled the SMEP bit if a guest VCPU was using HAP and was not
in paging mode. However I could observe VCPUs getting stuck in
the trampoline after the following patch in the Linux kernel
changed the way CR4 gets set up:
x86, realmode: read cr4 and EFER from kernel for 64-bit trampoline
The change will set CR4 from already set flags which includes the
SMEP bit. On bare metal this does not matter as the CPU is in non-
paging mode at that time. But Xen seems to use the emulated non-
paging mode regardless of HAP (I verified that on the guests I was
seeing the issue, HAP was not used).
Therefor it seems right to unset the SMEP bit for a VCPU that is
not in paging-mode, regardless of its HAP usage.
VMX: disable SMEP feature when guest is in non-paging mode
SMEP is disabled if CPU is in non-paging mode in hardware.
However Xen always uses paging mode to emulate guest non-paging
mode with HAP. To emulate this behavior, SMEP needs to be manually
disabled when guest switches to non-paging mode.
We met an issue that, SMP Linux guest with recent kernel (enable
SMEP support, for example, 3.5.3) would crash with triple fault if
setting unrestricted_guest=0 in grub. This is because Xen uses an
identity mapping page table to emulate the non-paging mode, where
the page table is set with USER flag. If SMEP is still enabled in
this case, guest will meet unhandlable page fault and then crash.
Tim Deegan [Tue, 9 Apr 2013 14:19:53 +0000 (16:19 +0200)]
x86/mm/shadow: spurious warning when unmapping xenheap pages.
Xenheap pages will always have an extra typecount, taken in
share_xen_page_with_guest(), which doesn't come from a shadow PTE.
Adjust the warning in sh_remove_all_mappings() to account for it.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Tim Deegan <tim@xen.org> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: cfc515dabe91e3d6c690c68c6a669d6d77fb7ac4
master date: 2013-04-04 10:14:30 +0100
Ben Guthro [Tue, 9 Apr 2013 14:13:11 +0000 (16:13 +0200)]
x86/S3: Restore broken vcpu affinity on resume
When in SYS_STATE_suspend, and going through the cpu_disable_scheduler
path, save a copy of the current cpu affinity, and mark a flag to
restore it later.
Later, in the resume process, when enabling nonboot cpus restore these
affinities.
Signed-off-by: Ben Guthro <benjamin.guthro@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
master commit: 41e71c2607e036f1ac00df898b8f4acb2d4df7ee
master date: 2013-04-02 09:52:32 +0200
Jan Beulich [Tue, 9 Apr 2013 14:10:29 +0000 (16:10 +0200)]
x86: irq_move_cleanup_interrupt() must ignore legacy vectors
Since the main loop in the function includes legacy vectors, and since
vector_irq[] gets set up for legacy vectors regardless of whether those
get handled through the IO-APIC, it must not do anything on this vector
range. In fact, we should never get past the move_cleanup_count check
for IRQs not handled through the IO-APIC. Adding a respective assertion
woulkd make those iterations more expensive (due to the lock acquire).
For such an assertion to not have false positives we however ought to
suppress setting up IRQ2 as an 8259A interrupt (which wasn't correct
anyway), which is being done here despite the assertion not actually
getting added.
Furthermore, there's no point iterating over the vectors past
LAST_HIPRIORITY_VECTOR, so terminate the loop accordingly.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
master commit: af699220ad6d111ba76fc3040342184e423cc9a1
master date: 2013-04-02 08:30:03 +0200
To make erst table size checking code works on all systems, both
testing are treated as PASS.
Same situation applies to einj_tab->header_length, so corresponding
table size checking is changed in similar way too.
Originally-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Huang Ying <ying.huang@intel.com>
- use switch() for better readability
- add comment explaining why a formally invalid size it also being
accepted
- check erst_tab->header.length before even looking at
erst_tab->header_length
- prefer sizeof(*erst_tab) over sizeof(struct acpi_table_erst)
Some actions in APEI ERST and EINJ tables are optional, for example,
ACPI_EINJ_BEGIN_OPERATION action is used to do some preparation for
error injection, and firmware may choose to do nothing here. While
some other actions are mandatory, for example, firmware must provide
ACPI_EINJ_GET_ERROR_TYPE implementation.
Original implementation treats all actions as optional (that is, can
have no instructions), that may cause issue if firmware does not
provide some mandatory actions. To fix this, this patch adds
apei_exec_run_optional, which should be used for optional actions.
The original apei_exec_run should be used for mandatory actions.
Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 72af01bf6f7489e54ad59270222a29d3e8c501d1
master date: 2013-03-22 12:46:25 +0100
Andrew Cooper [Tue, 2 Apr 2013 10:37:28 +0000 (12:37 +0200)]
ACPI/APEI: Unlock apei_iomaps_lock on error path
This causes deadlocks during early boot on hardware with broken/buggy
APEI implementations, such as a Dell Poweredge 2950 with the latest
currently available BIOS.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Don't use goto or another special error path, as handling the error
case in normal flow is quite simple.
ACPI/APEI: fix ERST MOVE_DATA instruction implementation
The src_base and dst_base fields in apei_exec_context are physical
address, so they should be ioremaped before being used in ERST
MOVE_DATA instruction.
Reported-by: Javier Martinez Canillas <martinez.javier@gmail.com> Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Huang Ying <ying.huang@intel.com>
Replace use of ioremap() by __acpi_map_table()/set_fixmap(). Fix error
handling.
x86: reserve pages when SandyBridge integrated graphics
SNB graphics devices have a bug that prevent them from accessing certain
memory ranges, namely anything below 1M and in the pages listed in the
table.
Xen does not initialize below 1MB to heap, i.e. below 1MB pages don't be
allocated, so it's unnecessary to reserve memory below the 1 MB mark
that has not already been reserved.
So reserve those pages listed in the table at xen boot if set detect a
SNB gfx device on the CPU to avoid GPU hangs.
Jan Beulich [Tue, 2 Apr 2013 10:31:36 +0000 (12:31 +0200)]
AMD IOMMU: allow disabling only interrupt remapping when certain IVRS consistency checks fail
After some more thought on the XSA-36 and specifically the comments we
got regarding disabling the IOMMU in this situation altogether making
things worse instead of better, I came to the conclusion that we can
actually restrict the action in affected cases to just disabling
interrupt remapping. That doesn't make the situation worse than prior
to the XSA-36 fixes (where interrupt remapping didn't really protect
domains from one another), but allows at least DMA isolation to still
be utilized.
To do so, disabling of interrupt remapping must be explicitly requested
on the command line - respective checks will then be skipped.
Stepping B-3 has two errata (#47 and #53) related to Interrupt
remapping, to which the workaround is for the BIOS to completely disable
interrupt remapping. These errata are fixed in stepping C-2.
Unfortunately this chipset stepping is very common and many BIOSes are
not disabling interrupt remapping on this stepping . We can detect this in
Xen and prevent Xen from using the problematic interrupt remapping feature.
The Intel 5500/5520/X58 chipset does not support VT-d
Extended Interrupt Mode(EIM). This means the iommu_supports_eim() check
always fails and so x2apic mode cannot be enabled in Xen before this quirk
disables the interrupt remapping feature.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Gate the function call to check the quirk on interrupt remapping being
requested to get enabled, and upon failure disable the IOMMU to be in
line with what the changes for XSA-36 (plus follow-ups) did.
Jan Beulich [Tue, 2 Apr 2013 10:29:34 +0000 (12:29 +0200)]
IOMMU: properly check whether interrupt remapping is enabled
... rather than the IOMMU as a whole.
That in turn required to make sure iommu_intremap gets properly
cleared when the respective initialization fails (or isn't being
done at all).
Along with making sure interrupt remapping doesn't get inconsistently
enabled on some IOMMUs and not on others in the VT-d code, this in turn
allowed quite a bit of cleanup on the VT-d side (removed from the
backport).
Andrew Cooper [Tue, 2 Apr 2013 10:27:20 +0000 (12:27 +0200)]
AMD/IOMMU: Process softirqs while building dom0 iommu mappings
Recent changes which have made their way into xen-4.2 stable have pushed the
runtime of construct_dom0() over 5 seconds, which has caused regressions in
XenServer testing because of our 5 second watchdog.
The root cause is that amd_iommu_dom0_init() does not process softirqs and in
particular the nmi_timer which causes the watchdog to decide that no useful
progress is being made.
This patch adds periodic calls to process_pending_softirqs() at the same
interval as the Intel variant of this function. The server which was failing
with the watchdog test now boots reliably with a timeout of 1 second.
Jan Beulich [Tue, 2 Apr 2013 10:26:22 +0000 (12:26 +0200)]
x86/MCA: suppress bank clearing for certain injected events
As the bits indicating validity of the ADDR and MISC bank MSRs may be
injected in a way that isn't consistent with what the underlying
hardware implements (while the bank must be valid for injection to
work, the auxiliary MSRs may not be implemented - and hence cause #GP
upon access - if the hardware never sets the corresponding valid bits.
Consequently we need to do the clearing writes only if no value was
interposed for the respective MSR (which also makes sense the other way
around: there's no point in clearing a hardware register when all data
read came from software). Of course this all requires the injection
tool to do things in a consistent way (but that had been a requirement
before already).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Ren Yongjie <yongjie.ren@intel.com> Acked-by: Liu Jinsong <jinsong.liu@intel.com>
master changeset: b0583c0e64cc8bb6229c95c3304fdac2051f79b3
master date: 2013-03-12 15:53:30 +0100
Try to fix the the issue that "some AMD systems may round the
frequencies in ACPI tables to 100MHz boundaries. We can obtain the real
frequencies from MSRs, so add a quirk to fix these frequencies up
on AMD systems." (from f594065..)
In discussion (around 9855d8..) "it turned out that indeed real
HW/BIOSes may choose to not set the valid bit and thus mark the
P-state as invalid. So this could be considered a fix for broken
BIOSes." (from 9855d8..)
which is great for Linux. Unfortunatly the Linux kernel, when
it tries to do the RDMSR under Xen it fails to get the right
value (it gets zero) as Xen traps it and returns zero. Hence
when dom0 uploads the P-states they will be unmodified and
we should take care of updating the frequencies with the right
values.
I've tested it under Dell Inc. PowerEdge T105 /0RR825, BIOS 1.3.2
08/20/2008 where this quirk can be observed (x86 == 0x10, model == 2).
Also on other AMD (x86 == 0x12, A8-3850; x86 = 0x14, AMD E-350) to
make sure the quirk is not applied there.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: stefan.bader@canonical.com
Do the MSR access here (and while at it, also the one reading
MSR_PSTATE_CUR_LIMIT) on the target CPU, and bound the loop over
amd_fixup_frequency() by max_hw_pstate (matching the one in
powernow_cpufreq_cpu_init()).
Ian Jackson [Fri, 15 Mar 2013 18:09:51 +0000 (18:09 +0000)]
Add DomU xz kernel decompression
Signed-off-by: Jan Beulich <jbeulich@novell.com>
xen-unstable changeset: 4afea3d65321c40bb8afec833c860f92176bfb42
xen-unstable date: Wed Mar 9 16:19:36 2011 +0000 Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ This is byte-for-byte identical to Bastian Blank's backport of the
same changeset to xen-4.1, as found in Debian xen_4.1.4-2.*
patch debian/patches/upstream-23002:eb64b8f8eebb -iwj ]
Jan Beulich [Tue, 12 Mar 2013 15:46:09 +0000 (16:46 +0100)]
x86/MSI: add mechanism to fully protect MSI-X table from PV guest accesses
This adds two new physdev operations for Dom0 to invoke when resource
allocation for devices is known to be complete, so that the hypervisor
can arrange for the respective MMIO ranges to be marked read-only
before an eventual guest getting such a device assigned even gets
started, such that it won't be able to set up writable mappings for
these MMIO ranges before Xen has a chance to protect them.
This also addresses another issue with the code being modified here,
in that so far write protection for the address ranges in question got
set up only once during the lifetime of a device (i.e. until either
system shutdown or device hot removal), while teardown happened when
the last interrupt was disposed of by the guest (which at least allowed
the tables to be writable when the device got assigned to a second
guest [instance] after the first terminated).
George Dunlap [Tue, 12 Mar 2013 15:30:41 +0000 (16:30 +0100)]
credit1: Use atomic bit operations for the flags structure
The flags structure is not protected by locks (or more precisely,
it is protected using an inconsistent set of locks); we therefore need
to make sure that all accesses are atomic-safe. This is particulary
important in the case of the PARKED flag, which if clobbered while
changing the YIELD bit will leave a vcpu wedged in an offline state.
Using the atomic bitops also requires us to change the size of the "flags"
element.
Spotted-by: Igor Pavlikevich <ipavlikevich@gmail.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
master changeset: be6507509454adf3bb5a50b9406c88504e996d5a
master date: 2013-03-04 13:37:39 +0100
Jan Beulich [Tue, 12 Mar 2013 15:30:04 +0000 (16:30 +0100)]
x86: defer processing events on the NMI exit path
Otherwise, we may end up in the scheduler, keeping NMIs masked for a
possibly unbounded period of time (until whenever the next IRET gets
executed). Enforce timely event processing by sending a self IPI.
Of course it's open for discussion whether to always use the straight
exit path from handle_ist_exception.
Jan Beulich [Tue, 12 Mar 2013 15:29:11 +0000 (16:29 +0100)]
SEDF: avoid gathering vCPU-s on pCPU0
The introduction of vcpu_force_reschedule() in 14320:215b799fa181 was
incompatible with the SEDF scheduler: Any vCPU using
VCPUOP_stop_periodic_timer (e.g. any vCPU of half way modern PV Linux
guests) ends up on pCPU0 after that call. Obviously, running all PV
guests' (and namely Dom0's) vCPU-s on pCPU0 causes problems for those
guests rather sooner than later.
So the main thing that was clearly wrong (and bogus from the beginning)
was the use of cpumask_first() in sedf_pick_cpu(). It is being replaced
by a construct that prefers to put back the vCPU on the pCPU that it
got launched on.
However, there's one more glitch: When reducing the affinity of a vCPU
temporarily, and then widening it again to a set that includes the pCPU
that the vCPU was last running on, the generic scheduler code would not
force a migration of that vCPU, and hence it would forever stay on the
pCPU it last ran on. Since that can again create a load imbalance, the
SEDF scheduler wants a migration to happen regardless of it being
apparently unnecessary.
Of course, an alternative to checking for SEDF explicitly in
vcpu_set_affinity() would be to introduce a flags field in struct
scheduler, and have SEDF set a "always-migrate-on-affinity-change"
flag.
Jan Beulich [Tue, 12 Mar 2013 15:24:58 +0000 (16:24 +0100)]
x86: make certain memory sub-ops return valid values
When a domain's shared info field "max_pfn" is zero,
domain_get_maximum_gpfn() so far returned ULONG_MAX, which
do_memory_op() in turn converted to -1 (i.e. -EPERM). Make the former
always return a sensible number (i.e. zero if the field was zero) and
have the latter no longer truncate return values.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
master changeset: 7ffc9779aa5120c5098d938cb88f69a1dda9a0fe
master date: 2013-03-04 10:16:04 +0100
Juergen Gross [Tue, 12 Mar 2013 15:23:15 +0000 (16:23 +0100)]
Avoid stale pointer when moving domain to another cpupool
When a domain is moved to another cpupool the scheduler private data pointers
in vcpu and domain structures must never point to an already freed memory
area.
While at it, simplify sched_init_vcpu() by using DOM2OP instead VCPU2OP.
xen, cpupools: Fix cpupool-move to make more consistent
The full order for creating new private data structures when moving
from one pool to another is now:
* Allocate all new structures
- Allocate a new private domain structure (but don't point there yet)
- Allocate per-vcpu data structures (but don't point there yet)
* Remove old structures
- Remove each vcpu, freeing the associated data structure
- Free the domain data structure
* Switch to the new structures
- Set the domain to the new cpupool, with the new private domain
structure
- Set each vcpu to the respective new structure, and insert
This is in line with a (fairly reasonable) assumption in credit2 that
the private structure of the domain will be the private structure
pointed to by the per-vcpu private structure.
Also fix a bug, in which insert_vcpu was called with the *old* vcpu
ops rather than the new ones.
Tim Deegan [Tue, 12 Mar 2013 15:22:03 +0000 (16:22 +0100)]
vmx: fix handling of NMI VMEXIT.
Call do_nmi() directly and explicitly re-enable NMIs rather than
raising an NMI through the APIC. Since NMIs are disabled after the
VMEXIT, the raised NMI would be blocked until the next IRET
instruction (i.e. the next real interrupt, or after scheduling a PV
guest) and in the meantime the guest will spin taking NMI VMEXITS.
Also, handle NMIs before re-enabling interrupts, since if we handle an
interrupt (and therefore IRET) before calling do_nmi(), we may end up
running the NMI handler with NMIs enabled.
Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
master changeset: 7dd3b06ff031c9a8c727df16c5def2afb382101c
master date: 2013-02-28 14:00:18 +0000
Jan Beulich [Fri, 8 Mar 2013 12:59:02 +0000 (13:59 +0100)]
x86: fix CMCI injection
This fixes the wrong use of literal vector 0xF7 with an "int"
instruction (invalidated by 25113:14609be41f36) and the fact that doing
the injection via a software interrupt was never valid anyway (because
cmci_interrupt() acks the LAPIC, which does the wrong thing if the
interrupt didn't get delivered though it).
In order to do latter, the patch introduces send_IPI_self(), at once
removing two opend coded uses of "genapic" in the IRQ handling code.
The IOMMU may stop processing page translations due to a perceived lack
of credits for writing upstream peripheral page service request (PPR)
or event logs. If the L2B miscellaneous clock gating feature is enabled
the IOMMU does not properly register credits after the log request has
completed, leading to a potential system hang.
BIOSes are supposed to disable L2B micellaneous clock gating by setting
L2_L2B_CK_GATE_CONTROL[CKGateL2BMiscDisable](D0F2xF4_x90[2]) = 1b. This
patch corrects that for those which do not enable this workaround.
Tim Deegan [Fri, 8 Mar 2013 12:46:22 +0000 (13:46 +0100)]
x86/setup: don't relocate the VGA hole.
Copying the contents of the VGA hole is at best pointless and at worst
dangerous. Booting Xen on Xen, it causes a very long delay as each
byte is referred to qemu.
Since we were already discarding the first 1MB of the relocated area,
just avoid copying it in the first place.
Reported-by: Jon Ludlam <jonathan.ludlam@eu.citrix.com> Signed-off-by: Tim Deegan <tim@xen.org>
master changeset: 0b76ce20de85ad7c23c47ee3275020859b91d46b
master date: 2013-02-14 12:20:58 +0000
Michael Young [Wed, 13 Feb 2013 17:00:15 +0000 (17:00 +0000)]
tools: Fix memset(&p,0,sizeof(p)) idiom in several places.
gcc 4.8 identifies several places where code of the form memset(x, 0,
sizeof(x)); is used incorrectly, meaning that less memory is set to
zero than required.
Signed-off-by: Michael Young <m.a.young@durham.ac.uk> Committed-by: Keir Fraser <keir@xen.org>
(cherry picked from commit d119301b5816b39b5ba722a2f8b301b37e8e34bd)
libxl: Fix uninitialized variable in libxl_create_stubdom
It is used for result domid from libxl__domain_make, but actually this
function have assert on an initial value.
This patch is intended for xen-4.1 only - 4.2 and later have reworked
this part of code already containing the fix.
Signed-off-by: Marek Marczykowski <marmarek@invisiblethingslab.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Dario Faggioli [Fri, 15 Feb 2013 14:31:55 +0000 (15:31 +0100)]
xen: sched_credit: improve picking up the idle CPU for a VCPU
In _csched_cpu_pick() we try to select the best possible CPU for
running a VCPU, considering the characteristics of the underlying
hardware (i.e., how many threads, core, sockets, and how busy they
are). What we want is "the idle execution vehicle with the most
idling neighbours in its grouping".
In order to achieve it, we select a CPU from the VCPU's affinity,
giving preference to its current processor if possible, as the basis
for the comparison with all the other CPUs. Problem is, to discount
the VCPU itself when computing this "idleness" (in an attempt to be
fair wrt its current processor), we arbitrarily and unconditionally
consider that selected CPU as idle, even when it is not the case,
for instance:
1. If the CPU is not the one where the VCPU is running (perhaps due
to the affinity being changed);
2. The CPU is where the VCPU is running, but it has other VCPUs in
its runq, so it won't go idle even if the VCPU in question goes.
22005(...) line (the first line) means _csched_cpu_pick() was called
on VCPU 1 of domain 10, while it is running on CPU 0, and it choose
CPU 8, which is busy ('|'), even if there are plenty of idle
CPUs. That is because, as a consequence of changing the VCPU affinity,
CPU 8 was chosen as the basis for the comparison, and therefore
considered idle (its bit gets unconditionally set in the bitmask
representing the idle CPUs). 28004(...) line means the VCPU is woken
up and queued on CPU 8's runq, where it waits for a context switch or
a migration, in order to be able to execute.
This change fixes things by only considering the "guessed" CPU idle if
the VCPU in question is both running there and is its only runnable
VCPU.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
xen-unstable changeset: 26287:127c2c47d440
xen-unstable date: Tue Dec 18 18:10:18 UTC 2012
Jan Beulich [Fri, 15 Feb 2013 14:30:38 +0000 (15:30 +0100)]
AMD IOMMU: also spot missing IO-APIC entries in IVRS table
Apart from dealing duplicate conflicting entries, we also have to
handle firmware omitting IO-APIC entries in IVRS altogether. Not doing
so has resulted in c/s 26517:601139e2b0db to crash such systems during
boot (whereas with the change here the IOMMU gets disabled just as is
being done in the other cases, i.e. unless global tables are being
used).
Debugging this issue has also pointed out that the debug log output is
pretty ugly to look at - consolidate the output, and add one extra
item for the IVHD special entries, so that future issues are easier
to analyze.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26531:e68f14b9e739
xen-unstable date: Thu Feb 14 08:40:52 UTC 2013
Ian Campbell [Fri, 15 Feb 2013 11:50:45 +0000 (11:50 +0000)]
tools/ocaml: oxenstored: correctly handle a full ring.
Change 26521:2c0fd406f02c (part of XSA-38 / CVE-2013-0215) incorrectly
caused us to ignore rather than process a completely full ring. Check if
producer and consumer are equal before masking to avoid this, since prod ==
cons + PAGE_SIZE after masking becomes prod == cons.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org> Committed-by: Ian Campbell <ian.campbell@citrix.com>
xen-unstable changeset: 26539:759574df84a6 Backport-requested-by: security@xen.org Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 12 Feb 2013 12:33:19 +0000 (13:33 +0100)]
x86: restore (optional) forwarding of PCI SERR induced NMI to Dom0
c/s 22949:54fe1011f86b removed the forwarding of NMIs to Dom0 when they
were caused by PCI SERR. NMI buttons as well as BMCs (like HP's iLO)
may however want such events to be seen in Dom0 (e.g. to trigger a
dump).
Therefore restore most of the functionality which named c/s removed
(adjusted for subsequent changes, and adjusting the public interface to
use the modern term, retaining the old one for backwards
compatibility).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26440:5af4f2ab06f3
xen-unstable date: Tue Jan 22 08:33:10 UTC 2013
Boris Ostrovsky [Tue, 12 Feb 2013 12:32:05 +0000 (13:32 +0100)]
x86/AMD: Enable WC+ memory type on family 10 processors
In some cases BIOS may not enable WC+ memory type on family 10 processors,
instead converting what would be WC+ memory to CD type. On guests using
nested pages this could result in performance degradation. This patch
enables WC+.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
xen-unstable changeset: 26427:8f6dd5dc5d6c
xen-unstable date: Fri Jan 18 11:20:58 UTC 2013
Ian Jackson [Thu, 7 Feb 2013 14:26:37 +0000 (14:26 +0000)]
oxenstored: Enforce a maximum message size of 4096 bytes
The maximum size of a message is part of the protocol spec in
xen/include/public/io/xs_wire.h
Before this patch a client which sends an overly large message can
cause a buffer read overrun.
Note if a badly-behaved client sends a very large message
then it will be difficult for them to make their connection
work again-- they will probably need to reboot.
This is a security issue, part of XSA-38 / CVE-2013-0215.
Signed-off-by: David Scott <dave.scott@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 26522:ffd30e7388ad Backport-requested-by: security@xen.org Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Thu, 7 Feb 2013 14:26:29 +0000 (14:26 +0000)]
tools/ocaml: oxenstored: Be more paranoid about ring reading
oxenstored makes use of the OCaml Xenbus bindings, in which the
function xs_ring_read in tools/ocaml/libs/xb/xs_ring_stubs.c is used
to read from the shared memory Xenstore ring.
This function does not correctly handle all possible (prod, cons)
states when MASK_XENSTORE_IDX(prod) > MASK_XENSTORE_IDX(cons).
The root cause is the use of the unmasked values of prod and cons to
calculate to_read. If prod is set to an out-of-range value, the ring
peer can cause to_read to be too large or even negative. This allows
the ring peer to force oxenstored to read and write out of range for
the buffers leading to a crash or possibly to privilege escalation.
Correct this by masking the values of cons and prod at the start, so
we only deal with masked values. This makes the logic simpler, as
semantically inappropriate values of the upper bits of the ring
pointers are simply ignored.
The same vulnerability does not exist in the ring writer because the
only use made of the unmasked value is the check which prevents the
prod pointer overtaking the cons pointer. A ring peer which defeats
this check will suffer only lost data.
However, additionally, precautions need to be taken to ensure that
req_cons and req_prod are only read once in each function. Without
the use of volatile or some asm construct, the compiler can "prove"
that req_cons and req_prod do not change unexpectedly and is permitted
to "amplify" the read of (say) req_cons into two reads at different
times, giving two different values for use as cons, and then use the
two sources of cons interchangeably. (The use of xen_mb() does not
forbid this.)
Therefore do the reads of req_cons and req_prod through a volatile
pointer in both xs_ring_read and xs_ring_write.
This is currently believed to be a theoretical vulnerability as we are
not aware of any compilers which amplify reads in this way.
This is a security issue, part of XSA-38 / CVE-2013-0215.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Matthew Daley <mattjd@gmail.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen-unstable changeset: 26521:2c0fd406f02c Backport-requested-by: security@xen.org Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Boris Ostrovsky [Tue, 5 Feb 2013 14:36:11 +0000 (15:36 +0100)]
AMD,IOMMU: Disable IOMMU if SATA Combined mode is on
AMD's SP5100 chipset can be placed into SATA Combined mode
that may cause prevent dom0 from booting when IOMMU is
enabled and per-device interrupt remapping table is used.
While SP5100 erratum 28 requires BIOSes to disable this mode,
some may still use it.
This patch checks whether this mode is on and, if per-device
table is in use, disables IOMMU.
This is XSA-36 / CVE-2013-0153.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Flipped operands of && in amd_iommu_init() to make the message issued
by amd_sp5100_erratum28() match reality (when amd_iommu_perdev_intremap
is zero, there's really no point in calling the function).
Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 26518:e379a23b0465
xen-unstable date: Tue Feb 5 14:21:25 UTC 2013
Jan Beulich [Tue, 5 Feb 2013 14:35:44 +0000 (15:35 +0100)]
AMD,IOMMU: Clean up old entries in remapping tables when creating new one
When changing the affinity of an IRQ associated with a passed
through PCI device, clear previous mapping.
This is XSA-36 / CVE-2013-0153.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
In addition, because some BIOSes may incorrectly program IVRS
entries for IOAPIC try to check for entry's consistency. Specifically,
if conflicting entries are found disable IOMMU if per-device
remapping table is used. If entries refer to bogus IOAPIC IDs
disable IOMMU unconditionally
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
xen-unstable changeset: 26517:601139e2b0db
xen-unstable date: Tue Feb 5 14:20:47 UTC 2013
Boris Ostrovsky [Tue, 5 Feb 2013 14:34:55 +0000 (15:34 +0100)]
ACPI: acpi_table_parse() should return handler's error code
Currently, the error code returned by acpi_table_parse()'s handler
is ignored. This patch will propagate handler's return value to
acpi_table_parse()'s caller.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com> Committed-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 26516:32d4516a97f0
xen-unstable date: Tue Feb 5 14:18:18 UTC 2013
Tim Deegan [Thu, 17 Jan 2013 12:43:26 +0000 (13:43 +0100)]
x86/mm: Fix loop increment in paging_log_dirty_range()
In 23417:53ef1f35a0f8 (the fix for XSA-27 / CVE-2012-5511), the
loop variable gets incremented twice, so the loop only clears every
second page of the bitmap. This might cause the tools to think that
pages are dirty when they are not.
Reported-by: Steven Noonan <snoonan@amazon.com> Reported-by: Matt Wilson <msw@amazon.com> Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Jan Beulich <jbeulich@suse.com>
Andre Przywara [Tue, 8 Jan 2013 09:23:37 +0000 (10:23 +0100)]
x86, amd: Disable way access filter on Piledriver CPUs
The Way Access Filter in recent AMD CPUs may hurt the performance of
some workloads, caused by aliasing issues in the L1 cache.
This patch disables it on the affected CPUs.
The issue is similar to that one of last year:
http://lkml.indiana.edu/hypermail/linux/kernel/1107.3/00041.html
This new patch does not replace the old one, we just need another
quirk for newer CPUs.
The performance penalty without the patch depends on the
circumstances, but is a bit less than the last year's 3%.
The workloads affected would be those that access code from the same
physical page under different virtual addresses, so different
processes using the same libraries with ASLR or multiple instances of
PIE-binaries. The code needs to be accessed simultaneously from both
cores of the same compute unit.
More details can be found here:
http://developer.amd.com/Assets/SharedL1InstructionCacheonAMD15hCPU.pdf
CPUs affected are anything with the core known as Piledriver.
That includes the new parts of the AMD A-Series (aka Trinity) and the
just released new CPUs of the FX-Series (aka Vishera).
The model numbering is a bit odd here: FX CPUs have model 2,
A-Series has model 10h, with possible extensions to 1Fh. Hence the
range of model ids.
Signed-off-by: Andre Przywara <osp@andrep.de>
Add and use MSR_AMD64_IC_CFG. Update the value whenever it is found to
not have all bits set, rather than just when it's zero.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26294:5fb0b8b838da
xen-unstable date: Wed Dec 19 10:42:09 UTC 2012
Jan Beulich [Tue, 8 Jan 2013 09:22:32 +0000 (10:22 +0100)]
IOMMU/ATS: fix maximum queue depth calculation
The capabilities register field is a 5-bit value, and the 5 bits all
being zero actually means 32 entries.
Under the assumption that amd_iommu_flush_iotlb() really just tried
to correct for the miscalculation above when adding 32 to the value,
that adjustment is also being removed.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by Xiantao Zhang <xiantao.zhang@intel.com> Acked-by: Wei Huang <wei.huang2@amd.com>
xen-unstable changeset: 26235:670b07e8d738
xen-unstable date: Wed Dec 5 08:52:14 UTC 2012
Jan Beulich [Tue, 8 Jan 2013 09:18:59 +0000 (10:18 +0100)]
x86/HPET: fix FSB interrupt masking
HPET_TN_FSB is not really suitable for masking interrupts - it merely
switches between the two delivery methods. The right way of masking is
through the HPET_TN_ENABLE bit (which really is an interrupt enable,
not a counter enable or some such). This is even more so with certain
chip sets not even allowing HPET_TN_FSB to be cleared on some of the
channels.
Further, all the setup of the channel should happen before actually
enabling the interrupt, which requires splitting legacy and FSB logic.
Finally this also fixes an S3 resume problem (HPET_TN_FSB did not get
set in hpet_broadcast_resume(), and hpet_msi_unmask() doesn't get
called from the general resume code either afaict).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
xen-unstable changeset: 26183:c139ca92edca
xen-unstable date: Thu Nov 22 09:03:23 UTC 2012
Jan Beulich [Tue, 8 Jan 2013 09:16:55 +0000 (10:16 +0100)]
passthrough/PCI: replace improper uses of pci_find_next_cap()
Using pci_find_next_cap() without prior pci_find_cap_offset() is bogus
(and possibly wrong, given that the latter doesn't check the
PCI_STATUS_CAP_LIST flag, which so far was checked in an open-coded way
only for the non-bridge case).
Once at it, fold the two calls into one, as we need its result in any
case.
Question is whether, without any caller left, pci_find_next_cap()
should be purged as well.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
xen-unstable changeset: 26179:ae6fb202b233
xen-unstable date: Tue Nov 20 07:58:31 UTC 2012
Lasse Collin [Wed, 19 Dec 2012 11:29:25 +0000 (12:29 +0100)]
XZ: Fix incorrect XZ_BUF_ERROR
From: Lasse Collin <lasse.collin@tukaani.org>
xz_dec_run() could incorrectly return XZ_BUF_ERROR if all of the
following was true:
- The caller knows how many bytes of output to expect and only
provides
that much output space.
- When the last output bytes are decoded, the caller-provided input
buffer ends right before the LZMA2 end of payload marker. So LZMA2
won't provide more output anymore, but it won't know it yet and
thus
won't return XZ_STREAM_END yet.
- A BCJ filter is in use and it hasn't left any unfiltered bytes in
the
temp buffer. This can happen with any BCJ filter, but in practice
it's more likely with filters other than the x86 BCJ.
This fixes <https://bugzilla.redhat.com/show_bug.cgi?id=3D735408>
where Squashfs thinks that a valid file system is corrupt.
This also fixes a similar bug in single-call mode where the
uncompressed size of a block using BCJ + LZMA2 was 0 bytes and caller
provided no output space. Many empty .xz files don't contain any
blocks and thus don't trigger this bug.
This also tweaks a closely related detail: xz_dec_bcj_run() could call
xz_dec_lzma2_run() to decode into temp buffer when it was known to be
useless. This was harmless although it wasted a minuscule number of
CPU cycles.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 23870:5c97b02f48fc
xen-unstable date: Thu Sep 22 17:34:27 UTC 2011
Lasse Collin [Wed, 19 Dec 2012 11:28:13 +0000 (12:28 +0100)]
XZ decompressor: Fix decoding of empty LZMA2 streams
From: Lasse Collin <lasse.collin@tukaani.org>
The old code considered valid empty LZMA2 streams to be corrupt.
Note that a typical empty .xz file has no LZMA2 data at all,
and thus most .xz files having no uncompressed data are handled
correctly even without this fix.
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen-unstable changeset: 23869:db1ea4b127cd
xen-unstable date: Thu Sep 22 17:33:48 UTC 2011
Greg Wettstein [Thu, 13 Dec 2012 14:35:58 +0000 (14:35 +0000)]
libxl: avoid blktap2 deadlock on cleanup
Establishes correct cleanup behavior for blktap devices. This patch
implements the release of the backend device before calling for
the destruction of the userspace component of the blktap device.
Without this patch the kernel xen-blkback driver deadlocks with
the blktap2 user control plane until the IPC channel is terminated by the
timeout on the select() call. This results in a noticeable delay
in the termination of the guest and causes the blktap minor
number which had been allocated to be orphaned.
Signed-off-by: Greg Wettstein <greg@enjellic.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 4 Dec 2012 18:50:03 +0000 (18:50 +0000)]
memop: limit guest specified extent order
Allowing unbounded order values here causes almost unbounded loops
and/or partially incomplete requests, particularly in PoD code.
The added range checks in populate_physmap(), decrease_reservation(),
and the "in" one in memory_exchange() architecturally all could use
PADDR_BITS - PAGE_SHIFT, and are being artificially constrained to
MAX_ORDER.
This is XSA-31 / CVE-2012-5515.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com>
Jan Beulich [Tue, 4 Dec 2012 18:50:01 +0000 (18:50 +0000)]
xen: fix error handling of guest_physmap_mark_populate_on_demand()
The only user of the "out" label bypasses a necessary unlock, thus
enabling the caller to lock up Xen.
Also, the function was never meant to be called by a guest for itself,
so rather than inspecting the code paths in depth for potential other
problems this might cause, and adjusting e.g. the non-guest printk()
in the above error path, just disallow the guest access to it.
Finally, the printk() (considering its potential of spamming the log,
the more that it's not using XENLOG_GUEST), is being converted to
P2M_DEBUG(), as debugging is what it apparently was added for in the
first place.
This is XSA-30 / CVE-2012-5514.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com>
Jan Beulich [Tue, 4 Dec 2012 18:49:56 +0000 (18:49 +0000)]
xen: add missing guest address range checks to XENMEM_exchange handlers
Ever since its existence (3.0.3 iirc) the handler for this has been
using non address range checking guest memory accessors (i.e.
the ones prefixed with two underscores) without first range
checking the accessed space (via guest_handle_okay()), allowing
a guest to access and overwrite hypervisor memory.
This is XSA-29 / CVE-2012-5513.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com>
Jan Beulich [Tue, 4 Dec 2012 18:49:53 +0000 (18:49 +0000)]
x86/HVM: range check xen_hvm_set_mem_access.hvmmem_access before use
Otherwise an out of bounds array access can happen if changing the
default access is being requested, which - if it doesn't crash Xen -
would subsequently allow reading arbitrary memory through
HVMOP_get_mem_access (again, unless that operation crashes Xen).
This is XSA-28 / CVE-2012-5512.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com>
Tim Deegan [Tue, 4 Dec 2012 18:49:49 +0000 (18:49 +0000)]
hvm: Limit the size of large HVM op batches
Doing large p2m updates for HVMOP_track_dirty_vram without preemption
ties up the physical processor. Integrating preemption into the p2m
updates is hard so simply limit to 1GB which is sufficient for a 15000
* 15000 * 32bpp framebuffer.
For HVMOP_modified_memory and HVMOP_set_mem_type preemptible add the
necessary machinery to handle preemption.
This is CVE-2012-5511 / XSA-27.
Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com>
x86/paging: Don't allocate user-controlled amounts of stack memory.
This is XSA-27 / CVE-2012-5511.
Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
v2: Provide definition of GB to fix x86-32 compile.
Signed-off-by: Jan Beulich <JBeulich@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 4 Dec 2012 18:49:42 +0000 (18:49 +0000)]
gnttab: fix releasing of memory upon switches between versions
gnttab_unpopulate_status_frames() incompletely freed the pages
previously used as status frame in that they did not get removed from
the domain's xenpage_list, thus causing subsequent list corruption
when those pages did get allocated again for the same or another purpose.
Similarly, grant_table_create() and gnttab_grow_table() both improperly
clean up in the event of an error - pages already shared with the guest
can't be freed by just passing them to free_xenheap_page(). Fix this by
sharing the pages only after all allocations succeeded.
This is CVE-2012-5510 / XSA-26.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Committed-by: Ian Jackson <ian.jackson.citrix.com>