Jan Beulich [Fri, 28 Feb 2014 16:04:04 +0000 (17:04 +0100)]
vsprintf: introduce %pv extended format specifier to print domain/vcpu ID pair
... in a simplified and consistent way.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Tim Deegan [Thu, 21 Nov 2013 13:02:34 +0000 (13:02 +0000)]
x86/mm: Don't allow p2m allocation after memory is allocated.
This avoids a potentially long loop populating the p2m table from the
m2p. Since there's no reason to turn on translate mode after the
domain is already running, this shouldn't be a problem.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Thu, 30 Jan 2014 21:34:16 +0000 (22:34 +0100)]
mem_event: Return previous value of CR0/CR3/CR4 on change.
This patch extends the information returned for CR0/CR3/CR4 register
write events with the previous value of the register. The old value
was already passed to the trap processing function, just never placed
into the returned request. By returning this value, applications
subscribing the CR events obtain additional context about the event.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Acked-by: Tim Deegan <tim@xen.org>
ns16550: Add support for UART present in Broadcom TruManage capable NetXtreme chips
Since it is an MMIO device, the code has been modified to accept MMIO based
devices as well. MMIO device settings are populated in the 'uart_config' table.
It also advertises 64 bit BAR. Therefore, code is reworked to account for 64
bit BAR and 64 bit MMIO lengths.
Some more quirks are - the need to shift the register offset by a specific
value and we also need to verify (UART_LSR_THRE && UART_LSR_TEMT) bits before
transmitting data.
While testing, include com1=115200,8n1,pci,0 on the xen cmdline to observe
output on console using SoL.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Thomas Lendacky <Thomas.Lendacky@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Wed, 26 Feb 2014 16:21:22 +0000 (17:21 +0100)]
x86/time: Remove redundant RTC REG_B read
RTC_ALWAYS_BCD is always defined by default, meaning that we will
unconditionally enter the if statement. Reordering the condition allows
short-circult evaluation to remove a redundant CMOS read.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 25 Feb 2014 08:40:31 +0000 (09:40 +0100)]
x86: generic MSRs save/restore
This patch introduces a generic MSRs save/restore mechanism, so that
in the future new MSRs' save/restore could be added w/ smaller change
than the full blown addition of a new save/restore type.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Liu Jinsong <jinsong.liu@intel.com> Acked-by: Keir Fraser <keir@xen.org>
Xudong Hao [Tue, 25 Feb 2014 08:38:21 +0000 (09:38 +0100)]
x86: MPX IA32_BNDCFGS msr handle
When MPX supported, a new guest-state field for IA32_BNDCFGS
is added to the VMCS. In addition, two new controls are added:
- a VM-exit control called "clear BNDCFGS"
- a VM-entry control called "load BNDCFGS."
VM exits always save IA32_BNDCFGS into BNDCFGS field of VMCS.
Signed-off-by: Xudong Hao <xudong.hao@intel.com> Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Unlikely, but in case VMX support is not available, not expose
MPX to hvm guest.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 25 Feb 2014 08:34:04 +0000 (09:34 +0100)]
x86/xsave: enable support for new ISA extensions
Intel has released a new version of Intel Architecture Instruction Set
Extensions Programming Reference, adding new features like AVX-512,
MPX, etc. Refer to
http://download-software.intel.com/sites/default/files/319433-015.pdf
This patch adds support for these new instruction set extensions
without enabling this support for guest use, yet.
It also adjusts XCR0 validation, at once fixing the definition of
XSTATE_ALL (which is not supposed to include bit 63).
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 25 Feb 2014 08:30:59 +0000 (09:30 +0100)]
x86/mce: Reduce boot-time logspam
When booting with "no-mce", the user does not need to be told that "MCE
support [was] disabled by bootparam" for each cpu. Furthermore, a file:line
reference is unnecessary.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tim Deegan [Tue, 25 Feb 2014 08:29:26 +0000 (09:29 +0100)]
x86/hvm/rtc: inject RTC periodic interupts from the vpt code
Let the vpt code drive the RTC's timer interrupts directly, as it does
for other periodic time sources, and fix up the register state in a
vpt callback when the interrupt is injected.
This fixes a hang seen on Windows 2003 in no-missed-ticks mode, where
when a tick was pending, the early callback from the VPT code would
always set REG_C.PF on every VMENTER; meanwhile the guest was in its
interrupt handler reading REG_C in a loop and waiting to see it clear.
One drawback is that a guest that attempts to suppress RTC periodic
interrupts by failing to read REG_C will receive up to 10 spurious
interrupts, even in 'strict' mode. However:
- since all previous RTC models have had this property (including
the current one, since 'no-ack' mode is hard-coded on) we're
pretty sure that all guests can handle this; and
- we're already playing some other interesting games with this
interrupt in the vpt code.
One other corner case: a guest that enables the PF timer interrupt,
masks the interupt in the APIC and then polls REG_C looking for PF
will not see PF getting set. The more likely case of enabling the
timers and masking the interrupt with REG_B.PIE is already handled
correctly.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tim Deegan [Tue, 25 Feb 2014 08:26:45 +0000 (09:26 +0100)]
x86/hvm/rtc: don't run the vpt timer when !REG_B.PIE
If the guest has not asked for interrupts, don't run the vpt timer
to generate them. This is a prerequisite for a patch to simplify how
the vpt interacts with the RTC, and also gets rid of a timer series in
Xen in a case where it's unlikely to be needed.
Instead, calculate the correct value for REG_C.PF whenever REG_C is
read or PIE is enabled. This allow a guest to poll for the PF bit
while not asking for actual timer interrupts. Such a guest would no
longer get the benefit of the vpt's timer modes.
Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ian Jackson [Mon, 24 Feb 2014 12:57:53 +0000 (12:57 +0000)]
libxl: Fix libxl_postfork_child_noexec deadlock etc.
libxl_postfork_child_noexec would nestedly reaquire the non-recursive
"no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
The result on Linux is that the process always deadlocks before
returning from this function.
This is used by xl's console child. So, the ultimate effect is that
xl with pygrub does not manage to connect to the pygrub console.
This behaviour was reported by Michael Young in Xen 4.4.0 RC5.
Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
not correct with SIGCHLD sharing. libxl_postfork_child_noexec is
documented to suffice if called only on one ctx. So deregistering the
ctx it's called on is not sufficient. Instead, we need a new approach
which discards the whole sigchld_user list and unconditionally removes
our SIGCHLD handler if we had one.
Prompted by this, clarify the semantics of
libxl_postfork_child_noexec. Specifically, expand on the meaning of
"quickly" by explaining what operations are not permitted; and
document the fact that the function doesn't reclaim the resources in
the ctxs.
And add a comment in libxl_postfork_child_noexec explaining the
internal concurrency situation.
This is an important bugfix. IMO the bug is a blocker for Xen 4.4.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reported-by: M A Young <m.a.young@durham.ac.uk> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
(cherry picked from commit 5be1e95318147855713709094e6847e3104ae910)
Julien Grall [Mon, 24 Feb 2014 11:33:00 +0000 (12:33 +0100)]
iommu: don't need to map dom0 page when the PT is shared
Currently iommu_init_dom0 is browsing the page list and call map_page callback
on each page.
On both AMD and VTD drivers, the function will directly return if the page
table is shared with the processor. So Xen can safely avoid to run through
the page list.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 24 Feb 2014 11:11:01 +0000 (12:11 +0100)]
x86/MSI: don't risk division by zero
The check in question is redundant with the one in the immediately
following if(), where dividing by zero gets carefully avoided.
Spotted-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Yang Zhang [Mon, 24 Feb 2014 11:09:52 +0000 (12:09 +0100)]
Nested VMX: update nested paging mode on vmexit
Since SVM and VMX use different mechanism to emulate the virtual-vmentry
and virtual-vmexit, it's hard to update the nested paging mode correctly in
common code. So we need to update the nested paging mode in their respective
code path.
SVM already updates the nested paging mode on vmexit. This patch adds the same
logic in VMX side.
Previous discussion is here:
http://lists.xen.org/archives/html/xen-devel/2013-12/msg01759.html
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Christoph Egger <chegger@amazon.de>
vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
vmce_amd_[rd|wr]msr functions can handle accesses to AMD thresholding
registers. But due to this statement here:
switch ( msr & (MSR_IA32_MC0_CTL | 3) )
we are wrongly masking off top two bits which meant the register
accesses never made it to vmce_amd_* functions.
Corrected this problem by modifying the mask in this patch to allow
AMD thresholding registers to fall to 'default' case which in turn
allows vmce_amd_* functions to handle access to the registers.
While at it, remove some clutter in the vmce_amd* functions. Retained
current policy of returning zero for reads and ignoring writes.
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Frediano Ziglio [Mon, 24 Feb 2014 11:07:41 +0000 (12:07 +0100)]
x86/MCE: Fix race condition in mctelem_reserve
These lines (in mctelem_reserve)
newhead = oldhead->mcte_next;
if (cmpxchgptr(freelp, oldhead, newhead) == oldhead) {
are racy. After you read the newhead pointer it can happen that another
flow (thread or recursive invocation) change all the list but set head
with same value. So oldhead is the same as *freelp but you are setting
a new head that could point to whatever element (even already used).
This patch use instead a bit array and atomic bit operations.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Ian Jackson [Tue, 18 Feb 2014 16:43:42 +0000 (16:43 +0000)]
libxl: Properly declare libxlu_disk_l.h in AUTOINCS
This is necessary so that make doesn't do things which depend on this
file until flex has finished producing it.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Olaf Hering <olaf@aepfle.de> Tested-by: Olaf Hering <olaf@aepfle.de> CC: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Julien Grall [Tue, 18 Feb 2014 13:58:21 +0000 (13:58 +0000)]
xen/arm: Save/restore GICH_VMCR on domain context switch
GICH_VMCR register contains alias to important bits of GICV interface such as:
- priority mask of the CPU
- EOImode
- ...
We were safe because Linux guest always use the same value for this bits.
When new guests will handle priority or change EOI mode, VCPU interrupt
management will be in a wrong state.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Tue, 18 Feb 2014 16:56:17 +0000 (16:56 +0000)]
xen/arm: Correctly handle non-page aligned pointer in raw_copy_from_guest
The current implementation of raw_copy_guest helper may lead to data corruption
and sometimes Xen crash when the guest virtual address is not aligned to
PAGE_SIZE.
When the total length is higher than a page, the length to read is badly
compute with
min(len, (unsigned)(PAGE_SIZE - offset))
As the offset is only computed one time per function, if the start address was
not aligned to PAGE_SIZE, we can end up in same iteration:
- to read accross page boundary => xen crash
- read the previous page => data corruption
This issue can be resolved by setting offset to 0 at the end of the first
iteration. Indeed, after it, the virtual guest address is always aligned
to PAGE_SIZE.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: George Dunlap <george.dunlap@citrix.com>
[ ijc -- duplicated the comment in the other two functions with this behaviour ]
Mukesh Rathor [Thu, 13 Feb 2014 16:56:39 +0000 (17:56 +0100)]
pvh: Fix regression due to assumption that HVM paths MUST use io-backend device
The commit 09bb434748af9bfe3f7fca4b6eef721a7d5042a4
"Nested VMX: prohibit virtual vmentry/vmexit during IO emulation"
assumes that the HVM paths are only taken by HVM guests. With the PVH
enabled that is no longer the case - which means that we do not have
to have the IO-backend device (QEMU) enabled.
****************************************
Panic on CPU 7:
FATAL PAGE FAULT
[error_code=0000]
Faulting linear address: 000000000000001e
****************************************
as we do not have an io based backend. In the case that the
PVH guest does run an HVM guest inside it - we need to do
further work to suport this - and for now the check will
bail us out.
We also fix spelling mistakes and the sentence structure.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: "Zhang, Yang Z" <yang.z.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Yang Zhang [Thu, 13 Feb 2014 15:50:22 +0000 (15:50 +0000)]
When enabling log dirty mode, it sets all guest's memory to readonly.
And in HAP enabled domain, it modifies all EPT entries to clear write bit
to make sure it is readonly. This will cause problem if VT-d shares page
table with EPT: the device may issue a DMA write request, then VT-d engine
tells it the target memory is readonly and result in VT-d fault.
Currnetly, there are two places will enable log dirty mode: migration and vram
tracking. Migration with device assigned is not allowed, so it is ok. For vram,
it doesn't need to set all memory to readonly. Only track the vram range is enough.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 13 Feb 2014 15:13:07 +0000 (15:13 +0000)]
xen: Don't use __builtin_stdarg_start().
Cset fca49a00 ("netbsd: build fix with gcc 4.5") changed the
definition of va_start() to use __builtin_va_start() rather than
__builtin_stdarg_start() for GCCs >= 4.5, but in fact GCC dropped
__builtin_stdarg_start() before v3.3.
Tim Deegan [Thu, 13 Feb 2014 12:13:58 +0000 (12:13 +0000)]
xen: stop trying to use the system <stdarg.h> and <stdbool.h>
We already have our own versions of the stdarg/stdbool definitions, for
systems where those headers are installed in /usr/include.
On linux, they're typically installed in compiler-specific paths, but
finding them has proved unreliable. Drop that and use our own versions
everywhere.
Daniel De Graaf [Tue, 11 Feb 2014 15:25:17 +0000 (10:25 -0500)]
docs/vtpm: fix auto-shutdown reference
The automatic shutdown feature of the vTPM was removed because it
interfered with pv-grub measurement support and was also not triggered
if the guest did not use the vTPM. Virtual TPM domains will need to be
shut down or destroyed on guest shutdown via a script or other user
action.
This also fixes an incorrect reference to the vTPM being PV-only.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 12 Feb 2014 14:27:37 +0000 (14:27 +0000)]
xl: suppress suspend/resume functions on platforms which do not support it.
ARM does not (currently) support migration, so stop offering tasty looking
treats like "xl migrate".
Apart from the UI improvement my intention is to use this in osstest to detect
whether to attempt the save/restore/migrate tests.
Other than the additions of the #define/#ifdef there is a tiny bit of code
motion ("dump-core" in the command list and core_dump_domain in the
implementations) which serves to put ifdeffable bits next to each other.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 12 Feb 2014 16:52:26 +0000 (16:52 +0000)]
xen: Drop N from rcN in XEN_EXTRAVERSION
Having this here means we have to wait for a push gate pass, or fart
about which explicit pushes to master, to make an RC. The boot
messages for git builds already contain the git revision (as a
shorthash).
I will change the tarball creation checklist to seddery the -rc back
to -rcN, along with the other release-management-related changes (like
using an embedded copy of qemu).
If this patch meets with approval it should be thrown into the push
gate today, along with the patch for XSA-88, and then hopefully
nothing much else, so that we can get something suitable for making an
RC from by Friday.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Release-Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Wed, 12 Feb 2014 12:49:11 +0000 (13:49 +0100)]
blkif: drop struct blkif_request_segment_aligned
Commit 5148b7b5 ("blkif: add indirect descriptors interface to public
headers") added this without really explaining why it is needed: The
structure is identical to struct blkif_request_segment apart from the
padding field not being given a name in the pre-existing type. Their
size and alignment - which are what is relevant - are identical as long
as __alignof__(uint32_t) == 4 (which I think we rely upon in various
other places, so we can take as given).
Also correct a few minor glitches in the description, including for it
to no longer assume PAGE_SIZE == 4096.
Ian Campbell [Tue, 11 Feb 2014 14:11:04 +0000 (14:11 +0000)]
xen: arm: correct terminology for cache flush macros
The term "flush" is slightly ambiguous. The correct ARM term for for this
operaton is clean, as opposed to clean+invalidate for which we also now have a
function.
This is a pure rename, no functional change.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
This approach has a short coming in that it breaks when a guest enables its
MMU (SCTLR.M, disabling HCR.DC) without enabling caches (SCTLR.C) first/at the
same time. It turns out that FreeBSD does this.
This has now been fixed (yet) another way (third time is the charm!) so remove
this support. The original commit contained some fixes which are still
relevant even with the revert of the bulk of the patch:
- Correction to HSR_SYSREG_CRN_MASK
- Rename of HSR_SYSCTL macros to avoid naming clash
- Definition of some additional cp reg specifications
Since these are still useful they are not reverted.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Tue, 11 Feb 2014 14:11:02 +0000 (14:11 +0000)]
xen/arm: clean and invalidate all guest caches by VMID after domain build.
Guests are initially started with caches disabled and so we need to make sure
they see consistent data in RAM (requiring a cache clean) but also that they
do not have old stale data suddenly appear in the caches when they enable
their caches (requiring the invalidate).
This can be split into two halves. First we must flush each page as it is
allocated to the guest. It is not sufficient to do the flush at scrub time
since this will miss pages which are ballooned out by the guest (where the
guest must scrub if it cares about not leaking the pagecontent). We need to
clean as well as invalidate to make sure that any scrubbing which has occured
gets committed to real RAM. To achieve this add a new cacheflush_page function,
which is a stub on x86.
Secondly we need to flush anything which the domain builder touches, which we
do via a new domctl.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Cc: keir@xen.org
Jan Beulich [Tue, 11 Feb 2014 10:14:10 +0000 (11:14 +0100)]
flask: check permissions first thing in flask_security_set_bool()
Nothing else should be done if the caller isn't permitted to set
boolean values.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Jan Beulich [Tue, 11 Feb 2014 10:13:22 +0000 (11:13 +0100)]
flask: fix error propagation from flask_security_set_bool()
The function should return an error when flask_security_make_bools()
fails as well as when the input ID is out of range.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Jan Beulich [Tue, 11 Feb 2014 10:11:48 +0000 (11:11 +0100)]
flask: fix memory leaks
Plus, in the case of security_preserve_bools(), prevent double freeing
in the case of security_get_bools() failing.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Jan Beulich [Mon, 10 Feb 2014 09:05:24 +0000 (10:05 +0100)]
AMD IOMMU: fail if there is no southbridge IO-APIC
... but interrupt remapping is requested (with per-device remapping
tables). Without it, the timer interrupt is usually not working.
Inspired by Linux'es "iommu/amd: Work around wrong IOAPIC device-id in
IVRS table" (commit c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059) by Joerg
Roedel <joerg.roedel@amd.com>.
Reported-by: Eric Houby <ehouby@yahoo.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Eric Houby <ehouby@yahoo.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
x86/AMD: apply workaround for AMD F16h erratum 792
Workaround for the Erratum will be in BIOSes spun only after
Jan 2014 onwards. But initial production parts shipped in 2013
itself. Since there is a coverage hole, we should carry this fix
in software in case BIOS does not do the right thing or someone
is using old BIOS.
Description:
Processor does not ensure DRAM scrub read/write sequence is atomic wrt
accesses to CC6 save state area. Therefore if a concurrent scrub
read/write access is to same address the entry may appear as if it is
not written. This quirk applies to Fam16h models 00h-0Fh
See "Revision Guide" for AMD F16h models 00h-0fh, document 51810 rev.
3.04, Nov 2013.
Equivalent Linux patch link:
http://marc.info/?l=linux-kernel&m=139066012217149&w=2
Tested the patch on Fam16h server platform and it works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Corrected checking for boot CPU. Made warning message conditional.
Compacted warning message text. Moved comment to commit message.
Ian Jackson [Thu, 6 Feb 2014 19:17:26 +0000 (19:17 +0000)]
libxl: test programs: Fix make race re libxenlight.so
The test programs were getting the proper libxenlight.so on their link
line. Filter it out. Also change the soname of the test library to
match the real one, so that libxutil is satisfied with it.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Thu, 6 Feb 2014 18:41:24 +0000 (18:41 +0000)]
libxl: test programs: Fix Makefile race re headers
We need to include the new TEST_PROG_OBJS and LIBXL_TEST_OBJS in the
appropriate dependencies. Otherwise we risk trying to build the test
program before gentypes is run.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <Ian.Campbell@citrix.com>
libvchan: Fix handling of invalid ring buffer indices
The remote (hostile) process can set ring buffer indices to any value
at any time. If that happens, it is possible to get "buffer space"
(either for writing data, or ready for reading) negative or greater
than buffer size. This will end up with buffer overflow in the second
memcpy inside of do_send/do_recv.
Fix this by introducing new available bytes accessor functions
raw_get_data_ready and raw_get_buffer_space which are robust against
mad ring states, and only return sanitised values.
Proof sketch of correctness:
Now {rd,wr}_{cons,prod} are only ever used in the raw available bytes
functions, and in do_send and do_recv.
The raw available bytes functions do unsigned arithmetic on the
returned values. If the result is "negative" or too big it will be
>ring_size (since we used unsigned arithmetic). Otherwise the result
is a positive in-range value representing a reasonable ring state, in
which case we can safely convert it to int (as the rest of the code
expects).
do_send and do_recv immediately mask the ring index value with the
ring size. The result is always going to be plausible. If the ring
state has become mad, the worst case is that our behaviour is
inconsistent with the peer's ring pointer. I.e. we read or write to
arguably-incorrect parts of the ring - but always parts of the ring.
And of course if a peer misoperates the ring they can achieve this
effect anyway.
So the security problem is fixed.
This is XSA-86.
(The patch is essentially Ian Jackson's work, although parts of the
commit message are by Marek.)
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Thu, 6 Feb 2014 15:33:50 +0000 (16:33 +0100)]
flask: fix reading strings from guest memory
Since the string size is being specified by the guest, we must range
check it properly before doing allocations based on it. While for the
two cases that are exposed only to trusted guests (via policy
restriction) this just uses an arbitrary upper limit (PAGE_SIZE), for
the FLASK_[GS]ETBOOL case (which any guest can use) the upper limit
gets enforced based on the longest name across all boolean settings.
This is XSA-84.
Reported-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Ian Jackson [Fri, 31 Jan 2014 15:07:55 +0000 (15:07 +0000)]
libxl: timeouts: Record deregistration when one occurs
When a timeout has occurred, it is deregistered. However, we failed
to record this fact by updating etime->func. As a result,
libxl__ev_time_isregistered would say `true' for a timeout which has
already happened.
It is necessary to clear etime->func before the callback, because the
callback might want to reinstate the timeout, or might free the etime
(or its containing struct) entirely.
The results are that we might try to have the timeout occur again
(causing problems for the call site), and/or corrupt the timeout list.
This fixes the timedereg event system unit test.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Fri, 31 Jan 2014 15:04:37 +0000 (15:04 +0000)]
libxl: timeouts: Break out time_occurs
Bring together the two places where etime->func() is called into a new
function time_occurs. For one call site this is pure code motion.
For the other the only semantic change is the introduction of a new
debugging message.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Mon, 3 Feb 2014 14:25:13 +0000 (14:25 +0000)]
libxl: events: timedereg internal unit test
Test timeout deregistration idempotency. In the current tree this
test fails because ev->func is not cleared, meaning that a timeout
can be removed from the list more than once, corrupting the list.
It is necessary to use multiple timeouts to demonstrate this bug,
because removing the very same entry twice from a list in quick
succession, without modifying the list in other ways in between,
doesn't actually corrupt the list. (Since removing an entry from a
doubly-linked list just copies next and back from the disappearing
entry into its neighbours.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Mon, 3 Feb 2014 14:17:46 +0000 (14:17 +0000)]
libxl: events: Makefile builds internal unit tests
We provide a new LIBXL_TESTS facility in the Makefile.
Also provide some helpful common routines for unit tests to use.
We don't want to put the weird test case entrypoints and the weird
test case code in the main libxl.so library. Symbol hiding prevents
us from simply directly linking the libxl_test_FOO.o in later. So
instead we provide a special library libxenlight_test.so which is used
only locally.
There are not yet any test cases defined; that will come in the next
patch.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Tue, 21 Jan 2014 15:05:37 +0000 (15:05 +0000)]
libxl: fork: Make SIGCHLD self-pipe nonblocking
Use the new libxl__pipe_nonblock and _close functions, rather than
open coding the same logic. Now the pipe is nonblocking, which avoids
a race which could result in libxl deadlocking in a multithreaded
program.
Reported-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 21 Jan 2014 14:58:10 +0000 (14:58 +0000)]
libxl: events: Break out libxl__pipe_nonblock, _close
Break out the pipe creation and destruction from the poller code
into two new functions libxl__pipe_nonblock and libxl__pipe_close.
Also change direct use of pipe() to libxl_pipe.
No overall functional difference other than minor differences in exact
log messages.
Also move libxl__self_pipe_wakeup and libxl__self_pipe_eatall into the
new pipe utilities section in libxl_event.c; this is pure code motion.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
--
v3: Mention that we switched pipe() -> libxl_pipe()
Ian Jackson [Fri, 17 Jan 2014 11:58:55 +0000 (11:58 +0000)]
libxl: fork: Share SIGCHLD handler amongst ctxs
Previously, an application which had multiple libxl ctxs in multiple
threads, would have to itself plumb SIGCHLD through to each ctx.
Instead, permit multiple libxl ctxs to all share the SIGCHLD handler.
We keep a list of all the ctxs which are interested in SIGCHLD and
notify all of their self-pipes.
In more detail:
* sigchld_owner, the ctx* of the SIGCHLD owner, is replaced by
sigchld_users, a list of SIGCHLD users.
* Each ctx keeps track of whether it is on the users list, so that
libxl__sigchld_needed and libxl__sigchld_notneeded now instead of
idempotently installing and removing the handler, idempotently add
or remove the ctx from the list.
We ensure that we always have the SIGCHLD handler installed
iff the sigchld_users list is nonempty. To make this a bit
easier we make sigchld_installhandler_core and
sigchld_removehandler_core idempotent.
Specifically, the call sites for sigchld_installhandler_core and
sigchld_removehandler_core are updated to manipulate sigchld_users
and only call the install or remove functions as applicable.
* In the signal handler we walk the list of SIGCHLD users and write
to each of their self-pipes. That means that we need to arrange to
defer SIGCHLD when we are manipulating the list (to avoid the
signal handler interrupting our list manipulation); this is quite
tiresome to arrange.
The code as written will, on the first installation of the SIGCHLD
handler, firstly install the real handler, then immediately replace
it with the deferral handler. Doing it this way makes the code
clearer as it makes the SIGCHLD deferral machinery much more
self-contained (and hence easier to reason about).
* The first part of libxl__sigchld_notneeded is broken out into a new
function sigchld_user_remove (which is also needed during for
postfork). And of course that first part of the function is now
rather different, as explained above.
* sigchld_installhandler_core no longer takes the gc argument,
because it now deals with SIGCHLD for all ctxs.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Include bugfixes from "Fixup SIGCHLD sharing" patch:
* Use a mutex for defer_sigchld, to guard against concurrency
between the thread calling defer_sigchld and an instance of the
primary signal handler on another thread.
* libxl_sigchld_owner_libxl_always is incompatible with SIGCHLD
sharing. Document this correctly.
Fix "have have" error in comment.
Move removal of newly unused variables to previous patch.
v2.1: Provide feature test macro LIBXL_HAVE_SIGCHLD_SHARING
Ian Jackson [Fri, 17 Jan 2014 15:42:31 +0000 (15:42 +0000)]
libxl: fork: Break out sigchld_sethandler_raw
We are going to want introduce another call site in the final
substantive patch.
Pure code motion; no functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
---
v3: Remove now-unused variables from sigchld_installhandler_core
Ian Jackson [Fri, 17 Jan 2014 12:01:24 +0000 (12:01 +0000)]
libxl: fork: Break out sigchld_installhandler_core
Pure code motion. This is going to make the final substantive patch
easier to read.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Fri, 17 Jan 2014 11:45:57 +0000 (11:45 +0000)]
libxl: fork: Rename sigchld handler functions
We are going to change these functions so that different libxl ctx's
can share a single SIGCHLD handler. Rename them now to a new name
which doesn't imply unconditional handler installation or removal.
Also note in the comments that they are idempotent.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Thu, 16 Jan 2014 17:03:34 +0000 (17:03 +0000)]
libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
This is the feature test macro for libxl_childproc_sigchld_occurred
and libxl_sigchld_owner_libxl_always_selective_reap.
It is split out into this separate patch because: a single feature
test is sensible because we do not intend anyone to release or ship
libxl versions with one of these but not the other; but, the two
features are in separate patches for clarity; and, this just makes
reading the actual code easier.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Thu, 16 Jan 2014 17:01:50 +0000 (17:01 +0000)]
libxl: fork: Provide ..._always_selective_reap
Applications exist which want to use libxl in an event-driven mode but
which do not integrate child termination into their event system, but
instead reap all their own children synchronously.
In such an application libxl must own SIGCHLD but avoid reaping any
children that don't belong to libxl.
Provide libxl_sigchld_owner_libxl_always_selective_reap which has this
behaviour.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
---
v2: Document the new mode in the big "Subprocess handling" comment.
Ian Jackson [Thu, 16 Jan 2014 16:57:27 +0000 (16:57 +0000)]
libxl: fork: Provide libxl_childproc_sigchld_occurred
Applications exist which don't keep track of all their child processes
in a manner suitable for coherent dispatch of their termination. In
such a situation, nothing in the whole process may call wait, or
waitpid(-1,,). Doing so reaps processes belonging to other parts of
the application and there is then no way to deliver the exit status to
the right place.
To facilitate this, provide a facility for such an application to ask
libxl to call waitpid on each of its children individually.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Thu, 16 Jan 2014 17:12:31 +0000 (17:12 +0000)]
libxl: fork: assert that chldmode is right
In libxl_childproc_reaped, check that the chldmode is as expected.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
libxl_sigchld_owner_libxl ought to have been mentioned in the list of
options for chldowner. Since it's the default, move the description
of the its behaviour into the description of that option.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Thu, 16 Jan 2014 16:55:04 +0000 (16:55 +0000)]
libxl: fork: Clarify docs for libxl_sigchld_owner
Clarify that libxl_sigchld_owner_libxl causes libxl to reap all the
process's children, and clarify the wording of the description of
libxl_sigchld_owner_libxl_always.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Thu, 16 Jan 2014 16:40:05 +0000 (16:40 +0000)]
libxl: fork: Break out childproc_reaped_ours
We're going to want to do this again at a new call site.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Thu, 16 Jan 2014 16:37:44 +0000 (16:37 +0000)]
libxl: fork: Break out checked_waitpid
This is a simple error-handling wrapper for waitpid. We're going to
want to call waitpid somewhere else and this avoids some of the
duplication.
No functional change in this patch. (Technically, we used to check
chldmode_ours again in the EINTR case, and don't now, but that can't
have changed because we continuously hold the libxl ctx lock.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 5 Feb 2014 14:43:38 +0000 (14:43 +0000)]
tools: Bump library SONAMEs for 4.4
There have been ABI/API changes in libxc. Bump its MAJOR (which
affets libxenguest et al too.)
There have been ABI changes in libxl. Bump its MAJOR.
(The API changes have been dealt with as we go along - there is
already a LIBXL_API_VERSION 0x040400.)
None of the other libraries have changed their interfaces. I have
verified this by building the tools and searching the dist/install
tree for files matching *.so.*. For each library that showed up, I
did this:
git-diff RELEASE-4.3.0..staging -- `find tools/FOO/ -name \*.h`
where FOO is the corresponding source directory.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
The function domain_page_map_to_mfn can be used to translate a virtual
address mapped by both map_domain_page and map_domain_page_global.
The former is using vmap to map the mfn, therefore domain_page_map_to_mfn
will always fail because the address is not in DOMHEAP range.
Check if the address is in vmap range and use __pa to translate it.
This patch fix guest shutdown when the event fifo is used.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: George Dunlap <george.dunlap@citrix.com>
Since the gic_remove_from_queues() and gic_irq_disable() called from
non interrupt context and they acquire the same lock as gic_set_guest_irq()
which called from interrupt context we must disable interrupts in these
functions to avoid possible deadlocks.
Andrew Cooper [Tue, 4 Feb 2014 18:01:10 +0000 (18:01 +0000)]
tools/libxc: Prevent erroneous success from xc_domain_restore
The variable 'rc' is set to 1 at the top of xc_domain_restore, and for the
most part is left alone until success, at which point it is set to 0.
There is a separate 'frc' which for the most part is used to check function
calls, keeping errors separate from 'rc'.
For a toolstack which sets callbacks->toolstack_restore(), and the function
returns 0, any subsequent error will end up with code flow going to "out;",
resulting in the migration being declared a success.
For consistency, update the callsites of xc_dom_gnttab{,_hvm}_seed() to use
'frc', even though their use of 'rc' is currently safe.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: George Dunlap <george.dunlap@eu.citrix.com>
xen: arm: Remove determining reset specific values from dts for XGENE.
This patch removes reading reset specific values (address, size and mask) from
dts and uses values defined in the code now. This is because currently xgene
reset driver (submitted in linux) is going through a change (which is not yet
accepted), this new driver has a new type of dts bindings for reset. Hence
till linux driver comes to some conclusion, we will use hardcoded values
instead of reading from dts so that xen code will not break due to the linux
transition.
Anthony PERARD [Fri, 31 Jan 2014 16:35:47 +0000 (16:35 +0000)]
libxl: Fix vcpu-set for PV guest.
vcpu-set will try to use the HVM path (through QEMU) instead of the PV
path (through xenstore) for a PV guest, if there is a QEMU running for
this domain. This patch check which kind of guest is running before
before doing any call.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Thu, 6 Feb 2014 11:20:48 +0000 (12:20 +0100)]
domctl: pause vCPU for context reads
"Base" context reads already paused the subject vCPU when being the
current one, but that special case isn't being properly dealt with
anyway (at the very least when x86's fsgsbase feature is in use), so
just disallow it.
"Extended" context reads so far didn't do any pausing.
While we can't avoid the reported data being stale by the time it
arrives at the caller, this way we at least guarantee that it is
consistent.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Thu, 6 Feb 2014 11:20:20 +0000 (12:20 +0100)]
domctl: also pause domain for "extended" context updates
This is not just for consistency with "base" context updates, but
actually needed so that guest side accesses can't race with control
domain side updates.
This would have been a security issue if XSA-77 hadn't waived them on
the affected domctl operation.
While looking at the code I also spotted a redundant NULL check in the
"base" context update handling code, which is being removed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Thu, 6 Feb 2014 11:19:06 +0000 (12:19 +0100)]
x86: fix FS/GS base handling when using the fsgsbase feature
In that case, due to the respective instructions not being privileged,
we can't rely on our in-memory data to always be correct: While the
guest is running, it may change without us knowing about it. Therefore
we need to
- read the correct values from hardware during context switch out
(save_segments())
- read the correct values from hardware during RDMSR emulation
- update in-memory values during guest mode change
(toggle_guest_mode())
For completeness/consistency, WRMSR emulation is also being switched
to use wr[fg]sbase().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Ian Campbell [Thu, 16 Jan 2014 15:27:59 +0000 (15:27 +0000)]
tools: libxl: do not set the PoD target on ARM
ARM does not implemented PoD and so returns ENOSYS from XENMEM_set_pod_target.
The correct solution here would be to check for ENOSYS in libxl, unfortunately
xc_domain_set_pod_target suffers from the same broken error reporting as the
rest of libxc and throws away the errno.
So for now conditionally define xc_domain_set_pod_target to return success
(which is what PoD does if nothing needs doing). xc_domain_get_pod_target sets
errno==-1 and returns -1, which matches the broken error reporting of the
existing function. It appears to have no in tree callers in any case.
The conditional should be removed once libxc has been fixed.
This makes ballooning (xl mem-set) work for ARM domains.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Cc: george.dunlap@citrix.com
Ian Campbell [Fri, 24 Jan 2014 14:23:07 +0000 (14:23 +0000)]
xen: arm: correct use of find_next_bit
find_next_bit takes a "const unsigned long *" but forcing a cast of an
"uint32_t *" throws away the alignment constraints and ends up causing an
alignment fault on arm64 if the input happened to be 4 but not 8 byte aligned.
Instead of casting use a temporary variable of the right type.
I've had a look around for similar constructs and the only thing I found was
maintenance_interrupt which cases a uint64_t down to an unsigned long, which
although perhaps not best advised is safe I think.
This was observed with the AArch64 Linaro toolchain 2013.12 but I think that
is just coincidental due to subtle changes to the stack layout etc.
Reported-by: Fu Wei <fu.wei@linaro.org> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
Wei Liu [Mon, 27 Jan 2014 17:53:38 +0000 (17:53 +0000)]
libxc: fix claim mode when creating HVM guest
The original code is wrong because:
* claim mode wants to know the total number of pages needed while
original code provides the additional number of pages needed.
* if pod is enabled memory will already be allocated by the time we try
to claim memory.
So the fix would be:
* move claim mode before actual memory allocation.
* pass the right number of pages to hypervisor.
The "right number of pages" should be number of pages of target memory
minus VGA_HOLE_SIZE, regardless of whether PoD is enabled.
This fixes bug #32.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com>