Tim Deegan [Thu, 28 Nov 2013 15:40:48 +0000 (15:40 +0000)]
bitmaps/bitops: Clarify tests for small constant size.
No semantic changes, just makes the control flow a bit clearer.
I was looking at this bcause the (-!__builtin_constant_p(x) | x__)
formula is too clever for Coverity, but in fact it always takes me a
minute or two to understand it too. :)
Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
Andrew Cooper [Tue, 4 Mar 2014 10:19:20 +0000 (11:19 +0100)]
x86: identify reset_stack_and_jump() as noreturn
reset_stack_and_jump() is actually a macro, but can effectivly become noreturn
by giving it an unreachable() declaration.
Propagate the 'noreturn-ness' up through the direct and indirect callers.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 4 Mar 2014 10:18:28 +0000 (11:18 +0100)]
misc cleanup as a result of the previous patches
This includes:
* A stale comment in sh_skip_sync()
* A dead for ever loop in __bug()
* A prototype for machine_power_off() which unimplemented in any architecture
* Replacing a for(;;); loop with unreachable()
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 4 Mar 2014 10:17:03 +0000 (11:17 +0100)]
identify panic and reboot/halt functions as noreturn
On an x86 build (GCC Debian 4.7.2-5), this substantially reduces the size of
.text and .init.text sections.
Experimentally, even in a non-debug build, GCC uses `call` rather than `jmp`
so there should be no impact on any stack trace generation.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Make a formal define for noreturn in compiler.h, and fix up opencoded uses of
__attribute__((noreturn)). This includes removing redundant uses with
function definitions which have a public declaration.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 4 Mar 2014 10:14:53 +0000 (11:14 +0100)]
x86/crash: fix up declaration of do_nmi_crash()
... so it can correctly be annotated as noreturn. Move the declaration of
nmi_crash() to be effectively private in crash.c
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 4 Mar 2014 10:01:57 +0000 (11:01 +0100)]
correctly use gcc's -x option
In Linux the improper use was found to cause problems with certain
distributed build environments. Even if not directly affecting us, be
on the safe side.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 4 Mar 2014 09:52:20 +0000 (10:52 +0100)]
IOMMU: generalize and correct softirq processing during Dom0 device setup
c/s 21039:95f5a4ce8f24 ("VT-d: reduce default verbosity") having put a
call to process_pending_softirqs() in VT-d's domain_context_mapping()
was wrong in two ways: For one we shouldn't be doing this when setting
up a device during DomU assignment. And then - I didn't check whether
that was the case already back then - we shouldn't call that function
with the pcidevs_lock (or in fact any spin lock) held.
Move the "preemption" into generic code, at once dealing with further
actual (too much output elsewhere - particularly on systems with very
many host bridge like devices - having been observed to still cause the
watchdog to trigger when enabled) and potential (other IOMMU code may
also end up being too verbose) issues.
Do the "preemption" once per device actually being set up when in
verbose mode, and once per bus otherwise.
Note that dropping pcidevs_lock around the process_pending_softirqs()
invocation is specifically not a problem here: We're in an __init
function and aren't racing with potential additions/removals of PCI
devices. Not acquiring the lock in setup_dom0_pci_devices() otoh is not
an option, as there are too many places that assert the lock being
held.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
Wei Liu [Fri, 28 Feb 2014 16:35:15 +0000 (17:35 +0100)]
mm: ensure useful progress in decrease_reservation
During my fun time playing with balloon driver I found that hypervisor's
preemption check kept decrease_reservation from doing any useful work
for 32 bit guests, resulting in hanging the guests.
As Andrew suggested, we can force the check to fail for the first
iteration to ensure progress. We did this in d3a55d7d9 "x86/mm: Ensure
useful progress in alloc_l2_table()" already.
After this change I cannot see the hang caused by continuation logic
anymore.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Fri, 28 Feb 2014 16:04:04 +0000 (17:04 +0100)]
vsprintf: introduce %pv extended format specifier to print domain/vcpu ID pair
... in a simplified and consistent way.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Tim Deegan [Thu, 21 Nov 2013 13:02:34 +0000 (13:02 +0000)]
x86/mm: Don't allow p2m allocation after memory is allocated.
This avoids a potentially long loop populating the p2m table from the
m2p. Since there's no reason to turn on translate mode after the
domain is already running, this shouldn't be a problem.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Thu, 30 Jan 2014 21:34:16 +0000 (22:34 +0100)]
mem_event: Return previous value of CR0/CR3/CR4 on change.
This patch extends the information returned for CR0/CR3/CR4 register
write events with the previous value of the register. The old value
was already passed to the trap processing function, just never placed
into the returned request. By returning this value, applications
subscribing the CR events obtain additional context about the event.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Acked-by: Tim Deegan <tim@xen.org>
ns16550: Add support for UART present in Broadcom TruManage capable NetXtreme chips
Since it is an MMIO device, the code has been modified to accept MMIO based
devices as well. MMIO device settings are populated in the 'uart_config' table.
It also advertises 64 bit BAR. Therefore, code is reworked to account for 64
bit BAR and 64 bit MMIO lengths.
Some more quirks are - the need to shift the register offset by a specific
value and we also need to verify (UART_LSR_THRE && UART_LSR_TEMT) bits before
transmitting data.
While testing, include com1=115200,8n1,pci,0 on the xen cmdline to observe
output on console using SoL.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Thomas Lendacky <Thomas.Lendacky@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Wed, 26 Feb 2014 16:21:22 +0000 (17:21 +0100)]
x86/time: Remove redundant RTC REG_B read
RTC_ALWAYS_BCD is always defined by default, meaning that we will
unconditionally enter the if statement. Reordering the condition allows
short-circult evaluation to remove a redundant CMOS read.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 25 Feb 2014 08:40:31 +0000 (09:40 +0100)]
x86: generic MSRs save/restore
This patch introduces a generic MSRs save/restore mechanism, so that
in the future new MSRs' save/restore could be added w/ smaller change
than the full blown addition of a new save/restore type.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Liu Jinsong <jinsong.liu@intel.com> Acked-by: Keir Fraser <keir@xen.org>
Xudong Hao [Tue, 25 Feb 2014 08:38:21 +0000 (09:38 +0100)]
x86: MPX IA32_BNDCFGS msr handle
When MPX supported, a new guest-state field for IA32_BNDCFGS
is added to the VMCS. In addition, two new controls are added:
- a VM-exit control called "clear BNDCFGS"
- a VM-entry control called "load BNDCFGS."
VM exits always save IA32_BNDCFGS into BNDCFGS field of VMCS.
Signed-off-by: Xudong Hao <xudong.hao@intel.com> Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Unlikely, but in case VMX support is not available, not expose
MPX to hvm guest.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Tue, 25 Feb 2014 08:34:04 +0000 (09:34 +0100)]
x86/xsave: enable support for new ISA extensions
Intel has released a new version of Intel Architecture Instruction Set
Extensions Programming Reference, adding new features like AVX-512,
MPX, etc. Refer to
http://download-software.intel.com/sites/default/files/319433-015.pdf
This patch adds support for these new instruction set extensions
without enabling this support for guest use, yet.
It also adjusts XCR0 validation, at once fixing the definition of
XSTATE_ALL (which is not supposed to include bit 63).
Signed-off-by: Jan Beulich <jbeulich@novell.com> Acked-by: Keir Fraser <keir@xen.org>
Andrew Cooper [Tue, 25 Feb 2014 08:30:59 +0000 (09:30 +0100)]
x86/mce: Reduce boot-time logspam
When booting with "no-mce", the user does not need to be told that "MCE
support [was] disabled by bootparam" for each cpu. Furthermore, a file:line
reference is unnecessary.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tim Deegan [Tue, 25 Feb 2014 08:29:26 +0000 (09:29 +0100)]
x86/hvm/rtc: inject RTC periodic interupts from the vpt code
Let the vpt code drive the RTC's timer interrupts directly, as it does
for other periodic time sources, and fix up the register state in a
vpt callback when the interrupt is injected.
This fixes a hang seen on Windows 2003 in no-missed-ticks mode, where
when a tick was pending, the early callback from the VPT code would
always set REG_C.PF on every VMENTER; meanwhile the guest was in its
interrupt handler reading REG_C in a loop and waiting to see it clear.
One drawback is that a guest that attempts to suppress RTC periodic
interrupts by failing to read REG_C will receive up to 10 spurious
interrupts, even in 'strict' mode. However:
- since all previous RTC models have had this property (including
the current one, since 'no-ack' mode is hard-coded on) we're
pretty sure that all guests can handle this; and
- we're already playing some other interesting games with this
interrupt in the vpt code.
One other corner case: a guest that enables the PF timer interrupt,
masks the interupt in the APIC and then polls REG_C looking for PF
will not see PF getting set. The more likely case of enabling the
timers and masking the interrupt with REG_B.PIE is already handled
correctly.
Signed-off-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tim Deegan [Tue, 25 Feb 2014 08:26:45 +0000 (09:26 +0100)]
x86/hvm/rtc: don't run the vpt timer when !REG_B.PIE
If the guest has not asked for interrupts, don't run the vpt timer
to generate them. This is a prerequisite for a patch to simplify how
the vpt interacts with the RTC, and also gets rid of a timer series in
Xen in a case where it's unlikely to be needed.
Instead, calculate the correct value for REG_C.PF whenever REG_C is
read or PIE is enabled. This allow a guest to poll for the PF bit
while not asking for actual timer interrupts. Such a guest would no
longer get the benefit of the vpt's timer modes.
Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ian Jackson [Mon, 24 Feb 2014 12:57:53 +0000 (12:57 +0000)]
libxl: Fix libxl_postfork_child_noexec deadlock etc.
libxl_postfork_child_noexec would nestedly reaquire the non-recursive
"no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
The result on Linux is that the process always deadlocks before
returning from this function.
This is used by xl's console child. So, the ultimate effect is that
xl with pygrub does not manage to connect to the pygrub console.
This behaviour was reported by Michael Young in Xen 4.4.0 RC5.
Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
not correct with SIGCHLD sharing. libxl_postfork_child_noexec is
documented to suffice if called only on one ctx. So deregistering the
ctx it's called on is not sufficient. Instead, we need a new approach
which discards the whole sigchld_user list and unconditionally removes
our SIGCHLD handler if we had one.
Prompted by this, clarify the semantics of
libxl_postfork_child_noexec. Specifically, expand on the meaning of
"quickly" by explaining what operations are not permitted; and
document the fact that the function doesn't reclaim the resources in
the ctxs.
And add a comment in libxl_postfork_child_noexec explaining the
internal concurrency situation.
This is an important bugfix. IMO the bug is a blocker for Xen 4.4.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reported-by: M A Young <m.a.young@durham.ac.uk> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com> Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
(cherry picked from commit 5be1e95318147855713709094e6847e3104ae910)
Julien Grall [Mon, 24 Feb 2014 11:33:00 +0000 (12:33 +0100)]
iommu: don't need to map dom0 page when the PT is shared
Currently iommu_init_dom0 is browsing the page list and call map_page callback
on each page.
On both AMD and VTD drivers, the function will directly return if the page
table is shared with the processor. So Xen can safely avoid to run through
the page list.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 24 Feb 2014 11:11:01 +0000 (12:11 +0100)]
x86/MSI: don't risk division by zero
The check in question is redundant with the one in the immediately
following if(), where dividing by zero gets carefully avoided.
Spotted-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Yang Zhang [Mon, 24 Feb 2014 11:09:52 +0000 (12:09 +0100)]
Nested VMX: update nested paging mode on vmexit
Since SVM and VMX use different mechanism to emulate the virtual-vmentry
and virtual-vmexit, it's hard to update the nested paging mode correctly in
common code. So we need to update the nested paging mode in their respective
code path.
SVM already updates the nested paging mode on vmexit. This patch adds the same
logic in VMX side.
Previous discussion is here:
http://lists.xen.org/archives/html/xen-devel/2013-12/msg01759.html
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Reviewed-by: Christoph Egger <chegger@amazon.de>
vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
vmce_amd_[rd|wr]msr functions can handle accesses to AMD thresholding
registers. But due to this statement here:
switch ( msr & (MSR_IA32_MC0_CTL | 3) )
we are wrongly masking off top two bits which meant the register
accesses never made it to vmce_amd_* functions.
Corrected this problem by modifying the mask in this patch to allow
AMD thresholding registers to fall to 'default' case which in turn
allows vmce_amd_* functions to handle access to the registers.
While at it, remove some clutter in the vmce_amd* functions. Retained
current policy of returning zero for reads and ignoring writes.
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Frediano Ziglio [Mon, 24 Feb 2014 11:07:41 +0000 (12:07 +0100)]
x86/MCE: Fix race condition in mctelem_reserve
These lines (in mctelem_reserve)
newhead = oldhead->mcte_next;
if (cmpxchgptr(freelp, oldhead, newhead) == oldhead) {
are racy. After you read the newhead pointer it can happen that another
flow (thread or recursive invocation) change all the list but set head
with same value. So oldhead is the same as *freelp but you are setting
a new head that could point to whatever element (even already used).
This patch use instead a bit array and atomic bit operations.
Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com> Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Ian Jackson [Tue, 18 Feb 2014 16:43:42 +0000 (16:43 +0000)]
libxl: Properly declare libxlu_disk_l.h in AUTOINCS
This is necessary so that make doesn't do things which depend on this
file until flex has finished producing it.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Olaf Hering <olaf@aepfle.de> Tested-by: Olaf Hering <olaf@aepfle.de> CC: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Julien Grall [Tue, 18 Feb 2014 13:58:21 +0000 (13:58 +0000)]
xen/arm: Save/restore GICH_VMCR on domain context switch
GICH_VMCR register contains alias to important bits of GICV interface such as:
- priority mask of the CPU
- EOImode
- ...
We were safe because Linux guest always use the same value for this bits.
When new guests will handle priority or change EOI mode, VCPU interrupt
management will be in a wrong state.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Tue, 18 Feb 2014 16:56:17 +0000 (16:56 +0000)]
xen/arm: Correctly handle non-page aligned pointer in raw_copy_from_guest
The current implementation of raw_copy_guest helper may lead to data corruption
and sometimes Xen crash when the guest virtual address is not aligned to
PAGE_SIZE.
When the total length is higher than a page, the length to read is badly
compute with
min(len, (unsigned)(PAGE_SIZE - offset))
As the offset is only computed one time per function, if the start address was
not aligned to PAGE_SIZE, we can end up in same iteration:
- to read accross page boundary => xen crash
- read the previous page => data corruption
This issue can be resolved by setting offset to 0 at the end of the first
iteration. Indeed, after it, the virtual guest address is always aligned
to PAGE_SIZE.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Cc: George Dunlap <george.dunlap@citrix.com>
[ ijc -- duplicated the comment in the other two functions with this behaviour ]
Mukesh Rathor [Thu, 13 Feb 2014 16:56:39 +0000 (17:56 +0100)]
pvh: Fix regression due to assumption that HVM paths MUST use io-backend device
The commit 09bb434748af9bfe3f7fca4b6eef721a7d5042a4
"Nested VMX: prohibit virtual vmentry/vmexit during IO emulation"
assumes that the HVM paths are only taken by HVM guests. With the PVH
enabled that is no longer the case - which means that we do not have
to have the IO-backend device (QEMU) enabled.
****************************************
Panic on CPU 7:
FATAL PAGE FAULT
[error_code=0000]
Faulting linear address: 000000000000001e
****************************************
as we do not have an io based backend. In the case that the
PVH guest does run an HVM guest inside it - we need to do
further work to suport this - and for now the check will
bail us out.
We also fix spelling mistakes and the sentence structure.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: "Zhang, Yang Z" <yang.z.zhang@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Yang Zhang [Thu, 13 Feb 2014 15:50:22 +0000 (15:50 +0000)]
When enabling log dirty mode, it sets all guest's memory to readonly.
And in HAP enabled domain, it modifies all EPT entries to clear write bit
to make sure it is readonly. This will cause problem if VT-d shares page
table with EPT: the device may issue a DMA write request, then VT-d engine
tells it the target memory is readonly and result in VT-d fault.
Currnetly, there are two places will enable log dirty mode: migration and vram
tracking. Migration with device assigned is not allowed, so it is ok. For vram,
it doesn't need to set all memory to readonly. Only track the vram range is enough.
Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> Acked-by: Tim Deegan <tim@xen.org>
Tim Deegan [Thu, 13 Feb 2014 15:13:07 +0000 (15:13 +0000)]
xen: Don't use __builtin_stdarg_start().
Cset fca49a00 ("netbsd: build fix with gcc 4.5") changed the
definition of va_start() to use __builtin_va_start() rather than
__builtin_stdarg_start() for GCCs >= 4.5, but in fact GCC dropped
__builtin_stdarg_start() before v3.3.
Signed-off-by: Tim Deegan <tim@xen.org> Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Tim Deegan [Thu, 13 Feb 2014 12:13:58 +0000 (12:13 +0000)]
xen: stop trying to use the system <stdarg.h> and <stdbool.h>
We already have our own versions of the stdarg/stdbool definitions, for
systems where those headers are installed in /usr/include.
On linux, they're typically installed in compiler-specific paths, but
finding them has proved unreliable. Drop that and use our own versions
everywhere.
Daniel De Graaf [Tue, 11 Feb 2014 15:25:17 +0000 (10:25 -0500)]
docs/vtpm: fix auto-shutdown reference
The automatic shutdown feature of the vTPM was removed because it
interfered with pv-grub measurement support and was also not triggered
if the guest did not use the vTPM. Virtual TPM domains will need to be
shut down or destroyed on guest shutdown via a script or other user
action.
This also fixes an incorrect reference to the vTPM being PV-only.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 12 Feb 2014 14:27:37 +0000 (14:27 +0000)]
xl: suppress suspend/resume functions on platforms which do not support it.
ARM does not (currently) support migration, so stop offering tasty looking
treats like "xl migrate".
Apart from the UI improvement my intention is to use this in osstest to detect
whether to attempt the save/restore/migrate tests.
Other than the additions of the #define/#ifdef there is a tiny bit of code
motion ("dump-core" in the command list and core_dump_domain in the
implementations) which serves to put ifdeffable bits next to each other.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 12 Feb 2014 16:52:26 +0000 (16:52 +0000)]
xen: Drop N from rcN in XEN_EXTRAVERSION
Having this here means we have to wait for a push gate pass, or fart
about which explicit pushes to master, to make an RC. The boot
messages for git builds already contain the git revision (as a
shorthash).
I will change the tarball creation checklist to seddery the -rc back
to -rcN, along with the other release-management-related changes (like
using an embedded copy of qemu).
If this patch meets with approval it should be thrown into the push
gate today, along with the patch for XSA-88, and then hopefully
nothing much else, so that we can get something suitable for making an
RC from by Friday.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Release-Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Jan Beulich [Wed, 12 Feb 2014 12:49:11 +0000 (13:49 +0100)]
blkif: drop struct blkif_request_segment_aligned
Commit 5148b7b5 ("blkif: add indirect descriptors interface to public
headers") added this without really explaining why it is needed: The
structure is identical to struct blkif_request_segment apart from the
padding field not being given a name in the pre-existing type. Their
size and alignment - which are what is relevant - are identical as long
as __alignof__(uint32_t) == 4 (which I think we rely upon in various
other places, so we can take as given).
Also correct a few minor glitches in the description, including for it
to no longer assume PAGE_SIZE == 4096.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Ian Campbell [Tue, 11 Feb 2014 14:11:04 +0000 (14:11 +0000)]
xen: arm: correct terminology for cache flush macros
The term "flush" is slightly ambiguous. The correct ARM term for for this
operaton is clean, as opposed to clean+invalidate for which we also now have a
function.
This is a pure rename, no functional change.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
This approach has a short coming in that it breaks when a guest enables its
MMU (SCTLR.M, disabling HCR.DC) without enabling caches (SCTLR.C) first/at the
same time. It turns out that FreeBSD does this.
This has now been fixed (yet) another way (third time is the charm!) so remove
this support. The original commit contained some fixes which are still
relevant even with the revert of the bulk of the patch:
- Correction to HSR_SYSREG_CRN_MASK
- Rename of HSR_SYSCTL macros to avoid naming clash
- Definition of some additional cp reg specifications
Since these are still useful they are not reverted.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Tue, 11 Feb 2014 14:11:02 +0000 (14:11 +0000)]
xen/arm: clean and invalidate all guest caches by VMID after domain build.
Guests are initially started with caches disabled and so we need to make sure
they see consistent data in RAM (requiring a cache clean) but also that they
do not have old stale data suddenly appear in the caches when they enable
their caches (requiring the invalidate).
This can be split into two halves. First we must flush each page as it is
allocated to the guest. It is not sufficient to do the flush at scrub time
since this will miss pages which are ballooned out by the guest (where the
guest must scrub if it cares about not leaking the pagecontent). We need to
clean as well as invalidate to make sure that any scrubbing which has occured
gets committed to real RAM. To achieve this add a new cacheflush_page function,
which is a stub on x86.
Secondly we need to flush anything which the domain builder touches, which we
do via a new domctl.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Cc: keir@xen.org
Jan Beulich [Tue, 11 Feb 2014 10:14:10 +0000 (11:14 +0100)]
flask: check permissions first thing in flask_security_set_bool()
Nothing else should be done if the caller isn't permitted to set
boolean values.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Jan Beulich [Tue, 11 Feb 2014 10:13:22 +0000 (11:13 +0100)]
flask: fix error propagation from flask_security_set_bool()
The function should return an error when flask_security_make_bools()
fails as well as when the input ID is out of range.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Jan Beulich [Tue, 11 Feb 2014 10:11:48 +0000 (11:11 +0100)]
flask: fix memory leaks
Plus, in the case of security_preserve_bools(), prevent double freeing
in the case of security_get_bools() failing.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Jan Beulich [Mon, 10 Feb 2014 09:05:24 +0000 (10:05 +0100)]
AMD IOMMU: fail if there is no southbridge IO-APIC
... but interrupt remapping is requested (with per-device remapping
tables). Without it, the timer interrupt is usually not working.
Inspired by Linux'es "iommu/amd: Work around wrong IOAPIC device-id in
IVRS table" (commit c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059) by Joerg
Roedel <joerg.roedel@amd.com>.
Reported-by: Eric Houby <ehouby@yahoo.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Eric Houby <ehouby@yahoo.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
x86/AMD: apply workaround for AMD F16h erratum 792
Workaround for the Erratum will be in BIOSes spun only after
Jan 2014 onwards. But initial production parts shipped in 2013
itself. Since there is a coverage hole, we should carry this fix
in software in case BIOS does not do the right thing or someone
is using old BIOS.
Description:
Processor does not ensure DRAM scrub read/write sequence is atomic wrt
accesses to CC6 save state area. Therefore if a concurrent scrub
read/write access is to same address the entry may appear as if it is
not written. This quirk applies to Fam16h models 00h-0Fh
See "Revision Guide" for AMD F16h models 00h-0fh, document 51810 rev.
3.04, Nov 2013.
Equivalent Linux patch link:
http://marc.info/?l=linux-kernel&m=139066012217149&w=2
Tested the patch on Fam16h server platform and it works fine.
Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Corrected checking for boot CPU. Made warning message conditional.
Compacted warning message text. Moved comment to commit message.
Ian Jackson [Thu, 6 Feb 2014 19:17:26 +0000 (19:17 +0000)]
libxl: test programs: Fix make race re libxenlight.so
The test programs were getting the proper libxenlight.so on their link
line. Filter it out. Also change the soname of the test library to
match the real one, so that libxutil is satisfied with it.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Thu, 6 Feb 2014 18:41:24 +0000 (18:41 +0000)]
libxl: test programs: Fix Makefile race re headers
We need to include the new TEST_PROG_OBJS and LIBXL_TEST_OBJS in the
appropriate dependencies. Otherwise we risk trying to build the test
program before gentypes is run.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Cc: Ian Campbell <Ian.Campbell@citrix.com>
libvchan: Fix handling of invalid ring buffer indices
The remote (hostile) process can set ring buffer indices to any value
at any time. If that happens, it is possible to get "buffer space"
(either for writing data, or ready for reading) negative or greater
than buffer size. This will end up with buffer overflow in the second
memcpy inside of do_send/do_recv.
Fix this by introducing new available bytes accessor functions
raw_get_data_ready and raw_get_buffer_space which are robust against
mad ring states, and only return sanitised values.
Proof sketch of correctness:
Now {rd,wr}_{cons,prod} are only ever used in the raw available bytes
functions, and in do_send and do_recv.
The raw available bytes functions do unsigned arithmetic on the
returned values. If the result is "negative" or too big it will be
>ring_size (since we used unsigned arithmetic). Otherwise the result
is a positive in-range value representing a reasonable ring state, in
which case we can safely convert it to int (as the rest of the code
expects).
do_send and do_recv immediately mask the ring index value with the
ring size. The result is always going to be plausible. If the ring
state has become mad, the worst case is that our behaviour is
inconsistent with the peer's ring pointer. I.e. we read or write to
arguably-incorrect parts of the ring - but always parts of the ring.
And of course if a peer misoperates the ring they can achieve this
effect anyway.
So the security problem is fixed.
This is XSA-86.
(The patch is essentially Ian Jackson's work, although parts of the
commit message are by Marek.)
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Thu, 6 Feb 2014 15:33:50 +0000 (16:33 +0100)]
flask: fix reading strings from guest memory
Since the string size is being specified by the guest, we must range
check it properly before doing allocations based on it. While for the
two cases that are exposed only to trusted guests (via policy
restriction) this just uses an arbitrary upper limit (PAGE_SIZE), for
the FLASK_[GS]ETBOOL case (which any guest can use) the upper limit
gets enforced based on the longest name across all boolean settings.
This is XSA-84.
Reported-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Ian Jackson [Fri, 31 Jan 2014 15:07:55 +0000 (15:07 +0000)]
libxl: timeouts: Record deregistration when one occurs
When a timeout has occurred, it is deregistered. However, we failed
to record this fact by updating etime->func. As a result,
libxl__ev_time_isregistered would say `true' for a timeout which has
already happened.
It is necessary to clear etime->func before the callback, because the
callback might want to reinstate the timeout, or might free the etime
(or its containing struct) entirely.
The results are that we might try to have the timeout occur again
(causing problems for the call site), and/or corrupt the timeout list.
This fixes the timedereg event system unit test.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Fri, 31 Jan 2014 15:04:37 +0000 (15:04 +0000)]
libxl: timeouts: Break out time_occurs
Bring together the two places where etime->func() is called into a new
function time_occurs. For one call site this is pure code motion.
For the other the only semantic change is the introduction of a new
debugging message.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Mon, 3 Feb 2014 14:25:13 +0000 (14:25 +0000)]
libxl: events: timedereg internal unit test
Test timeout deregistration idempotency. In the current tree this
test fails because ev->func is not cleared, meaning that a timeout
can be removed from the list more than once, corrupting the list.
It is necessary to use multiple timeouts to demonstrate this bug,
because removing the very same entry twice from a list in quick
succession, without modifying the list in other ways in between,
doesn't actually corrupt the list. (Since removing an entry from a
doubly-linked list just copies next and back from the disappearing
entry into its neighbours.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Mon, 3 Feb 2014 14:17:46 +0000 (14:17 +0000)]
libxl: events: Makefile builds internal unit tests
We provide a new LIBXL_TESTS facility in the Makefile.
Also provide some helpful common routines for unit tests to use.
We don't want to put the weird test case entrypoints and the weird
test case code in the main libxl.so library. Symbol hiding prevents
us from simply directly linking the libxl_test_FOO.o in later. So
instead we provide a special library libxenlight_test.so which is used
only locally.
There are not yet any test cases defined; that will come in the next
patch.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Tue, 21 Jan 2014 15:05:37 +0000 (15:05 +0000)]
libxl: fork: Make SIGCHLD self-pipe nonblocking
Use the new libxl__pipe_nonblock and _close functions, rather than
open coding the same logic. Now the pipe is nonblocking, which avoids
a race which could result in libxl deadlocking in a multithreaded
program.
Reported-by: Jim Fehlig <jfehlig@suse.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Tue, 21 Jan 2014 14:58:10 +0000 (14:58 +0000)]
libxl: events: Break out libxl__pipe_nonblock, _close
Break out the pipe creation and destruction from the poller code
into two new functions libxl__pipe_nonblock and libxl__pipe_close.
Also change direct use of pipe() to libxl_pipe.
No overall functional difference other than minor differences in exact
log messages.
Also move libxl__self_pipe_wakeup and libxl__self_pipe_eatall into the
new pipe utilities section in libxl_event.c; this is pure code motion.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
--
v3: Mention that we switched pipe() -> libxl_pipe()
Ian Jackson [Fri, 17 Jan 2014 11:58:55 +0000 (11:58 +0000)]
libxl: fork: Share SIGCHLD handler amongst ctxs
Previously, an application which had multiple libxl ctxs in multiple
threads, would have to itself plumb SIGCHLD through to each ctx.
Instead, permit multiple libxl ctxs to all share the SIGCHLD handler.
We keep a list of all the ctxs which are interested in SIGCHLD and
notify all of their self-pipes.
In more detail:
* sigchld_owner, the ctx* of the SIGCHLD owner, is replaced by
sigchld_users, a list of SIGCHLD users.
* Each ctx keeps track of whether it is on the users list, so that
libxl__sigchld_needed and libxl__sigchld_notneeded now instead of
idempotently installing and removing the handler, idempotently add
or remove the ctx from the list.
We ensure that we always have the SIGCHLD handler installed
iff the sigchld_users list is nonempty. To make this a bit
easier we make sigchld_installhandler_core and
sigchld_removehandler_core idempotent.
Specifically, the call sites for sigchld_installhandler_core and
sigchld_removehandler_core are updated to manipulate sigchld_users
and only call the install or remove functions as applicable.
* In the signal handler we walk the list of SIGCHLD users and write
to each of their self-pipes. That means that we need to arrange to
defer SIGCHLD when we are manipulating the list (to avoid the
signal handler interrupting our list manipulation); this is quite
tiresome to arrange.
The code as written will, on the first installation of the SIGCHLD
handler, firstly install the real handler, then immediately replace
it with the deferral handler. Doing it this way makes the code
clearer as it makes the SIGCHLD deferral machinery much more
self-contained (and hence easier to reason about).
* The first part of libxl__sigchld_notneeded is broken out into a new
function sigchld_user_remove (which is also needed during for
postfork). And of course that first part of the function is now
rather different, as explained above.
* sigchld_installhandler_core no longer takes the gc argument,
because it now deals with SIGCHLD for all ctxs.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Include bugfixes from "Fixup SIGCHLD sharing" patch:
* Use a mutex for defer_sigchld, to guard against concurrency
between the thread calling defer_sigchld and an instance of the
primary signal handler on another thread.
* libxl_sigchld_owner_libxl_always is incompatible with SIGCHLD
sharing. Document this correctly.
Fix "have have" error in comment.
Move removal of newly unused variables to previous patch.
v2.1: Provide feature test macro LIBXL_HAVE_SIGCHLD_SHARING
Ian Jackson [Fri, 17 Jan 2014 15:42:31 +0000 (15:42 +0000)]
libxl: fork: Break out sigchld_sethandler_raw
We are going to want introduce another call site in the final
substantive patch.
Pure code motion; no functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
---
v3: Remove now-unused variables from sigchld_installhandler_core
Ian Jackson [Fri, 17 Jan 2014 12:01:24 +0000 (12:01 +0000)]
libxl: fork: Break out sigchld_installhandler_core
Pure code motion. This is going to make the final substantive patch
easier to read.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Fri, 17 Jan 2014 11:45:57 +0000 (11:45 +0000)]
libxl: fork: Rename sigchld handler functions
We are going to change these functions so that different libxl ctx's
can share a single SIGCHLD handler. Rename them now to a new name
which doesn't imply unconditional handler installation or removal.
Also note in the comments that they are idempotent.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Thu, 16 Jan 2014 17:03:34 +0000 (17:03 +0000)]
libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
This is the feature test macro for libxl_childproc_sigchld_occurred
and libxl_sigchld_owner_libxl_always_selective_reap.
It is split out into this separate patch because: a single feature
test is sensible because we do not intend anyone to release or ship
libxl versions with one of these but not the other; but, the two
features are in separate patches for clarity; and, this just makes
reading the actual code easier.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Ian Jackson [Thu, 16 Jan 2014 17:01:50 +0000 (17:01 +0000)]
libxl: fork: Provide ..._always_selective_reap
Applications exist which want to use libxl in an event-driven mode but
which do not integrate child termination into their event system, but
instead reap all their own children synchronously.
In such an application libxl must own SIGCHLD but avoid reaping any
children that don't belong to libxl.
Provide libxl_sigchld_owner_libxl_always_selective_reap which has this
behaviour.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
---
v2: Document the new mode in the big "Subprocess handling" comment.
Ian Jackson [Thu, 16 Jan 2014 16:57:27 +0000 (16:57 +0000)]
libxl: fork: Provide libxl_childproc_sigchld_occurred
Applications exist which don't keep track of all their child processes
in a manner suitable for coherent dispatch of their termination. In
such a situation, nothing in the whole process may call wait, or
waitpid(-1,,). Doing so reaps processes belonging to other parts of
the application and there is then no way to deliver the exit status to
the right place.
To facilitate this, provide a facility for such an application to ask
libxl to call waitpid on each of its children individually.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Cc: Jim Fehlig <jfehlig@suse.com> Cc: Ian Campbell <Ian.Campbell@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>