]> xenbits.xensource.com Git - people/liuw/libxenctrl-split/xen.git/log
people/liuw/libxenctrl-split/xen.git
11 years agobitmaps/bitops: Clarify tests for small constant size.
Tim Deegan [Thu, 28 Nov 2013 15:40:48 +0000 (15:40 +0000)]
bitmaps/bitops: Clarify tests for small constant size.

No semantic changes, just makes the control flow a bit clearer.

I was looking at this bcause the (-!__builtin_constant_p(x) | x__)
formula is too clever for Coverity, but in fact it always takes me a
minute or two to understand it too. :)

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/mem_sharing: drop unused variable.
Tim Deegan [Thu, 28 Nov 2013 15:02:39 +0000 (15:02 +0000)]
x86/mem_sharing: drop unused variable.

Coverity CID 1087198

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
11 years agox86/shadow: Drop shadow_mode_trap_reads()
Tim Deegan [Thu, 28 Nov 2013 14:59:07 +0000 (14:59 +0000)]
x86/shadow: Drop shadow_mode_trap_reads()

This was never actually implemented, and is confusing coverity.

Coverity CID 1090354

Signed-off-by: Tim Deegan <tim@xen.org>
11 years agocommon/vsprintf: Explicitly treat negative lengths as 'unlimited'
Tim Deegan [Thu, 28 Nov 2013 14:33:06 +0000 (14:33 +0000)]
common/vsprintf: Explicitly treat negative lengths as 'unlimited'

The old code relied on implictly casting negative numbers to size_t
making a very large limit, which was correct but non-obvious.

Coverity CID 1128575

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: identify reset_stack_and_jump() as noreturn
Andrew Cooper [Tue, 4 Mar 2014 10:19:20 +0000 (11:19 +0100)]
x86: identify reset_stack_and_jump() as noreturn

reset_stack_and_jump() is actually a macro, but can effectivly become noreturn
by giving it an unreachable() declaration.

Propagate the 'noreturn-ness' up through the direct and indirect callers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
11 years agomisc cleanup as a result of the previous patches
Andrew Cooper [Tue, 4 Mar 2014 10:18:28 +0000 (11:18 +0100)]
misc cleanup as a result of the previous patches

This includes:
 * A stale comment in sh_skip_sync()
 * A dead for ever loop in __bug()
 * A prototype for machine_power_off() which unimplemented in any architecture
 * Replacing a for(;;); loop with unreachable()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoidentify panic and reboot/halt functions as noreturn
Andrew Cooper [Tue, 4 Mar 2014 10:17:03 +0000 (11:17 +0100)]
identify panic and reboot/halt functions as noreturn

On an x86 build (GCC Debian 4.7.2-5), this substantially reduces the size of
.text and .init.text sections.

Experimentally, even in a non-debug build, GCC uses `call` rather than `jmp`
so there should be no impact on any stack trace generation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agocompiler: replace opencoded __attribute__((noreturn))
Andrew Cooper [Tue, 4 Mar 2014 10:15:47 +0000 (11:15 +0100)]
compiler: replace opencoded __attribute__((noreturn))

Make a formal define for noreturn in compiler.h, and fix up opencoded uses of
__attribute__((noreturn)).  This includes removing redundant uses with
function definitions which have a public declaration.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/crash: fix up declaration of do_nmi_crash()
Andrew Cooper [Tue, 4 Mar 2014 10:14:53 +0000 (11:14 +0100)]
x86/crash: fix up declaration of do_nmi_crash()

... so it can correctly be annotated as noreturn.  Move the declaration of
nmi_crash() to be effectively private in crash.c

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoinclude: parallelize compat/xlat.h generation
Jan Beulich [Tue, 4 Mar 2014 10:03:13 +0000 (11:03 +0100)]
include: parallelize compat/xlat.h generation

Splitting this up into pieces signficantly speeds up building on multi-
CPU systems when making use of make's -j option.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agocorrectly use gcc's -x option
Jan Beulich [Tue, 4 Mar 2014 10:01:57 +0000 (11:01 +0100)]
correctly use gcc's -x option

In Linux the improper use was found to cause problems with certain
distributed build environments. Even if not directly affecting us, be
on the safe side.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/ACPI: also print address space for PM1x fields
Jan Beulich [Tue, 4 Mar 2014 10:00:26 +0000 (11:00 +0100)]
x86/ACPI: also print address space for PM1x fields

At least one vendor is in the process of making systems available where
these live in MMIO, not in I/O port space.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/AMD: re-use function wide variables in init_amd()
Jan Beulich [Tue, 4 Mar 2014 09:59:44 +0000 (10:59 +0100)]
x86/AMD: re-use function wide variables in init_amd()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: don't propagate acpi_skip_timer_override do Dom0
Jan Beulich [Tue, 4 Mar 2014 09:58:19 +0000 (10:58 +0100)]
x86: don't propagate acpi_skip_timer_override do Dom0

It's unclear why c/s 4850:923dd9975981 added this - Dom0 isn't
controlling the timer interrupt, and hence has no need to know.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/time: avoid redundant this_cpu()
Andrew Cooper [Tue, 4 Mar 2014 09:55:56 +0000 (10:55 +0100)]
x86/time: avoid redundant this_cpu()

this_cpu() makes use of RELOC_HIDE() to prevent unsafe optimisations, forcing
a recalculation of the per-cpu data area.  Don't use it needlessly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/time: cleanup
Jan Beulich [Tue, 4 Mar 2014 09:54:21 +0000 (10:54 +0100)]
x86/time: cleanup

Eliminate effectively unused variables mistakenly left in place by
9539:08aede767c63 ("Rename update_dom_time() to
update_vcpu_system_time()").

Drop the pointless casts.

Use SECONDS() instead of open coding it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoIOMMU: generalize and correct softirq processing during Dom0 device setup
Jan Beulich [Tue, 4 Mar 2014 09:52:20 +0000 (10:52 +0100)]
IOMMU: generalize and correct softirq processing during Dom0 device setup

c/s 21039:95f5a4ce8f24 ("VT-d: reduce default verbosity") having put a
call to process_pending_softirqs() in VT-d's domain_context_mapping()
was wrong in two ways: For one we shouldn't be doing this when setting
up a device during DomU assignment. And then - I didn't check whether
that was the case already back then - we shouldn't call that function
with the pcidevs_lock (or in fact any spin lock) held.

Move the "preemption" into generic code, at once dealing with further
actual (too much output elsewhere - particularly on systems with very
many host bridge like devices - having been observed to still cause the
watchdog to trigger when enabled) and potential (other IOMMU code may
also end up being too verbose) issues.

Do the "preemption" once per device actually being set up when in
verbose mode, and once per bus otherwise.

Note that dropping pcidevs_lock around the process_pending_softirqs()
invocation is specifically not a problem here: We're in an __init
function and aren't racing with potential additions/removals of PCI
devices. Not acquiring the lock in setup_dom0_pci_devices() otoh is not
an option, as there are too many places that assert the lock being
held.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Xiantao Zhang <xiantao.zhang@intel.com>
11 years agomm: ensure useful progress in decrease_reservation
Wei Liu [Fri, 28 Feb 2014 16:35:15 +0000 (17:35 +0100)]
mm: ensure useful progress in decrease_reservation

During my fun time playing with balloon driver I found that hypervisor's
preemption check kept decrease_reservation from doing any useful work
for 32 bit guests, resulting in hanging the guests.

As Andrew suggested, we can force the check to fail for the first
iteration to ensure progress. We did this in d3a55d7d9 "x86/mm: Ensure
useful progress in alloc_l2_table()" already.

After this change I cannot see the hang caused by continuation logic
anymore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoxsm: streamline xsm_default_action()
Jan Beulich [Fri, 28 Feb 2014 16:13:47 +0000 (17:13 +0100)]
xsm: streamline xsm_default_action()

The privileges being strongly ordered is better reflected by using fall
through within the respective switch statement.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoxsm: use # printk format modifier
Jan Beulich [Fri, 28 Feb 2014 16:13:05 +0000 (17:13 +0100)]
xsm: use # printk format modifier

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoflask: use xzalloc()
Jan Beulich [Fri, 28 Feb 2014 16:12:13 +0000 (17:12 +0100)]
flask: use xzalloc()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoflask: add compat mode guest support
Jan Beulich [Fri, 28 Feb 2014 16:08:36 +0000 (17:08 +0100)]
flask: add compat mode guest support

... which has been missing since the introduction of the new interface
in the 4.2 development cycle.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Keir Fraser <keir@xen.org>
11 years agovsprintf: introduce %pv extended format specifier to print domain/vcpu ID pair
Jan Beulich [Fri, 28 Feb 2014 16:04:04 +0000 (17:04 +0100)]
vsprintf: introduce %pv extended format specifier to print domain/vcpu ID pair

... in a simplified and consistent way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/p2m: drop second pass looking for shared pages.
Tim Deegan [Wed, 18 Dec 2013 14:12:31 +0000 (14:12 +0000)]
x86/p2m: drop second pass looking for shared pages.

We have run relinquish_shared_pages() already by the time this
teardown happens, and page_make_sharable() exits early if the owning
domain is dying.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Andres Lagar-Cavilla <andres@lagarcavilla.org>
11 years agox86/mm: Don't allow p2m allocation after memory is allocated.
Tim Deegan [Thu, 21 Nov 2013 13:02:34 +0000 (13:02 +0000)]
x86/mm: Don't allow p2m allocation after memory is allocated.

This avoids a potentially long loop populating the p2m table from the
m2p.  Since there's no reason to turn on translate mode after the
domain is already running, this shouldn't be a problem.

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agomem_event: Return previous value of CR0/CR3/CR4 on change.
Tamas K Lengyel [Thu, 30 Jan 2014 21:34:16 +0000 (22:34 +0100)]
mem_event: Return previous value of CR0/CR3/CR4 on change.

This patch extends the information returned for CR0/CR3/CR4 register
write events with the previous value of the register. The old value
was already passed to the trap processing function, just never placed
into the returned request. By returning this value, applications
subscribing the CR events obtain additional context about the event.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agons16550: Add support for UART present in Broadcom TruManage capable NetXtreme chips
Aravind Gopalakrishnan [Wed, 26 Feb 2014 16:25:04 +0000 (17:25 +0100)]
ns16550: Add support for UART present in Broadcom TruManage capable NetXtreme chips

Since it is an MMIO device, the code has been modified to accept MMIO based
devices as well. MMIO device settings are populated in the 'uart_config' table.
It also advertises 64 bit BAR. Therefore, code is reworked to account for 64
bit BAR and 64 bit MMIO lengths.

Some more quirks are - the need to shift the register offset by a specific
value and we also need to verify (UART_LSR_THRE && UART_LSR_TEMT) bits before
transmitting data.

While testing, include com1=115200,8n1,pci,0 on the xen cmdline to observe
output on console using SoL.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Thomas Lendacky <Thomas.Lendacky@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/faulting: Use formal defines instead of opencoded bits
Andrew Cooper [Wed, 26 Feb 2014 16:23:47 +0000 (17:23 +0100)]
x86/faulting: Use formal defines instead of opencoded bits

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/cpu: Store extended cpuid level in cpuinfo_x86
Andrew Cooper [Wed, 26 Feb 2014 16:22:30 +0000 (17:22 +0100)]
x86/cpu: Store extended cpuid level in cpuinfo_x86

To save finding it repeatedly with cpuid instructions.  The name
"extended_cpuid_level" is chosen to match Linux.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/time: Remove redundant RTC REG_B read
Andrew Cooper [Wed, 26 Feb 2014 16:21:22 +0000 (17:21 +0100)]
x86/time: Remove redundant RTC REG_B read

RTC_ALWAYS_BCD is always defined by default, meaning that we will
unconditionally enter the if statement.  Reordering the condition allows
short-circult evaluation to remove a redundant CMOS read.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86: MSR_IA32_BNDCFGS save/restore
Jan Beulich [Tue, 25 Feb 2014 08:41:40 +0000 (09:41 +0100)]
x86: MSR_IA32_BNDCFGS save/restore

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: generic MSRs save/restore
Jan Beulich [Tue, 25 Feb 2014 08:40:31 +0000 (09:40 +0100)]
x86: generic MSRs save/restore

This patch introduces a generic MSRs save/restore mechanism, so that
in the future new MSRs' save/restore could be added w/ smaller change
than the full blown addition of a new save/restore type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86: MPX IA32_BNDCFGS msr handle
Xudong Hao [Tue, 25 Feb 2014 08:38:21 +0000 (09:38 +0100)]
x86: MPX IA32_BNDCFGS msr handle

When MPX supported, a new guest-state field for IA32_BNDCFGS
is added to the VMCS. In addition, two new controls are added:
 - a VM-exit control called "clear BNDCFGS"
 - a VM-entry control called "load BNDCFGS."
VM exits always save IA32_BNDCFGS into BNDCFGS field of VMCS.

Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
Unlikely, but in case VMX support is not available, not expose
MPX to hvm guest.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agox86/xsave: enable support for new ISA extensions
Jan Beulich [Tue, 25 Feb 2014 08:34:04 +0000 (09:34 +0100)]
x86/xsave: enable support for new ISA extensions

Intel has released a new version of Intel Architecture Instruction Set
Extensions Programming Reference, adding new features like AVX-512,
MPX, etc. Refer to
http://download-software.intel.com/sites/default/files/319433-015.pdf

This patch adds support for these new instruction set extensions
without enabling this support for guest use, yet.

It also adjusts XCR0 validation, at once fixing the definition of
XSTATE_ALL (which is not supposed to include bit 63).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Keir Fraser <keir@xen.org>
11 years agoxsm: Fix xsm_map_gfmn_foreign prototype when XSM is enabled
Julien Grall [Tue, 25 Feb 2014 08:31:29 +0000 (09:31 +0100)]
xsm: Fix xsm_map_gfmn_foreign prototype when XSM is enabled

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agox86/mce: Reduce boot-time logspam
Andrew Cooper [Tue, 25 Feb 2014 08:30:59 +0000 (09:30 +0100)]
x86/mce: Reduce boot-time logspam

When booting with "no-mce", the user does not need to be told that "MCE
support [was] disabled by bootparam" for each cpu.  Furthermore, a file:line
reference is unnecessary.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 years agox86/hvm/rtc: always deassert the IRQ line when clearing REG_C.IRQF
Tim Deegan [Tue, 25 Feb 2014 08:30:21 +0000 (09:30 +0100)]
x86/hvm/rtc: always deassert the IRQ line when clearing REG_C.IRQF

Even in no-ack mode, there's no reason to leave the line asserted
after an explicit ack of the interrupt.

Furthermore, rtc_update_irq() is an unconditional noop having just cleared
REG_C.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/hvm/rtc: inject RTC periodic interupts from the vpt code
Tim Deegan [Tue, 25 Feb 2014 08:29:26 +0000 (09:29 +0100)]
x86/hvm/rtc: inject RTC periodic interupts from the vpt code

Let the vpt code drive the RTC's timer interrupts directly, as it does
for other periodic time sources, and fix up the register state in a
vpt callback when the interrupt is injected.

This fixes a hang seen on Windows 2003 in no-missed-ticks mode, where
when a tick was pending, the early callback from the VPT code would
always set REG_C.PF on every VMENTER; meanwhile the guest was in its
interrupt handler reading REG_C in a loop and waiting to see it clear.

One drawback is that a guest that attempts to suppress RTC periodic
interrupts by failing to read REG_C will receive up to 10 spurious
interrupts, even in 'strict' mode.  However:
 - since all previous RTC models have had this property (including
   the current one, since 'no-ack' mode is hard-coded on) we're
   pretty sure that all guests can handle this; and
 - we're already playing some other interesting games with this
   interrupt in the vpt code.

One other corner case: a guest that enables the PF timer interrupt,
masks the interupt in the APIC and then polls REG_C looking for PF
will not see PF getting set.  The more likely case of enabling the
timers and masking the interrupt with REG_B.PIE is already handled
correctly.

Signed-off-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/hvm/rtc: don't run the vpt timer when !REG_B.PIE
Tim Deegan [Tue, 25 Feb 2014 08:26:45 +0000 (09:26 +0100)]
x86/hvm/rtc: don't run the vpt timer when !REG_B.PIE

If the guest has not asked for interrupts, don't run the vpt timer
to generate them.  This is a prerequisite for a patch to simplify how
the vpt interacts with the RTC, and also gets rid of a timer series in
Xen in a case where it's unlikely to be needed.

Instead, calculate the correct value for REG_C.PF whenever REG_C is
read or PIE is enabled.  This allow a guest to poll for the PF bit
while not asking for actual timer interrupts.  Such a guest would no
longer get the benefit of the vpt's timer modes.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agolibxl: Fix libxl_postfork_child_noexec deadlock etc.
Ian Jackson [Mon, 24 Feb 2014 12:57:53 +0000 (12:57 +0000)]
libxl: Fix libxl_postfork_child_noexec deadlock etc.

libxl_postfork_child_noexec would nestedly reaquire the non-recursive
"no_forking" mutex: atfork_lock uses it, as does sigchld_user_remove.
The result on Linux is that the process always deadlocks before
returning from this function.

This is used by xl's console child.  So, the ultimate effect is that
xl with pygrub does not manage to connect to the pygrub console.
This behaviour was reported by Michael Young in Xen 4.4.0 RC5.

Also, the use of sigchld_user_remove in libxl_postfork_child_noexec is
not correct with SIGCHLD sharing.  libxl_postfork_child_noexec is
documented to suffice if called only on one ctx.  So deregistering the
ctx it's called on is not sufficient.  Instead, we need a new approach
which discards the whole sigchld_user list and unconditionally removes
our SIGCHLD handler if we had one.

Prompted by this, clarify the semantics of
libxl_postfork_child_noexec.  Specifically, expand on the meaning of
"quickly" by explaining what operations are not permitted; and
document the fact that the function doesn't reclaim the resources in
the ctxs.

And add a comment in libxl_postfork_child_noexec explaining the
internal concurrency situation.

This is an important bugfix.  IMO the bug is a blocker for Xen 4.4.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Reported-by: M A Young <m.a.young@durham.ac.uk>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
(cherry picked from commit 5be1e95318147855713709094e6847e3104ae910)

11 years agoiommu: don't need to map dom0 page when the PT is shared
Julien Grall [Mon, 24 Feb 2014 11:33:00 +0000 (12:33 +0100)]
iommu: don't need to map dom0 page when the PT is shared

Currently iommu_init_dom0 is browsing the page list and call map_page callback
on each page.

On both AMD and VTD drivers, the function will directly return if the page
table is shared with the processor. So Xen can safely avoid to run through
the page list.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agovtd: don't export iommu_set_pgd
Julien Grall [Mon, 24 Feb 2014 11:32:00 +0000 (12:32 +0100)]
vtd: don't export iommu_set_pgd

iommu_set_pgd is only used internally in
xen/drivers/passthrough/vtd/iommu.c

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Xiantoa Zhang <xiantao.zhang@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Mon, 24 Feb 2014 11:31:28 +0000 (12:31 +0100)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

11 years agovtd: don't export iommu_domain_teardown
Julien Grall [Mon, 24 Feb 2014 11:21:54 +0000 (12:21 +0100)]
vtd: don't export iommu_domain_teardown

iommu_domain_teardown is only used internally in
xen/drivers/passthrough/vtd/iommu.c

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Cambell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 years agolibxl: comments cleanup on libxl_dm.c
Fabio Fantoni [Sat, 22 Feb 2014 10:35:54 +0000 (11:35 +0100)]
libxl: comments cleanup on libxl_dm.c

Removed some unuseful comments lines.

Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agox86: expose RDSEED, ADX, and PREFETCHW to dom0
Xudong Hao [Mon, 24 Feb 2014 11:11:53 +0000 (12:11 +0100)]
x86: expose RDSEED, ADX, and PREFETCHW to dom0

This patch explicitly exposes Intel new features to dom0, including
RDSEED and ADX. As for PREFETCHW, it doesn't need explicit exposing.

Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agox86/MSI: don't risk division by zero
Jan Beulich [Mon, 24 Feb 2014 11:11:01 +0000 (12:11 +0100)]
x86/MSI: don't risk division by zero

The check in question is redundant with the one in the immediately
following if(), where dividing by zero gets carefully avoided.

Spotted-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
11 years agoNested VMX: update nested paging mode on vmexit
Yang Zhang [Mon, 24 Feb 2014 11:09:52 +0000 (12:09 +0100)]
Nested VMX: update nested paging mode on vmexit

Since SVM and VMX use different mechanism to emulate the virtual-vmentry
and virtual-vmexit, it's hard to update the nested paging mode correctly in
common code. So we need to update the nested paging mode in their respective
code path.
SVM already updates the nested paging mode on vmexit. This patch adds the same
logic in VMX side.

Previous discussion is here:
http://lists.xen.org/archives/html/xen-devel/2013-12/msg01759.html

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Reviewed-by: Christoph Egger <chegger@amazon.de>
11 years agovmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs
Aravind Gopalakrishnan [Mon, 24 Feb 2014 11:09:14 +0000 (12:09 +0100)]
vmce: Allow vmce_amd_* functions to handle AMD thresolding MSRs

vmce_amd_[rd|wr]msr functions can handle accesses to AMD thresholding
registers. But due to this statement here:
switch ( msr & (MSR_IA32_MC0_CTL | 3) )
we are wrongly masking off top two bits which meant the register
accesses never made it to vmce_amd_* functions.

Corrected this problem by modifying the mask in this patch to allow
AMD thresholding registers to fall to 'default' case which in turn
allows vmce_amd_* functions to handle access to the registers.

While at it, remove some clutter in the vmce_amd* functions. Retained
current policy of returning zero for reads and ignoring writes.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
11 years agox86/MCE: Fix race condition in mctelem_reserve
Frediano Ziglio [Mon, 24 Feb 2014 11:07:41 +0000 (12:07 +0100)]
x86/MCE: Fix race condition in mctelem_reserve

These lines (in mctelem_reserve)

        newhead = oldhead->mcte_next;
        if (cmpxchgptr(freelp, oldhead, newhead) == oldhead) {

are racy. After you read the newhead pointer it can happen that another
flow (thread or recursive invocation) change all the list but set head
with same value. So oldhead is the same as *freelp but you are setting
a new head that could point to whatever element (even already used).

This patch use instead a bit array and atomic bit operations.

Signed-off-by: Frediano Ziglio <frediano.ziglio@citrix.com>
Reviewed-by: Liu Jinsong <jinsong.liu@intel.com>
11 years agoQEMU_TAG, QEMU_UPSTREAM_REVISION: Branching
Ian Jackson [Fri, 21 Feb 2014 16:59:45 +0000 (16:59 +0000)]
QEMU_TAG, QEMU_UPSTREAM_REVISION: Branching

QEMU_UPSTREAM_REVISION set back to master, to track the tip.

QEMU_TAG set to the specific changeset as is customary.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
11 years agoREADME, xen/Makefile: Branching for 4.5
Ian Jackson [Fri, 21 Feb 2014 16:59:14 +0000 (16:59 +0000)]
README, xen/Makefile: Branching for 4.5

Change version numbers to 4.5-unstable.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
11 years agolibxl: Properly declare libxlu_disk_l.h in AUTOINCS
Ian Jackson [Tue, 18 Feb 2014 16:43:42 +0000 (16:43 +0000)]
libxl: Properly declare libxlu_disk_l.h in AUTOINCS

This is necessary so that make doesn't do things which depend on this
file until flex has finished producing it.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
CC: Ian Campbell <Ian.Campbell@citrix.com>
CC: Olaf Hering <olaf@aepfle.de>
Tested-by: Olaf Hering <olaf@aepfle.de>
CC: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agoxen/arm: Save/restore GICH_VMCR on domain context switch
Julien Grall [Tue, 18 Feb 2014 13:58:21 +0000 (13:58 +0000)]
xen/arm: Save/restore GICH_VMCR on domain context switch

GICH_VMCR register contains alias to important bits of GICV interface such as:
    - priority mask of the CPU
    - EOImode
    - ...

We were safe because Linux guest always use the same value for this bits.
When new guests will handle priority or change EOI mode, VCPU interrupt
management will be in a wrong state.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
11 years agoxen/arm: Correctly handle non-page aligned pointer in raw_copy_from_guest
Julien Grall [Tue, 18 Feb 2014 16:56:17 +0000 (16:56 +0000)]
xen/arm: Correctly handle non-page aligned pointer in raw_copy_from_guest

The current implementation of raw_copy_guest helper may lead to data corruption
and sometimes Xen crash when the guest virtual address is not aligned to
PAGE_SIZE.

When the total length is higher than a page, the length to read is badly
compute with
    min(len, (unsigned)(PAGE_SIZE - offset))

As the offset is only computed one time per function, if the start address was
not aligned to PAGE_SIZE, we can end up in same iteration:
    - to read accross page boundary => xen crash
    - read the previous page => data corruption

This issue can be resolved by setting offset to 0 at the end of the first
iteration. Indeed, after it, the virtual guest address is always aligned
to PAGE_SIZE.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
[ ijc -- duplicated the comment in the other two functions with this behaviour ]

11 years agoUpdate QEMU_UPSTREAM_REVISION for 4.4.0-rc4
Ian Jackson [Mon, 17 Feb 2014 16:33:48 +0000 (16:33 +0000)]
Update QEMU_UPSTREAM_REVISION for 4.4.0-rc4

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
11 years agopvh: Fix regression due to assumption that HVM paths MUST use io-backend device
Mukesh Rathor [Thu, 13 Feb 2014 16:56:39 +0000 (17:56 +0100)]
pvh: Fix regression due to assumption that HVM paths MUST use io-backend device

The commit 09bb434748af9bfe3f7fca4b6eef721a7d5042a4
"Nested VMX: prohibit virtual vmentry/vmexit during IO emulation"
assumes that the HVM paths are only taken by HVM guests. With the PVH
enabled that is no longer the case - which means that we do not have
to have the IO-backend device (QEMU) enabled.

As such, that patch can crash the hypervisor:

Xen call trace:
    [<ffff82d0801ddd9a>] nvmx_switch_guest+0x4d/0x903
    [<ffff82d0801de95b>] vmx_asm_vmexit_handler+0x4b/0xc0

Pagetable walk from 000000000000001e:
  L4[0x000] = 0000000000000000 ffffffffffffffff

****************************************
Panic on CPU 7:
FATAL PAGE FAULT
[error_code=0000]
Faulting linear address: 000000000000001e
****************************************

as we do not have an io based backend. In the case that the
PVH guest does run an HVM guest inside it - we need to do
further work to suport this - and for now the check will
bail us out.

We also fix spelling mistakes and the sentence structure.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Release-acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: "Zhang, Yang Z" <yang.z.zhang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agoWhen enabling log dirty mode, it sets all guest's memory to readonly.
Yang Zhang [Thu, 13 Feb 2014 15:50:22 +0000 (15:50 +0000)]
When enabling log dirty mode, it sets all guest's memory to readonly.
And in HAP enabled domain, it modifies all EPT entries to clear write bit
to make sure it is readonly. This will cause problem if VT-d shares page
table with EPT: the device may issue a DMA write request, then VT-d engine
tells it the target memory is readonly and result in VT-d fault.

Currnetly, there are two places will enable log dirty mode: migration and vram
tracking. Migration with device assigned is not allowed, so it is ok. For vram,
it doesn't need to set all memory to readonly. Only track the vram range is enough.

Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>
Acked-by: Tim Deegan <tim@xen.org>
11 years agoxen: Don't use __builtin_stdarg_start().
Tim Deegan [Thu, 13 Feb 2014 15:13:07 +0000 (15:13 +0000)]
xen: Don't use __builtin_stdarg_start().

Cset fca49a00 ("netbsd: build fix with gcc 4.5") changed the
definition of va_start() to use __builtin_va_start() rather than
__builtin_stdarg_start() for GCCs >= 4.5, but in fact GCC dropped
__builtin_stdarg_start() before v3.3.

Signed-off-by: Tim Deegan <tim@xen.org>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
11 years agodocs: mention whitespace handling diskspec target= parsing
Olaf Hering [Thu, 13 Feb 2014 14:43:24 +0000 (15:43 +0100)]
docs: mention whitespace handling diskspec target= parsing

disk=[ ' target=/dev/loop0 ' ] will fail to parse because
'/dev/loop ' does not exist.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: stop trying to use the system <stdarg.h> and <stdbool.h>
Tim Deegan [Thu, 13 Feb 2014 12:13:58 +0000 (12:13 +0000)]
xen: stop trying to use the system <stdarg.h> and <stdbool.h>

We already have our own versions of the stdarg/stdbool definitions, for
systems where those headers are installed in /usr/include.

On linux, they're typically installed in compiler-specific paths, but
finding them has proved unreliable.  Drop that and use our own versions
everywhere.

Signed-off-by: Tim Deegan <tim@xen.org>
Tested-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Keir Fraser <keir@xen.org>
11 years agotools/configure: correct --enable-blktap1 help text
Jan Beulich [Thu, 13 Feb 2014 12:57:43 +0000 (12:57 +0000)]
tools/configure: correct --enable-blktap1 help text

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agodocs/vtpm: fix auto-shutdown reference
Daniel De Graaf [Tue, 11 Feb 2014 15:25:17 +0000 (10:25 -0500)]
docs/vtpm: fix auto-shutdown reference

The automatic shutdown feature of the vTPM was removed because it
interfered with pv-grub measurement support and was also not triggered
if the guest did not use the vTPM. Virtual TPM domains will need to be
shut down or destroyed on guest shutdown via a script or other user
action.

This also fixes an incorrect reference to the vTPM being PV-only.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agox86/pci: Store VF's memory space displacement in a 64-bit value
Boris Ostrovsky [Thu, 13 Feb 2014 09:49:55 +0000 (10:49 +0100)]
x86/pci: Store VF's memory space displacement in a 64-bit value

VF's memory space offset can be greater than 4GB and therefore needs
to be stored in a 64-bit variable.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
11 years agoxl: suppress suspend/resume functions on platforms which do not support it.
Ian Campbell [Wed, 12 Feb 2014 14:27:37 +0000 (14:27 +0000)]
xl: suppress suspend/resume functions on platforms which do not support it.

ARM does not (currently) support migration, so stop offering tasty looking
treats like "xl migrate".

Apart from the UI improvement my intention is to use this in osstest to detect
whether to attempt the save/restore/migrate tests.

Other than the additions of the #define/#ifdef there is a tiny bit of code
motion ("dump-core" in the command list and core_dump_domain in the
implementations) which serves to put ifdeffable bits next to each other.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agolibxc: Fix out-of-memory error handling in xc_cpupool_getinfo()
Andrew Cooper [Wed, 22 Jan 2014 17:47:21 +0000 (17:47 +0000)]
libxc: Fix out-of-memory error handling in xc_cpupool_getinfo()

Avoid freeing info then returning it to the caller.

This is XSA-88.

Coverity-ID: 1056192
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 years agoxen: Drop N from rcN in XEN_EXTRAVERSION
Ian Jackson [Wed, 12 Feb 2014 16:52:26 +0000 (16:52 +0000)]
xen: Drop N from rcN in XEN_EXTRAVERSION

Having this here means we have to wait for a push gate pass, or fart
about which explicit pushes to master, to make an RC.  The boot
messages for git builds already contain the git revision (as a
shorthash).

I will change the tarball creation checklist to seddery the -rc back
to -rcN, along with the other release-management-related changes (like
using an embedded copy of qemu).

If this patch meets with approval it should be thrown into the push
gate today, along with the patch for XSA-88, and then hopefully
nothing much else, so that we can get something suitable for making an
RC from by Friday.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Release-Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
11 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 12 Feb 2014 12:59:14 +0000 (12:59 +0000)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

11 years agoblkif: drop struct blkif_request_segment_aligned
Jan Beulich [Wed, 12 Feb 2014 12:49:11 +0000 (13:49 +0100)]
blkif: drop struct blkif_request_segment_aligned

Commit 5148b7b5 ("blkif: add indirect descriptors interface to public
headers") added this without really explaining why it is needed: The
structure is identical to struct blkif_request_segment apart from the
padding field not being given a name in the pre-existing type. Their
size and alignment - which are what is relevant - are identical as long
as __alignof__(uint32_t) == 4 (which I think we rely upon in various
other places, so we can take as given).

Also correct a few minor glitches in the description, including for it
to no longer assume PAGE_SIZE == 4096.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
11 years agoxen: arm: correct terminology for cache flush macros
Ian Campbell [Tue, 11 Feb 2014 14:11:04 +0000 (14:11 +0000)]
xen: arm: correct terminology for cache flush macros

The term "flush" is slightly ambiguous. The correct ARM term for for this
operaton is clean, as opposed to clean+invalidate for which we also now have a
function.

This is a pure rename, no functional change.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoRevert "xen: arm: force guest memory accesses to cacheable when MMU is disabled"
Ian Campbell [Tue, 11 Feb 2014 14:11:03 +0000 (14:11 +0000)]
Revert "xen: arm: force guest memory accesses to cacheable when MMU is disabled"

This reverts commit 89eb02c2204a0b42a0aa169f107bc346a3fef802.

This approach has a short coming in that it breaks when a guest enables its
MMU (SCTLR.M, disabling HCR.DC) without enabling caches (SCTLR.C) first/at the
same time. It turns out that FreeBSD does this.

This has now been fixed (yet) another way (third time is the charm!) so remove
this support. The original commit contained some fixes which are still
relevant even with the revert of the bulk of the patch:
 - Correction to HSR_SYSREG_CRN_MASK
 - Rename of HSR_SYSCTL macros to avoid naming clash
 - Definition of some additional cp reg specifications

Since these are still useful they are not reverted.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen/arm: clean and invalidate all guest caches by VMID after domain build.
Ian Campbell [Tue, 11 Feb 2014 14:11:02 +0000 (14:11 +0000)]
xen/arm: clean and invalidate all guest caches by VMID after domain build.

Guests are initially started with caches disabled and so we need to make sure
they see consistent data in RAM (requiring a cache clean) but also that they
do not have old stale data suddenly appear in the caches when they enable
their caches (requiring the invalidate).

This can be split into two halves. First we must flush each page as it is
allocated to the guest. It is not sufficient to do the flush at scrub time
since this will miss pages which are ballooned out by the guest (where the
guest must scrub if it cares about not leaking the pagecontent). We need to
clean as well as invalidate to make sure that any scrubbing which has occured
gets committed to real RAM. To achieve this add a new cacheflush_page function,
which is a stub on x86.

Secondly we need to flush anything which the domain builder touches, which we
do via a new domctl.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Cc: keir@xen.org
11 years agoxen: arm: rename p2m next_gfn_to_relinquish to lowest_mapped_gfn
Ian Campbell [Tue, 11 Feb 2014 14:11:01 +0000 (14:11 +0000)]
xen: arm: rename p2m next_gfn_to_relinquish to lowest_mapped_gfn

This has other uses other than during relinquish, so rename it for clarity.

This is a pure rename.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen: arm: rename create_p2m_entries to apply_p2m_changes
Ian Campbell [Tue, 11 Feb 2014 14:11:00 +0000 (14:11 +0000)]
xen: arm: rename create_p2m_entries to apply_p2m_changes

This function hasn't been only about creating for quite a while.

This is purely a rename.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
11 years agoxen/arm: Correctly boot with an initrd and no linux command line
Julien Grall [Mon, 10 Feb 2014 17:34:46 +0000 (17:34 +0000)]
xen/arm: Correctly boot with an initrd and no linux command line

When DOM0 device tree is building, the properties for initrd will
only be added if there is a linux command line. This will result to a panic
later:

(XEN) *** LOADING DOMAIN 0 ***
(XEN) Populate P2M 0x20000000->0x40000000 (1:1 mapping for dom0)
(XEN) Loading kernel from boot module 2
(XEN) Loading zImage from 0000000001000000 to 0000000027c00000-0000000027eafb48
(XEN) Loading dom0 initrd from 0000000002000000 to 0x0000000028200000-0x0000000028c00000
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Cannot fix up "linux,initrd-start" property
(XEN) ****************************************
(XEN)

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxenlight_stubs.c: Allow it to build with ocaml 3.09.3
Don Slutz [Fri, 7 Feb 2014 21:51:51 +0000 (16:51 -0500)]
xenlight_stubs.c: Allow it to build with ocaml 3.09.3

This code was copied from:

http://docs.camlcity.org/docs/godisrc/oasis-ocaml-fd-1.1.1.tar.gz/ocaml-fd-1.1.1/lib/fd_stubs.c

Signed-off-by: Don Slutz <dslutz@verizon.com>
Acked-by: David Scott <dave.scott@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoxen: arm: arm64: Fix memory cloberring issues during VFP save restore.
Pranavkumar Sawargaonkar [Fri, 7 Feb 2014 12:57:16 +0000 (18:27 +0530)]
xen: arm: arm64: Fix memory cloberring issues during VFP save restore.

This patch addresses memory cloberring issue mentioed by Julien Grall
with my earlier patch -
Commit Id: 712eb2e04da2cbcd9908f74ebd47c6df60d6d12f

Discussion related to this fix -
http://www.gossamer-threads.com/lists/xen/devel/316247

Signed-off-by: Pranavkumar Sawargaonkar <pranavkumar@linaro.org>
Signed-off-by: Anup Patel <anup.patel@linaro.org>
Acked-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoflask: check permissions first thing in flask_security_set_bool()
Jan Beulich [Tue, 11 Feb 2014 10:14:10 +0000 (11:14 +0100)]
flask: check permissions first thing in flask_security_set_bool()

Nothing else should be done if the caller isn't permitted to set
boolean values.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoflask: fix error propagation from flask_security_set_bool()
Jan Beulich [Tue, 11 Feb 2014 10:13:22 +0000 (11:13 +0100)]
flask: fix error propagation from flask_security_set_bool()

The function should return an error when flask_security_make_bools()
fails as well as when the input ID is out of range.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoflask: fix memory leaks
Jan Beulich [Tue, 11 Feb 2014 10:11:48 +0000 (11:11 +0100)]
flask: fix memory leaks

Plus, in the case of security_preserve_bools(), prevent double freeing
in the case of security_get_bools() failing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agoAMD IOMMU: fail if there is no southbridge IO-APIC
Jan Beulich [Mon, 10 Feb 2014 09:05:24 +0000 (10:05 +0100)]
AMD IOMMU: fail if there is no southbridge IO-APIC

... but interrupt remapping is requested (with per-device remapping
tables). Without it, the timer interrupt is usually not working.

Inspired by Linux'es "iommu/amd: Work around wrong IOAPIC device-id in
IVRS table" (commit c2ff5cf5294bcbd7fa50f7d860e90a66db7e5059) by Joerg
Roedel <joerg.roedel@amd.com>.

Reported-by: Eric Houby <ehouby@yahoo.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Eric Houby <ehouby@yahoo.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
11 years agox86/AMD: apply workaround for AMD F16h erratum 792
Aravind Gopalakrishnan [Fri, 7 Feb 2014 10:12:22 +0000 (11:12 +0100)]
x86/AMD: apply workaround for AMD F16h erratum 792

Workaround for the Erratum will be in BIOSes spun only after
Jan 2014 onwards. But initial production parts shipped in 2013
itself. Since there is a coverage hole, we should carry this fix
in software in case BIOS does not do the right thing or someone
is using old BIOS.

Description:
 Processor does not ensure DRAM scrub read/write sequence is atomic wrt
 accesses to CC6 save state area. Therefore if a concurrent scrub
 read/write access is to same address the entry may appear as if it is
 not written. This quirk applies to Fam16h models 00h-0Fh

See "Revision Guide" for AMD F16h models 00h-0fh, document 51810 rev.
3.04, Nov 2013.

Equivalent Linux patch link:
 http://marc.info/?l=linux-kernel&m=139066012217149&w=2

Tested the patch on Fam16h server platform and it works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Corrected checking for boot CPU. Made warning message conditional.
Compacted warning message text. Moved comment to commit message.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
11 years agolibxl: test programs: Fix make race re libxenlight.so
Ian Jackson [Thu, 6 Feb 2014 19:17:26 +0000 (19:17 +0000)]
libxl: test programs: Fix make race re libxenlight.so

The test programs were getting the proper libxenlight.so on their link
line.  Filter it out.  Also change the soname of the test library to
match the real one, so that libxutil is satisfied with it.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibxl: test programs: Fix Makefile race re headers
Ian Jackson [Thu, 6 Feb 2014 18:41:24 +0000 (18:41 +0000)]
libxl: test programs: Fix Makefile race re headers

We need to include the new TEST_PROG_OBJS and LIBXL_TEST_OBJS in the
appropriate dependencies.  Otherwise we risk trying to build the test
program before gentypes is run.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Olaf Hering <olaf@aepfle.de>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibvchan: Fix handling of invalid ring buffer indices
Marek Marczykowski-Górecki [Thu, 6 Feb 2014 15:44:41 +0000 (16:44 +0100)]
libvchan: Fix handling of invalid ring buffer indices

The remote (hostile) process can set ring buffer indices to any value
at any time. If that happens, it is possible to get "buffer space"
(either for writing data, or ready for reading) negative or greater
than buffer size.  This will end up with buffer overflow in the second
memcpy inside of do_send/do_recv.

Fix this by introducing new available bytes accessor functions
raw_get_data_ready and raw_get_buffer_space which are robust against
mad ring states, and only return sanitised values.

Proof sketch of correctness:

Now {rd,wr}_{cons,prod} are only ever used in the raw available bytes
functions, and in do_send and do_recv.

The raw available bytes functions do unsigned arithmetic on the
returned values.  If the result is "negative" or too big it will be
>ring_size (since we used unsigned arithmetic).  Otherwise the result
is a positive in-range value representing a reasonable ring state, in
which case we can safely convert it to int (as the rest of the code
expects).

do_send and do_recv immediately mask the ring index value with the
ring size.  The result is always going to be plausible.  If the ring
state has become mad, the worst case is that our behaviour is
inconsistent with the peer's ring pointer.  I.e. we read or write to
arguably-incorrect parts of the ring - but always parts of the ring.
And of course if a peer misoperates the ring they can achieve this
effect anyway.

So the security problem is fixed.

This is XSA-86.

(The patch is essentially Ian Jackson's work, although parts of the
commit message are by Marek.)

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
11 years agoxsm/flask: correct off-by-one in flask_security_avc_cachestats cpu id check
Matthew Daley [Thu, 6 Feb 2014 15:42:36 +0000 (16:42 +0100)]
xsm/flask: correct off-by-one in flask_security_avc_cachestats cpu id check

This is XSA-85.

Signed-off-by: Matthew Daley <mattd@bugfuzz.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
11 years agoflask: fix reading strings from guest memory
Jan Beulich [Thu, 6 Feb 2014 15:33:50 +0000 (16:33 +0100)]
flask: fix reading strings from guest memory

Since the string size is being specified by the guest, we must range
check it properly before doing allocations based on it. While for the
two cases that are exposed only to trusted guests (via policy
restriction) this just uses an arbitrary upper limit (PAGE_SIZE), for
the FLASK_[GS]ETBOOL case (which any guest can use) the upper limit
gets enforced based on the longest name across all boolean settings.

This is XSA-84.

Reported-by: Matthew Daley <mattd@bugfuzz.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
11 years agolibxl: timeouts: Record deregistration when one occurs
Ian Jackson [Fri, 31 Jan 2014 15:07:55 +0000 (15:07 +0000)]
libxl: timeouts: Record deregistration when one occurs

When a timeout has occurred, it is deregistered.  However, we failed
to record this fact by updating etime->func.  As a result,
libxl__ev_time_isregistered would say `true' for a timeout which has
already happened.

It is necessary to clear etime->func before the callback, because the
callback might want to reinstate the timeout, or might free the etime
(or its containing struct) entirely.

The results are that we might try to have the timeout occur again
(causing problems for the call site), and/or corrupt the timeout list.

This fixes the timedereg event system unit test.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibxl: timeouts: Break out time_occurs
Ian Jackson [Fri, 31 Jan 2014 15:04:37 +0000 (15:04 +0000)]
libxl: timeouts: Break out time_occurs

Bring together the two places where etime->func() is called into a new
function time_occurs.  For one call site this is pure code motion.
For the other the only semantic change is the introduction of a new
debugging message.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibxl: events: timedereg internal unit test
Ian Jackson [Mon, 3 Feb 2014 14:25:13 +0000 (14:25 +0000)]
libxl: events: timedereg internal unit test

Test timeout deregistration idempotency.  In the current tree this
test fails because ev->func is not cleared, meaning that a timeout
can be removed from the list more than once, corrupting the list.

It is necessary to use multiple timeouts to demonstrate this bug,
because removing the very same entry twice from a list in quick
succession, without modifying the list in other ways in between,
doesn't actually corrupt the list.  (Since removing an entry from a
doubly-linked list just copies next and back from the disappearing
entry into its neighbours.)

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibxl: events: Makefile builds internal unit tests
Ian Jackson [Mon, 3 Feb 2014 14:17:46 +0000 (14:17 +0000)]
libxl: events: Makefile builds internal unit tests

We provide a new LIBXL_TESTS facility in the Makefile.
Also provide some helpful common routines for unit tests to use.

We don't want to put the weird test case entrypoints and the weird
test case code in the main libxl.so library.  Symbol hiding prevents
us from simply directly linking the libxl_test_FOO.o in later.  So
instead we provide a special library libxenlight_test.so which is used
only locally.

There are not yet any test cases defined; that will come in the next
patch.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibxl: fork: Make SIGCHLD self-pipe nonblocking
Ian Jackson [Tue, 21 Jan 2014 15:05:37 +0000 (15:05 +0000)]
libxl: fork: Make SIGCHLD self-pipe nonblocking

Use the new libxl__pipe_nonblock and _close functions, rather than
open coding the same logic.  Now the pipe is nonblocking, which avoids
a race which could result in libxl deadlocking in a multithreaded
program.

Reported-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
11 years agolibxl: events: Break out libxl__pipe_nonblock, _close
Ian Jackson [Tue, 21 Jan 2014 14:58:10 +0000 (14:58 +0000)]
libxl: events: Break out libxl__pipe_nonblock, _close

Break out the pipe creation and destruction from the poller code
into two new functions libxl__pipe_nonblock and libxl__pipe_close.
Also change direct use of pipe() to libxl_pipe.

No overall functional difference other than minor differences in exact
log messages.

Also move libxl__self_pipe_wakeup and libxl__self_pipe_eatall into the
new pipe utilities section in libxl_event.c; this is pure code motion.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
--
v3: Mention that we switched pipe() -> libxl_pipe()

11 years agolibxl: fork: Share SIGCHLD handler amongst ctxs
Ian Jackson [Fri, 17 Jan 2014 11:58:55 +0000 (11:58 +0000)]
libxl: fork: Share SIGCHLD handler amongst ctxs

Previously, an application which had multiple libxl ctxs in multiple
threads, would have to itself plumb SIGCHLD through to each ctx.
Instead, permit multiple libxl ctxs to all share the SIGCHLD handler.

We keep a list of all the ctxs which are interested in SIGCHLD and
notify all of their self-pipes.

In more detail:

 * sigchld_owner, the ctx* of the SIGCHLD owner, is replaced by
   sigchld_users, a list of SIGCHLD users.

 * Each ctx keeps track of whether it is on the users list, so that
   libxl__sigchld_needed and libxl__sigchld_notneeded now instead of
   idempotently installing and removing the handler, idempotently add
   or remove the ctx from the list.

   We ensure that we always have the SIGCHLD handler installed
   iff the sigchld_users list is nonempty.  To make this a bit
   easier we make sigchld_installhandler_core and
   sigchld_removehandler_core idempotent.

   Specifically, the call sites for sigchld_installhandler_core and
   sigchld_removehandler_core are updated to manipulate sigchld_users
   and only call the install or remove functions as applicable.

 * In the signal handler we walk the list of SIGCHLD users and write
   to each of their self-pipes.  That means that we need to arrange to
   defer SIGCHLD when we are manipulating the list (to avoid the
   signal handler interrupting our list manipulation); this is quite
   tiresome to arrange.

   The code as written will, on the first installation of the SIGCHLD
   handler, firstly install the real handler, then immediately replace
   it with the deferral handler.  Doing it this way makes the code
   clearer as it makes the SIGCHLD deferral machinery much more
   self-contained (and hence easier to reason about).

 * The first part of libxl__sigchld_notneeded is broken out into a new
   function sigchld_user_remove (which is also needed during for
   postfork).  And of course that first part of the function is now
   rather different, as explained above.

 * sigchld_installhandler_core no longer takes the gc argument,
   because it now deals with SIGCHLD for all ctxs.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v3: Include bugfixes from "Fixup SIGCHLD sharing" patch:

    * Use a mutex for defer_sigchld, to guard against concurrency
      between the thread calling defer_sigchld and an instance of the
      primary signal handler on another thread.

    * libxl_sigchld_owner_libxl_always is incompatible with SIGCHLD
      sharing.  Document this correctly.

    Fix "have have" error in comment.

    Move removal of newly unused variables to previous patch.

v2.1: Provide feature test macro LIBXL_HAVE_SIGCHLD_SHARING

11 years agolibxl: fork: Break out sigchld_sethandler_raw
Ian Jackson [Fri, 17 Jan 2014 15:42:31 +0000 (15:42 +0000)]
libxl: fork: Break out sigchld_sethandler_raw

We are going to want introduce another call site in the final
substantive patch.

Pure code motion; no functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
---
v3: Remove now-unused variables from sigchld_installhandler_core

11 years agolibxl: fork: Break out sigchld_installhandler_core
Ian Jackson [Fri, 17 Jan 2014 12:01:24 +0000 (12:01 +0000)]
libxl: fork: Break out sigchld_installhandler_core

Pure code motion.  This is going to make the final substantive patch
easier to read.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibxl: fork: Rename sigchld handler functions
Ian Jackson [Fri, 17 Jan 2014 11:45:57 +0000 (11:45 +0000)]
libxl: fork: Rename sigchld handler functions

We are going to change these functions so that different libxl ctx's
can share a single SIGCHLD handler.  Rename them now to a new name
which doesn't imply unconditional handler installation or removal.

Also note in the comments that they are idempotent.

No functional change.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP
Ian Jackson [Thu, 16 Jan 2014 17:03:34 +0000 (17:03 +0000)]
libxl: fork: Provide LIBXL_HAVE_SIGCHLD_SELECTIVE_REAP

This is the feature test macro for libxl_childproc_sigchld_occurred
and libxl_sigchld_owner_libxl_always_selective_reap.

It is split out into this separate patch because: a single feature
test is sensible because we do not intend anyone to release or ship
libxl versions with one of these but not the other; but, the two
features are in separate patches for clarity; and, this just makes
reading the actual code easier.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
11 years agolibxl: fork: Provide ..._always_selective_reap
Ian Jackson [Thu, 16 Jan 2014 17:01:50 +0000 (17:01 +0000)]
libxl: fork: Provide ..._always_selective_reap

Applications exist which want to use libxl in an event-driven mode but
which do not integrate child termination into their event system, but
instead reap all their own children synchronously.

In such an application libxl must own SIGCHLD but avoid reaping any
children that don't belong to libxl.

Provide libxl_sigchld_owner_libxl_always_selective_reap which has this
behaviour.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
---
v2: Document the new mode in the big "Subprocess handling" comment.

11 years agolibxl: fork: Provide libxl_childproc_sigchld_occurred
Ian Jackson [Thu, 16 Jan 2014 16:57:27 +0000 (16:57 +0000)]
libxl: fork: Provide libxl_childproc_sigchld_occurred

Applications exist which don't keep track of all their child processes
in a manner suitable for coherent dispatch of their termination.  In
such a situation, nothing in the whole process may call wait, or
waitpid(-1,,).  Doing so reaps processes belonging to other parts of
the application and there is then no way to deliver the exit status to
the right place.

To facilitate this, provide a facility for such an application to ask
libxl to call waitpid on each of its children individually.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>