]> xenbits.xensource.com Git - xen.git/log
xen.git
9 years agox86/mce: translate passed-in GPA to host machine address
Haozhong Zhang [Wed, 16 Sep 2015 09:40:26 +0000 (11:40 +0200)]
x86/mce: translate passed-in GPA to host machine address

This patch adds a new flag MC_MSRINJ_F_GPADDR to
xen_mc_msrinject.mcinj_flags, and makes do_mca() to translate the
guest physical address passed-in through
xen_mc_msrinject.mcinj_msr[i].value to the host machine address if
this flag is present.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Christoph Egger <chegger@amazon.de>
9 years agotools/xen-mceinj: fix code style
Haozhong Zhang [Wed, 16 Sep 2015 09:40:16 +0000 (11:40 +0200)]
tools/xen-mceinj: fix code style

Remove trailing whitespaces in xen-mceinj.c.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
9 years agox86/mce: fix code style
Haozhong Zhang [Wed, 16 Sep 2015 09:39:16 +0000 (11:39 +0200)]
x86/mce: fix code style

Remove trailing whitespaces and fix indentations in mce.c and xen_mca.h.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Christoph Egger <chegger@amazon.de>
9 years agox86/sysctl: don't clobber memory if NCAPINTS > ARRAY_SIZE(pi->hw_cap)
Andrew Cooper [Wed, 16 Sep 2015 09:22:00 +0000 (11:22 +0200)]
x86/sysctl: don't clobber memory if NCAPINTS > ARRAY_SIZE(pi->hw_cap)

There is no current problem, as both NCAPINTS and pi->hw_cap are 8 entries,
but the limit should be calculated appropriately so as to avoid hypervisor
stack corruption if the two do get out of sync.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agovtd: correct loglevel when check group devices
Tiejun Chen [Wed, 16 Sep 2015 09:20:54 +0000 (11:20 +0200)]
vtd: correct loglevel when check group devices

Since commit 3848058e7dd6 (vtd/iommu: permit group devices to
passthrough in relaxed mode) is introduced, we always print
message as XENLOG_G_WARNING but its not correct in the case of
strict mode. So here is making this message depending on the
specific mode.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/MSI: fail if no hardware support
Jan Beulich [Wed, 16 Sep 2015 09:20:27 +0000 (11:20 +0200)]
x86/MSI: fail if no hardware support

This is to guard against buggy callers (luckily Dom0 only) invoking
the respective hypercall for a device not being MSI-capable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/boot: remove unneeded instruction
Daniel Kiper [Wed, 16 Sep 2015 09:18:38 +0000 (11:18 +0200)]
x86/boot: remove unneeded instruction

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoarm: reduce power use by contented spin locks with WFE/SEV
David Vrabel [Mon, 3 Aug 2015 11:29:19 +0000 (12:29 +0100)]
arm: reduce power use by contented spin locks with WFE/SEV

Instead of cpu_relax() while spinning and observing the ticket head,
introduce arch_lock_relax() which executes a WFE instruction.  After
the ticket head is changed call arch_lock_signal() to execute an SEV
instruction (with the required DSB first) to wake any spinners.

This should improve power consumption when locks are contented and
spinning.

For consistency also move arch_lock_(acquire|release)_barrier to
asm/spinlock.h.

Booted the result on arm32 (Midway) and arm64 (Mustang). Build test
only on amd64.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[ijc: add barrier, rename as arch_lock_*, move arch_lock_*_barrier, test]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoxen/arm: hvm_domain drop unused field instropection_enabled
Julien Grall [Mon, 14 Sep 2015 15:30:38 +0000 (16:30 +0100)]
xen/arm: hvm_domain drop unused field instropection_enabled

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxl: tighten parsing of "irq" and "iomem" list elements
Jan Beulich [Mon, 14 Sep 2015 13:53:27 +0000 (07:53 -0600)]
xl: tighten parsing of "irq" and "iomem" list elements

While "ioport" list element parsing already validates that the entire
input string got consumed, its two siblings so far didn't.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: slightly refine pci-assignable-{add, remove} handling
Jan Beulich [Thu, 10 Sep 2015 12:36:54 +0000 (06:36 -0600)]
libxl: slightly refine pci-assignable-{add, remove} handling

While it appears to be intentional for "xl pci-assignable-remove" to
not re-bind the original driver by default (requires the -r option),
permanently losing the information which driver was originally used
seems bad. Make "add; remove; add; remove -r" re-bind the original
driver by allowing "remove" to delete the information only upon
successful re-bind.

In the course of this I also noticed that binding information is lost
when upon first "add" pciback isn't loaded yet, due to its presence not
being checked for early enough. Adjust pciback_dev_is_assigned()
accordingly, and properly distinguish "yes" and "error" returns in the
"add" case (removing a redundant error message from the "remove" path
for consistency).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/PoD: use clear_domain_page()
Jan Beulich [Mon, 14 Sep 2015 11:40:04 +0000 (13:40 +0200)]
x86/PoD: use clear_domain_page()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/p2m: fix mismatched unlock
Jan Beulich [Mon, 14 Sep 2015 11:39:19 +0000 (13:39 +0200)]
x86/p2m: fix mismatched unlock

Luckily, due to gfn_unlock() currently mapping to p2m_unlock(), this is
only a cosmetic issue right now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agovtd/iommu: permit group devices to passthrough in relaxed mode
Tiejun Chen [Mon, 14 Sep 2015 11:38:02 +0000 (13:38 +0200)]
vtd/iommu: permit group devices to passthrough in relaxed mode

Currently we don't allow passing through any group devices which are
sharing same RMRR entry since it would break security among VMs. And
indeed, we expect we can figure out a better way to handle this kind
of case completely.

But before the group assignment gets implemented, we might make this
permission dependent on our RMRR policy. So, now it would be allowed
in the relaxed mode.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: handle empty vnuma configuration
Wei Liu [Fri, 11 Sep 2015 13:50:09 +0000 (14:50 +0100)]
xl: handle empty vnuma configuration

When user specifies vnuma = [], we need to skip the whole parser
function, otherwise the parser sets b_info->max_memkb to garbage value.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxl/libxl: disallow saving a guest with vNUMA configured
Wei Liu [Wed, 9 Sep 2015 16:11:24 +0000 (17:11 +0100)]
xl/libxl: disallow saving a guest with vNUMA configured

This is because the migration stream does not preserve node information.

Note this is not a regression for migration v2 vs legacy migration
because neither of them preserves node information.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- some grammar fixes to the doc and fixed a comment ]

9 years agolibxc: introduce xc_domain_getvnuma
Wei Liu [Fri, 11 Sep 2015 13:50:07 +0000 (14:50 +0100)]
libxc: introduce xc_domain_getvnuma

A simple wrapper for XENMEM_get_vnumainfo.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: format fd flags with 0x since they are hex.
Ian Campbell [Fri, 11 Sep 2015 14:19:54 +0000 (15:19 +0100)]
libxl: format fd flags with 0x since they are hex.

Commit 93f5194e7270 "libxl: clear O_NONBLOCK|O_NDELAY on migration fd
and reinstate afterwards" added some logging of fcntl.F_GETFL at all
as %x without a 0x prefix to make it clear they numbers are hex. Fix
this alongwith an inadvertent logging of the fd itself as hex.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/hvm: fix saved pmtimer and hpet values
Kouya Shimura [Fri, 11 Sep 2015 14:24:56 +0000 (16:24 +0200)]
x86/hvm: fix saved pmtimer and hpet values

The ACPI PM timer is sometimes broken on live migration.
Since vcpu->arch.hvm_vcpu.guest_time is always zero in other than
"delay for missed ticks mode". Even in "delay for missed ticks mode",
vcpu's guest_time field is not valid (i.e. zero) when
the state of vcpu is "blocked". (see pt_save_timer function)

The original author (Tim Deegan) of pmtimer_save() must have intended
that it saves the last scheduled time of the vcpu. Unfortunately it was
already implied this bug. FYI, there is no other timer mode than
"delay for missed ticks mode" then.

For consistency with HPET, pmtimer_save() should refer hvm_get_guest_time()
to update the counter as well as hpet_save() does.

Without this patch, the clock of windows server 2012R2 without HPET
might leap forward several minutes on live migration.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Retain use of ->arch.hvm_vcpu.guest_time when non-zero. Do the inverse
adjustment for vHPET.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kouya Shimura <kouya@jp.fujitsu.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: clear O_NONBLOCK|O_NDELAY on migration fd and reinstate afterwards
Ian Campbell [Fri, 11 Sep 2015 10:42:51 +0000 (11:42 +0100)]
libxl: clear O_NONBLOCK|O_NDELAY on migration fd and reinstate afterwards

The fd passed to us by libvirt for both save and restore has at least
O_NONBLOCK set, which libxl does not expect and therefore fails to
handle any EAGAIN which might arise.

This has been observed with migration v2, but if v1 used to work I
think that would be just be by luck and/or coincidence.

Unix convention (and the principal of least surprise) is usually to
ensure that an fd has no "strange" properties, such as being
non-blocking, when handing it to another component.

However for the convenience of the application arrange instead for
libxl to clear any unexpected flags on the file descriptors it is
given for save or restore and restore them to their original state at
the end. O_NDELAY could be similarly problematic so clear that as
well as O_NONBLOCK.

To do this introduce a pair of new helper functions one to modify+save
the flags and another to restore them and call them in the appropriate
places.

The migration v1 code appeared to do some things with O_NONBLOCK in
the checkpoint case. Migration v2 doesn't seem to do so, and in any
case I wouldn't expect it to be relying on libvirt's setting of
O_NONBLOCK when xl doesn't use that flag.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Cc: Yang Hongyang <yanghy@cn.fujitsu.com>
9 years agoxen: arm: Support <32MB frametables
Chris Brand [Fri, 21 Aug 2015 21:30:37 +0000 (14:30 -0700)]
xen: arm: Support <32MB frametables

setup_frametable_mappings() rounds frametable_size up to a multiple
of 32MB. This is wasteful on systems with less than 4GB of RAM,
although it does allow the "contig" bit to be set in the PTEs.

Where the frametable is less than 32MB in size, instead round up
to a multiple of 2MB, not setting the "contig" bit in the PTEs.

Signed-off-by: Chris Brand <chris.brand@broadcom.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen: arm: Be explicit about bit values in mfn_to_xen_entry()
Chris Brand [Thu, 10 Sep 2015 18:56:29 +0000 (11:56 -0700)]
xen: arm: Be explicit about bit values in mfn_to_xen_entry()

Ensure that every relevant bit is given an explicit value.
This has no effect on the generated code, but makes it
a little easier to follow.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Chris Brand <chris.brand@broadcom.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen: arm re-order assignments in mfn_to_xen_entry()
Chris Brand [Thu, 10 Sep 2015 18:56:28 +0000 (11:56 -0700)]
xen: arm re-order assignments in mfn_to_xen_entry()

Shuffle lines around so that the assignments in mfn_to_xen_entry()
occur in the same order as the bits are declared in lpae_pt_t.
This makes it easier to see which ones are never given a value.
No change in behaviour.

Also fix a minor comment typo.

Signed-off-by: Chris Brand <chris.brand@broadcom.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoQEMU_TAG update
Ian Jackson [Fri, 11 Sep 2015 10:18:53 +0000 (11:18 +0100)]
QEMU_TAG update

9 years agolibxl: add LIBXL_DEVICE_MODEL_SAVE_FILE
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:58:26 +0000 (16:58 +0200)]
libxl: add LIBXL_DEVICE_MODEL_SAVE_FILE

Use this in libxl_dm instead of hard-coding.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: support XEN_DOMCTL_soft_reset operation
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:58:17 +0000 (16:58 +0200)]
libxc: support XEN_DOMCTL_soft_reset operation

Introduce xc_domain_soft_reset() function supporting XEN_DOMCTL_soft_reset.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoarch-specific hooks for domain_soft_reset()
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:57:40 +0000 (16:57 +0200)]
arch-specific hooks for domain_soft_reset()

x86-specific hook cleans up the pirq-emuirq mappings, destroys all ioreq
servers and and replaces the shared_info frame with an empty page to support
subsequent XENMAPSPACE_shared_info call.

ARM-specific hook is -ENOSYS for now.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoflask: DOMCTL_soft_reset support
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:54:48 +0000 (16:54 +0200)]
flask: DOMCTL_soft_reset support

Add new soft_reset vector to domain2 class, add it to create_domain
in the default policy.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
9 years agointroduce XEN_DOMCTL_soft_reset
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:54:09 +0000 (16:54 +0200)]
introduce XEN_DOMCTL_soft_reset

New domctl resets state for a domain allowing it to 'start over': register
vcpu_info, switch to FIFO ABI for event channels. Still active grants are
being logged to help debugging misbehaving backends.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agogrant_table: implement grant_table_warn_active_grants()
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:53:36 +0000 (16:53 +0200)]
grant_table: implement grant_table_warn_active_grants()

Log first 10 active grants for a domain. This function is going to be used
for soft reset, active grants on this path usually mean misbehaving backends
refusing to release their mappings on shutdown. We need that in addition to
the already existent 'g' keyhandler as such misbehaving backends can cause a
domain to crash right after the soft reset operation and 'g' option won't be
available in this case.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoevtchn: make evtchn_reset() ready for soft reset
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:53:08 +0000 (16:53 +0200)]
evtchn: make evtchn_reset() ready for soft reset

We need to close all event channel so the domain performing soft reset
will be able to open them back.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoxl: introduce enum domain_restart_type
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:52:58 +0000 (16:52 +0200)]
xl: introduce enum domain_restart_type

As a preparation before adding new restart type (soft reset) put all
restart types into an enum.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agolibxl: support SHUTDOWN_soft_reset shutdown reason
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:52:08 +0000 (16:52 +0200)]
libxl: support SHUTDOWN_soft_reset shutdown reason

Use letter 'S' to indicate a domain in such state. Introduce new
'on_soft_reset' action and default it to 'restart' for now.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agointroduce SHUTDOWN_soft_reset shutdown reason
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:51:26 +0000 (16:51 +0200)]
introduce SHUTDOWN_soft_reset shutdown reason

This special type of shutdown is supposed to be used by PVHVM guests when
they want to perform some sort of kexec/kdump.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agolibxl: set ret to non-zero value in failure path
Wei Liu [Wed, 9 Sep 2015 17:03:36 +0000 (18:03 +0100)]
libxl: set ret to non-zero value in failure path

... otherwise we have something like:

xl: libxl_create.c:968: initiate_domain_create: Assertion `ret' failed.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoconfigure: don't silently disable systemd support
Wei Liu [Thu, 10 Sep 2015 11:18:03 +0000 (12:18 +0100)]
configure: don't silently disable systemd support

Originally when user runs ./configure --enable-systemd and systemd
development library is not available the build system silently disables
systemd support. This is not in line with normal expectation.

Instead, configure should error out when user has asked for systemd
support but development libraries can't be found.

Reported-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoRerun autogen.sh to pickup new version number
Ian Campbell [Thu, 10 Sep 2015 11:30:40 +0000 (12:30 +0100)]
Rerun autogen.sh to pickup new version number

315a8722b4d7ba6141c6cc85009b6e09f5b20424 bumped the version after 4.6
branched. This picks up that change into the generated files.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc/x86/pvh: Allow creation of 32b PVH guests
Boris Ostrovsky [Wed, 9 Sep 2015 15:10:39 +0000 (17:10 +0200)]
libxc/x86/pvh: Allow creation of 32b PVH guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/pvh: handle hypercalls for 32b PVH guests
Boris Ostrovsky [Wed, 9 Sep 2015 15:10:13 +0000 (17:10 +0200)]
x86/pvh: handle hypercalls for 32b PVH guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/compat: test both PV and PVH guests for compat mode
Boris Ostrovsky [Wed, 9 Sep 2015 15:09:34 +0000 (17:09 +0200)]
x86/compat: test both PV and PVH guests for compat mode

Add is_pvh_32bit_domain() macro and use it alongside is_pv_32bit_domain() where
necessary.

Since PVH guests cannot change execution mode, has_32bit_shinfo is a good
indicator of whether the guest is PVH and 32-bit.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
9 years agox86/pvh: do not allow 32-bit PVH guests to clear CR4's PAE bit
Boris Ostrovsky [Wed, 9 Sep 2015 15:08:56 +0000 (17:08 +0200)]
x86/pvh: do not allow 32-bit PVH guests to clear CR4's PAE bit

.. since we only support 32-bit PV(H) guests in PAE mode.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/pvh: set 32b PVH guest mode in XEN_DOMCTL_set_address_size
Boris Ostrovsky [Wed, 9 Sep 2015 15:08:12 +0000 (17:08 +0200)]
x86/pvh: set 32b PVH guest mode in XEN_DOMCTL_set_address_size

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agodomctl: lower loglevel of XEN_DOMCTL_memory_mapping
Tiejun Chen [Wed, 9 Sep 2015 14:29:11 +0000 (16:29 +0200)]
domctl: lower loglevel of XEN_DOMCTL_memory_mapping

We should lower loglevel to XENLOG_G_DEBUG while mapping or
unmapping memory via XEN_DOMCTL_memory_mapping since its
fair enough to check this info just while debugging.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
9 years agox86: wrap kexec feature with CONFIG_KEXEC
Jonathan Creekmore [Wed, 9 Sep 2015 14:28:27 +0000 (16:28 +0200)]
x86: wrap kexec feature with CONFIG_KEXEC

Add the appropriate #if checks around the kexec code in the x86 codebase
so that the feature can actually be turned off by the flag instead of
always required to be enabled on x86.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: David Vrabel <david.vrabel@citrix.com>
9 years agox86: clean up vm_event-related code in asm-x86/domain.h
Razvan Cojocaru [Wed, 9 Sep 2015 14:27:24 +0000 (16:27 +0200)]
x86: clean up vm_event-related code in asm-x86/domain.h

As suggested by Jan Beulich, moved struct monitor_write_data from
struct arch_domain to struct arch_vcpu, as well as moving all
vm_event-related data from asm-x86/domain.h to struct vm_event,
and allocating it dynamically only when needed.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoACPI / table: Replace '1' with specific error return values
Hanjun Guo [Wed, 9 Sep 2015 14:26:37 +0000 (16:26 +0200)]
ACPI / table: Replace '1' with specific error return values

After commit 7f8f97c3cc (ACPI: acpi_table_parse() now returns
success/fail, not count), acpi_table_parse() returns '1' when it is
unable to find the table, but it should return a negative error code
in that case.  Make it return -ENODEV instead.

Fix the same problem in acpi_table_init() analogously.

Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
[rjw: Subject and changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit 95df812dbdc350bfcf31e247e9100c378a472480]
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
9 years agoACPI: disable ACPI cleanly when bad RSDP found
Len Brown [Wed, 9 Sep 2015 14:26:12 +0000 (16:26 +0200)]
ACPI: disable ACPI cleanly when bad RSDP found

When ACPI is disabled in the BIOS of this VIA C3 box,
it invalidates the RSDP, which Linux notices:

ACPI Error (tbxfroot-0218): A valid RSDP was not found [20080926]

Bug Linux neglected to disable ACPI at that stage,
and later scribbled on smp_found_config:

ACPI: No APIC-table, disabling MPS

But this box doesn't run well in legacy PIC mode,
it needed IOAPIC mode to perform correctly:

http://lkml.org/lkml/2009/2/5/39

So exit ACPI mode cleanly when we first detect
that it is hopeless.

Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit 9e3a9d1ed8cc8db93e5c53e9a5b09065bd95de8b]
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
9 years agoACPI/table: Always count matched and successfully parsed entries
Tomasz Nowicki [Wed, 9 Sep 2015 14:25:42 +0000 (16:25 +0200)]
ACPI/table: Always count matched and successfully parsed entries

acpi_parse_entries() allows to traverse all available table entries (aka
subtables) by passing max_entries parameter equal to 0, but since its count
variable is only incremented if max_entries is not 0, the function always
returns 0 for max_entries equal to 0.  It would be more useful if it returned
the number of entries matched instead, so make it increment count in that
case too.

Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
[Linux commit 4ceacd02f5a1795c5c697e0345ee10beef675290]
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
9 years agox86/VPMU: Set VPMU context pointer to NULL when freeing it
Boris Ostrovsky [Wed, 9 Sep 2015 13:32:00 +0000 (15:32 +0200)]
x86/VPMU: Set VPMU context pointer to NULL when freeing it

Otherwise we may hit assertion in vpmu_initialise() if vcpu is offlined
and then onlined again.

For tidyness, set priv_context to NULL as well.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoefi: introduce efi_arch_flush_dcache_area
Stefano Stabellini [Wed, 9 Sep 2015 13:29:06 +0000 (15:29 +0200)]
efi: introduce efi_arch_flush_dcache_area

Objects loaded by FileHandle->Read need to be flushed from dcache,
otherwise copy_from_paddr will read stale data when copying the kernel,
causing a failure to boot.

Introduce efi_arch_flush_dcache_area and call it from read_file.

This commit introduces no functional changes on x86.

Reported-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoBranch for 4.6: Update staging version to 4.7-unstable
Ian Jackson [Wed, 9 Sep 2015 12:19:57 +0000 (13:19 +0100)]
Branch for 4.6: Update staging version to 4.7-unstable

* Change README to say `Xen 4.6-rc'
* Change XEN_EXTRAVERSION so that we are `4.6.0-rc'

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
9 years agoBranch for 4.6: Unpin QEMU_UPSTREAM_REVISION
Ian Jackson [Wed, 9 Sep 2015 12:18:13 +0000 (13:18 +0100)]
Branch for 4.6: Unpin QEMU_UPSTREAM_REVISION

9 years agoMAINTAINERS: stable backports should be requested on xen-devel
Ian Campbell [Sat, 25 Jul 2015 08:28:58 +0000 (08:28 +0000)]
MAINTAINERS: stable backports should be requested on xen-devel

As well as CC-ing the correct people. I just saw such a request on
xen-users and thought this was worth clarifying here too.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: add assertion to avoid setting same bit more than once 4.6.0-rc3
Wei Liu [Sun, 6 Sep 2015 20:05:39 +0000 (21:05 +0100)]
libxc: add assertion to avoid setting same bit more than once

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agolibxc: don't populate same pfn more than once in populate_pfns
Wei Liu [Sun, 6 Sep 2015 20:05:38 +0000 (21:05 +0100)]
libxc: don't populate same pfn more than once in populate_pfns

The original implementation of populate_pfns didn't consider the same
pfn can be present multiple times in the array. The mechanism to prevent
populating the same pfn multiple times only worked if the recurring pfn
appeared in different batches.

This bug is discovered by Linux 4.1 32 bit kernel save / restore test,
which has several ptes pointing to same pfn, which results in an array
containing recurring pfn.  When libxc called x86_pv_localise_page, the
original implementation would populate the same pfn more than once.

The fix is to set bit in populated bitmap as we generate list of pfns to
be populated.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agolibxc: fix indentation
Wei Liu [Sun, 6 Sep 2015 20:05:37 +0000 (21:05 +0100)]
libxc: fix indentation

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agolibxc: migration v2 prefix Memory -> Frames
Wei Liu [Sun, 6 Sep 2015 20:05:36 +0000 (21:05 +0100)]
libxc: migration v2 prefix Memory -> Frames

The prefix "Memory" is confusing because the numbers shown after that
are referring to frames.

Change a bunch of prefixes from "Memory" to "Frames". Also rename
send_memory_verify to verify_frames.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agolibxc: clearer migration v2 debug message
Wei Liu [Sun, 6 Sep 2015 20:05:35 +0000 (21:05 +0100)]
libxc: clearer migration v2 debug message

Previous the message was like:

SAVE:
xc: detail: 32 bits, 3 levels
xc: detail: max_pfn 0xfffff, p2m_frames 1024
xc: detail: max_mfn 0x130000

RESTORE:
xc: detail: max_mfn 0x130000
xc: detail: 32 bits, 3 levels
xc: detail: Expanded p2m from 0 to 0xfffff

It's not immediately clear that the last line in restore messages was
referring to max_pfn. Change the debug message a bit to make that
clearer.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agopublic/io/netif.h: move and amend multicast control documentation
Paul Durrant [Wed, 2 Sep 2015 11:17:05 +0000 (12:17 +0100)]
public/io/netif.h: move and amend multicast control documentation

netif.h contains a specification of the XEN_NETIF_EXTRA_TYPE_MCAST_{ADD,DEL}
extra info messages require to manipulate a multicast filter list maintained
by a backend and specifies the xenstore negotiation protocol in a comment
just above the structure defintion, which is easy to miss.

This patch moves the documentation of the xenstore negotiation to be
co-located with the documentation for other features and also amends the
wording to be clearer.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- added a blank line to the comment ]

9 years agoMAINTAINERS: tools/ocaml: update David Scott's email address
David Scott [Wed, 2 Sep 2015 10:04:41 +0000 (11:04 +0100)]
MAINTAINERS: tools/ocaml: update David Scott's email address

Replace my sometimes unreliable <dave.scott@eu.citrix.com> address with
my reliable permanent address.

Reported-by: Doug Goldstein <cardoe@cardoe.com>
Signed-off-by: David Scott <dave@recoil.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agobuild: update top-level make help
Doug Goldstein [Tue, 1 Sep 2015 20:10:08 +0000 (15:10 -0500)]
build: update top-level make help

Update the top-level make help to include all the possible targets and
not reference targets that are deprecated while hopefully being more
clear as to what each target does.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/dt: Handle correctly node without interrupt-map in dt_for_each_irq_map
Vijaya Kumar K [Mon, 31 Aug 2015 11:06:18 +0000 (16:36 +0530)]
xen/dt: Handle correctly node without interrupt-map in dt_for_each_irq_map

dt_for_each_irq_map() returns error if no irq mapping is found.
With this patch, ignore error and return success

Signed-off-by: Vijaya Kumar K <Vijaya.Kumar@caviumnetworks.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools/xen-access: use PRI_xen_pfn
Wei Liu [Thu, 3 Sep 2015 18:27:47 +0000 (19:27 +0100)]
tools/xen-access: use PRI_xen_pfn

Otherwise when building with 32bit compiler, we get:

xen-access.c: In function 'xenaccess_init':
xen-access.c:263:5: error: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'xen_pfn_t' [-Werror=format]
cc1: all warnings being treated as errors

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
9 years agoxen/arm64: do not (incorrectly) limit size of xenheap
Julien Grall [Fri, 4 Sep 2015 12:57:00 +0000 (13:57 +0100)]
xen/arm64: do not (incorrectly) limit size of xenheap

The commit 88e3ed61642bb393458acc7a9bd2f96edc337190 "x86/NUMA: make
init_node_heap() respect Xen heap limit" breaks boot on the arm64 board
X-Gene.

The xenheap bits variable is used to know the last RAM MFN always mapped
in Xen virtual memory. If the value is 0, it means that all the memory is
always mapped in Xen virtual memory.

On X-gene the RAM bank resides above 128GB and last xenheap MFN is
0x4400000. With the new way to calculate the number of bits, xenheap_bits
will be equal to 38 bits. This will result to hide all the RAM and the
impossibility to allocate xenheap memory.

Given that aarch64 have always all the memory mapped in Xen virtual
memory, it's not necessary to call xenheap_max_mfn which set the number
of bits.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoMerge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging
Ian Campbell [Wed, 2 Sep 2015 13:27:01 +0000 (14:27 +0100)]
Merge branch 'staging' of ssh://xenbits.xen.org/home/xen/git/xen into staging

9 years agotmem: Spelling and full stop surgery.
Konrad Rzeszutek Wilk [Mon, 31 Aug 2015 15:14:14 +0000 (11:14 -0400)]
tmem: Spelling and full stop surgery.

I could not help myself.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agotmem: Remove extra spaces at end and some hard tabbing.
Konrad Rzeszutek Wilk [Fri, 21 Aug 2015 14:48:25 +0000 (10:48 -0400)]
tmem: Remove extra spaces at end and some hard tabbing.

My editor marks these in red glowing red so removing them to
make it easier to focus on code.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotmem: Use 'struct xen_tmem_oid' in tmem_handle and move it to sysctl header.
Konrad Rzeszutek Wilk [Mon, 31 Aug 2015 15:13:50 +0000 (11:13 -0400)]
tmem: Use 'struct xen_tmem_oid' in tmem_handle and move it to sysctl header.

Instead of the three member uint64_t structure.

The structure is used by the control stack for
XEN_SYSCTL_TMEM_OP_SAVE_GET_NEXT_[PAGE|INV] only so
moving it to the sysctl header.

Also modified tmemc_save_get_next_page to deal with
the new type - and converted some of the on-stack
usage of the array to use an pointer.

Further work will be to make the xen_sysctl_tmem_op have
an union with proper type for the two: ..GET_NEXT_[PAGE|INV]
operations.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agotmem: Use 'struct xen_tmem_oid' for every user.
Konrad Rzeszutek Wilk [Mon, 31 Aug 2015 15:00:29 +0000 (11:00 -0400)]
tmem: Use 'struct xen_tmem_oid' for every user.

Patch "tmem: Make the uint64_t oid[3] a proper structure:
xen_tmem_oid" converted the sysctl API to use an
proper structure. But it did not do it for the tmem hypercall.

This expands that and converts the tmem hypercall. For this
to work we define the struct in tmem.h and include it in
sysctl.h.

This change also included work to make the compat layer
happy. That was to declare the struct xen_tmem_oid to be
checked in xlat.lst - which will construct an typedef
in the compat file with the same type, hence allowing
copying of 'oid' member without type issues. The kicker
is that the compat layer adds the prefix 'xen' and since
our structure already has it - we must not include it.

The layout (and size) of this structure in memory for the
'struct tmem_op' (so guest facing) is the same! Verified
via pahole and with 32/64 bit guests.

--- /tmp/old    2015-08-27 16:34:00.535638730 -0400
+++ /tmp/new    2015-08-27 16:34:10.447705328 -0400
@@ -8,7 +8,7 @@
                        uint32_t   arg1;                 /*    28     4 */
                } creat;                                 /*          24 */
                struct {
-                       uint64_t   oid[3];               /*     8    24 */
+                       xen_tmem_oid_t oid;              /*     8    24 */
                        uint32_t   index;                /*    32     4 */
                        uint32_t   tmem_offset;          /*    36     4 */
                        uint32_t   pfn_offset;           /*    40     4 */

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jen Beulich <jbeulich@suse.com>
9 years agotmem: Make the uint64_t oid[3] a proper structure: xen_tmem_oid
Konrad Rzeszutek Wilk [Mon, 31 Aug 2015 14:57:27 +0000 (10:57 -0400)]
tmem: Make the uint64_t oid[3] a proper structure: xen_tmem_oid

And use it almost everywhere. It is easy to use it for the
sysctl since the hypervisor and toolstack are intertwined.

But for the tmem hypercall we need to be dilligient (as it
is guest facing) so delaying that to another patch:
"tmem: Use 'struct xen_tmem_oid' for every user" to help
with bisection issues.

We also move some of the parameters on functions to be within
the right location.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jen Beulich <jbeulich@suse.com>
9 years agotmem: Remove the old tmem control XSM checks as it is part of sysctl hypercall.
Konrad Rzeszutek Wilk [Wed, 26 Aug 2015 21:36:21 +0000 (17:36 -0400)]
tmem: Remove the old tmem control XSM checks as it is part of sysctl hypercall.

The sysctl is where the tmem control operations are done and the
XSM checks are done via there. The old mechanism (to check
for control tmem op XSM from do_tmem_op) is not needed anymore.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
9 years agotmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.
Konrad Rzeszutek Wilk [Wed, 26 Aug 2015 22:04:12 +0000 (18:04 -0400)]
tmem: Move TMEM_CONTROL subop of tmem hypercall to sysctl.

The operations are to be used by an control domain to set parameters,
list pools, clients, and to be used during migration.

There is no need to have them in the tmem hypercall path.

This patch moves code without adding fixes - and in fact in
some cases makes the parameters soo long that they hurt eyes - but
that is for another patch.

Note that in regards to existing users:

 - Only the control domain could call it - which meant that if
   a guest called it would get -EPERM, so we are OK there.
   In practice no guests called this TMEM_CONTROL command.

 - The spec: https://oss.oracle.com/projects/tmem/dist/documentation/api/tmemspec-v001.pdf
   mentions: "TBD [Not sure if this is really needed.]"
   which is a carte blanche as any to do this!

Note: The XSM check is the same - we just move it from do_tmem_op
to do_sysctl.

We also add an 32-bit pad to make the sysctl structure have the same
exact size under 32 and 64-bit toolstacks and not worry about aligment
issues.

And the XLAT does not need to deal with the buf as it has been
moved to another structure which is 32/64 fixed.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Jen Beulich <jbeulich@suse.com>
9 years agotmem: Remove xc_tmem_control mystical arg3
Konrad Rzeszutek Wilk [Wed, 26 Aug 2015 21:43:31 +0000 (17:43 -0400)]
tmem: Remove xc_tmem_control mystical arg3

It mentions it but it is never used. The hypercall interface
knows nothing of this sort of thing either. Lets just remove it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com> [release + toolstack]
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agotmem: Remove in xc_tmem_control_oid duplicate set_xen_guest_handle call
Konrad Rzeszutek Wilk [Wed, 26 Aug 2015 21:41:34 +0000 (17:41 -0400)]
tmem: Remove in xc_tmem_control_oid duplicate set_xen_guest_handle call

We are doing another call to set_xen_guest_handle right
after the xc_hypercall_bounce_pre (the correct place to do it).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotmem: Add ASSERT in obj_rb_insert for pool->rwlock lock.
Konrad Rzeszutek Wilk [Fri, 21 Aug 2015 14:49:22 +0000 (10:49 -0400)]
tmem: Add ASSERT in obj_rb_insert for pool->rwlock lock.

Manipulating the obj-> structures requires us to hold the
pool->rwlock lock. Lets make that obvious in this function to
catch any errant users (none found, but we may in future).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotmem: Don't crash/hang/leak hypervisor when using shared pools within an guest.
Konrad Rzeszutek Wilk [Wed, 26 Aug 2015 17:15:03 +0000 (13:15 -0400)]
tmem: Don't crash/hang/leak hypervisor when using shared pools within an guest.

This is a regression introduced by a36590e1b5040838af19d2ca712a516f07a6062b
"tmem: reorg the shared pool allocate path".

When we are using shared pools we have an global array
(on which we put the pool), and an array of pools per domain.
We also have an shared list of clients (guests) _except_
for the very first domain that created the shared pool.

To deal with multiple guests using an shared pool we have an
ref count and a linked list. Whenever an new user of
a the shared pool joins we increase the ref count and add to
the linked list. Whenever an user quits the shared pool
we decrement and remove from the linked list.

Unfortunately this ref counting and linked list never
worked properly. There are multiple issues:

 1) If we have one shared pool (and only one guest creating it)
    - we do not add it to the shared list of clients. Which
    means the logic in 'shared_pool_quit' never removed
    the pool from global_shared_pools. That meant when the pool
    was de-allocated - we still had an pointer to the pool
    which would be accessed by tmemc_list_client (xl tmem-list -a)
    and hit a NULL page!

 2). If we have two shared pools in a domain - it (shared_pool_quit)
     would remove the domain from the share_list linked list, decrements
     the refcount to zero - and remove the pool from the global shared pool.
     When done it would also clear the client->pools[] to NULL for itself.
     Which is good. However since there are two shared pools in the domain
     the next entry in the client->pools[] would have a stale pointer to
     the just de-allocated pool. Accessing it and trying to de-allocate it
     would lead to crashes or hypervisor hang - depending on the build.

Fun times!

To trigger this use
http://xenbits.xen.org/gitweb/?p=xentesttools/bootstrap.git;a=blob;f=root_image/drivers/tmem_test/tmem_test.c

This patch fixes it by making the very first domain that created
an shared pool to follow the same logic as every domain that is
joining a shared pool. That is increment the refcount and also
add itself to the shared list of domains using it.

We also remove an ASSERT that incorrectly assumed
that only one shared pool would exist for a domain.

And to mirror the reporting logic in shared_pool_join
we also add a printk to advertise inter-domain shared pool
joining.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: Honor Config.mk debug value, rather than setting our own
George Dunlap [Wed, 2 Sep 2015 09:34:55 +0000 (10:34 +0100)]
tools: Honor Config.mk debug value, rather than setting our own

Changeset 1166ecf ('tools/Rules.mk: Don't optimize debug builds; add
macro debugging information') exposed a bug whereby the autoconf stuff
in tools was setting its own debug value (defaulting to ENABLED, even
for releases) instead of honoring the value set in Config.mk.

After that changeset, if the global build has -D_FORTIFY_SOURCE
enabled (as is the default in CentOS 7 rpmbuild), then the tools build
will fail (because debug builds default to on).

There should be only one place to specify whether to build debug or
not, and Config.mk is already included by the relevant makefiles.  So
simply remove the tools/configure debug option and everything falls
into place naturally.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: mm: Do not dump the p2m when mapping a foreign gfn
Julien Grall [Thu, 13 Aug 2015 11:03:43 +0000 (12:03 +0100)]
xen/arm: mm: Do not dump the p2m when mapping a foreign gfn

The physmap operation XENMAPSPACE_gfmn_foreign is dumping the p2m when
an error occured by calling dump_p2m_lookup. But this function is not
using ratelimited printk.

Any domain able to map foreign gfmn would be able to flood the Xen
console.

The information wasn't not useful so drop it.

This is XSA-141.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoocaml/xs: prefer using character device
Doug Goldstein [Tue, 1 Sep 2015 18:34:02 +0000 (13:34 -0500)]
ocaml/xs: prefer using character device

Since 9c89dc95201ffed5fead17b35754bf9440fdbdc0 libxenstore prefers using
/dev/xen/xenbus over /proc/xen/xenbus. This makes the OCaml xenstore
library contain the same preference.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: David Scott <dave.scott@citrix.com>
9 years agoUpdate QEMU_UPSTREAM_REVISION for rc3
Ian Jackson [Tue, 1 Sep 2015 17:24:57 +0000 (18:24 +0100)]
Update QEMU_UPSTREAM_REVISION for rc3

As requested by Wei.

9 years agox86/mm: make {set,clear}_identity_p2m_mapping() work for PV guests
Jan Beulich [Tue, 1 Sep 2015 14:51:44 +0000 (16:51 +0200)]
x86/mm: make {set,clear}_identity_p2m_mapping() work for PV guests

Namely Dom0 suffers from commit 5ae03990c1 ("xen/vtd: create RMRR
mapping") having removed the creation of such mappings for non-
translated guests.

Reported-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/NUMA: make init_node_heap() respect Xen heap limit
Jan Beulich [Tue, 1 Sep 2015 12:02:57 +0000 (14:02 +0200)]
x86/NUMA: make init_node_heap() respect Xen heap limit

On NUMA systems, where we try to use node local memory for the basic
control structures of the buddy allocator, this special case needs to
take into consideration a possible address width limit placed on the
Xen heap. In turn this (but also other, more abstract considerations)
requires that xenheap_max_mfn() not be called more than once (at most
we might permit it to be called a second time with a larger value than
was passed the first time), and be called only before calling
end_boot_allocator().

While inspecting all the involved code, a couple of off-by-one issues
were found (and are being corrected here at once):
- arch_init_memory() cleared one too many page table slots
- the highmem_start based invocation of xenheap_max_mfn() passed too
  big a value
- xenheap_max_mfn() calculated the wrong bit count in edge cases

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxenstore: prefer using the character device
Jonathan Creekmore [Thu, 27 Aug 2015 14:04:38 +0000 (09:04 -0500)]
libxenstore: prefer using the character device

With the addition of FMODE_ATOMIC_POS in the Linux 3.14 kernel,
concurrent blocking file accesses to a single open file descriptor can
cause a deadlock trying to grab the file position lock. If a watch has
been set up, causing a read_thread to blocking read on the file
descriptor, then future writes that would cause the background read to
complete will block waiting on the file position lock before they can
execute. This race condition only occurs when libxenstore is accessing
the xenstore daemon through the /proc/xen/xenbus file and not through
the unix domain socket, which is the case when the xenstore daemon is
running as a stub domain or when oxenstored is passed
--disable-socket. Accessing the daemon from the true character device
also does not exhibit this problem.

On Linux, prefer using the character device file over the proc file if
the character device exists.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agopython/xc: add missing Py_DECREF() to fix a memory leak
Zhigang Wang [Fri, 28 Aug 2015 21:35:18 +0000 (17:35 -0400)]
python/xc: add missing Py_DECREF() to fix a memory leak

Python PyList_Append() will increase reference count of the item. We have to
decrease its reference count to let it garbage collected.

We missed the Py_DECREF() for 'cpuinfo_obj' here, thus we have a memory leak.

The memory leak could be easily confirmed by:

  # python
  >>> import xen.lowlevel.xc
  >>> xc = xen.lowlevel.xc.xc()
  >>> for i in range(1000): xc.getcpuinfo(1)

And check the python process memory usage before and after:

  # ps f -o vsize,rss,%mem,size,cmd -p <pid>

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/NUMA: don't account hotplug regions
Jan Beulich [Mon, 31 Aug 2015 11:52:24 +0000 (13:52 +0200)]
x86/NUMA: don't account hotplug regions

... except in cases where they really matter: node_memblk_range[] now
is the only place all regions get stored. nodes[] and NODE_DATA() track
present memory only. This improves the reporting when nodes have
disjoint "normal" and hotplug regions, with the hotplug region sitting
above the highest populated page. In such cases a node's spanned-pages
value (visible in both XEN_SYSCTL_numainfo and 'u' debug key output)
covered all the way up to top of populated memory, giving quite
different a picture from what an otherwise identically configured
system without and hotplug regions would report. Note, however, that
the actual hotplug case (as well as cases of nodes with multiple
disjoint present regions) is still not being handled such that the
reported values would represent how much memory a node really has (but
that can be considered intentional).

Reported-by: Jim Fehlig <jfehlig@suse.com>
This at once makes nodes_cover_memory() no longer consider E820_RAM
regions covered by SRAT hotplug regions.

Also reject self-overlaps with mismatching hotplug flags.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/NUMA: fix setup_node()
Jan Beulich [Mon, 31 Aug 2015 11:51:52 +0000 (13:51 +0200)]
x86/NUMA: fix setup_node()

The function referenced an __initdata object (nodes_found). Since this
being a node mask was more complicated than needed, the variable gets
replaced by a simple counter. Check at once that the count of nodes
doesn't go beyond MAX_NUMNODES.

Also consolidate four printk()s related to the function's use into just
one.

Finally (quite the opposite of the above issue) __init-annotate
nodes_cover_memory().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86: adjustments to memory_add()
Jan Beulich [Mon, 31 Aug 2015 11:50:56 +0000 (13:50 +0200)]
x86: adjustments to memory_add()

The function should clean up after a failed map_pages_to_xen().

Sharing the M2P table with Dom0 needs to happen before adding the new
pages to the heap (so pages handed out by the allocator will be
represented in what a tool stack may need to map).

Avoid the IOMMU mapping loop whenever possible.

Drop a redundant setting of 'ret'.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoQEMU_TAG update
Ian Jackson [Fri, 28 Aug 2015 15:01:55 +0000 (16:01 +0100)]
QEMU_TAG update

9 years agox86/vmx: fix vmx_is_singlestep_supported return value
Tamas K Lengyel [Fri, 28 Aug 2015 10:17:05 +0000 (12:17 +0200)]
x86/vmx: fix vmx_is_singlestep_supported return value

The function supposed to return a boolean but instead it returned
the value 0x8000000 which is the Intel internal flag for MTF. This has
caused various checks using this function to falsely report no MTF
capability.

Signed-off-by: Tamas K Lengyel <tlengyel@novetta.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: Fix installation of man8 pages
Andrew Cooper [Thu, 27 Aug 2015 19:13:16 +0000 (20:13 +0100)]
docs: Fix installation of man8 pages

c/s a430436 "docs: Support for generating man(8) pages" accidentally
failed to update to the install and clean rules for man8 pages, meaning
that c/s 7b21214 "docs: Move xentrace.8 to docs/man/xentrace.pod.8"
caused a packaging regression when it came to xentop.8

To avoid similar bugs in the future, move the generation of the build,
install and clean rules into the manpage metarule.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agodocs: Move xentrace.8 to docs/man/xentrace.pod.8
Andrew Cooper [Wed, 26 Aug 2015 09:15:20 +0000 (09:15 +0000)]
docs: Move xentrace.8 to docs/man/xentrace.pod.8

And transform to POD to match our other manpages.

The content is identical, although the formatting was altered slightly
to conform to more usual manpage layout.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: Support for generating man(8) pages
Andrew Cooper [Wed, 26 Aug 2015 09:15:20 +0000 (09:15 +0000)]
docs: Support for generating man(8) pages

The manpage rules are very repetative, because of the section number being
present in the filenames.

Instead of adding another set of 3 rules, switch to using a metarule to
automate the repetative action.  New rules for different manpage sections can
be added simply by extending the $(foreach)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: Move xentrace_format.1 to docs/man/xentrace_format.pod.1
Andrew Cooper [Wed, 26 Aug 2015 09:15:12 +0000 (09:15 +0000)]
docs: Move xentrace_format.1 to docs/man/xentrace_format.pod.1

And transform to POD to match our other manpages.

The content is identical (other than one correction), although the
layout differs slightly with certain indentation.

As a correction, remove the reference to xentrace_cpusplit(1) which was
removed in c/s 9b9ca98b6ab16, more than 10 years ago!

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: Move xentop.1 to docs/man/xentop.pod.1
Andrew Cooper [Wed, 26 Aug 2015 09:15:20 +0000 (09:15 +0000)]
docs: Move xentop.1 to docs/man/xentop.pod.1

And transform to POD to match our other manpages.

The content is identical, although the layout differs slightly with
certain indentation.

In addition, adjust the MAN{1,5}SRC-y `find` runes to be more general.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: Move misc README's into docs/misc/
Andrew Cooper [Wed, 26 Aug 2015 09:15:20 +0000 (09:15 +0000)]
docs: Move misc README's into docs/misc/

To live with the other documentation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild: use correct qemu emulator binary
Doug Goldstein [Tue, 25 Aug 2015 13:49:49 +0000 (13:49 +0000)]
build: use correct qemu emulator binary

Per http://wiki.qemu.org/ChangeLog/1.0 and the fact that no currently
supported distro ships the x86 system emulator binary as 'qemu', this
changes the default when a user specifies --with-system-qemu without a
PATH to 'qemu-system-i386', otherwise the default results in a
non-functional setup.

[ Reran autogen.sh -iwj ]

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild: fix tarball stubdom build
Wei Liu [Thu, 27 Aug 2015 15:54:01 +0000 (16:54 +0100)]
build: fix tarball stubdom build

When we create a source code tarball, mini-os is extracted to
extras/mini-os directory. When building a source code tarball, we
shouldn't clone mini-os again.

Only clone mini-os when that directory doesn't exist. This fixes tarball
build and doesn't affect non-tarball build.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
9 years agoIOMMU: skip domains without page tables when dumping
Jan Beulich [Thu, 27 Aug 2015 15:40:38 +0000 (17:40 +0200)]
IOMMU: skip domains without page tables when dumping

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/MSI: don't flag non-maskable entries as host-masked
Jan Beulich [Thu, 27 Aug 2015 15:39:37 +0000 (17:39 +0200)]
x86/MSI: don't flag non-maskable entries as host-masked

'M' debug key output looks confusing without this adjustment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoetherboot: Build fix for GCC 5.1.1
Konrad Rzeszutek Wilk [Mon, 24 Aug 2015 19:48:58 +0000 (15:48 -0400)]
etherboot: Build fix for GCC 5.1.1

Specificially we are pulling in the upstream patch (commit
1b56452121672e6408c38ac8926bdd6998a39004)):
[ath9k] Remove confusing logic inversion in an ANI variable

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>