]> xenbits.xensource.com Git - people/julieng/xen-unstable.git/log
people/julieng/xen-unstable.git
9 years agolibxc: allow creating domains without emulated devices
Roger Pau Monné [Tue, 15 Dec 2015 13:12:18 +0000 (14:12 +0100)]
libxc: allow creating domains without emulated devices

Introduce a new flag in xc_dom_image that turns on and off the emulated
devices. This prevents creating the VGA hole, the hvm_info page and the
ioreq server pages. libxl unconditionally sets it to true for all HVM
domains at the moment.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: allow disabling all emulated devices inside of Xen
Roger Pau Monné [Tue, 15 Dec 2015 13:11:49 +0000 (14:11 +0100)]
x86: allow disabling all emulated devices inside of Xen

Only allow enabling or disabling all the emulated devices inside of Xen,
right now Xen doesn't support enabling specific emulated devices only.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: set the vPMU interface based on the presence of a lapic
Roger Pau Monné [Tue, 15 Dec 2015 13:11:11 +0000 (14:11 +0100)]
x86: set the vPMU interface based on the presence of a lapic

Instead of choosing the interface to expose to guests based on the guest
type, do it based on whether the guest has an emulated local apic or not.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agoxen: arm: Drop trailing ; from DEFINE_XEN_GUEST_HANDLE
Ian Campbell [Mon, 14 Dec 2015 16:21:31 +0000 (16:21 +0000)]
xen: arm: Drop trailing ; from DEFINE_XEN_GUEST_HANDLE

This is always present at the point of use, which with -pedantic
provokes:

error: ISO C does not allow extra ';' outside of a function [-Werror=edantic]

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agolibxl: re-implement libxl__xs_printf()
Paul Durrant [Tue, 1 Dec 2015 13:55:25 +0000 (13:55 +0000)]
libxl: re-implement libxl__xs_printf()

This patch adds a new libxl__xs_vprintf() which actually checks the
success of the underlying call to xs_write() (logging if it fails) and
then re-implements libxl__xs_printf() using this (and replacing the
call to vasprintf() with a call to libxl__vsprintf()).

libxl__xs_vprintf() is added to the 'checked' section of libxl_internal.h
and, since it now underpins libxl__xs_printf(), that declaration is
moved into the same section.

Looking at call sites of libxl__xs_printf() it seems as though several
of them expected a failure if the underlying xs_write() failed, so this
patch should actually fulfil the semantic that was intended all along.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: re-name libxl__xs_write() to libxl__xs_printf()...
Paul Durrant [Tue, 1 Dec 2015 13:55:24 +0000 (13:55 +0000)]
libxl: re-name libxl__xs_write() to libxl__xs_printf()...

...to denote what it actually does.

The name libxl__xs_write() suggests something taking a buffer and length,
akin to write(2), whereas the semantics of the function are actually more
akin to printf(3).

This patch is a textual substitution of libxl__xs_write with
libxl__xs_printf with some associated formatting fixes.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: p2m: Remove translation table when it's empty
Julien Grall [Tue, 1 Dec 2015 17:52:12 +0000 (17:52 +0000)]
xen/arm: p2m: Remove translation table when it's empty

Currently, the translation table is left in place even if no entries
are in use. Because of how the p2m code has been implemented,
replacing a translation table by a block (i.e superpage) is not
supported. Therefore, any remapping of a superpage size will be split
in smaller chunks making the translation less efficient.

Replacing a table by a block when a new mapping is added would be too
complicated because it requires us to check if all the upper levels
are not in use and free them if necessary.

Instead, we will remove the empty translation table when mappings are
removed. To avoid going through all the table checking if no entry is
in use, a counter representing the number of entry currently in use is
kept per table translation and updated when an entry changes state
(i.e valid <-> invalid).

As Xen allocates a page for each translation table, it's possible to
store the counter in the struct page_info. A new field p2m_refcount
has been introduced in the in use union for this purpose. This is fine
as the page is only used by the P2M code and nobody touches the other
field of the union type_info.

For the record, type_info has not been used because it would require
more work to use it properly as Xen on ARM doesn't yet have the
concept of type.

Once Xen has finished removing a mapping and all the references to
each translation table have been updated, then the higher levels will
be processed and freed as needed. This will allow us to propagate the
number of references and free multiple translation table at different
level in one go.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- updated commit message as discussed ]

9 years agoxen/arm: p2m: Introduce a helper to remove an entry in the page table
Julien Grall [Tue, 1 Dec 2015 17:52:11 +0000 (17:52 +0000)]
xen/arm: p2m: Introduce a helper to remove an entry in the page table

Factorize the code to remove an entry in p2m_remove_pte so we can re-use
it later.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: p2m: Store the page for each mapping
Julien Grall [Tue, 1 Dec 2015 17:52:10 +0000 (17:52 +0000)]
xen/arm: p2m: Store the page for each mapping

The page will be use later for reference counting. So we need a quick
access to the page associated to the mapping.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: p2m: Flush for every exit paths in apply_p2m_changes
Julien Grall [Tue, 1 Dec 2015 17:52:09 +0000 (17:52 +0000)]
xen/arm: p2m: Flush for every exit paths in apply_p2m_changes

Currently, the TLB is not flushed if an error occured while updating the
stage-2 p2m. However, the TLB will contain stale mappings for any entry
updated so far.

To avoid a such situation, flush on every exit path when the variable
"flush" is set.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoVT-d: Correct order of parameters to memset() in setup_posted_irte()
Andrew Cooper [Thu, 10 Dec 2015 16:25:18 +0000 (17:25 +0100)]
VT-d: Correct order of parameters to memset() in setup_posted_irte()

Introduced in c/s 83ea9229 "vt-d: add API to update IRTE when VT-d PI is
used".  Spotted by Coverity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agosched: fix (ACPI S3) resume with cpupools with different schedulers
Dario Faggioli [Thu, 10 Dec 2015 16:24:51 +0000 (17:24 +0100)]
sched: fix (ACPI S3) resume with cpupools with different schedulers

In fact, with 2 cpupools, one (the default) Credit and
one Credit2 (with at least 1 pCPU in the latter), trying
a (e.g., ACPI S3) suspend/resume crashes like this:

(XEN) [  150.587779] ----[ Xen-4.7-unstable  x86_64  debug=y  Not tainted ]----
(XEN) [  150.587783] CPU:    6
(XEN) [  150.587786] RIP:    e008:[<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
(XEN) [  150.587796] RFLAGS: 0000000000010086   CONTEXT: hypervisor
(XEN) [  150.587801] rax: ffff83031fa3c020   rbx: ffff830322c1b4b0   rcx: 0000000000000000
(XEN) [  150.587806] rdx: ffff83031fa78000   rsi: 000000000000000a   rdi: ffff82d0802a9788
(XEN) [  150.587811] rbp: ffff83031fa7fe20   rsp: ffff83031fa7fd30   r8:  ffff83031fa80000
(XEN) [  150.587815] r9:  0000000000000006   r10: 000000000008f7f2   r11: 0000000000000006
(XEN) [  150.587819] r12: ffff8300dbdf3000   r13: ffff830322c1b4b0   r14: 0000000000000006
(XEN) [  150.587823] r15: 0000000000000000   cr0: 000000008005003b   cr4: 00000000000026e0
(XEN) [  150.587827] cr3: 00000000dbaa8000   cr2: 0000000000000000
(XEN) [  150.587830] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
(XEN) [  150.587835] Xen stack trace from rsp=ffff83031fa7fd30:
... ... ...
(XEN) [  150.587962] Xen call trace:
(XEN) [  150.587966]    [<ffff82d080123a10>] sched_credit.c#csched_schedule+0xf2/0xc3d
(XEN) [  150.587974]    [<ffff82d08012a98b>] schedule.c#schedule+0x128/0x635
(XEN) [  150.587979]    [<ffff82d08012dc16>] softirq.c#__do_softirq+0x82/0x8d
(XEN) [  150.587983]    [<ffff82d08012dc6e>] do_softirq+0x13/0x15
(XEN) [  150.587988]    [<ffff82d080162ddd>] domain.c#idle_loop+0x5b/0x6b
(XEN) [  151.272182]
(XEN) [  151.274174] ****************************************
(XEN) [  151.279624] Panic on CPU 6:
(XEN) [  151.282915] Xen BUG at sched_credit.c:655
(XEN) [  151.287415] ****************************************

During suspend, the pCPUs are not removed from their
pools with the standard procedure (which would involve
schedule_cpu_switch(). During resume, they:
 1) are assigned to the default cpupool (CPU_UP_PREPARE
    phase);
 2) are moved to the pool they were in before suspend,
    via schedule_cpu_switch() (CPU_ONLINE phase)

During resume, scheduling (even if just the idle loop)
can happen right after the CPU_STARTING phase(before
CPU_ONLINE), i.e., before the pCPU is put back in its
pool. In this case, it is the default pool'sscheduler
that is invoked (Credit1, in the example above). But,
during suspend, the Credit2 specific vCPU data is not
being freed, and Credit1 specific vCPU data is not
allocated, during resume.

Therefore, Credit1 schedules on pCPUs whose idle vCPU's
sched_priv points to Credit2 vCPU data, and we crash.

Fix things by properly deallocating scheduler specific
data of the pCPU's pool scheduler during pCPU teardown,
and re-allocating them --always for &ops-- during pCPU
bringup.

This also fixes another (latent) bug. In fact, it avoids,
still in schedule_cpu_switch(), that Credit1's free_vdata()
is used to deallocate data allocated with Credit2's
alloc_vdata(). This is not easy to trigger, but only
because the other bug shown above manifests first and
crashes the host.

The downside of this patch, is that it adds one more
allocation on the resume path, which is not ideal. Still,
there is no better way of fixing the described bugs at
the moment. Removing (all ideally) allocations happening
during resume should continue being chased, in the long
run.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agolibxl: update check-xl-disk-parse
Wei Liu [Wed, 9 Dec 2015 10:43:36 +0000 (10:43 +0000)]
libxl: update check-xl-disk-parse

The block-attach command now returns 1 when fails. Update first test
case to expect return value 1 instead of 255.

The parser now doesn't generate output for default values. Remove them
from expected output.

According to 417e6b70 ("libxl: add option for discard support to xl disk
configuration"), the "discard=" variant is never supported, delete two
test cases with that variant.

Reported-by: Jim Fehlig <jfehlig@suse.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Jim Fehlig <jfehlig@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoVT-d: make flush-all actually flush all
Jan Beulich [Thu, 10 Dec 2015 12:17:49 +0000 (13:17 +0100)]
VT-d: make flush-all actually flush all

Passing gfn=0 and page_count=0 actually avoids the
iommu_flush_iotlb_dsi() and results in page-specific invalidation
instead.

Reported-by: "张智" <zhangzhi2014@caep.cn>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Feng Wu <feng.wu@intel.com>
9 years agox86: re-enable NX if disabled
Jan Beulich [Thu, 10 Dec 2015 12:17:21 +0000 (13:17 +0100)]
x86: re-enable NX if disabled

I noticed Linux 4.4 doing this universally now, and I think it's a good
idea to override such anti-security BIOS settings (we certainly have no
compatibility problem due to NX being enabled).

Secondary changes:
- no need to check supported extended CPUID level for leaves 80000000
  and 80000001 (required on x86-64)
- no need to update c->cpuid_level in early_init_intel() (done anyway
  in generic_identify())
- alignment of trampoline data items

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/hvm: loosen up the ASSERT in hvm_cr4_guest_reserved_bits and hvm_efer_valid
Roger Pau Monné [Thu, 10 Dec 2015 12:16:15 +0000 (13:16 +0100)]
x86/hvm: loosen up the ASSERT in hvm_cr4_guest_reserved_bits and hvm_efer_valid

Loosen up the condition so we make sure that the current vcpu belongs to the
same domain.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/VPMU: support only versions 2 through 4 of architectural performance monitoring
Boris Ostrovsky [Thu, 10 Dec 2015 12:15:35 +0000 (13:15 +0100)]
x86/VPMU: support only versions 2 through 4 of architectural performance monitoring

We need to have at least version 2 since it's the first version to
support various control and status registers (such as
MSR_CORE_PERF_GLOBAL_CTRL) that VPMU relies on always having.

We don't fully emulate version 4 but since it's back compatible with
earlier versions we can fall back to v3. At this point there is no
compatibility statement for v5 so anything above 4 is not supported.

For guests querying PMU version via CPUID leaf 0xa clip it at v3.

With explicit testing for PMU version we can now remove CPUID model
check.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86: fixup IRQs when CPUs go down during shutdown
Ross Lagerwall [Thu, 10 Dec 2015 12:14:53 +0000 (13:14 +0100)]
x86: fixup IRQs when CPUs go down during shutdown

Commit fc0c3fa2ad5c ("x86/IO-APIC: fix setup of Xen internally used IRQs
(take 2)") introduced a regression on some hardware where Xen would hang
during shutdown, repeating the following message:
APIC error on CPU0: 08(08), Receive accept error

This appears to be because an interrupt (in this case from the serial
console) destined for a CPU other than the boot CPU is left unhandled so
an APIC error on CPU 0 is generated instead.

To fix this, before taking down the non-boot CPUs, call fixup_irqs()
with a CPU mask of only the boot CPU to reset the IRQ affinities
correctly.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agovmx: properly handle notification event when vCPU is running
Feng Wu [Thu, 10 Dec 2015 12:14:04 +0000 (13:14 +0100)]
vmx: properly handle notification event when vCPU is running

When a vCPU is running in Root mode and a notification event
has been injected to it. we need to set VCPU_KICK_SOFTIRQ for
the current cpu, so the pending interrupt in PIRR will be
synced to vIRR before VM-Exit in time.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agopass-through: update IRTE according to guest interrupt config changes
Feng Wu [Thu, 10 Dec 2015 12:13:33 +0000 (13:13 +0100)]
pass-through: update IRTE according to guest interrupt config changes

When guest changes its interrupt configuration (such as, vector, etc.)
for direct-assigned devices, we need to update the associated IRTE
with the new guest vector, so external interrupts from the assigned
devices can be injected to guests without VM-Exit.

For lowest-priority interrupts, we use vector-hashing mechamisn to find
the destination vCPU. This follows the hardware behavior, since modern
Intel CPUs use vector hashing to handle the lowest-priority interrupt.

For multicast/broadcast vCPU, we cannot handle it via interrupt posting,
still use interrupt remapping.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
9 years agovt-d: add API to update IRTE when VT-d PI is used
Feng Wu [Thu, 10 Dec 2015 12:13:01 +0000 (13:13 +0100)]
vt-d: add API to update IRTE when VT-d PI is used

This patch adds an API which is used to update the IRTE
for posted-interrupt when guest changes MSI/MSI-X information.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agovmx: suppress posting interrupts when 'SN' is set
Feng Wu [Thu, 10 Dec 2015 12:12:06 +0000 (13:12 +0100)]
vmx: suppress posting interrupts when 'SN' is set

Currently, we don't support urgent interrupt, all interrupts
are recognized as non-urgent interrupt, so we cannot send
posted-interrupt when 'SN' is set.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: <kevin.tian@intel.com>
9 years agoVT-d Posted-intterrupt (PI) design
Feng Wu [Thu, 10 Dec 2015 12:11:25 +0000 (13:11 +0100)]
VT-d Posted-intterrupt (PI) design

Add the design doc for VT-d PI.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoRevert "tools: Refactor "xentoollog" into its own library"
Ian Campbell [Thu, 10 Dec 2015 10:21:34 +0000 (10:21 +0000)]
Revert "tools: Refactor "xentoollog" into its own library"

This reverts commit c7d3afbb44b47af9103be0b914afd588a84d9e62 which
broke the libvirt build, since libvirt uses xtl_* and hence needs
updating to link against the new library when necessary.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
9 years agomemory: fix XSA-158 fix
Jan Beulich [Wed, 9 Dec 2015 12:53:13 +0000 (13:53 +0100)]
memory: fix XSA-158 fix

For one the uses of domu_max_order and ptdom_max_order were swapped.

And then gcc warns about an unused result of a __must_check function
in the control part of a conditional expression when both other
expressions can be determined by the compiler to produce the same value
(see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68039), which happens
when HAS_PASSTHROUGH is undefined (i.e. for ARM on 4.4 and older).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools: Refactor "xentoollog" into its own library
Ian Campbell [Thu, 3 Dec 2015 11:22:02 +0000 (11:22 +0000)]
tools: Refactor "xentoollog" into its own library

In attempting to disaggregate libxenctrl I found that many of the
pieces were going to want access to this library, so split it out (as
it probably should always have been).

Various build adjustments are needed. In particular things which use
xtl_* themselves now need to explicity link against the library.

This has a nice side effect which is that users of libxl no longer
need to link against libxenctrl just to create a logger, which was
counter to the principal that applications using libxl shouldn't be
required to look behind the curtain. This means that xl no longer
links against libxenctrl.

The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- Update QEMU_TRADITIONAL_REVISION and MINIOS_UPSTREAM_REVISION ]

9 years agotools/Rules.mk: Properly handle libraries with recursive dependencies.
Ian Campbell [Thu, 3 Dec 2015 11:22:01 +0000 (11:22 +0000)]
tools/Rules.mk: Properly handle libraries with recursive dependencies.

In tree libraries which link against other in tree libraries in a way
which is opaque to their callers need special handling, specifically
correct use of -Wl,-rpath-link for the recusively used libraries.

Currently this is rather simple, but up coming changes are going to
introduce transitive dependencies more than 1 step deep.

Introduce a SHDEPS idiom to contain all the recursive deps for a
library and include those in both LDLIBS (for linking) and SHLIB (for
recursive uses).

Try and document the whole thing.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/ocaml: simplify compile/link of test apps
Ian Campbell [Thu, 3 Dec 2015 11:22:00 +0000 (11:22 +0000)]
tools/ocaml: simplify compile/link of test apps

xtl doesn't require the full LDLIBS_libxenctrl, just the -L and
xenlight.cmxa, the latter which contains LDLIBS_libxenctrl as needed.
Fixing this avoids the need to be concerned about LDLIBS_libxenctrl
becoming more than one word in the future.

Since the tests are pure ocaml (no C components) CFLAGS and
LIBS_xenlight are not required.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: David Scott <dave@recoil.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: David Scott <dave@recoil.org>
9 years agomce-test: do not include libxenguest internal headers
Ian Campbell [Thu, 3 Dec 2015 11:21:59 +0000 (11:21 +0000)]
mce-test: do not include libxenguest internal headers

As far as I can tell there is no requirement for these and it builds
fine without them.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoQEMU_TAG update
Ian Jackson [Wed, 9 Dec 2015 11:48:27 +0000 (11:48 +0000)]
QEMU_TAG update

9 years agolibxc: try to find last used pfn when migrating
Juergen Gross [Wed, 2 Dec 2015 07:42:17 +0000 (08:42 +0100)]
libxc: try to find last used pfn when migrating

For migration the last used pfn of a guest is needed to size the
logdirty bitmap and as an upper bound of the page loop. Unfortunately
there are pv-kernels advertising a much higher maximum pfn as they
are really using in order to support memory hotplug. This will lead
to allocation of much more memory in Xen tools during migration as
really needed.

Try to find the last used guest pfn of a pv-domu by scanning the p2m
tree from the last entry towards it's start and search for an entry
not being invalid.

Normally the mid pages of the p2m tree containing all invalid entries
are being reused, so we can just scan the top page for identical
entries and skip them but the first one.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[ ijc -- added errno = E2BIG to one error path ]
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoFix regression in xendomains initscript: test for privcmd char device
Sander Eikelenboom [Tue, 8 Dec 2015 15:07:03 +0000 (16:07 +0100)]
Fix regression in xendomains initscript: test for privcmd char device

Since commit:
"xendomains initscript: test for privcmd char device"
(1367e9e5ba4d1612e303123ec0bbf961100fcfa1)
due to incorrect negation the xendomains initscript bails out
early when both: "/dev/xen/privcmd" and "/proc/xen/privcmd"
are present in dom0.

Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libxc: Identify problematic file in error messages
Andrew Cooper [Mon, 7 Dec 2015 13:09:08 +0000 (13:09 +0000)]
tools/libxc: Identify problematic file in error messages

Error messages along the lines of:

   xc: error: panic: xc_dom_core.c:207: failed to open file: No such file or
   directory: Internal error

are of very little use.

Include the filename in the error messages, so the user does not have to
resort to debug level logging to identify the problem.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: Introduce a template for devices with a controller
George Dunlap [Tue, 1 Dec 2015 12:09:58 +0000 (12:09 +0000)]
libxl: Introduce a template for devices with a controller

We have several outstanding patch series which add devices that have
two levels: a controller and individual devices attached to that
controller.

In the interest of consistency, this patch introduces a section that
sketches out a template for interfaces for such devices.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Chun Yan Liu <cyliu@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: Fix bootloader-related virtual memory leak on pv build failure
Ian Jackson [Wed, 18 Nov 2015 15:34:54 +0000 (15:34 +0000)]
libxl: Fix bootloader-related virtual memory leak on pv build failure

The bootloader may call libxl__file_reference_map(), which mmap's the
pv_kernel and pv_ramdisk into process memory.  This was only unmapped,
however, on the success path of libxl__build_pv().  If there were a
failure anywhere between libxl_bootloader.c:parse_bootloader_result()
and the end of libxl__build_pv(), the calls to
libxl__file_reference_unmap() would be skipped, leaking the mapped
virtual memory.

Ideally this would be fixed by adding the unmap calls to the
destruction path for libxl__domain_build_state.  Unfortunately the
lifetime of the libxl__domain_build_state is opaque, and it doesn't
have a proper destruction path.  But, the only thing in it that isn't
from the gc are these bootloader references, and they are only ever
set for one libxl__domain_build_state, the one which is
libxl__domain_create_state.build_state.

So we can clean up in the exit path from libxl__domain_create_*, which
always comes through domcreate_complete.

Remove the now-redundant unmaps in libxl__build_pv's success path.

This is XSA-160.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agomemory: fix XENMEM_exchange error handling
Jan Beulich [Tue, 8 Dec 2015 13:01:43 +0000 (14:01 +0100)]
memory: fix XENMEM_exchange error handling

assign_pages() can fail due to the domain getting killed in parallel,
which should not result in a hypervisor crash.

Reported-by: Julien Grall <julien.grall@citrix.com>
Also delete a redundant put_gfn() - all relevant paths leading to the
"fail" label already do this (and there are also paths where it was
plain wrong). All of the put_gfn()-s got introduced by 51032ca058
("Modify naming of queries into the p2m"), including the otherwise
unneeded initializer for k (with even a kind of misleading comment -
the compiler warning could actually have served as a hint that the use
is wrong).

This is CVE-2015-8339 + CVE-2015-8340 / XSA-159.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agomemory: split and tighten maximum order permitted in memops
Jan Beulich [Tue, 8 Dec 2015 13:00:33 +0000 (14:00 +0100)]
memory: split and tighten maximum order permitted in memops

Introduce and enforce separate limits for ordinary DomU, DomU with
pass-through device(s), control domain, and hardware domain.

The DomU defaults were determined based on what so far was allowed by
multipage_allocation_permitted().

The x86 hwdom default was chosen based on linux-2.6.18-xen.hg c/s
1102:82782f1361a9 indicating 2Mb is not enough, plus some slack.

The ARM hwdom default was chosen to allow 2Mb (order-9) mappings, plus
a little bit of slack.

This is CVE-2015-8338 / XSA-158.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/time: fix domain type check in tsc_set_info()
Haozhong Zhang [Tue, 8 Dec 2015 08:46:30 +0000 (09:46 +0100)]
x86/time: fix domain type check in tsc_set_info()

Replace is_hvm_domain() in tsc_set_info() by has_hvm_container_domain()
to keep consistent with other domain type checks in tsc_set_info().

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
9 years agosvm: fix incorrect TSC scaling
Haozhong Zhang [Tue, 8 Dec 2015 08:46:12 +0000 (09:46 +0100)]
svm: fix incorrect TSC scaling

SVM TSC ratio is incorrectly used in the current
svm_get_tsc_offset(). This patch replaces the scaling logic in
svm_get_tsc_offset() with a correct implementation.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
9 years agox86: refine nr_sockets calculation
Jan Beulich [Tue, 8 Dec 2015 08:45:29 +0000 (09:45 +0100)]
x86: refine nr_sockets calculation

The previous variant didn't work for non-contiguous socket numbers.

Reported-by: Ed Swierk <eswierk@skyportsystems.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Ed Swierk <eswierk@skyportsystems.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoRevert "vVMX: use latched VMCS machine address"
Jan Beulich [Tue, 8 Dec 2015 08:43:59 +0000 (09:43 +0100)]
Revert "vVMX: use latched VMCS machine address"

This reverts commit d02e84b9d9d16b6b56186f0dfdcb3c90b83c82a3,
causing a regression on some systems.

9 years agox86/libxc: add an arch domain config parameter to xc_domain_create
Roger Pau Monne [Fri, 13 Nov 2015 11:05:51 +0000 (12:05 +0100)]
x86/libxc: add an arch domain config parameter to xc_domain_create

With the addition of HVMlite the hypervisor now always requires a non-null
arch domain config, which is different between HVM and PV guests.

Add a new parameter to xc_domain_create that contains a pointer to an arch
domain config. If the pointer is null, create a default arch domain config
based on guest type.

Fix all the in-tree callers to provide a null arch domain config in order to
mimic previous behaviour.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agobuild: fix clean to remove all *.o and .*.d files
Jonathan Creekmore [Thu, 3 Dec 2015 14:35:09 +0000 (15:35 +0100)]
build: fix clean to remove all *.o and .*.d files

In commit 8b6ef9c152edceabecc7f90c811cd538a7b7a110, several files in
xen/common/compat were changed to be built using the Makefile in
xen/common, by appending the compat prefix to the object
files. Additionally, the xen/common/compat directory was removed from
the subdirs-y variable, so it is no longer visited by the clean
rule. This resulted in some object files and dependency files being
generated by inclusion into obj-y, but not cleaned because they lived in a
directory that was unvisited by the clean rules.

Since there is a desire for all of the object files and dependency files
to be cleaned, just search for all objects and dependency files and
delete them on clean. The previous method of only tracking with the
$(DEPS) and *.o in the clean rules had the disadvantage that, if the
configuration changed between a build and a clean, some of the
dependencies or objects could get left behind. This method does not have
the same disadvantage.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
[dropped removal of *.o and $(DEPS) from xen/Rules.mk's clean rule]
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: __{cpu,dev}initdata drop follow-up
Jan Beulich [Thu, 3 Dec 2015 14:34:41 +0000 (15:34 +0100)]
x86: __{cpu,dev}initdata drop follow-up

While reviewing those patches I noticed a few types that could do with
tweaking.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: make sure the HVM callback vector is correctly set
Roger Pau Monné [Thu, 3 Dec 2015 14:33:40 +0000 (15:33 +0100)]
x86: make sure the HVM callback vector is correctly set

If certain devices (like the local or the io apic) are disabled some modes
of operation of the HVM event channel callback cannot be used. Make sure Xen
doesn't try to setup them.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoVT-d: drop unneeded Ivybridge quirk workaround
Jan Beulich [Thu, 3 Dec 2015 14:33:10 +0000 (15:33 +0100)]
VT-d: drop unneeded Ivybridge quirk workaround

We've been told by Intel that server chipsets don't need the workaround
anymore starting with Ivybridge (Xeon E5/E7 v2); the second half of the
workaround was missing anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/PCI: make all config space writes subject to XSM checking
Jan Beulich [Thu, 3 Dec 2015 14:32:30 +0000 (15:32 +0100)]
x86/PCI: make all config space writes subject to XSM checking

Now that we intercept them all, there's no reason not to also uniformly
hand them to XSM. Reads (which are expected to be of less interest) get
handled as before (MMCFG accesses un-audited).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agolibxc: do proper return code checking of allocator in domain builder
Juergen Gross [Tue, 1 Dec 2015 17:14:54 +0000 (18:14 +0100)]
libxc: do proper return code checking of allocator in domain builder

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: replace INVALID_P2M_ENTRY by INVALID_PFN
Juergen Gross [Tue, 1 Dec 2015 17:14:53 +0000 (18:14 +0100)]
libxc: replace INVALID_P2M_ENTRY by INVALID_PFN

INVALID_P2M_ENTRY is defined as (xen_pfn_t)-1 and is often used
according to it's type for an invalid pfn. Change the name of the
macro to INVALID_PFN.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxendomains initscript: test for privcmd char device
Doug Goldstein [Tue, 1 Dec 2015 19:27:55 +0000 (13:27 -0600)]
xendomains initscript: test for privcmd char device

Allow the init script to continue if either the character device or the
proc file is available.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools: update outdated header comment on privcmd.h
Doug Goldstein [Tue, 1 Dec 2015 19:27:54 +0000 (13:27 -0600)]
tools: update outdated header comment on privcmd.h

The BSDs have always accessed privcmd via /dev/xen/privcmd while Linux
has used /proc/xen/privcmd but things are shifting to /dev/xen/privcmd
as well.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxc: prefer using privcmd character device
Doug Goldstein [Tue, 1 Dec 2015 19:27:53 +0000 (13:27 -0600)]
libxc: prefer using privcmd character device

Prefer using the character device over the proc file if the character
device exists. This follows similar conversions of xenbus to avoid
issues with FMODE_ATOMIC_POS added in Linux 3.14 and newer.

CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: Ian Campbell <ian.campbell@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/build: disable default built-in rules and variables
Doug Goldstein [Wed, 2 Dec 2015 14:22:56 +0000 (15:22 +0100)]
xen/build: disable default built-in rules and variables

Disable the built-in rules and variables from GNU make to improve
build performance and avoid awkward corner cases with the built-in
rules. Currently none of the implicit rules are used but this is helpful
to do when developing changes to the build system.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoMAINTAINERS: restore original maintainership of arch VPMU files
Boris Ostrovsky [Wed, 2 Dec 2015 14:22:39 +0000 (15:22 +0100)]
MAINTAINERS: restore original maintainership of arch VPMU files

It was lost when vpmu* files were moved from xen/arch/x86/hvm/{vmx|svm}/ to
xen/arch/x86/cpu/

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
9 years agoevtchn: don't reuse ports that are still "busy"
David Vrabel [Wed, 2 Dec 2015 14:21:46 +0000 (15:21 +0100)]
evtchn: don't reuse ports that are still "busy"

When using the FIFO ABI a guest may close an event channel that is
still LINKED.  If this port is reused, subsequent events may be lost
because they may become pending on the wrong queue.

This could be fixed by requiring guests to only close event channels
that are not linked.  This is difficult since: a) irq cleanup in the
guest may be done in a context that cannot wait for the event to be
unlinked; b) the guest may attempt to rebind a PIRQ whose previous
close is still pending; and c) existing guests already have the
problematic behaviour.

Instead, simply check a port is not "busy" (i.e., it's not linked)
before reusing it.

Guests should still drain any queues for VCPUs that are being
offlined, or the port will become unusable until the VCPU is onlined
and starts processing events again.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/HVM: XSETBV intercept needs to check CPL on SVM only
Jan Beulich [Wed, 2 Dec 2015 14:21:15 +0000 (15:21 +0100)]
x86/HVM: XSETBV intercept needs to check CPL on SVM only

VMX doesn't need a software CPL check on the XSETBV intercept, and
SVM can do that check without resorting to hvm_get_segment_register().

Clean up what is left of hvm_handle_xsetbv(), namely make it return a
proper error code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/vmx: enable PML by default
Kai Huang [Wed, 2 Dec 2015 14:20:19 +0000 (15:20 +0100)]
x86/vmx: enable PML by default

Since PML series were merged (but disabled by default) we have conducted lots of
PML tests (live migration, GUI display) and PML has been working fine, therefore
turn it on by default.

Document of PML command line is adjusted accordingly as well.

Signed-off-by: Kai Huang <kai.huang@linux.intel.com>
Tested-by: Robert Hu <robert.hu@intel.com>
Tested-by: Xudong Hao <xudong.hao@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/ept: remove unnecessary sync after resolving misconfigured entries
David Vrabel [Wed, 2 Dec 2015 14:19:53 +0000 (15:19 +0100)]
x86/ept: remove unnecessary sync after resolving misconfigured entries

When using EPT, type changes are done with the following steps:

1. Set entry as invalid (misconfigured) by settings a reserved memory
type.

2. Flush all EPT and combined translations (ept_sync_domain()).

3. Fixup misconfigured entries as required (on EPT_MISCONFIG vmexits or
when explicitly setting an entry.

Since resolve_misconfig() only updates entries that were misconfigured,
there is no need to invalidate any translations since the hardware
does not cache misconfigured translations (vol 3, section 28.3.2).

Remove the unnecessary (and very expensive) ept_sync_domain() calls).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agolibxc: refactor memory allocation functions
Wei Liu [Tue, 1 Dec 2015 11:39:16 +0000 (11:39 +0000)]
libxc: refactor memory allocation functions

There were some problems with the original memory allocation functions:
1. xc_dom_alloc_segment and xc_dom_alloc_pad ended up calling
   xc_dom_chk_alloc_pages while xc_dom_alloc_page open-coded everything.
2. xc_dom_alloc_pad didn't call dom->allocate.

Refactor the code so that:
1. xc_dom_alloc_{segment,pad,page} end up calling
   xc_dom_chk_alloc_pages.
2. xc_dom_chk_alloc_pages calls dom->allocate.

This way we avoid scattering dom->allocate over multiple locations and
open-coding.

Also change the return type of xc_dom_alloc_page to xen_pfn_t and return
an invalid pfn when xc_dom_chk_alloc_pages fails.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Tested-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: correct domain builder for 64 bit guest with 32 bit tools
Juergen Gross [Tue, 1 Dec 2015 07:49:49 +0000 (08:49 +0100)]
libxc: correct domain builder for 64 bit guest with 32 bit tools

Commit 8c45adec18e0512c3d34dcafb13414ecba21be6a ("create unmapped
initrd in domain builder if supported") introduced an error for
building a 64 bit guest with a 32 bit toolset.

The initrd start address and size where stored in an unsigned long
instead of using a 64 bit type.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: use correct return type for do_memory_op()
Juergen Gross [Fri, 27 Nov 2015 09:00:51 +0000 (10:00 +0100)]
libxc: use correct return type for do_memory_op()

Currently do_memory_op() is returning int, while the hypervisor is
returning long. This will lead to wrong return informations as soon as
e.g. a pfn larger than about 2 billion (8 TB) is returned.

Use the correct long return type instead and correct the functions
expecting a pfn via the return value of do_memory_op().

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoocaml/xc: add softreset shutdown reason
Wei Liu [Mon, 16 Nov 2015 12:43:19 +0000 (12:43 +0000)]
ocaml/xc: add softreset shutdown reason

According to public/sched.h, there is a new shutdown_reason called
soft_reset. Propagate that value to ocaml.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave@recoil.org>
9 years agolibxl: implement libxl__xs_mknod using XS_WRITE rather than XS_MKDIR
Paul Durrant [Wed, 25 Nov 2015 14:51:00 +0000 (14:51 +0000)]
libxl: implement libxl__xs_mknod using XS_WRITE rather than XS_MKDIR

This patch modifies the implentation of libxl__xs_mknod() to use XS_WRITE
rather than XS_MKDIR since passing an empty value to the former will
ensure that the path is both existent and empty upon return, rather than
merely existent. The function return type is also changed to a libxl
error value rather than a boolean, it's declaration is accordingly moved
into the 'checked' section in libxl_internal.h, and a comment is added to
clarify its semantics.

This patch also contains as small whitespace fix in the definition of
libxl__xs_mknod() and the addition of 'ok' to CODING_STYLE as the
canonical variable name for holding return values from boolean functions.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxl: replace libxl__xs_mkdir() with libxl__xs_mknod()
Paul Durrant [Wed, 25 Nov 2015 14:50:59 +0000 (14:50 +0000)]
libxl: replace libxl__xs_mkdir() with libxl__xs_mknod()

This patch is purely cosmetic, it contains no functional change. A
change in the implementation of libxl__xs_mknod() will be made in a
subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
9 years agomwait_idle: Skylake Client Support
Len Brown [Mon, 30 Nov 2015 11:02:22 +0000 (12:02 +0100)]
mwait_idle: Skylake Client Support

Skylake Client CPU idle Power states (C-states)
are similar to the previous generation, Broadwell.
However, Skylake does get its own table with updated
worst-case latency and average energy-break-even residency values.

Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit 493f133f47750aa5566fafa9403617e3f0506f8c]

mwait_idle: Skylake Client Support - updated

Addition of PC9 state, and minor tweaks to existing PC6 and PC8 states.

Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit 135919a3a80565070b9645009e65f73e72c661c0]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agodrop unused __devexit{,data} and CONFIG_HOTPLUG
Andrew Cooper [Mon, 30 Nov 2015 11:01:21 +0000 (12:01 +0100)]
drop unused __devexit{,data} and CONFIG_HOTPLUG

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Also CONFIG_HOTPLUG_CPU.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agodrop empty __devinit annotation, and aliased __pminit
Andrew Cooper [Mon, 30 Nov 2015 11:00:53 +0000 (12:00 +0100)]
drop empty __devinit annotation, and aliased __pminit

x86 is the only architecture which uses __devinit, and also has CONFIG_HOTPLUG
enabled, making the annotation empty.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agodrop empty __devinitdata annotation
Andrew Cooper [Mon, 30 Nov 2015 10:58:09 +0000 (11:58 +0100)]
drop empty __devinitdata annotation

x86 is the only architecture which uses __devinitdata, and also has
CONFIG_HOTPLUG enabled, making the annotation empty.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agodrop empty __cpuinitdata annotation
Andrew Cooper [Mon, 30 Nov 2015 10:57:34 +0000 (11:57 +0100)]
drop empty __cpuinitdata annotation

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agodrop unused fastcall annotation
Andrew Cooper [Mon, 30 Nov 2015 10:57:04 +0000 (11:57 +0100)]
drop unused fastcall annotation

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: properly macroize the two XRSTOR flavors
Jan Beulich [Mon, 30 Nov 2015 10:56:20 +0000 (11:56 +0100)]
x86: properly macroize the two XRSTOR flavors

All they differ by is the REX64 prefix. Create a single macro covering
both, at once allowing to get rid of the disconnect between the current
partial macro and its two use sites.

No change in generated code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: drop dummy input from alternative_{input,io}()
Jan Beulich [Mon, 30 Nov 2015 10:55:49 +0000 (11:55 +0100)]
x86: drop dummy input from alternative_{input,io}()

We don't need the claimed API compatibility. No change in generated
code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/cpu: introduce cpu_dev.c_early_init()
Andrew Cooper [Mon, 30 Nov 2015 10:54:11 +0000 (11:54 +0100)]
x86/cpu: introduce cpu_dev.c_early_init()

The name is chosen to be consistent with Linux.  Doing this allows
early_intel_workaround() to be removed from common code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: allow disabling the emulated local apic
Roger Pau Monné [Thu, 26 Nov 2015 15:01:27 +0000 (16:01 +0100)]
x86: allow disabling the emulated local apic

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/vlapic: fixes for HVM code when running without a vlapic
Roger Pau Monné [Thu, 26 Nov 2015 15:00:56 +0000 (16:00 +0100)]
x86/vlapic: fixes for HVM code when running without a vlapic

The HVM related code (SVM, VMX) generally assumed that a local apic is
always present. With the introduction of a HVM mode were the local apic can
be removed, some of this broken code paths arised.

The SVM exit/resume paths unconditionally checked the state of the lapic,
which is wrong if it's been disabled by hardware, fix this by adding the
necessary checks. On the VMX side, make sure we don't add mappings for a
local apic if it's disabled.

In the generic vlapic code, add checks to prevent setting the TSC deadline
timer if the lapic is disabled, and also prevent trying to inject interrupts
from the PIC is the lapic is also disabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86: suppress bogus log message
Jan Beulich [Thu, 26 Nov 2015 14:51:49 +0000 (15:51 +0100)]
x86: suppress bogus log message

The way we populate mpc_cpufeature is not compatible with modern CPUs,
and hence the message printed using that information is useless/bogus.
It's of interest only anyway when not using ACPI, so move it into MPS
parsing code. This at once significantly reduces boot time logging on
huge systems.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoHVM/save: allow the usage of zeroextend and a fixup function
Roger Pau Monné [Thu, 26 Nov 2015 14:51:00 +0000 (15:51 +0100)]
HVM/save: allow the usage of zeroextend and a fixup function

With the current compat implementation in the save/restore context handling,
only one compat structure is allowed, and using _zeroextend prevents the
fixup function from being called.

In order to allow for the compat handling layer to be able to handle
different compat versions allow calling the fixup function with
hvm_load_entry_zeroextend.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoHVM/save: pass a size parameter to the HVM compat functions
Roger Pau Monné [Thu, 26 Nov 2015 14:50:36 +0000 (15:50 +0100)]
HVM/save: pass a size parameter to the HVM compat functions

In order to cope with types having multiple compat versions pass a size
parameter to the fixup function so we can identify which compat version
Xen is dealing with.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agobuild: fix dependencies for files compiled from their parent directory
Jan Beulich [Thu, 26 Nov 2015 14:50:07 +0000 (15:50 +0100)]
build: fix dependencies for files compiled from their parent directory

The use of $(basename ...) here was wrong (yet I'm sure I tested it).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoMAINTAINERS: change the vt-d maintainer
Yang Zhang [Thu, 26 Nov 2015 14:49:29 +0000 (15:49 +0100)]
MAINTAINERS: change the vt-d maintainer

add Feng as the new maintainer of VT-d stuff

Signed-off-by: Yang Zhang <yang.z.zhang@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/viridian: flush remote tlbs by hypercall
Paul Durrant [Thu, 26 Nov 2015 14:48:41 +0000 (15:48 +0100)]
x86/viridian: flush remote tlbs by hypercall

The Microsoft Hypervisor Top Level Functional Spec. (section 3.4) defines
two bits in CPUID leaf 0x40000004:EAX for the hypervisor to recommend
whether or not to issue a hypercall for local or remote TLB flush.

Whilst it's doubtful whether using a hypercall for local TLB flush would
be any more efficient than a specific INVLPG VMEXIT, a remote TLB flush
may well be more efficiently done. This is because the alternative
mechanism is to IPI all the vCPUs in question which (in the absence of
APIC virtualisation) will require emulation and scheduling of the vCPUs
only to have them immediately VMEXIT for local TLB flush.

This patch therefore adds a viridian option which, if selected, enables
the hypercall for remote TLB flush and implements it using ASID
invalidation for targetted vCPUs followed by an IPI only to the set of
CPUs that happened to be running a targetted vCPU (which may be the empty
set). The flush may be more severe than requested since the hypercall can
request flush only for a specific address space (CR3) but Xen neither
keeps a mapping of ASID to guest CR3 nor allows invalidation of a specific
ASID, but on a host with contended CPUs performance is still likely to
be better than a more specific flush using IPIs.

The implementation of the patch introduces per-vCPU viridian_init() and
viridian_deinit() functions to allow a scratch cpumask to be allocated.
This avoids needing to put this potentially large data structure on stack
during hypercall processing. It also modifies the hypercall input and
output bit-fields to allow a check for the 'fast' calling convention,
and a white-space fix in the definition of HVMPV_feature_mask (to remove
hard tabs).

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agopublic/event_channel.h: correct comment
Peng Fan [Wed, 25 Nov 2015 16:26:09 +0000 (17:26 +0100)]
public/event_channel.h: correct comment

According to definition of structure evtchn_alloc_unbound,
there is an entry "domid_t remote_dom", no "rdom". So
using "remote_dom" in comments instead of "rdom".

Signed-off-by: Peng Fan <van.freenix@gmail.com>
9 years agox86/boot: check for not allowed sections before linking
Daniel Kiper [Wed, 25 Nov 2015 16:24:36 +0000 (17:24 +0100)]
x86/boot: check for not allowed sections before linking

Currently check for not allowed sections is performed just after
compilation. However, if compilation succeeds and check fails then
second build will create xen.gz/xen.efi without any visible error.
This happens because %.o: %.c recipe created object file during first
run and make do not execute this recipe during second run. So, look
for not allowed sections before linking. This way check will be
executed every time.

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agolibxc: expose xsaves/xgetbv1/xsavec to hvm guest
Shuai Ruan [Wed, 25 Nov 2015 16:24:17 +0000 (17:24 +0100)]
libxc: expose xsaves/xgetbv1/xsavec to hvm guest

This patch exposes xsaves/xgetbv1/xsavec to hvm guest.
The reserved bits of eax/ebx/ecx/edx must be cleaned up
when call cpuid(0dh) with leaf 1 or 2..63.

According to the spec the following bits must be reserved:
For leaf 1, bits 03-04/08-31 of ecx is reserved. Edx is reserved.
For leaf 2...63, bits 01-31 of ecx is reserved, Edx is reserved.

But as no XSS festures are currently supported, even in HVM guests,
for leaf 2...63, ecx should be zero at the moment.

Signed-off-by: Shuai Ruan <shuai.ruan@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/xsaves: enable xsaves/xrstors for hvm guest
Shuai Ruan [Wed, 25 Nov 2015 16:23:51 +0000 (17:23 +0100)]
x86/xsaves: enable xsaves/xrstors for hvm guest

This patch enables xsaves for hvm guest, includes:
1.handle xsaves vmcs init and vmexit.
2.add logic to write/read the XSS msr.

Add IA32_XSS_MSR save/rstore support.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/xsaves: enable xsaves/xrstors/xsavec in xen
Shuai Ruan [Wed, 25 Nov 2015 16:20:05 +0000 (17:20 +0100)]
x86/xsaves: enable xsaves/xrstors/xsavec in xen

This patch uses xsaves/xrstors/xsavec instead of xsaveopt/xrstor
to perform the xsave_area switching so that xen itself
can benefit from them when available.

For xsaves/xrstors/xsavec only use compact format. Add format conversion
support when perform guest os migration. Also, pv guest will not support
xsaves/xrstors.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
[dropped redundant uses of XRSTOR_FIXUP and fix formatting]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/xsaves: using named operand instead numbered operand in xrstor
Shuai Ruan [Wed, 25 Nov 2015 16:19:45 +0000 (17:19 +0100)]
x86/xsaves: using named operand instead numbered operand in xrstor

This is pre-req patch for latter xsaves patch. This patch introduce
a macro to handle restor fixup, also use named opreand instead of
numbered operand in restor fixup code.

Signed-off-by: Shuai Ruan <shuai.ruan@intel.com>
[with the expectation of later doing some cleanup:]
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agobuild: remove .d files from xen/ on a clean
Jonathan Creekmore [Wed, 25 Nov 2015 16:19:01 +0000 (17:19 +0100)]
build: remove .d files from xen/ on a clean

Dependency files were getting left behind in the xen
directory (since 8b6ef9c152edceabecc7f90c811cd538a7b7a110),
so append the $(DEPS) to the clean rule that runs in the
hypervisor directory.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
9 years agoconsole: make printk() line continuation tracking per-CPU
Jan Beulich [Wed, 25 Nov 2015 16:18:21 +0000 (17:18 +0100)]
console: make printk() line continuation tracking per-CPU

This avoids cases where split messages (with other than the initial
part not carrying a log level; single line messages only of course)
issued on multiple CPUs interfere with each other, causing messages to
be issued which are supposed to be suppressed due to the log level
setting. E.g.

CPU A CPU B
XENLOG_G_DEBUG "abc"
XENLOG_G_DEBUG "def\n"
"xyz\n"

would cause the last message to be logged despite this obviously not
being intended (at default log levels).

Suggested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Make clear that GICD_*SPI_* registers are reserved
Julien Grall [Wed, 18 Nov 2015 17:28:06 +0000 (17:28 +0000)]
xen/arm: vgic-v3: Make clear that GICD_*SPI_* registers are reserved

Our vGIC emulation have GICD_TYPER.MBIS set to 0 which means that
GICD_*SPI_* registers are reserved. Implement them using the *_reserved
labels.

Also, implement theses registers for the read part.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Don't implement write-only register read as zero
Julien Grall [Wed, 18 Nov 2015 17:28:05 +0000 (17:28 +0000)]
xen/arm: vgic-v3: Don't implement write-only register read as zero

A read to a write only register is unknown. Use a memorable value to
differentiate from an actual RAZ register.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Remove spurious return in GICR_INVALLR
Julien Grall [Wed, 18 Nov 2015 17:28:04 +0000 (17:28 +0000)]
xen/arm: vgic-v3: Remove spurious return in GICR_INVALLR

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Emulate read to GICD_ICACTIVER<n>
Julien Grall [Wed, 18 Nov 2015 17:28:03 +0000 (17:28 +0000)]
xen/arm: vgic-v3: Emulate read to GICD_ICACTIVER<n>

The GICD_ICACTIVER<n> registers are missing in the read emulation of the
distributor.

Call the common emulation for the whole range.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic: Re-order the register emulations to match the memory map
Julien Grall [Wed, 18 Nov 2015 17:28:02 +0000 (17:28 +0000)]
xen/arm: vgic: Re-order the register emulations to match the memory map

It helps to find quickly whether we forgot to emulate a register or not.

At the same time add the missing reserved/implementation defined
registers. All other missing registers will be added in a follow-up if
necessary.

Note that only the distributor register map explicitely say the
size of a register (see 8.8 in ARM IHI 0069A). When the size is not
known, the implementation defined/reserved may not be emulated
correctly.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Remove GICR_MOVALLR and GICR_MOVLPIR
Julien Grall [Wed, 18 Nov 2015 17:28:01 +0000 (17:28 +0000)]
xen/arm: vgic-v3: Remove GICR_MOVALLR and GICR_MOVLPIR

The 2 registers are not described in the software spec (ARM IHI 0069A)
and their offsets are marked "implementation defined".

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic: Properly emulate the full register
Julien Grall [Wed, 18 Nov 2015 17:28:00 +0000 (17:28 +0000)]
xen/arm: vgic: Properly emulate the full register

The offset in the emulation is based on byte. As most of the registers
are 64/32 bits, they will span over multiple bytes.

However, the current emulation only cares about the first offset. This
will result in not properly emulating any access on the register with
any other offset.

Introduce new macros to help implementing access on multiple byte and
use them over the vGIC emulation.

Note that I didn't convert the reserved/implementation defined
registers. It will be done in a follow-up.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Only emulate identification registers required by the spec
Julien Grall [Wed, 18 Nov 2015 17:27:59 +0000 (17:27 +0000)]
xen/arm: vgic-v3: Only emulate identification registers required by the spec

Most of the identification registers space contains implementation
defined registers (see 8.1.13 in ARM IHI 0069A) and only GIC{D,R}_PIDR2
is required to be implemented.

Currently the emulation of those registers mimic the ARM implementation,
but it's untrue to say that we properly emulate a such implementation.

Keep only GIC{D,R}_PIDR2 implemented with the "implementation defined
bits" to zero and the ArchRev field (bits[7:4]) to 0x3 as we emulate a
GICv3.

Note that the emulation of the range wasn't valid anyway because the
registers are split in 2 sets (PIDR4-PIDR7 and PIDR0-PIDR2).

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Use the correct offset GICR_IGRPMODR0
Julien Grall [Wed, 18 Nov 2015 17:27:58 +0000 (17:27 +0000)]
xen/arm: vgic-v3: Use the correct offset GICR_IGRPMODR0

The offset is 0x0D00 and not 0x0F80.

Also re-order the definition to keep all the definitions ordered.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Don't try to emulate IROUTER which do not exist in the spec
Julien Grall [Wed, 18 Nov 2015 17:27:57 +0000 (17:27 +0000)]
xen/arm: vgic-v3: Don't try to emulate IROUTER which do not exist in the spec

The range of valid IROUTER<n> are n = 32 - 1019 (see 8.9.13 in IHI 0069A)
which correspond to the offset 0x6100-0x7FD8.

Other offsets are invalid and therefore should not be emulated.

Also remove the now unused label read_as_zero_64 and write_ignore_64.

Note that GICD_IROUTER is kept to accommodate the GICv3 drivers which has
been in part taken from Linux.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v2: Implement correctly ICFGR{0, 1} read-only
Julien Grall [Wed, 18 Nov 2015 17:27:56 +0000 (17:27 +0000)]
xen/arm: vgic-v2: Implement correctly ICFGR{0, 1} read-only

Each ITARGETSR register is 4-bytes wide and the offset is in bytes.

The current implementation is computing the offset of ICFGR1 and ICFG2
wrongly result to emulate only the first 2 byte of the ICFGR<n> range
read-only. The rest will be treated as read-write.

For convenience introduce ITARGETSR1 and ITARGETSR2.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- typoes in commit message ]