]> xenbits.xensource.com Git - people/iwj/xen.git/log
people/iwj/xen.git
6 years agoautomation: introduce a new variable to control container user
Wei Liu [Tue, 25 Sep 2018 14:19:31 +0000 (15:19 +0100)]
automation: introduce a new variable to control container user

Sometimes it is handy to create a container and play with its setup
manually as root.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agox86emul: fix test harness build after e8dfbc2962
Jan Beulich [Wed, 26 Sep 2018 08:49:38 +0000 (10:49 +0200)]
x86emul: fix test harness build after e8dfbc2962

There was another stdio.h inclusion left in place. Re-order #include-s
altogether in test_x86_emulator.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Wei Liu <wei.liu2@citrix.com>
6 years agoarm: fix Dom build after cd8015b634
Jan Beulich [Tue, 25 Sep 2018 12:56:58 +0000 (06:56 -0600)]
arm: fix Dom build after cd8015b634

The removal of the VLA there has changed sizeof() for the array.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agox86/altp2m: clean up p2m_{get/set}_suppress_ve()
Razvan Cojocaru [Tue, 25 Sep 2018 14:35:52 +0000 (15:35 +0100)]
x86/altp2m: clean up p2m_{get/set}_suppress_ve()

Move p2m_{get/set}_suppress_ve() to p2m.c, replace incorrect
ASSERT() in p2m-pt.c (since a guest can run in shadow mode even on
a system with virt exceptions, which would trigger the ASSERT()),
move the VMX-isms (cpu_has_vmx_virt_exceptions checks) to
p2m_ept_{get/set}_entry(), and fix locking code in
p2m_get_suppress_ve().

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agofuzz, test x86_emulator: disable sse before including always_inline fns
Christopher Clark [Tue, 25 Sep 2018 14:30:32 +0000 (16:30 +0200)]
fuzz, test x86_emulator: disable sse before including always_inline fns

Workaround for compiler rejection of SSE-using always_inlines defined before
SSE is disabled.

Compiling with _FORTIFY_SOURCE or higher levels of optimization enabled
will always_inline several library fns (memset, memcpy, ...)
(with gcc 8.2.0 and glibc 2.28).

In fuzz and x86_emulator test, the compiler is instructed not
to generate SSE instructions via: #pragma GCC target("no-sse")
because those registers are needed for use by the workload.

The combination above causes compilation failure as the inline functions
use those instructions. This is resolved by reordering the inclusion of
<stdio.h> and <string.h> to after the pragma disabling SSE generation.

It would be preferable to locate the no-sse pragma within x86-emulate.h at the
top of the file, prior to including any other headers; unfortunately doing so
before <stdlib.h> causes compilation failure due to declaration of 'atof' with:
  "SSE register return with SSE disabled".
Fortunately there is no (known) current dependency on any always_inline
SSE-inclined function declared in <stdlib.h> or any of its dependencies, so the
pragma is therefore issued immediately after inclusion of <stdlib.h> with a
comment introduced to explain its location there.

Add compile-time checks for unwanted prior inclusion of <string.h> and
<stdio.h>, which are the two headers that provide the library functions that
are handled with wrappers and listed within "x86-emulate.h" as ones "we think
might access any of the FPU state".
* Use standard-defined "EOF" macro to detect prior <stdio.h> inclusion.
* Use "_STRING_H" (non-standardized guard macro) as best-effort
  for detection of prior <string.h> inclusion. This is non-universally
  viable but will provide error output on common GLIBC systems, so
  provides some defensive coverage.

Adds conditional #include <stdio.h> to x86-emulate.h because fwrite, printf,
etc. are referenced when WRAP has been defined.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/mm: don't crash on unknown memory types in page_get_ram_type
Roger Pau Monné [Tue, 25 Sep 2018 14:29:59 +0000 (16:29 +0200)]
x86/mm: don't crash on unknown memory types in page_get_ram_type

Instead return RAM_TYPE_UNKNOWN.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: change gethvmcontext_partial error code for offline vcpus
Alexandru Isaila [Tue, 25 Sep 2018 14:29:18 +0000 (16:29 +0200)]
x86/hvm: change gethvmcontext_partial error code for offline vcpus

This patch is needed in order to have a different return error for invalid vcpu
and offline vcpu on the per vcpu king.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen: Disallow variable length arrays
Andrew Cooper [Mon, 17 Sep 2018 14:49:14 +0000 (15:49 +0100)]
xen: Disallow variable length arrays

Variable length arrays result in excess stack utilisation, with a risk
of stack overflow if the length is too large.  It also results in fairly
poor asm generation, because of requiring a divide as part of the space
calcuation.

Xen no longer has any variable length arrays, so take the opportunity to
formally disallow them.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: Adjust hvmemul_rep_stos() to compile with -Wvla
Andrew Cooper [Mon, 17 Sep 2018 15:32:32 +0000 (16:32 +0100)]
x86/hvm: Adjust hvmemul_rep_stos() to compile with -Wvla

When using -Wvla, the typecast of buf triggers a Variable Length Array
warning.  This is less than ideal, as this typecast doesn't occupy any stack
space, but we don't have a finer grain option to use.

Alter the asm expression to avoid the typecast, which necessitates the
introduction of a memory clobber as the compiler can no longer identify
the total quantity of written memory.

Despite the memory clobber, there is no change to the generated asm.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/PoD: Avoid using variable length arrays in p2m_pod_zero_check()
Andrew Cooper [Mon, 17 Sep 2018 15:30:53 +0000 (16:30 +0100)]
x86/PoD: Avoid using variable length arrays in p2m_pod_zero_check()

Callers of p2m_pod_zero_check() pass a count of up to POD_SWEEP_STRIDE.
Move the definition of POD_SWEEP_STRIDE and give the arrays a fixed
bound.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agox86/PoD: Simplify handling of the quick check
Andrew Cooper [Mon, 17 Sep 2018 15:21:53 +0000 (16:21 +0100)]
x86/PoD: Simplify handling of the quick check

There is no need to duplicate the contents of the skip block.

While cleaning up this function, change 4 ints to be unsigned.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agox86/hvm: Add check for cpu_has_vmx_virt_exceptions
Alexandru Isaila [Tue, 25 Sep 2018 09:10:38 +0000 (12:10 +0300)]
x86/hvm: Add check for cpu_has_vmx_virt_exceptions

This is useful so HVMOP_altp2m_vcpu_enable_notify will fail and not
silently succeed. It save a call to HVMOP_altp2m_set_suppress_ve.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoMake credit2 the default scheduler
George Dunlap [Tue, 25 Sep 2018 09:47:10 +0000 (10:47 +0100)]
Make credit2 the default scheduler

Credit2 was declared "supported" in 4.8, and as of 4.10 had two other
critical features implemented (soft affinity / NUMA and caps).

Why change the default?

The code is better: more predictable, less jitter, easier to determine
how modifications will affect overall behavior, easier in the future
to make load-balancing behavior more subtle (e.g., taking into account
the cost of powering up extra cores, &c).

Overall performance compared to Credit1 is somewhat of a mixed bag.
Unfortunately most of what I have are tests using XenServer's internal
perf testing system, so I can't share the raw data (via links anyway).

Here is a summary of data from an internal e-mail Dario sent in the
past:

* DVDbench: On underloaded systems, credit2 outperformed credit1 by
about 4%.  On overloaded systems, credit2 underperformed by about 3%.

* On a range of tests (unixbench, lmbench, &c), credit and credit2
perform within 5% of each other (up and down).

* Credit2 fairly consistently beats credit for TCP-style workloads.

* Credit2 is sometimes equal to, sometimes 5-15% worse than, credit for
synthetic CPU workloads (e.g., Dhrystone).

* On LoginVSI, credit2 fairly consistently outperforms credit by about 10%.

Credit2, like credit, has a number of workloads / setups for which
performance could be improved.  Personally I think networking and
partially-loaded systems is going to be more representative of what
Xen is actually used for; so I think credit2 is on the whole the
better scheduler to use by default.  And in any case, making those
improvements on credit2 will be easier than on credit.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
6 years agox86: expose CONFIG_HVM
Wei Liu [Fri, 21 Sep 2018 15:54:52 +0000 (16:54 +0100)]
x86: expose CONFIG_HVM

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/mm: put HVM only code under CONFIG_HVM
Wei Liu [Fri, 21 Sep 2018 15:54:51 +0000 (16:54 +0100)]
x86/mm: put HVM only code under CONFIG_HVM

Going through the code, HAP, EPT, PoD and ALTP2M depend on HVM code.
Put these components under CONFIG_HVM. This further requires putting
one of the vm event under CONFIG_HVM.

Altp2m requires a bit more attention because its code is embedded in
generic x86 p2m code.

Also make hap_enabled evaluate to false when !CONFIG_HVM. Make sure it
evaluate its parameter to avoid unused variable warnings in its users.

Also sort items in Makefile while at it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agox86/mm: put nested p2m code under CONFIG_HVM
Wei Liu [Fri, 21 Sep 2018 15:54:50 +0000 (16:54 +0100)]
x86/mm: put nested p2m code under CONFIG_HVM

These functions are only useful for nested hvm, which isn't enabled
when CONFIG_HVM is false.

Enclose relevant code and fields in CONFIG_HVM.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agox86/p2m/pod: make it build with !CONFIG_HVM
Wei Liu [Fri, 21 Sep 2018 15:54:49 +0000 (16:54 +0100)]
x86/p2m/pod: make it build with !CONFIG_HVM

Populate-on-demand is HVM only.

Provide a bunch of stubs for common p2m code and guard one invocation
of guest_physmap_mark_populate_on_demand with is_hvm_domain.

Put relevant fields in p2m_domain and code which touches those fields
under CONFIG_HVM.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agox86: Clean up the Xen MSR infrastructure
Andrew Cooper [Wed, 21 Feb 2018 17:54:13 +0000 (17:54 +0000)]
x86: Clean up the Xen MSR infrastructure

Rename them to guest_{rd,wr}msr_xen() for consistency, and because the _regs
suffix isn't very appropriate.

Update them to take a vcpu pointer rather than presuming that they act on
current, and switch to using X86EMUL_* return values.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/viridan: Clean up Viridian MSR infrastructure
Andrew Cooper [Wed, 20 Sep 2017 17:33:59 +0000 (17:33 +0000)]
x86/viridan: Clean up Viridian MSR infrastructure

Rename the functions to guest_{rd,wr}msr_viridian() for consistency, and
because the _regs() suffix isn't very appropriate.

Update them to take a vcpu pointer rather than presuming that they act on
current, which is safe for all implemented operations, and switch their return
ABI to use X86EMUL_*.

The default cases no longer need to deal with MSRs out of the Viridian range,
but drop the printks to debug builds only and identify the value attempting to
be written.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/msr: Dispatch Xen and Viridian MSRs from guest_{wr,rd}msr()
Andrew Cooper [Wed, 20 Sep 2017 17:33:59 +0000 (18:33 +0100)]
x86/msr: Dispatch Xen and Viridian MSRs from guest_{wr,rd}msr()

Despite the complicated diff in {svm,vmx}_msr_write_intercept(), it is just
the 0 case losing one level of indentation, as part of removing the call to
wrmsr_hypervisor_regs().

The case blocks in guest_{wr,rd}msr() use raw numbers, partly for consistency
with the CPUID side of things, but mainly because this is clearer code to
follow.  In particular, the Xen block may overlap with the Viridian block if
Viridian is not enabled for the domain, and trying to express this with named
literals caused more confusion that it solved.

Future changes with clean up the individual APIs, including allowing these
MSRs to be usable for vcpus other than current (no callers exist with v !=
current).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoARM/dom0: Avoid using a variable length array in make_memory_node()
Andrew Cooper [Mon, 24 Sep 2018 13:00:02 +0000 (14:00 +0100)]
ARM/dom0: Avoid using a variable length array in make_memory_node()

The reg[] array can have a maximum size of 8 in practice, so use the worst
case calculation rather than making it variable length.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
6 years agoxen:arm: Populate arm64 image header
Amit Singh Tomar [Tue, 11 Sep 2018 16:48:06 +0000 (22:18 +0530)]
xen:arm: Populate arm64 image header

This patch adds image size and flags to XEN image header. It uses
those fields according to the updated Linux kernel image definition.

With this patch bootloader can now place XEN image anywhere in system
RAM at 2MB aligned address without to worry about relocation.
For instance, it fixes the XEN boot on Amlogic SoC where bootloader(U-BOOT)
always relocates the XEN image to an address range reserved for firmware data.

Signed-off-by: Amit Singh Tomar <amittomer25@gmail.com>
Reviewed-by: Andre Pryzwara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agox86/mem_access: put p2m_{get/set}_suppress_ve under CONFIG_HVM
Wei Liu [Fri, 21 Sep 2018 15:54:48 +0000 (16:54 +0100)]
x86/mem_access: put p2m_{get/set}_suppress_ve under CONFIG_HVM

They are used by HVM code only.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
6 years agolibxl: keep assigned pci devices across domain reboots
Roger Pau Monne [Thu, 20 Sep 2018 10:40:25 +0000 (12:40 +0200)]
libxl: keep assigned pci devices across domain reboots

Fill the from_xenstore libxl_device_type hook for PCI devices so that
libxl_retrieve_domain_configuration can properly retrieve PCI devices
from xenstore.

This fixes disappearing pci devices across domain reboots.

Reported-by: Andreas Kinzler <hfp@posteo.de>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/pvh: copy data from low 1MB to Dom0 physmap instead of mapping it
Roger Pau Monné [Fri, 21 Sep 2018 10:23:44 +0000 (12:23 +0200)]
x86/pvh: copy data from low 1MB to Dom0 physmap instead of mapping it

Identity mapping RAM regions on the low 1MB for Dom0 is not ideal,
since there's data there that could be used by Xen during runtime
(like the AP trampoline), so instead of identity mapping the low 1MB
into the Dom0 physmap populate those RAM regions and copy the data.

Note that this allows to remove unshare_xen_page_with_guest since the
only caller was the PVH Dom0 builder.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoiommu: setup inclusive mappings before enabling iommu
Roger Pau Monné [Fri, 21 Sep 2018 10:22:38 +0000 (12:22 +0200)]
iommu: setup inclusive mappings before enabling iommu

Or else it can lead to freezes when enabling the iommu on certain
Intel hardware:

[...]
(XEN) ELF: addresses:
(XEN)     virt_base        = 0xffffffff80000000
(XEN)     elf_paddr_offset = 0x0
(XEN)     virt_offset      = 0xffffffff80000000
(XEN)     virt_kstart      = 0xffffffff81000000
(XEN)     virt_kend        = 0xffffffff82953000
(XEN)     virt_entry       = 0xffffffff8274e180
(XEN)     p2m_base         = 0x8000000000
(XEN)  Xen  kernel: 64-bit, lsb, compat32
(XEN)  Dom0 kernel: 64-bit, PAE, lsb, paddr 0x1000000 -> 0x295300
<freeze>

This restores the behavior before commit 66a9274cc3435 that changed
the order and enabled the iommu without having the inclusive mappings
setup.

Note that on AMD hardware the order is also changed to add inclusive
mappings before adding any devices.

Reported-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agox86/mm: re-indent after "re-arrange get_page_from_l<N>e() vs pv_l1tf_check_l<N>e()"
Jan Beulich [Fri, 21 Sep 2018 10:21:32 +0000 (12:21 +0200)]
x86/mm: re-indent after "re-arrange get_page_from_l<N>e() vs pv_l1tf_check_l<N>e()"

That earlier change introduced two "else switch ()" constructs which now
get converted back to "normal" style (indentation). To limit indentation
depth, a conditional gets inverted in ptwr_emulated_update().

No functional change intended.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/altp2m: Allow setting the #VE info page for an arbitrary VCPU
Adrian Pop [Tue, 4 Sep 2018 04:59:22 +0000 (07:59 +0300)]
x86/altp2m: Allow setting the #VE info page for an arbitrary VCPU

In a classic HVI + Xen setup, the introspection engine would monitor
legacy guest page-tables by marking them read-only inside the EPT; this
way any modification explicitly made by the guest or implicitly made by
the CPU page walker would trigger an EPT violation, which would be
forwarded by Xen to the SVA and thus the HVI agent.  The HVI agent would
analyse the modification, and act upon it - for example, a virtual page
may be remapped (its guest physical address changed inside the
page-table), in which case the introspection logic would update the
protection accordingly (remove EPT hook on the old gpa, and place a new
EPT hook on the new gpa).  In other cases, the modification may be of no
interest to the introspection engine - for example, the accessed/dirty
bits may be cleared by the operating system or the accessed/dirty bits
may be set by the CPU page walker.

In our tests we discovered that the vast majority of guest page-table
modifications fall in the second category (especially on Windows 10 RS4
x64 - more than 95% of ALL the page-table modifications are irrelevant to
us) - they are of no interest to the introspection logic, but they
trigger a very costly EPT violation nonetheless.  Therefore, we decided
to make use of the new #VE & VMFUNC features in recent Intel CPUs to
accelerate the guest page-tables monitoring in the following way:

1. Each monitored page-table would be flagged as being convertible
   inside the EPT, thus enabling the CPU to deliver a virtualization
   exception to he guest instead of generating a traditional EPT
   violation.
2. We inject a small filtering driver inside the protected guest VM,
   which would intercept the virtualization exception in order to handle
   guest page-table modifications.
3. We create a dedicated EPT view (altp2m) for the in-guest agent, which
   would isolate the agent from the rest of the operating system; the
   agent will switch in and out of the protected EPT view via the VMFUNC
   instruction placed inside a trampoline page, thus making the agent
   immune to malicious code inside the guest.

This way, all the page-table accesses would generate a
virtualization-exception inside the guest instead of a costly EPT
violation; the #VE agent would emulate and analyse the modification, and
decide whether it is relevant for the main introspection logic; if it is
relevant, it would do a VMCALL and notify the introspection engine
about the modification; otherwise, it would resume normal instruction
execution, thus avoiding a very costly VM exit.

Signed-off-by: Adrian Pop <apop@bitdefender.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agoChange timestamps representation for keyhandlers
Andrii Anisov [Wed, 12 Sep 2018 17:03:27 +0000 (20:03 +0300)]
Change timestamps representation for keyhandlers

For different keyhandlers, replace a hex with delimiter representation
of time to PRI_stime which is decimal ns currently.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agox86/altp2m: Add a hvmop for querying the suppress #VE bit
Adrian Pop [Wed, 12 Sep 2018 07:50:06 +0000 (10:50 +0300)]
x86/altp2m: Add a hvmop for querying the suppress #VE bit

Signed-off-by: Adrian Pop <apop@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/altp2m: Add a hvmop for setting the suppress #VE bit
Adrian Pop [Wed, 12 Sep 2018 07:50:05 +0000 (10:50 +0300)]
x86/altp2m: Add a hvmop for setting the suppress #VE bit

Introduce a new hvmop, HVMOP_altp2m_set_suppress_ve, which allows a
domain to change the value of the #VE suppress bit for a page.

Add a libxc wrapper for invoking this hvmop.

Signed-off-by: Adrian Pop <apop@bitdefender.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
6 years agotools: add option to explicitly enable VirtFS in QEMU build
Paul Durrant [Tue, 11 Sep 2018 15:01:08 +0000 (16:01 +0100)]
tools: add option to explicitly enable VirtFS in QEMU build

9pfs support has been a documented feature since Xen 4.9, but QEMU will
not be built with backend support unless VirtFS is enabled, which is
predicated on the libcap and libattr dev packages being installed. This is
not obvious to anyone intending to use 9pfs.

This patch adds an 'enable-9pfs' option to configure which, if specified,
will cause '--enable-virtfs' to be passed to QEMU's configure. This will
cause the dependency on libcap and libattr to be called out if the packages
are not in installed.

For completeness, specifying 'disable-9pfs' will cause '--disable-virtfs' to
be passed to QEMU's confgure and not specifying an option will keep the
previous behaviour of predicating VirtFS on whether the libcap and libattr
packages are installed.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxen: sched/Credit2: fix bug when moving CPUs between two Credit2 cpupools
Dario Faggioli [Tue, 18 Sep 2018 15:50:44 +0000 (16:50 +0100)]
xen: sched/Credit2: fix bug when moving CPUs between two Credit2 cpupools

Whether or not a CPU is assigned to a runqueue (and, if yes, to which
one) within a Credit2 scheduler instance must be both a per-cpu and
per-scheduler instance one.

In fact, when we move a CPU between cpupools, we first setup its per-cpu
data in the new pool, and then cleanup its per-cpu data from the old
pool. In Credit2, when there currently is no per-scheduler, per-cpu
data (as the cpu-to-runqueue map is stored on a per-cpu basis only),
this means that the cleanup of the old per-cpu data can mess with the
new per-cpu data, leading to crashes like this:

https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg23306.html
https://www.mail-archive.com/xen-devel@lists.xenproject.org/msg23350.html

Basically, when csched2_deinit_pdata() is called for CPU 13, for fully
removing the CPU from Pool-0, per_cpu(13,runq_map) already contain the
id of the runqueue to which the CPU has been assigned in the scheduler
of Pool-1, which means wrong runqueue manipulations happen in Pool-0's
scheduler. Furthermore, at the end of such call, that same runq_map is
updated with -1, which is what causes the BUG_ON in csched2_schedule(),
on CPU 13, to trigger.

So, instead of reverting a2c4e5ab59d "xen: credit2: make the cpu to
runqueue map per-cpu" (as we don't want to go back to having the huge
array in struct csched2_private) add a per-cpu scheduler specific data
structure, like, for instance, Credit1 has already. That (for now) only
contains one field: the id of the runqueue the CPU is assigned to.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agoautomation: skip some branches in gitlab CI
Wei Liu [Mon, 17 Sep 2018 08:33:41 +0000 (09:33 +0100)]
automation: skip some branches in gitlab CI

Ignore branches which are always fast-forwarded to staging* branches.
List of filters taken from Travis CI setup.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoxen/vcpu: Introduce vcpu_destroy()
Andrew Cooper [Wed, 5 Sep 2018 17:32:52 +0000 (17:32 +0000)]
xen/vcpu: Introduce vcpu_destroy()

Like _domain_destroy(), this will eventually idempotently free all parts of a
struct vcpu.

While breaking apart the failure path of vcpu_create(), rework the codeflow to
be in a line at the end of the function for clarity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoxen/vcpu: Rename the common interfaces for consistency
Andrew Cooper [Wed, 5 Sep 2018 16:48:02 +0000 (16:48 +0000)]
xen/vcpu: Rename the common interfaces for consistency

The vcpu functions are far less consistent than the domain side of things, and
in particular, has vcpu_destroy() for architecture specific functionality.

Perform the following renames:

  * alloc_vcpu      => vcpu_create
  * vcpu_initialise => arch_vcpu_create
  * vcpu_destroy    => arch_vcpu_destroy

which makes the vcpu hierarchy consistent with the domain hierarchy.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoxen: connect guest creation with CONFIG_HVM
Wei Liu [Thu, 13 Sep 2018 16:38:08 +0000 (17:38 +0100)]
xen: connect guest creation with CONFIG_HVM

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: provide stubs, declarations and macros in hvm.h
Wei Liu [Thu, 13 Sep 2018 16:38:04 +0000 (17:38 +0100)]
x86: provide stubs, declarations and macros in hvm.h

Make sure hvm_enabled evaluate to false then provide necessary things
to make xen build when !CONFIG_HVM.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/mm: put paging_update_nestedmode under CONFIG_HVM
Wei Liu [Fri, 17 Aug 2018 10:23:28 +0000 (11:23 +0100)]
x86/mm: put paging_update_nestedmode under CONFIG_HVM

Nested HVM is not enabled when !CONFIG_HVM.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agotools: correct tools/tests/depriv/Makefile
Juergen Gross [Fri, 7 Sep 2018 09:16:54 +0000 (11:16 +0200)]
tools: correct tools/tests/depriv/Makefile

tools/tests/depriv/Makefile directly builds the target program from
its C-source. This is problematic when an incremental build is needed
after a header the program is depending on has been modified: in this
case all headers are added into the gcc call and the build will fail.

Correct that by adding a rule for building the program from its .o
file.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agotools/tests: allow depriv-fd-checker to build with really old Linux headers
Jan Beulich [Fri, 31 Aug 2018 07:02:42 +0000 (01:02 -0600)]
tools/tests: allow depriv-fd-checker to build with really old Linux headers

Assuming it was intentional for this test utility, other than most other
ones, to always be built, I think it would be nice if it didn't fail to
build on really old distros just because of the lack of a TUNGETIFF
definition.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxen: decouple HVM and IOMMU capabilities
Wei Liu [Fri, 24 Aug 2018 20:01:40 +0000 (21:01 +0100)]
xen: decouple HVM and IOMMU capabilities

HVM and IOMMU are two distinct hardware features, yet they were
bundled together in sysctl and xl's output.

Decouple them on sysctl level. On toolstack level we still need to
maintain a sensible semantics for `xl info`. Massage the information
according to the following table:

pv      hvm     iommu           flags in xl info
0       0       0               n/a
0       0       1               n/a
0       1       0               hvm
0       1       1               hvm hvm_directio
1       0       0               NIL
1       0       1               directio
1       1       0               hvm
1       1       1               hvm hvm_directio directio

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agox86/domctl: don't pause the whole domain if only getting vcpu state
Alexandru Isaila [Mon, 10 Sep 2018 14:27:00 +0000 (16:27 +0200)]
x86/domctl: don't pause the whole domain if only getting vcpu state

This patch is focused on moving changing hvm_save_one() to save one
typecode from one vcpu and now that the save functions get data from a
single vcpu we can pause the specific vcpu instead of the domain.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: remove redundant save functions
Alexandru Isaila [Mon, 10 Sep 2018 14:27:00 +0000 (16:27 +0200)]
x86/hvm: remove redundant save functions

This patch removes the redundant save functions and renames the
save_one* to save. It then changes the domain param to vcpu in the
save funcs and adapts print messages in order to match the format of the
other save related messages.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/domctl: use hvm_save_vcpu_handler
Alexandru Isaila [Mon, 10 Sep 2018 14:27:00 +0000 (16:27 +0200)]
x86/domctl: use hvm_save_vcpu_handler

This patch is aimed on using the new save_one fuctions in the hvm_save

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: add handler for save_one funcs
Alexandru Isaila [Mon, 10 Sep 2018 14:27:00 +0000 (16:27 +0200)]
x86/hvm: add handler for save_one funcs

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: introduce lapic_save_regs_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/hvm: introduce lapic_save_regs_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: introduce lapic_save_hidden_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/hvm: introduce lapic_save_hidden_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: introduce viridian_save_vcpu_ctxt_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/hvm: introduce viridian_save_vcpu_ctxt_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
6 years agox86/hvm: introduce hvm_save_mtrr_msr_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/hvm: introduce hvm_save_mtrr_msr_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>i
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: introduce hvm_save_cpu_msrs_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/hvm: introduce hvm_save_cpu_msrs_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: introduce hvm_save_cpu_xsave_states_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/hvm: introduce hvm_save_cpu_xsave_states_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: introduce hvm_save_cpu_ctxt_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/hvm: introduce hvm_save_cpu_ctxt_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: introduce hvm_save_tsc_adjust_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/hvm: introduce hvm_save_tsc_adjust_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/cpu: introduce vmce_save_vcpu_ctxt_one()
Alexandru Isaila [Mon, 10 Sep 2018 14:26:00 +0000 (16:26 +0200)]
x86/cpu: introduce vmce_save_vcpu_ctxt_one()

This is used to save data from a single instance.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/mm: change default value for suppress #VE in set_mem_access()
Vlad Ioan Topan [Wed, 12 Sep 2018 07:50:00 +0000 (09:50 +0200)]
x86/mm: change default value for suppress #VE in set_mem_access()

The default value for the "suppress #VE" bit set by set_mem_access()
currently depends on whether the call is made from the same domain (the
bit is set when called from another domain and cleared if called from
the same domain). This patch changes that behavior to inherit the old
suppress #VE bit value if it is already set and to set it to 1
otherwise, which is safer and more reliable.

Signed-off-by: Vlad Ioan Topan <itopan@bitdefender.com>
Signed-off-by: Adrian Pop <apop@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
6 years agox86/iommu: add map-reserved dom0-iommu option to map reserved memory ranges
Roger Pau Monné [Fri, 7 Sep 2018 09:08:00 +0000 (11:08 +0200)]
x86/iommu: add map-reserved dom0-iommu option to map reserved memory ranges

Several people have reported hardware issues (malfunctioning USB
controllers) due to iommu page faults on Intel hardware. Those faults
are caused by missing RMRR (VTd) entries in the ACPI tables. Those can
be worked around on VTd hardware by manually adding RMRR entries on
the command line, this is however limited to Intel hardware and quite
cumbersome to do.

In order to solve those issues add a new dom0-iommu=map-reserved
option that identity maps all regions marked as reserved in the memory
map. Note that regions used by devices emulated by Xen (LAPIC, IO-APIC
or PCIe MCFG regions) are specifically avoided. Note that this option
is available to all Dom0 modes (as opposed to the inclusive option
which only works for PV Dom0).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
6 years agox86/iommu: switch the hwdom mapping function to use page_get_type
Roger Pau Monné [Fri, 7 Sep 2018 09:08:00 +0000 (11:08 +0200)]
x86/iommu: switch the hwdom mapping function to use page_get_type

This avoids repeated calls to page_is_ram_type which improves
performance and makes the code easier to read.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agomm: introduce a helper to get the memory type of a page
Roger Pau Monné [Fri, 7 Sep 2018 09:08:00 +0000 (11:08 +0200)]
mm: introduce a helper to get the memory type of a page

Returns all the memory types applicable to a page.

This function is unimplemented for ARM.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoiommu: make iommu_inclusive_mapping a suboption of dom0-iommu
Roger Pau Monné [Fri, 7 Sep 2018 09:08:00 +0000 (11:08 +0200)]
iommu: make iommu_inclusive_mapping a suboption of dom0-iommu

Introduce a new dom0-iommu=map-inclusive generic option that
supersedes iommu_inclusive_mapping. The previous behavior is preserved
and the option should only be enabled by default on Intel hardware.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
6 years agoiommu: introduce dom0-iommu option
Roger Pau Monné [Fri, 7 Sep 2018 09:08:00 +0000 (11:08 +0200)]
iommu: introduce dom0-iommu option

To select the iommu configuration used by Dom0. This option supersedes
iommu=dom0-strict|dom0-passthrough.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoiommu: rename iommu_dom0_strict and iommu_passthrough
Roger Pau Monné [Fri, 7 Sep 2018 09:07:00 +0000 (11:07 +0200)]
iommu: rename iommu_dom0_strict and iommu_passthrough

To iommu_hwdom_strict and iommu_hwdom_passthrough which is more
descriptive of their usage. Also change their type from bool_t to
bool.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
6 years agoxen/sched: Re-position the domain_update_node_affinity() call during vcpu construction
Andrew Cooper [Thu, 6 Sep 2018 13:40:56 +0000 (14:40 +0100)]
xen/sched: Re-position the domain_update_node_affinity() call during vcpu construction

alloc_vcpu()'s call to domain_update_node_affinity() has existed for a decade,
but its effort is mostly wasted.

alloc_vcpu() is called in a loop for each vcpu, bringing them into existence.
The values of the affinity masks are still default, which is allcpus in
general, or a processor singleton for pinned domains.

Furthermore, domain_update_node_affinity() itself loops over all vcpus
accumulating the masks, making it quadratic with the number of vcpus.

Move it to be called once after all vcpus are constructed, which has the same
net effect, but with fewer intermediate memory allocations and less cpumask
arithmetic.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
6 years agoxen/domain: Remove trailing whitespace
Andrii Anisov [Tue, 11 Sep 2018 15:36:32 +0000 (18:36 +0300)]
xen/domain: Remove trailing whitespace

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/HVM: don't #GP/#SS on wrapping virt->linear translations
Jan Beulich [Tue, 11 Sep 2018 13:06:23 +0000 (15:06 +0200)]
x86/HVM: don't #GP/#SS on wrapping virt->linear translations

Real hardware wraps silently in most cases, so we should behave the
same. Also split real and VM86 mode handling, as the latter really
ought to have limit checks applied.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/shadow: a little bit of style cleanup
Jan Beulich [Tue, 11 Sep 2018 13:05:09 +0000 (15:05 +0200)]
x86/shadow: a little bit of style cleanup

Correct indentation of a piece of code, adjusting comment style at the
same time. Constify gl3e pointers and drop a bogus (and useless once
corrected) cast.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
6 years agoxen: Fix inconsistent callers of panic()
Andrew Cooper [Wed, 29 Aug 2018 16:39:10 +0000 (16:39 +0000)]
xen: Fix inconsistent callers of panic()

Callers are inconsistent with whether they pass a newline to panic(),
including adjacent calls in the same function using different styles.

painc() not expecting a newline is inconsistent with most other printing
functions, which is most likely why we've gained so many inconsistencies.

Switch panic() to expect a newline, and update all callers which currently
lack a newline to include one.

This actually reduces the size of .rodata (0x07e3e8 down to 0x07e3a8) because
a number of strings are passed to both panic() and printk().  As they
previously differed by \n alone, they couldn't be merged.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoSVM: limit GIF=0 region
Jan Beulich [Tue, 11 Sep 2018 09:06:41 +0000 (11:06 +0200)]
SVM: limit GIF=0 region

Use EFLAGS.IF for most ordinary purposes; there's in particular no need
to unduly defer NMI/#MC. Clear GIF only immediately before VMRUN itself.
This has the additional advantage that svm_stgi_label now indeed marks
the only place where GIF gets set.

Note regarding the main STI placement: Quite counterintuitively the
host's EFLAGS.IF continues to have a meaning while the guest runs; see
PM Vol 2 section "Physical (INTR) Interrupt Masking in EFLAGS". Hence we
need to set the flag for the duration of time being in guest context.
However, SPEC_CTRL_ENTRY_FROM_HVM wants to be carried out with EFLAGS.IF
clear.

Note regarding the main STGI placement: It could be moved further up,
but at present SPEC_CTRL_EXIT_TO_HVM is not NMI/#MC-safe.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agox86/HVM: split page straddling emulated accesses in more cases
Jan Beulich [Tue, 11 Sep 2018 09:03:46 +0000 (11:03 +0200)]
x86/HVM: split page straddling emulated accesses in more cases

Assuming consecutive linear addresses map to all RAM or all MMIO is not
correct. Nor is assuming that a page straddling MMIO access will access
the same emulating component for both parts of the access. If a guest
RAM read fails with HVMTRANS_bad_gfn_to_mfn and if the access straddles
a page boundary, issue accesses separately for both parts.

The extra call to known_gla() from hvmemul_write() is just to preserve
original behavior; for consistency the check also gets added to
hvmemul_rmw() (albeit I continue to be unsure whether we wouldn't better
drop both).

Note that the correctness of this depends on the MMIO caching used
elsewhere in the emulation code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
6 years agox86/HVM: add known_gla() emulation helper
Jan Beulich [Tue, 11 Sep 2018 09:03:14 +0000 (11:03 +0200)]
x86/HVM: add known_gla() emulation helper

... as a central place to do respective checking for whether the
translation for the linear address is available as well as usable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
6 years agox86/HVM: drop hvm_fetch_from_guest_linear()
Jan Beulich [Tue, 11 Sep 2018 09:02:37 +0000 (11:02 +0200)]
x86/HVM: drop hvm_fetch_from_guest_linear()

It can easily be expressed through hvm_copy_from_guest_linear(), and in
two cases this even simplifies callers.

Suggested-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
6 years agoxsm: fix clang build
Roger Pau Monné [Tue, 11 Sep 2018 09:01:13 +0000 (11:01 +0200)]
xsm: fix clang build

ebitmap.c:244:32: error: invalid conversion specifier 'Z' [-Werror,-Wformat-invalid-specifier]
               "match my size %Zd (high bit was %d)\n", mapunit,
                              ~^
ebitmap.c:245:16: error: format specifies type 'int' but the argument has type 'unsigned long'
      [-Werror,-Wformat]
               sizeof(u64) * 8, e->highbit);
               ^~~~~~~~~~~~~~~
ebitmap.c:245:33: error: data argument not used by format string [-Werror,-Wformat-extra-args]
               sizeof(u64) * 8, e->highbit);

Use %zd instead of %Zd, which is compliant with C99.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
6 years agox86/HVM: meet xentrace's expectations on emulation event data
Jan Beulich [Tue, 11 Sep 2018 09:00:01 +0000 (11:00 +0200)]
x86/HVM: meet xentrace's expectations on emulation event data

According to the logic in hvm_mmio_assist_process(), 64 bits of data are
expected with 64-bit addresses, and 32 bits of data with 32-bit ones. I
don't think this is very reasonable, but I'm also not going to touch the
consumer side, the more that it is anyway not very helpful for the code
here to only ever supply 32 bits of data (despite the field being 64
bits wide, and having been even in the 32-bit days of Xen).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
6 years agodocs: document ~/control/sysrq
Wei Liu [Wed, 5 Sep 2018 14:05:01 +0000 (15:05 +0100)]
docs: document ~/control/sysrq

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agomkdeb: use compression level 0
Wei Liu [Fri, 7 Sep 2018 10:41:31 +0000 (11:41 +0100)]
mkdeb: use compression level 0

This requires calling dpkg-deb directly and pass it -z0.

It reduces the time to run the mkdeb script from 14 seconds to 3
seconds on my workstation with SSD, from 87s to 15s on a machine
with HDD. The deb file grows from 49M to 58M.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agotools/mkrpm: switch payload to gzip to reduce turnaround time
Olaf Hering [Thu, 30 Aug 2018 10:05:11 +0000 (12:05 +0200)]
tools/mkrpm: switch payload to gzip to reduce turnaround time

rpmbuild -bb spents alot of time in compressing the binaries. Reduce the
turnaround time of 'make rpmball' by using gzip as compression tool.
This reduces the buildtime from 'w9.xzdio'/138 seconds to 'w1.gzdio'/88
seconds in my environment.
The downside is an increased filesize of xen.rpm, 19MB vs. 37MB.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: don't set PoD target for PV guests
Wei Liu [Tue, 4 Sep 2018 16:15:23 +0000 (17:15 +0100)]
libxl: don't set PoD target for PV guests

Previously PoD target was unconditionally set for both PV and HVM
guests, but in fact PoD has always been an HVM (now PVH as well) only
feature.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agox86/hvm: rearrange content of hvm.h
Wei Liu [Tue, 4 Sep 2018 16:15:25 +0000 (17:15 +0100)]
x86/hvm: rearrange content of hvm.h

Move enum and function declarations to first half of the file.

Static inline functions and macros, which reference HVM specific
fields directly are grouped together in second half of the file.

The movement is needed because in a later patch the second half is
going to be enclosed in CONFIG_HVM.

Pure code movement. No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoautomation: specify -j$(nproc) in build script
Wei Liu [Thu, 6 Sep 2018 14:55:59 +0000 (15:55 +0100)]
automation: specify -j$(nproc) in build script

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agopvshim: introduce a PV shim defconfig
Roger Pau Monné [Fri, 7 Sep 2018 07:29:20 +0000 (09:29 +0200)]
pvshim: introduce a PV shim defconfig

In order to build a tailored pvshim-only binary from Xen. Switch the
PV shim build from the tools firmware into using the new defconfig.

A diff of the .config generated for the pvshim firmware build before
and after this change shows no differences.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/dmar: zap DMAR signature for dom0 once in TBOOT case
Zhenzhong Duan [Fri, 7 Sep 2018 07:27:19 +0000 (09:27 +0200)]
x86/dmar: zap DMAR signature for dom0 once in TBOOT case

Commit 6c298ecc1f ("vtd: Reinstate ACPI DMAR on system shutdown or
S3/S4/S5") did everything for acpi_dmar_zap() call to be unnecessary,
except for invoking the function from acpi_parse_dmar(), which
123c779379 ("VTd/dmar: Tweak how the DMAR table is clobbered")
added several years later.

Some stale comments are also removed, No functional change.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
6 years agoxen/ARM+sched: Don't opencode %pv in printk()'s
Andrew Cooper [Wed, 29 Aug 2018 16:27:44 +0000 (16:27 +0000)]
xen/ARM+sched: Don't opencode %pv in printk()'s

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
6 years agox86: PIT emulation is common to both PV and HVM
Wei Liu [Tue, 4 Sep 2018 16:15:22 +0000 (17:15 +0100)]
x86: PIT emulation is common to both PV and HVM

Move the file to x86 common code and change its name to emul-i8254.c.

Put HVM only code under CONFIG_HVM or is_hvm_domain.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: XENMEM_resource_ioreq_server is HVM only
Wei Liu [Thu, 6 Sep 2018 15:18:31 +0000 (16:18 +0100)]
x86: XENMEM_resource_ioreq_server is HVM only

Put the entire case branch under CONFIG_HVM.

Lift the check from hvm_get_ioreq_server_frame into its caller.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: introduce and use a set of internal emulation flags
Wei Liu [Tue, 4 Sep 2018 16:15:19 +0000 (17:15 +0100)]
x86: introduce and use a set of internal emulation flags

Use these flags in has_* tests and emulation_flags_ok.

Not using raw flags directly enables DCE to kick in for has_* tests,
while at the same time makes sure emulation_flags_ok won't go out of
sync.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/viridian: set shutdown_code in response to CrashNotify
Paul Durrant [Fri, 10 Aug 2018 15:43:42 +0000 (16:43 +0100)]
x86/viridian: set shutdown_code in response to CrashNotify

When Windows writes the CrashNotify bit in the CRASH_CTL MSR then we know
it is crashing, so set the domain shutdown code appropriately.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen/domctl: Drop vcpu_alloc_lock
Andrew Cooper [Tue, 27 Feb 2018 17:22:40 +0000 (17:22 +0000)]
xen/domctl: Drop vcpu_alloc_lock

Since its introduction in c/s 8cbb5278e "x86/AMD: Add support for AMD's OSVW
feature in guests", the OSVW data has been corrected to be per-domain rather
than per-vcpu, and is initialised during XEN_DOMCTL_createdomain.

Furthermore, because XENPF_microcode_update uses hypercall continuations to
move between CPUs, it drops the vcpu_alloc_lock mid update, meaning that it
didn't provided the interlock guarantee that the OSVW patch was looking for in
the first place.

This interlock serves no purpose, so take the opportunity to drop it and
remove a global spinlock from the hypervisor.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86emul: fix test harness dependencies
Jan Beulich [Thu, 6 Sep 2018 14:05:52 +0000 (16:05 +0200)]
x86emul: fix test harness dependencies

The generated header files are what needs to spell out dependencies on
other (real) headers in the main Makefile here, not the intermediate
(helper) .o files produced through testcase.mk.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/hvm: remove default ioreq server (again)
Paul Durrant [Thu, 6 Sep 2018 14:04:51 +0000 (16:04 +0200)]
x86/hvm: remove default ioreq server (again)

My recent patch [1] to qemu-xen-traditional removes the last use of the
'default' ioreq server in Xen. (This is a catch-all ioreq server that is
used if no explicitly registered I/O range is targetted).

This patch can be applied once that patch is committed, to remove the
(>100 lines of) redundant code in Xen.

The previous version of this patch caused a QEMU build failure. This has
been fixed by extending the #ifdef around deprecated HVM_PARAM declarations
to __XEN_TOOLS__ as well as __XEN__.

NOTE: The removal of the special case for HVM_PARAM_DM_DOMAIN in
      hvm_allow_set_param() is not directly related to removal of
      default ioreq servers. It could have been cleaned up at any time
      after commit 9a422c03 "x86/hvm: stop passing explicit domid to
      hvm_create_ioreq_server()". It is now added to the new
      deprecated sets introduced by this patch.

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg00270.html

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen: add DEBUG_INFO Kconfig symbol
Olaf Hering [Thu, 6 Sep 2018 14:02:58 +0000 (16:02 +0200)]
xen: add DEBUG_INFO Kconfig symbol

Creating debug info during build is not strictly required at runtime.
Make it optional by introducing a new Kconfig knob "DEBUG_INFO".
This slightly reduces build time and diskusage, if disabled.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agotools/xl: refuse to set number of vcpus to 0 via xl vcpu-set
Juergen Gross [Mon, 3 Sep 2018 12:59:42 +0000 (14:59 +0200)]
tools/xl: refuse to set number of vcpus to 0 via xl vcpu-set

Trying to set the number of vcpus of a domain to 0 isn't refused.
We should not allow that.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxen: fill topology info for all present cpus
Juergen Gross [Fri, 31 Aug 2018 15:22:05 +0000 (17:22 +0200)]
xen: fill topology info for all present cpus

The topology information obtainable via XEN_SYSCTL_cputopoinfo is
filled rather weird: the size of the array is derived from the highest
online cpu number, so in case there are trailing offline cpus they
will not be included.

On a dual core system with 4 threads booted with smt=0 without this
patch xl info -n will print:

cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
  1:       0        0        0
  2:       1        0        0

while with this patch the output is:

cpu_topology           :
cpu:    core    socket     node
  0:       0        0        0
  1:       0        0        0
  2:       1        0        0
  3:       1        0        0

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agotools/libxl: correct vcpu affinity output with sparse physical cpu map
Juergen Gross [Fri, 31 Aug 2018 15:22:04 +0000 (17:22 +0200)]
tools/libxl: correct vcpu affinity output with sparse physical cpu map

With not all physical cpus online (e.g. with smt=0) the output of hte
vcpu affinities is wrong, as the affinity bitmaps are capped after
nr_cpus bits, instead of using max_cpu_id.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: create control/sysrq xenstore node
Vitaly Kuznetsov [Tue, 4 Sep 2018 11:39:29 +0000 (13:39 +0200)]
libxl: create control/sysrq xenstore node

'xl sysrq' command doesn't work with modern Linux guests with the following
message in guest's log:

 xen:manage: sysrq_handler: Error -13 writing sysrq in control/sysrq

xenstore trace confirms:

 IN 0x24bd9a0 20180904 04:36:32 WRITE (control/sysrq )
 OUT 0x24bd9a0 20180904 04:36:32 ERROR (EACCES )

The problem seems to be in the fact that we don't pre-create control/sysrq
xenstore node and libxl_send_sysrq() doing libxl__xs_printf() creates it as
read-only. As we want to allow guests to clean 'control/sysrq' after the
requested action is performed, we need to make this node writable.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/xl: fix output of xl vcpu-pin dry run with smt=0
Juergen Gross [Mon, 3 Sep 2018 11:26:30 +0000 (13:26 +0200)]
tools/xl: fix output of xl vcpu-pin dry run with smt=0

Fix another smt=0 fallout: xl -N vcpu-pin prints only parts of the
affinities as it is using the number of online cpus instead of the
maximum cpu number.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: monitor.o is currently HVM only
Wei Liu [Tue, 4 Sep 2018 16:15:21 +0000 (17:15 +0100)]
x86: monitor.o is currently HVM only

There has been plan to make PV work, but it is not yet there.  Provide
stubs to make it build with !CONFIG_HVM.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
6 years agox86: change name of parameter for various invlpg functions
Wei Liu [Tue, 4 Sep 2018 16:15:18 +0000 (17:15 +0100)]
x86: change name of parameter for various invlpg functions

They all incorrectly named a parameter virtual address while it should
have been linear address.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agoxen/domain: Make rangeset_domain_destroy() idempotent
Andrew Cooper [Mon, 3 Sep 2018 12:56:55 +0000 (13:56 +0100)]
xen/domain: Make rangeset_domain_destroy() idempotent

... and move it into the common __domain_destroy() path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxen/domain: Fold xsm_free_security_domain() paths together
Andrew Cooper [Mon, 3 Sep 2018 11:48:13 +0000 (12:48 +0100)]
xen/domain: Fold xsm_free_security_domain() paths together

xsm_free_security_domain() is idempotent (both the dummy handler, and the
flask handler).  Move it into the shared __domain_destroy() path, and drop the
INIT_xsm flag from domain_create()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxen/domain: Call lock_profile_deregister_struct() from common code
Andrew Cooper [Mon, 3 Sep 2018 11:10:48 +0000 (12:10 +0100)]
xen/domain: Call lock_profile_deregister_struct() from common code

lock_profile_register_struct() is called from common code, but the matching
deregister was previously only called from x86 code.

The practical upshot of this when using CONFIG_LOCK_PROFILE, destroyed domains
on ARM (and in particular, the freed page behind struct domain) remain on the
lockprofile linked list, which will become corrupt when the page is reused.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>