]> xenbits.xensource.com Git - people/sstabellini/xen-unstable.git/.git/log
people/sstabellini/xen-unstable.git/.git
6 years agoxen/arm: add reserved-memory regions to the dom0 memory node iomem_cache-v2
Stefano Stabellini [Tue, 30 Apr 2019 20:56:40 +0000 (13:56 -0700)]
xen/arm: add reserved-memory regions to the dom0 memory node

Reserved memory regions are automatically remapped to dom0. Their device
tree nodes are also added to dom0 device tree. However, the dom0 memory
node is not currently extended to cover the reserved memory regions
ranges as required by the spec.  This commit fixes it.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
6 years agoxen/arm: map reserved-memory regions as normal memory in dom0
Stefano Stabellini [Tue, 30 Apr 2019 20:55:40 +0000 (13:55 -0700)]
xen/arm: map reserved-memory regions as normal memory in dom0

reserved-memory regions should be mapped as normal memory. At the
moment, they get remapped as device memory in dom0 because Xen doesn't
know any better. Add an explicit check for it.

reserved-memory regions overlap with memory nodes. The overlapping
memory is reserved-memory and should be handled accordingly:
consider_modules and dt_unreserved_regions should skip these regions the
same way they are already skipping mem-reserve regions.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---
Changes in v2:
- fix commit message: full overlap
- remove check_reserved_memory
- extend consider_modules and dt_unreserved_regions

6 years agoxen/arm: keep track of reserved-memory regions
Stefano Stabellini [Tue, 30 Apr 2019 20:54:56 +0000 (13:54 -0700)]
xen/arm: keep track of reserved-memory regions

As we parse the device tree in Xen, keep track of the reserved-memory
regions as they need special treatment (follow-up patches will make use
of the stored information.)

Reuse process_memory_node to add reserved-memory regions to the
bootinfo.reserved_mem array. Remove the warning if there is no reg in
process_memory_node because it is a normal condition for
reserved-memory.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---

Not done: create an e820-like structure on ARM.

Changes in v2:
- call process_memory_node from process_reserved_memory_node to avoid
  duplication

6 years agoxen/arm: make process_memory_node a device_tree_node_func
Stefano Stabellini [Tue, 30 Apr 2019 20:53:19 +0000 (13:53 -0700)]
xen/arm: make process_memory_node a device_tree_node_func

Change the signature of process_memory_node to match
device_tree_node_func.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---
Changes in v2:
- new

6 years agoxen/arm: extend device_tree_for_each_node
Stefano Stabellini [Tue, 30 Apr 2019 20:52:19 +0000 (13:52 -0700)]
xen/arm: extend device_tree_for_each_node

Add two new paramters to device_tree_for_each_node: node and depth.
Node is the node to start the search from and depth is the min depth of
the search.

Passing 0, 0 triggers the old behavior.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---
Changes in v2:
- new

6 years agolibxl/xl: add memory policy option to iomem
Stefano Stabellini [Tue, 30 Apr 2019 20:51:19 +0000 (13:51 -0700)]
libxl/xl: add memory policy option to iomem

Add a new memory policy option for the iomem parameter.
Possible values are:
- arm_devmem, device nGRE, the default on ARM
- arm_memory, WB cachable memory
- x86_uc: uncachable memory, the default on x86

Store the parameter in a new field in libxl_iomem_range.

Pass the memory policy option to xc_domain_mem_map_policy.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: ian.jackson@eu.citrix.com
CC: wei.liu2@citrix.com
---
Changes in v2:
- add #define LIBXL_HAVE_MEMORY_POLICY
- ability to part the memory policy parameter even if gfn is not passed
- rename cache_policy to memory policy
- rename MEMORY_POLICY_DEVMEM to MEMORY_POLICY_ARM_DEV_nGRE
- rename MEMORY_POLICY_MEMORY to MEMORY_POLICY_ARM_MEM_WB
- rename memory to arm_memory and devmem to arm_devmem
- expand the non-security support status to non device passthrough iomem
  configurations
- rename iomem options
- add x86 specific iomem option

6 years agolibxc: introduce xc_domain_mem_map_policy
Stefano Stabellini [Tue, 30 Apr 2019 20:50:19 +0000 (13:50 -0700)]
libxc: introduce xc_domain_mem_map_policy

Introduce a new libxc function that makes use of the new memory_policy
parameter added to the XEN_DOMCTL_memory_mapping hypercall.

The parameter values are the same for the XEN_DOMCTL_memory_mapping
hypercall (0 is MEMORY_POLICY_DEFAULT). Pass MEMORY_POLICY_DEFAULT by
default -- no changes in behavior.

We could extend xc_domain_memory_mapping, but QEMU makes use of it, so
it is easier and less disruptive to introduce a new libxc function and
change the implementation of xc_domain_memory_mapping to call into it.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: ian.jackson@eu.citrix.com
CC: wei.liu2@citrix.com
---
Changes in v2:
- rename cache_policy to memory policy
- rename MEMORY_POLICY_DEVMEM to MEMORY_POLICY_ARM_DEV_nGRE
- rename MEMORY_POLICY_MEMORY to MEMORY_POLICY_ARM_MEM_WB
- introduce xc_domain_mem_map_policy

6 years agoxen: extend XEN_DOMCTL_memory_mapping to handle memory policy
Stefano Stabellini [Tue, 30 Apr 2019 20:49:19 +0000 (13:49 -0700)]
xen: extend XEN_DOMCTL_memory_mapping to handle memory policy

Reuse the existing padding field to pass memory policy information.  On
Arm, the caller can specify whether the memory should be mapped as
device nGRE, which is the default and the only possibility today, or
cacheable memory write-back. On x86, the only option is uncachable. The
current behavior becomes the default (numerically '0').

On ARM, map device nGRE as p2m_mmio_direct_dev (as it is already done
today) and WB cacheable memory as p2m_mmio_direct_c.

On x86, return error if the memory policy requested is not
MEMORY_POLICY_X86_UC.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: JBeulich@suse.com
CC: andrew.cooper3@citrix.com
---
Changes in v2:
- rebase
- use p2m_mmio_direct_c
- use EOPNOTSUPP
- rename cache_policy to memory policy
- rename MEMORY_POLICY_DEVMEM to MEMORY_POLICY_ARM_DEV_nGRE
- rename MEMORY_POLICY_MEMORY to MEMORY_POLICY_ARM_MEM_WB
- add MEMORY_POLICY_X86_UC
- add MEMORY_POLICY_DEFAULT and use it

6 years agoxen: rename un/map_mmio_regions to un/map_regions
Stefano Stabellini [Tue, 30 Apr 2019 20:48:19 +0000 (13:48 -0700)]
xen: rename un/map_mmio_regions to un/map_regions

Now that map_mmio_regions takes a p2mt parameter, there is no need to
keep "mmio" in the name. The p2mt parameter does a better job at
expressing what the mapping is about. Let's save the environment 5
characters at a time.

Also fix the comment on top of map_mmio_regions.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: JBeulich@suse.com
CC: andrew.cooper3@citrix.com
---
Changes in v2:
- new patch

6 years agoxen: add a p2mt parameter to map_mmio_regions
Stefano Stabellini [Tue, 30 Apr 2019 20:47:16 +0000 (13:47 -0700)]
xen: add a p2mt parameter to map_mmio_regions

Add a p2mt parameter to map_mmio_regions, pass p2m_mmio_direct_dev on
ARM and p2m_mmio_direct on x86 -- no changes in behavior.

On ARM, given the similarity between map_mmio_regions after the change
and map_regions_p2mt, remove un/map_regions_p2mt.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
CC: JBeulich@suse.com
CC: andrew.cooper3@citrix.com
---
Changes in v2:
- new patch

6 years agox86/msr: Fix fallout from mostly c/s 832c180
Andrew Cooper [Tue, 9 Apr 2019 15:18:46 +0000 (16:18 +0100)]
x86/msr: Fix fallout from mostly c/s 832c180

 * Fix the shim build by providing a !CONFIG_HVM declaration for
   hvm_get_guest_bndcfgs(), and removing the introduced
   ASSERT(is_hvm_domain(d))'s.  They are needed for DCE to keep the build
   working.  Furthermore, in this way, the risk of runtime type confusion is
   removed.
 * Revert the de-const'ing of the vcpu pointer in vmx_get_guest_bndcfgs().
   vmx_vmcs_enter() really does mutate the vcpu, and may cause it to undergo a
   full de/reschedule, which is contrary to the programmers expectation of
   hvm_get_guest_bndcfgs().  guest_rdmsr() was always going to need to lose
   its const parameter, and this was the correct time for it to happen.
 * The MSRs in vcpu_msrs are in numeric order.  Re-position XSS to match.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agotimers: move back migrate_timers_from_cpu() invocation
Jan Beulich [Thu, 11 Apr 2019 10:45:41 +0000 (04:45 -0600)]
timers: move back migrate_timers_from_cpu() invocation

Commit 597fbb8be6 ("xen/timers: Fix memory leak with cpu unplug/plug")
went a little too far: Migrating timers away from a CPU being offlined
needs to heppen independent of whether it get parked or fully offlined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
6 years agoxl: handle PVH type in apply_global_affinity_masks again
Wei Liu [Fri, 12 Apr 2019 10:03:25 +0000 (11:03 +0100)]
xl: handle PVH type in apply_global_affinity_masks again

A call site in create_domain can call it with PVH type. That site was
missed during the review of 48dab9767.

Reinstate PVH type in the switch.

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agoxmalloc: statically initialize pool list head and lock
Jan Beulich [Thu, 11 Apr 2019 08:25:22 +0000 (10:25 +0200)]
xmalloc: statically initialize pool list head and lock

There's no need to execute any instructions for doing so.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86: fix build race when generating temporary object files
Jan Beulich [Thu, 11 Apr 2019 08:25:05 +0000 (10:25 +0200)]
x86: fix build race when generating temporary object files

The rules to generate xen-syms and xen.efi may run in parallel, but both
recursively invoke $(MAKE) to build symbol/relocation table temporary
object files. These recursive builds would both re-generate the .*.d2
files (where needed). Both would in turn invoke the same rule, thus
allowing for a race on the .*.d2.tmp intermediate files.

The dependency files of the temporary .xen*.o files live in xen/ rather
than xen/arch/x86/ anyway, so won't be included no matter what. Take the
opportunity and delete them, as the just re-generated .xen*.S files will
trigger a proper re-build of the .xen*.o ones anyway.

Empty the DEPS variable in case the set of goals consists of just those
temporary object files, thus eliminating the race.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/mm: Clean up p2m_finish_type_change return value
Alexandru Stefan ISAILA [Wed, 10 Apr 2019 10:08:39 +0000 (11:08 +0100)]
x86/mm: Clean up p2m_finish_type_change return value

In the case of any errors, finish_type_change() passes values returned
from p2m->recalc() up the stack (with some exceptions in the case where
an error is expected); this eventually ends up being returned to the
XEN_DOMOP_map_mem_type_to_ioreq_server hypercall.

However, on Intel processors (but not on AMD processor), p2m->recalc()
can also return '1' as well as '0'.  This case is handled very
inconsistently: finish_type_change() will return the value of the final
entry it attempts, discarding results for other entries;
p2m_finish_type_change() will attempt to accumulate '1's, so that it
returns '1' if any of the calls to finish_type_change() returns '1'; and
dm_op() will again return '1' only if the very last call to
p2m_finish_type_change() returns '1'.  The result is that the
XEN_DMOP_map_mem_type_to_ioreq_server() hypercall will sometimes return
0 and sometimes return 1 on success, in an unpredictable manner.

The hypercall documentation doesn't mention return values; but it's not
clear what the caller could do with the information about whether
entries had been changed or not.  At the moment it's always 0 on AMD
boxes, and *usually* 1 on Intel boxes; so nothing can be relying on a
'1' return value for correctness (or if it is, it's broken).

Make the return value on success consistently '0' by only returning
0/-ERROR from finish_type_change().  Also remove the accumulation code
from p2m_finish_type_change().

Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: Fix altp2m_op hypercall continuations
Andrew Cooper [Fri, 5 Apr 2019 14:59:27 +0000 (15:59 +0100)]
x86/hvm: Fix altp2m_op hypercall continuations

c/s 9383de210 "x86/altp2m: support for setting restrictions for an array of
pages" introduced this logic, but do_hvm_op() was already capable of handling
-ERESTART correctly.

More problematic however is a continuation from compat_altp2m_op().  The arg
written back into register state points into the hypercall XLAT area, not at
the original parameter passed by the guest.  It may be truncated by the
vmentry, but definitely won't be correct on the next invocation.

Delete the hypercall_create_continuation() call, and return -ERESTART, which
will cause the compat case to start working correctly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/smt: Support for enabling/disabling SMT at runtime
Andrew Cooper [Thu, 28 Mar 2019 14:37:00 +0000 (14:37 +0000)]
x86/smt: Support for enabling/disabling SMT at runtime

Currently, a user can in principle combine the output of `xl info -n`, the
APCI tables, and some manual CPUID data to figure out which CPU numbers to
feed into `xen-hptool cpu-offline` to effectively disable SMT at runtime.

A more convenient option is to teach Xen how to perform this action.

Extend XEN_SYSCTL_cpu_hotplug with two new operations.  Introduce a new
smt_up_down_helper() which wraps the cpu_{up,down}_helper() helpers with logic
which understands siblings based on their APIC_ID.

Add libxc stubs, and extend xen-hptool with smt-{enable,disable} options.
These are intended to be shorthands for a loop over cpu-{online,offline}.

To simplify the implemention, they will strictly enable/disable secondary
siblings (those with a non-zero thread id).  This functionality is intended
for use in production scenarios where debugging options such as `maxcpus=` or
other manual plug/unplug configuration has not been used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agotools/xl: use libxl_domain_info to get domain type for vcpu-pin
Igor Druzhinin [Tue, 9 Apr 2019 12:01:58 +0000 (13:01 +0100)]
tools/xl: use libxl_domain_info to get domain type for vcpu-pin

Parsing the config seems to be an overkill for this particular task
and the config might simply be absent. Type returned from libxl_domain_info
should be either LIBXL_DOMAIN_TYPE_HVM or LIBXL_DOMAIN_TYPE_PV but in
that context distinction between PVH and HVM should be irrelevant.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: Document device_add_domain_config
Anthony PERARD [Fri, 5 Apr 2019 17:58:11 +0000 (18:58 +0100)]
libxl: Document device_add_domain_config

Commit 03e1a56d81c16eece735e4d0ef74bfb10eaaba07 replaced DEVICE_ADD()
calls by device_add_domain_config() calls but also removed the comment
of DEVICE_ADD(). Copy the useful part of that comment to
device_add_domain_config().

Also, rename the parameter `type` to `dev`, because that parameter isn't
used as a type but as the device we want to add/update to d_config.

Also, constify `dev` because it isn't modified.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: Constify src of device_compare_fn_t
Anthony PERARD [Fri, 5 Apr 2019 17:58:10 +0000 (18:58 +0100)]
libxl: Constify src of device_compare_fn_t

All functions libxl_device_*_copy which implements device_compare_fn_t
already have the `src' parameter defined with const.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: Constify libxl_device_*_compare functions
Anthony PERARD [Fri, 5 Apr 2019 17:58:09 +0000 (18:58 +0100)]
libxl: Constify libxl_device_*_compare functions

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/IOMMU: abstract Intel-specific adjust_vtd_irq_affinities()
Jan Beulich [Tue, 9 Apr 2019 13:12:43 +0000 (15:12 +0200)]
x86/IOMMU: abstract Intel-specific adjust_vtd_irq_affinities()

This can't be folded into the resume hook, as that runs before bringing
back up APs, but the affinity adjustment wants to happen with all CPUs
back online. Hence a separate hook is needed such that AMD can then
leverage it as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoVT-d: posted interrupts require interrupt remapping
Jan Beulich [Tue, 9 Apr 2019 13:12:07 +0000 (15:12 +0200)]
VT-d: posted interrupts require interrupt remapping

Initially I had just noticed the unnecessary indirection in the call
from pi_update_irte(). The generic wrapper having an iommu_intremap
conditional made me look at the setup code though. So first of all
enforce the necessary dependency.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: remove defunct init/load/save_msr() hvm_funcs
Paul Durrant [Thu, 14 Mar 2019 13:55:00 +0000 (14:55 +0100)]
x86: remove defunct init/load/save_msr() hvm_funcs

These hvm_funcs are no longer required since no MSR values are saved or
restored by implementation-specific code.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86: stop handling MSR_IA32_XSS save/restore in implementation code
Paul Durrant [Thu, 14 Mar 2019 13:55:00 +0000 (14:55 +0100)]
x86: stop handling MSR_IA32_XSS save/restore in implementation code

Saving and restoring the value of this MSR is currently handled by
implementation-specific code despite it being architectural. This patch
moves handling of accesses to this MSR from hvm.c into the msr.c, thus
allowing the common MSR save/restore code to handle it.

This patch also adds proper checks of CPUID policy in the new get/set code.

NOTE: MSR_IA32_XSS is the last MSR to be saved and restored by
      implementation-specific code. This patch therefore removes the
      (VMX) definitions and of the init_msr(), save_msr() and
      load_msr() hvm_funcs, as they are no longer necessary. The
      declarations of and calls to those hvm_funcs will be cleaned up
      by a subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86: move the saved value of MSR_IA32_XSS into struct vcpu_msrs
Paul Durrant [Thu, 14 Mar 2019 13:54:00 +0000 (14:54 +0100)]
x86: move the saved value of MSR_IA32_XSS into struct vcpu_msrs

Currently the value is saved directly in struct hvm_vcpu. This patch simply
co-locates it with other saved MSR values. No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86: stop handling MSR_IA32_BNDCFGS save/restore in implementation code
Paul Durrant [Thu, 14 Mar 2019 13:56:00 +0000 (14:56 +0100)]
x86: stop handling MSR_IA32_BNDCFGS save/restore in implementation code

Saving and restoring the value of this MSR is currently handled by
implementation-specific code despite it being architectural. This patch
moves handling of accesses to this MSR from hvm.c into the msr.c, thus
allowing the common MSR save/restore code to handle it.

NOTE: Because vmx_get/set_guest_bndcfgs() call vmx_vmcs_enter(), the
      struct vcpu pointer passed in, and hence the vcpu pointer passed to
      guest_rdmsr() cannot be const.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
6 years agoxen/arm: memaccess: Initialize correctly *access in __p2m_get_mem_access
Julien Grall [Wed, 27 Mar 2019 18:45:23 +0000 (18:45 +0000)]
xen/arm: memaccess: Initialize correctly *access in __p2m_get_mem_access

The commit 8d84e701fd "xen/arm: initialize access" initializes
*access using the wrong enumeration type. This result to a warning
using clang:

mem_access.c:50:20: error: implicit conversion from enumeration type
'p2m_access_t' to different enumeration type 'xenmem_access_t'
[-Werror,-Wenum-conversion]
    *access = p2m->default_access;
            ~ ~~~~~^~~~~~~~~~~~~~

The correct solution is to use the array memaccess that will do the
conversion between the 2 enums.

Fixes: 8d84e701fd ("xen/arm: initialize access")
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
6 years agoxen/console: Properly buffer domU output when using CONSOLEIO_write
Julien Grall [Tue, 2 Apr 2019 16:42:35 +0000 (17:42 +0100)]
xen/console: Properly buffer domU output when using CONSOLEIO_write

The output will be buffered if the buffer provided by the DomU does not
contain a newline. This can also happen if buffer provided by DomU is
split in multiple part (Xen can only process 127 characters at the time).

As Xen will remove any non-printable characters, the output buffer may
be smaller than the buffer provided. However, Xen will buffer using the
original length. This means that the NUL character and garbagge will be
copied in the internal buffer.

Once the newline is found or the internal buffer is full, only part of
the internal buffer will end up to be printed.

An easy way to reproduce it is:

HYPERVISOR_consoleio(CONSOLEIO_write, "\33", 1);
HYPERVISOR_consoleio(CONSOLEIO_write, "d", 1);
HYPERVISOR_consoleio(CONSOLEIO_write, "\n", 1);

In the current code, the character 'd' will not be printed.

This problem can be solved by computing the size of the output buffer
(i.e the buffer without the non-printable characters).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/vmx: Fixup removals of MSR load/save list entries
Igor Druzhinin [Thu, 4 Apr 2019 16:25:10 +0000 (17:25 +0100)]
x86/vmx: Fixup removals of MSR load/save list entries

Commit 540d5422 ("x86/vmx: Support removing MSRs from the host/guest
load/save lists") introduced infrastructure finally exposed by
commit fd32dcfe ("x86/vmx: Don't leak EFER.NXE into guest context")
that led to a functional regression on Harpertown and earlier cores
(Gen 1 VT-x) due to MSR count being incorrectly set in VMCS.
As the result, as soon as guest EFER becomes equal to Xen EFER
(which eventually happens in almost every 64-bit VM) and its MSR
entry is supposed to be removed, a stale version of EFER is loaded
into a guest instead causing almost immediate guest failure.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
6 years agoMAINTAINERS: add ARM meson serial driver
Amit Singh Tomar [Thu, 21 Mar 2019 10:25:35 +0000 (15:55 +0530)]
MAINTAINERS: add ARM meson serial driver

The meson-uart.c is an ARM specific UART driver for the Amlogic MESON
SoC family.

Signed-off-by: Amit Singh Tomar <amittomer25@gmail.com>
Reviewed-by: Andre Pryzwara <andre.pryzwara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoxen/arm: Add MESON UART driver for Amlogic Meson SoCs
Amit Singh Tomar [Thu, 21 Mar 2019 10:25:34 +0000 (15:55 +0530)]
xen/arm: Add MESON UART driver for Amlogic Meson SoCs

This patch adds driver for UART controller present on Amlogic Meson
SoCs and it has been tested on Nanopi K2 board based on S905 SoC.

Controller registers defination is taken from Linux 4.20.
https://github.com/torvalds/linux/blob/v4.20-rc1/drivers/tty/serial/meson_uart.c

Signed-off-by: Amit Singh Tomar <amittomer25@gmail.com>
Reviewed-by: Andre Pryzwara <andre.pryzwara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoxen/cpu: Fix ARM build following c/s 597fbb8
Andrew Cooper [Mon, 8 Apr 2019 17:20:07 +0000 (18:20 +0100)]
xen/cpu: Fix ARM build following c/s 597fbb8

c/s 597fbb8 "xen/timers: Fix memory leak with cpu unplug/plug" broke the ARM
build by being the first patch to add park_offline_cpus to common code.

While it is currently specific to Intel hardware (for reasons of being able to
handle machine check exceptions without an immediate system reset), it isn't
inherently architecture specific, so define it to be false on ARM for now.

Add a comment in both smp.h headers explaining the intended behaviour of the
option.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agogitlab-ci: use git clean -ffdx in build each commit test
Wei Liu [Mon, 8 Apr 2019 11:50:31 +0000 (12:50 +0100)]
gitlab-ci: use git clean -ffdx in build each commit test

The build script invoked is designed to run in a pristine checkout.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agocameraif: add ABI for para-virtual camera
Oleksandr Andrushchenko [Fri, 22 Mar 2019 07:37:42 +0000 (09:37 +0200)]
cameraif: add ABI for para-virtual camera

This is the ABI for the two halves of a para-virtualized
camera driver which extends Xen's reach multimedia capabilities even
farther enabling it for video conferencing, In-Vehicle Infotainment,
high definition maps etc.

The initial goal is to support most needed functionality with the
final idea to make it possible to extend the protocol if need be:

1. Provide means for base virtual device configuration:
 - pixel formats
 - resolutions
 - frame rates
2. Support basic camera controls:
 - contrast
 - brightness
 - hue
 - saturation
3. Support streaming control

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Reviewed-by: Juergen Gross <jgross@suse.com>
6 years agox86/IOMMU: initialize iommu_ops in vendor-independent code
Jan Beulich [Mon, 8 Apr 2019 11:08:05 +0000 (13:08 +0200)]
x86/IOMMU: initialize iommu_ops in vendor-independent code

Move this into iommu_hardware_setup() and make that function non-
inline. Move its declaration into common code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agox86/IOMMU: abstract Intel-specific iommu_{en,dis}able_x2apic_IR()
Jan Beulich [Mon, 8 Apr 2019 11:06:54 +0000 (13:06 +0200)]
x86/IOMMU: abstract Intel-specific iommu_{en,dis}able_x2apic_IR()

Introduce respective elements in struct iommu_init_ops as well as a
pointer to the main ops structure.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/IOMMU: abstract Intel-specific iommu_supports_eim()
Jan Beulich [Mon, 8 Apr 2019 11:05:12 +0000 (13:05 +0200)]
x86/IOMMU: abstract Intel-specific iommu_supports_eim()

Introduce a respective element in struct iommu_init_ops.

Take the liberty and also switch intel_iommu_supports_eim() to bool/
true/false, to fully match the hook's type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
6 years agox86/IOMMU: introduce init-ops structure
Jan Beulich [Mon, 8 Apr 2019 11:04:23 +0000 (13:04 +0200)]
x86/IOMMU: introduce init-ops structure

Do away with the CPU vendor dependency, and set the init ops pointer
based on which ACPI tables have been found.

Also take the opportunity and add __read_mostly to iommu_ops.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agox86/ACPI: also parse AMD IOMMU tables early
Jan Beulich [Mon, 8 Apr 2019 11:03:07 +0000 (13:03 +0200)]
x86/ACPI: also parse AMD IOMMU tables early

In order to be able to initialize x2APIC mode we need to parse
respective ACPI tables early. Split amd_iov_detect() into two parts for
this purpose, and call the initial part earlier on.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
6 years agogitlab-ci: log commit range in build test
Wei Liu [Mon, 8 Apr 2019 10:08:56 +0000 (11:08 +0100)]
gitlab-ci: log commit range in build test

It is easier to debug stuff when the target range is clearly visible
at the top.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen/arm: Cap the number of interrupt lines for dom0
Lukas Juenger [Fri, 5 Apr 2019 13:54:04 +0000 (15:54 +0200)]
xen/arm: Cap the number of interrupt lines for dom0

Dom0 vGIC will use the same number of interrupt lines as the hardware GIC.
While the hardware GIC can support up to 1020 interrupt lines,
the vGIC is only supporting up to 992 interrupt lines.
This means that Xen will not be able to boot on platforms where the hardware
GIC supports more than 992 interrupt lines.
While it would make sense to increase the limits in the vGICs, this is not
trivial because of the design choices.
At the moment, only models seem to report the maximum of interrupt lines.
They also do not have any interrupt wired above the 992 limit.
So it should be fine to cap the number of interrupt lines for dom0 to 992 lines.

Signed-off-by: Lukas Juenger <juenger@ice.rwth-aachen.de>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agoxen/timers: Fix memory leak with cpu unplug/plug
Andrew Cooper [Fri, 29 Mar 2019 16:17:24 +0000 (16:17 +0000)]
xen/timers: Fix memory leak with cpu unplug/plug

timer_softirq_action() realloc's itself a larger timer heap whenever
necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
ever freed this allocation.

CPU plug and unplug has the side effect of zeroing the percpu data area, which
clears ts->heap.  This in turn causes new timers to be put on the list rather
than the heap, and for timer_softirq_action() to bootstrap itself again.

This in practice leaks ts->heap every time a CPU is unplugged and replugged.

Implement free_percpu_timers() which includes freeing ts->heap when
appropriate, and update the notifier callback with the recent cpu parking
logic and free-avoidance across suspend.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agodocs/hypervisor-guide: Code Coverage
Andrew Cooper [Tue, 26 Mar 2019 11:54:34 +0000 (11:54 +0000)]
docs/hypervisor-guide: Code Coverage

During a discussion in person, it was identified that Coverage doesn't
currently work for ARM yet.  Also, there are a number of errors with the
existing coverage document.

Take the opportunity to rewrite it in RST, making it easier to follow for a
non-expert user.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agodocs/sphinx: Introduce a hypervisor guide section
Andrew Cooper [Tue, 26 Mar 2019 11:54:32 +0000 (11:54 +0000)]
docs/sphinx: Introduce a hypervisor guide section

Include (and retrofit to the user guide) an introductory paragraph describing
the intended audience.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86emul: don't read mask register on AVX512F-incapable platforms
Jan Beulich [Fri, 5 Apr 2019 15:27:13 +0000 (17:27 +0200)]
x86emul: don't read mask register on AVX512F-incapable platforms

Nor when register state isn't sufficiently enabled.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoautomation: fix "build each commit" test
Wei Liu [Fri, 5 Apr 2019 11:21:57 +0000 (12:21 +0100)]
automation: fix "build each commit" test

An error was introduced while rebasing 9b8b3f30. The new test
shouldn't depend on anything, otherwise artefacts will be downloaded
from build stage and cause the script to abort.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agox86/entry: drop unused header inclusions
Jan Beulich [Fri, 5 Apr 2019 14:28:31 +0000 (16:28 +0200)]
x86/entry: drop unused header inclusions

I'm in particular after getting rid of asm/apicdef.h, but there are more
no longer (or perhaps never having been) used ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
6 years agoMAINTAINERS: Move xen/lib/x86 under x86 maintainership
Julien Grall [Thu, 4 Apr 2019 14:04:10 +0000 (15:04 +0100)]
MAINTAINERS: Move xen/lib/x86 under x86 maintainership

At the moment, xen/lib/x86 is covered by the "REST". However, this is
x86-only, so this can fall under the x86 maintainership.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agodocs/cmdline: Partially revert 3860d5534df4
Andrew Cooper [Fri, 5 Apr 2019 12:32:08 +0000 (13:32 +0100)]
docs/cmdline: Partially revert 3860d5534df4

This hunk modifies the cpuid= documentation, which is unrelated to the
spec-ctrl= section.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agovm_event: fix XEN_VM_EVENT_RESUME domctl
Petre Pircalabu [Fri, 5 Apr 2019 13:42:03 +0000 (15:42 +0200)]
vm_event: fix XEN_VM_EVENT_RESUME domctl

Make XEN_VM_EVENT_RESUME return 0 in case of success, instead of
-EINVAL.
Remove vm_event_resume form vm_event.h header and set the function's
visibility to static as is used only in vm_event.c.
Move the vm_event_check_ring test inside vm_event_resume in order to
simplify the code.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
6 years agox86/gnttab: relax a get_gfn() invocation
Jan Beulich [Fri, 5 Apr 2019 13:41:24 +0000 (15:41 +0200)]
x86/gnttab: relax a get_gfn() invocation

In the case here only a query is intended, i.e. without populating a
possible PoD or paged out entry, as the intention is to replace the
current (grant) entry anyway. Use get_gfn_query() there instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86: don't allow clearing of TF_kernel_mode for other than 64-bit PV
Jan Beulich [Fri, 5 Apr 2019 13:40:42 +0000 (15:40 +0200)]
x86: don't allow clearing of TF_kernel_mode for other than 64-bit PV

The flag is really only meant for those, both HVM and 32-bit PV tell
kernel from user mode based on CPL/RPL. Remove the all-question-marks
comment and let's be on the safe side here and also suppress clearing
for 32-bit PV (this isn't a fast path after all).

Remove no longer necessary is_pv_32bit_*() from sh_update_cr3() and
sh_walk_guest_tables(). Note that shadow_one_bit_disable() already
assumes the new behavior.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agocommon/domain: block speculative out-of-bound accesses
Norbert Manthey [Thu, 14 Mar 2019 12:57:00 +0000 (13:57 +0100)]
common/domain: block speculative out-of-bound accesses

When issuing a vcpu_op hypercall, guests have control over the
vcpuid variable. In the old code, this allowed to perform
speculative out-of-bound accesses. To block this, we make use
of the domain_vcpu function.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/hvm: add nospec to hvmop param
Norbert Manthey [Thu, 14 Mar 2019 12:56:00 +0000 (13:56 +0100)]
x86/hvm: add nospec to hvmop param

The params array in hvm can be accessed with get and set functions.
As the index is guest controlled, make sure no out-of-bound accesses
can be performed.

As we cannot influence how future compilers might modify the
instructions that enforce the bounds, we furthermore block speculation,
so that the update is visible in the architectural state.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agocommon/memory: block speculative out-of-bound accesses
Norbert Manthey [Thu, 14 Mar 2019 12:56:00 +0000 (13:56 +0100)]
common/memory: block speculative out-of-bound accesses

The get_page_from_gfn method returns a pointer to a page that belongs
to a gfn. Before returning the pointer, the gfn is checked for being
valid. Under speculation, these checks can be bypassed, so that
the function get_page is still executed partially. Consequently, the
function page_get_owner_and_reference might be executed partially as
well. In this function, the computed pointer is accessed, resulting in
a speculative out-of-bound address load. As the gfn can be controlled by
a guest, this access is problematic.

To mitigate the root cause, an lfence instruction is added via the
evaluate_nospec macro. To make the protection generic, we do not
introduce the lfence instruction for this single check, but add it to
the mfn_valid function. This way, other potentially problematic accesses
are protected as well.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agois_hvm/pv_domain: block speculation
Norbert Manthey [Thu, 14 Mar 2019 12:56:00 +0000 (13:56 +0100)]
is_hvm/pv_domain: block speculation

When checking for being an hvm domain, or PV domain, we have to make
sure that speculation cannot bypass that check, and eventually access
data that should not end up in cache for the current domain type.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Fri, 5 Apr 2019 10:16:52 +0000 (12:16 +0200)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

6 years agoautomation: introduce a test to build each commit
Wei Liu [Wed, 27 Feb 2019 17:26:42 +0000 (17:26 +0000)]
automation: introduce a test to build each commit

This is added to the test stage so that its failure won't block other
things.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: add a script to build newly pushed commits in Gitlab CI
Wei Liu [Thu, 28 Feb 2019 12:50:02 +0000 (12:50 +0000)]
automation: add a script to build newly pushed commits in Gitlab CI

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agois_control_domain: block speculation
Norbert Manthey [Thu, 14 Mar 2019 12:56:00 +0000 (13:56 +0100)]
is_control_domain: block speculation

Checks of domain properties, such as is_hardware_domain or is_hvm_domain,
might be bypassed by speculatively executing these instructions. A reason
for bypassing these checks is that these macros access the domain
structure via a pointer, and check a certain field. Since this memory
access is slow, the CPU assumes a returned value and continues the
execution.

In case an is_control_domain check is bypassed, for example during a
hypercall, data that should only be accessible by the control domain could
be loaded into the cache.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoautomation: set ret for potential error in build-test.sh
Wei Liu [Wed, 27 Feb 2019 18:22:34 +0000 (18:22 +0000)]
automation: set ret for potential error in build-test.sh

`git rev-list` can fail if the base..tip range contains invalid
commit(s). If that happens ret never gets a chance to be set.

Set ret before hand to fix the issue.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: allow build-test.sh to run in detached HEAD state
Wei Liu [Wed, 27 Feb 2019 17:42:07 +0000 (17:42 +0000)]
automation: allow build-test.sh to run in detached HEAD state

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agonospec: introduce evaluate_nospec
Norbert Manthey [Thu, 14 Mar 2019 12:55:00 +0000 (13:55 +0100)]
nospec: introduce evaluate_nospec

Since the L1TF vulnerability of Intel CPUs, loading hypervisor data into
L1 cache is problematic, because when hyperthreading is used as well, a
guest running on the sibling core can leak this potentially secret data.

To prevent these speculative accesses, we block speculation after
accessing the domain property field by adding lfence instructions. This
way, the CPU continues executing and loading data only once the condition
is actually evaluated.

As this protection is typically used in if statements, the lfence has to
come in a compatible way. Therefore, a function that returns true after an
lfence instruction is introduced. To protect both branches after a
conditional, an lfence instruction has to be added for the two branches.
To be able to block speculation after several evaluations, the generic
barrier macro block_speculation is also introduced.

As the L1TF vulnerability is only present on the x86 architecture, there is
no need to add protection for other architectures. Hence, the introduced
functions are defined but empty.

On the x86 architecture, by default, the lfence instruction is not present
either. Only when a L1TF vulnerable platform is detected, the lfence
instruction is patched in via alternative patching. Similarly, PV guests
are protected wrt L1TF by default, so that the protection is furthermore
disabled in case HVM is exclueded via the build configuration.

Introducing the lfence instructions catches a lot of potential leaks with
a simple unintrusive code change. During performance testing, we did not
notice performance effects.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agospec: add l1tf-barrier
Norbert Manthey [Thu, 14 Mar 2019 12:55:00 +0000 (13:55 +0100)]
spec: add l1tf-barrier

To control the runtime behavior on L1TF vulnerable platforms better, the
command line option l1tf-barrier is introduced. This option controls
whether on vulnerable x86 platforms the lfence instruction is used to
prevent speculative execution from bypassing the evaluation of
conditionals that are protected with the evaluate_nospec macro.

By now, Xen is capable of identifying L1TF vulnerable hardware. However,
this information cannot be used for alternative patching, as a CPU feature
is required. To control alternative patching with the command line option,
a new x86 feature "X86_FEATURE_SC_L1TF_VULN" is introduced. This feature
is used to patch the lfence instruction into the arch_barrier_nospec_true
function. The feature is enabled only if L1TF vulnerable hardware is
detected and the command line option does not prevent using this feature.

The status of hyperthreading is considered when automatically enabling
adding the lfence instruction. Since platforms without hyperthreading can
still be vulnerable to L1TF in case the L1 cache is not flushed properly,
the additional lfence instructions are patched in if either hyperthreading
is enabled, or L1 cache flushing is missing.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/msr: Fix handling of MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV
Andrew Cooper [Mon, 1 Apr 2019 10:08:28 +0000 (11:08 +0100)]
x86/msr: Fix handling of MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV

There are a number of bugs.  There are no read/write hooks on the HVM side, so
guest accesses fall into the "read/write-discard" defaults, which bypass the
correct faulting behaviour and the Intel special case.

For the PV side, writes are discarded (again, bypassing proper faulting),
except for a pinned dom0, which is permitted to actually write the values
other than 0.  This is pointless with read hook implementing the Intel special
case.

However, implementing the Intel special case is itself pointless.  First of
all, OS software can't guarentee to read back 0 in the first place, because a)
this behaviour isn't guarenteed in the SDM, and b) there are SMM handlers
which use the CPUID instruction.  Secondly, when a guest executes CPUID, this
doesn't typically result in Xen executing a CPUID instruction in practice.

With the dom0 special case removed, there are now no writes to this MSR other
than Xen's microcode loading facilities, which means that the value held in
the MSR will be properly up-to-date.  Forward it directly, without jumping
through any hoops.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/cpu: Renumber X86_VENDOR_* to form a bitmap
Andrew Cooper [Thu, 4 Apr 2019 18:39:08 +0000 (19:39 +0100)]
x86/cpu: Renumber X86_VENDOR_* to form a bitmap

CPUs from different vendors sometimes share characteristics.  All users of
X86_VENDOR_* are now direct equal/not-equal comparisons.  By expressing the
X86_VENDOR_* constants in a bitmap fashon, we can more concicely and
efficiently test whether a vendor is one of a group.

Update all parts of the code which can already benefit from this improvement.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/cpu: Introduce x86_cpuid_vendor_to_str() and drop cpu_dev.c_vendor[]
Andrew Cooper [Thu, 4 Apr 2019 18:19:20 +0000 (19:19 +0100)]
x86/cpu: Introduce x86_cpuid_vendor_to_str() and drop cpu_dev.c_vendor[]

cpu_dev.c_vendor[] is a char[8] array which is printed using %s in two
locations.  This leads to subtle lack-of-NUL bugs when using an 8 character
vendor name.

Introduce x86_cpuid_vendor_to_str() to turn an x86_vendor into a printable
string, use it in the two locations that c_vendor is used, and drop c_vendor.

This drops the final user of X86_VENDOR_NUM, so drop that as well.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/cpu: Drop cpu_devs[] and $VENDOR_init_cpu() hooks
Andrew Cooper [Thu, 4 Apr 2019 14:51:25 +0000 (15:51 +0100)]
x86/cpu: Drop cpu_devs[] and $VENDOR_init_cpu() hooks

These helpers each fill in a single cpu_devs[] pointer, and since c/s
00b4f4d0f "x86/cpuid: Drop get_cpu_vendor() completely", this array is read
exactly once on boot.

Delete the hooks and cpu_devs[], and have early_cpu_detect() pick the
appropriate cpu_dev structure directly.

As early_cpu_init() is empty now other than a call to early_cpu_detect(), and
this isn't expected to change moving forwards, rename the latter and delete
the former.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86emul: support AVX512{F,BW} down conversion moves
Jan Beulich [Fri, 5 Apr 2019 08:42:39 +0000 (10:42 +0200)]
x86emul: support AVX512{F,BW} down conversion moves

Note that the vpmov{,s,us}{d,q}w table entries in evex-disp8.c are
slightly different from what one would expect, due to them requiring
EVEX.W to be zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512{F,BW} zero- and sign-extending moves
Jan Beulich [Fri, 5 Apr 2019 08:41:59 +0000 (10:41 +0200)]
x86emul: support AVX512{F,BW} zero- and sign-extending moves

Note that the testing in simd.c doesn't really follow the ISA extension
pattern - to fit the scheme, extensions from byte and word granular
vectors can (currently) sensibly only happen in the AVX512BW case (and
hence respective abstraction macros will be added there rather than
here).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: basic AVX512VL testing
Jan Beulich [Fri, 5 Apr 2019 08:41:12 +0000 (10:41 +0200)]
x86emul: basic AVX512VL testing

Test the 128- and 256-bit variants of the insns which have been
implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512{F,BW,DQ} integer broadcast insns
Jan Beulich [Fri, 5 Apr 2019 08:40:33 +0000 (10:40 +0200)]
x86emul: support AVX512{F,BW,DQ} integer broadcast insns

Note that the pbroadcastw table entry in evex-disp8.c is slightly
different from what one would expect, due to it requiring EVEX.W to be
zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: basic AVX512F testing
Jan Beulich [Fri, 5 Apr 2019 08:40:02 +0000 (10:40 +0200)]
x86emul: basic AVX512F testing

Test various of the insns which have been implemented already.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512{F,BW,DQ} insert insns
Jan Beulich [Fri, 5 Apr 2019 08:39:17 +0000 (10:39 +0200)]
x86emul: support AVX512{F,BW,DQ} insert insns

Also correct the comment of the AVX form of VINSERTPS.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86emul: support AVX512{F,BW,DQ} extract insns
Jan Beulich [Fri, 5 Apr 2019 08:38:38 +0000 (10:38 +0200)]
x86emul: support AVX512{F,BW,DQ} extract insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoviridian: add implementation of the HvSendSyntheticClusterIpi hypercall
Paul Durrant [Tue, 19 Mar 2019 15:29:00 +0000 (16:29 +0100)]
viridian: add implementation of the HvSendSyntheticClusterIpi hypercall

This patch adds an implementation of the hypercall as documented in the
specification [1], section 10.5.2. This enlightenment, as with others, is
advertised by CPUID leaf 0x40000004 and is under control of a new
'hcall_ipi' option in libxl.

If used, this enlightenment should mean the guest only takes a single VMEXIT
to issue IPIs to multiple vCPUs rather than the multiple VMEXITs that would
result from using the emulated local APIC.

[1] https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: add implementation of synthetic timers
Paul Durrant [Tue, 19 Mar 2019 15:29:00 +0000 (16:29 +0100)]
viridian: add implementation of synthetic timers

This patch introduces an implementation of the STIMER0-15_CONFIG/COUNT MSRs
and hence a the first SynIC message source.

The new (and documented) 'stimer' viridian enlightenment group may be
specified to enable this feature.

While in the neighbourhood, this patch adds a missing check for an
attempt to write the time reference count MSR, which should result in an
exception (but not be reported as an unimplemented MSR).

NOTE: It is necessary for correct operation that timer expiration and
      message delivery time-stamping use the same time source as the guest.
      The specification is ambiguous but testing with a Windows 10 1803
      guest has shown that using the partition reference counter as a
      source whilst the guest is using RDTSC and the reference tsc page
      does not work correctly. Therefore the time_now() function is used.
      This implements the algorithm for acquiring partition reference time
      that is documented in the specifiction.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: add implementation of synthetic interrupt MSRs
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: add implementation of synthetic interrupt MSRs

This patch introduces an implementation of the SCONTROL, SVERSION, SIEFP,
SIMP, EOM and SINT0-15 SynIC MSRs. No message source is added and, as such,
nothing will yet generate a synthetic interrupt. A subsequent patch will
add an implementation of synthetic timers which will need the infrastructure
added by this patch to deliver expiry messages to the guest.

NOTE: A 'synic' option is added to the toolstack viridian enlightenments
      enumeration but is deliberately not documented as enabling these
      SynIC registers without a message source is only useful for
      debugging.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: stop directly calling viridian_time_ref_count_freeze/thaw()...
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: stop directly calling viridian_time_ref_count_freeze/thaw()...

...from arch_domain_shutdown/pause/unpause().

A subsequent patch will introduce an implementaion of synthetic timers
which will also need freeze/thaw hooks, so make the exported hooks more
generic and call through to (re-named and static) time_ref_count_freeze/thaw
functions.

NOTE: This patch also introduces a new time_ref_count() helper to return
      the current counter value. This is currently only used by the MSR
      read handler but the synthetic timer code will also need to use it.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: use viridian_map/unmap_guest_page() for reference tsc page
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: use viridian_map/unmap_guest_page() for reference tsc page

Whilst the reference tsc page does not currently need to be kept mapped
after it is initially set up (or updated after migrate), the code can
be simplified by using the common guest page map/unmap and dump functions.
New functionality added by a subsequent patch will also require the page to
kept mapped for the lifetime of the domain.

NOTE: Because the reference tsc page is per-domain rather than per-vcpu
      this patch also changes viridian_map_guest_page() to take a domain
      pointer rather than a vcpu pointer. The domain pointer cannot be
      const, unlike the vcpu pointer.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoviridian: add missing context save helpers into synic and time modules
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: add missing context save helpers into synic and time modules

Currently the time module lacks vcpu context save helpers and the synic
module lacks domain context save helpers. These helpers are not yet
required but subsequent patches will require at least some of them so this
patch completes the set to avoid introducing them in an ad-hoc way.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoviridian: extend init/deinit hooks into synic and time modules
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: extend init/deinit hooks into synic and time modules

This patch simply adds domain and vcpu init/deinit hooks into the synic
and time modules and wires them into viridian_[domain|vcpu]_[init|deinit]().
Only one of the hooks is currently needed (to unmap the 'VP Assist' page)
but subsequent patches will make use of the others.

NOTE: To perform the unmap of the VP Assist page,
      viridian_unmap_guest_page() is now directly called in the new
      viridian_synic_vcpu_deinit() function (which is safe even if
      is_viridian_vcpu() evaluates to false). This replaces the slightly
      hacky mechanism of faking a zero write to the
      HV_X64_MSR_VP_ASSIST_PAGE MSR in viridian_cpu_deinit().

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
6 years agoviridian: make 'fields' struct anonymous...
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: make 'fields' struct anonymous...

...inside viridian_page_msr and viridian_guest_os_id_msr unions.

There's no need to name it and the code is shortened by not doing so.
No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: use stack variables for viridian_vcpu and viridian_domain...
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: use stack variables for viridian_vcpu and viridian_domain...

...where there is more than one dereference inside a function.

This shortens the code and makes it more readable. No functional change.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: separately allocate domain and vcpu structures
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: separately allocate domain and vcpu structures

Currently the viridian_domain and viridian_vcpu structures are inline in
the hvm_domain and hvm_vcpu structures respectively. Subsequent patches
will need to add sizable extra fields to the viridian structures which
will cause the PAGE_SIZE limit of the overall vcpu structure to be
exceeded. This patch, therefore, uses the new init hooks to separately
allocate the structures and converts the 'viridian' fields in hvm_domain
and hvm_cpu to be pointers to these allocations. These separate allocations
also allow some vcpu and domain pointers to become const.

Ideally, now that they are no longer inline, the allocations of the
viridian structures could be made conditional on whether the toolstack
is going to configure the viridian enlightenments. However the toolstack
is currently unable to convey this information to the domain creation code
so such an enhancement is deferred until that becomes possible.

NOTE: The patch also introduced the 'is_viridian_vcpu' macro to avoid
      introducing a second evaluation of 'is_viridian_domain' with an
      open-coded 'v->domain' argument. This macro will also be further
      used in a subsequent patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoviridian: add init hooks
Paul Durrant [Tue, 19 Mar 2019 15:25:00 +0000 (16:25 +0100)]
viridian: add init hooks

This patch adds domain and vcpu init hooks for viridian features. The init
hooks do not yet do anything; the functionality will be added to by
subsequent patches.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agohvmloader: add SMBIOS type 2 info for customized string
Xin Li [Fri, 5 Apr 2019 08:16:16 +0000 (10:16 +0200)]
hvmloader: add SMBIOS type 2 info for customized string

Extend smbios type 2 struct to match specification, add support to
write it when customized string provided and no smbios passed in.

Signed-off-by: Xin Li <xin.li@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/mm: drop redundant local variable from _get_page_type()
Jan Beulich [Fri, 5 Apr 2019 08:15:10 +0000 (10:15 +0200)]
x86/mm: drop redundant local variable from _get_page_type()

Instead of the separate iommu_ret, the general rc can be used even for
the IOMMU operations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agoxen: vcpu_migrate_start can be static
Wei Liu [Thu, 4 Apr 2019 14:13:36 +0000 (15:13 +0100)]
xen: vcpu_migrate_start can be static

It's not used outside of schedule.c.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agobuild: don't mandate availability of a fetcher program
Wei Liu [Thu, 14 Mar 2019 14:08:47 +0000 (14:08 +0000)]
build: don't mandate availability of a fetcher program

It is common that build hosts are isolated from outside world. They
don't necessarily have wget or ftp installed.

Turn the error into warning in configure. And point FETCHER to `false'
command if neither wget nor ftp is available, so any attempt to
download will result in error.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
6 years agox86/sysctl: Clean up XEN_SYSCTL_cpu_hotplug
Andrew Cooper [Fri, 29 Mar 2019 12:14:37 +0000 (12:14 +0000)]
x86/sysctl: Clean up XEN_SYSCTL_cpu_hotplug

A future change is going to introduce two more cases.  Instead of opcoding the
XSM checks and contine_hypercall logic, collect the data into local variables.

Switch the default return value to -EOPNOTSUPP to distinguish a bad op from a
bad cpu index.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/cpu: Distinguish "cpu already in that state" in cpu_{up,down}()
Andrew Cooper [Tue, 2 Apr 2019 13:21:56 +0000 (14:21 +0100)]
xen/cpu: Distinguish "cpu already in that state" in cpu_{up,down}()

All methods of querying the online state of a CPU are racy without the hotplug
lock held, which can lead to a TOCTOU race trying to online or offline CPUs.

Distinguish this case with -EEXIST rather than -EINVAL, so the caller can take
other actions if necessary.

While adjusting this, rework the code slightly to fold the exit paths, which
results in a minor reduction in compiled code size.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/pv: Drop redundant CONFIG_PV ifdefary
Andrew Cooper [Wed, 3 Apr 2019 18:55:55 +0000 (19:55 +0100)]
x86/pv: Drop redundant CONFIG_PV ifdefary

These were made redundant by c/s 23058e7b3 "x86/shadow: put PV L1TF functions
under CONFIG_PV" but makes the surrounding code read as if is outside of the
ifdef.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agogitlab-ci: add fedora gcc build jobs
Wei Liu [Thu, 4 Apr 2019 11:23:02 +0000 (12:23 +0100)]
gitlab-ci: add fedora gcc build jobs

Although the image comes with clang, clang builds don't work yet.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: add Fedora image to containerize script
Wei Liu [Thu, 4 Apr 2019 11:23:01 +0000 (12:23 +0100)]
automation: add Fedora image to containerize script

At the same time sort the list alphabetically.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agoautomation: add a Fedora image
Wei Liu [Thu, 4 Apr 2019 11:23:00 +0000 (12:23 +0100)]
automation: add a Fedora image

Use the latest and greatest.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
6 years agopublic/io/blkif.h: try to fix the semantics of sector based quantities
Paul Durrant [Thu, 4 Apr 2019 11:40:02 +0000 (12:40 +0100)]
public/io/blkif.h: try to fix the semantics of sector based quantities

The semantics of sector based quantities, such as first_sect and last_sect
in blkif_request_segment, and the value of "sectors" in the backend info
in xenstore have become confused. Some comments in the header suggest they
should be supplied/interpreted strictly in terms of 512-byte units, others
suggest they should be scaled by the value of "sector-size" i.e. the
logical block size of the underlying backend storage.
This confusion has caused mixed semantics to become ingrained in frontend
implementations. For instance Linux xen-blkfront.c contains code such as:

    fsect = offset >> 9;
    lsect = fsect + (len >> 9) - 1;

whereas the Windows XENVBD frontend contains the following equivalent code:

    Segment->FirstSector = (UCHAR)((Offset + SectorSize - 1) / SectorSize);
    *SectorsNow = __min(SectorsLeft, SectorsPerPage - Segment->FirstSector);
    Segment->LastSector = (UCHAR)(Segment->FirstSector + *SectorsNow - 1);

(where SectorSize is the "sector-size" value advertized in xenstore).

Thus it has become unsafe for a backend to set "sector-size" to anything
other than 512 as it does not know which way the frontend is coded.

This patch is intended to clarify the situation and also introduce a
mechanism to allow logical block sizes of more than 512 to be supported...

A new frontend feature node is specified: 'feature-large-sector-size'.
If this node is present and set to "1" then it means that frontend is
coded to supply and interpret all sector based quantities in terms of the
the advertized "sector-size" value rather than a hardcoded size of 512.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
6 years agoxen/sched: don't disable scheduler on cpus during suspend
Juergen Gross [Tue, 2 Apr 2019 05:34:57 +0000 (07:34 +0200)]
xen/sched: don't disable scheduler on cpus during suspend

Today there is special handling in cpu_disable_scheduler() for suspend
by forcing all vcpus to the boot cpu. In fact there is no need for that
as during resume the vcpus are put on the correct cpus again.

So we can just omit the call of cpu_disable_scheduler() when offlining
a cpu due to suspend and on resuming we can omit taking the schedule
lock for selecting the new processor.

In restore_vcpu_affinity() we should be careful when applying affinity
as the cpu might not have come back to life. This in turn enables us
to even support affinity_broken across suspend/resume.

Avoid all other scheduler dealloc - alloc dance when doing suspend and
resume, too. It is enough to react on cpus failing to come up on resume
again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>