]> xenbits.xensource.com Git - people/royger/xen.git/log
people/royger/xen.git
7 years agopci: add support to size ROM BARs to pci_size_mem_bar
Roger Pau Monne [Fri, 23 Feb 2018 13:32:23 +0000 (13:32 +0000)]
pci: add support to size ROM BARs to pci_size_mem_bar

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes since v6:
 - Remove the rom local variable.

Changes since v5:
 - Use the flags field.
 - Introduce a mask local variable.
 - Simplify return.

Changes since v4:
 - New in this version.

7 years agopci: split code to size BARs from pci_add_device
Roger Pau Monne [Fri, 23 Feb 2018 13:32:23 +0000 (13:32 +0000)]
pci: split code to size BARs from pci_add_device

So that it can be called from outside in order to get the size of regular PCI
BARs. This will be required in order to map the BARs from PCI devices into PVH
Dom0 p2m.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
---
Changes since v7:
 - Do not return error from pci_size_mem_bar in order to keep previous
   behavior.

Changes since v6:
 - Remove the vf and addr local variables.
 - Change the way flags are declared.
 - Move the last bool parameter to the flags field.

Changes since v5:
 - Introduce a flags field for pci_size_mem_bar.
 - Use pci_sbdf_t.

Changes since v4:
 - Restore printing whether the BAR is from a vf.
 - Make the psize pointer parameter not optional.
 - s/u64/uint64_t.
 - Remove some unneeded parentheses.
 - Assert the return value is never 0.
 - Use the newly introduced pci_sbdf_t type.

Changes since v3:
 - Rename function to size BARs to pci_size_mem_bar.
 - Change the parameters passed to the function. Pass the position and
   whether the BAR is the last one, instead of the (base, max_bars,
   *index) tuple.
 - Make the function return the number of BARs consumed (1 for 32b, 2
   for 64b BARs).
 - Change the dprintk back to printk.
 - Do not log another error message in pci_add_device in case
   pci_size_mem_bar fails.

7 years agox86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0
Roger Pau Monne [Fri, 23 Feb 2018 13:32:23 +0000 (13:32 +0000)]
x86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0

So that MMCFG regions not present in the MCFG ACPI table can be added
at run time by the hardware domain.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v7:
 - Add newline in hvm_physdev_op for non-fallthrough case.

Changes since v6:
 - Do not return EEXIST if the same exact region is already tracked by
   Xen.

Changes since v5:
 - Check for has_vpci before calling register_vpci_mmcfg_handler
   instead of checking for is_hvm_domain.

Changes since v4:
 - Change the hardware_domain check in hvm_physdev_op to a vpci check.
 - Only register the MMCFG area, but don't scan it.

Changes since v3:
 - New in this version.

7 years agox86/mmcfg: add handlers for the PVH Dom0 MMCFG areas
Roger Pau Monne [Fri, 23 Feb 2018 13:32:22 +0000 (13:32 +0000)]
x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas

Introduce a set of handlers for the accesses to the MMCFG areas. Those
areas are setup based on the contents of the hardware MMCFG tables,
and the list of handled MMCFG areas is stored inside of the hvm_domain
struct.

The read/writes are forwarded to the generic vpci handlers once the
address is decoded in order to obtain the device and register the
guest is trying to access.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v7:
 - Add check for end_bus >= start_bus to register_vpci_mmcfg_handler.
 - Protect destroy_vpci_mmcfg with the mmcfg_lock.

Changes since v6:
 - Move allocation of mmcfg outside of the locked region.
 - Do proper overlap checks when adding mmcfg regions.
 - Return _RETRY if the mcfg region cannot be found in the read/write
   handlers. This means the mcfg area has been removed between the
   accept and the read/write calls.

Changes since v5:
 - Switch to use pci_sbdf_t.
 - Switch to the new per vpci locks.
 - Move the mmcfg related external definitions to asm-x86/pci.h.

Changes since v4:
 - Change the attribute of pvh_setup_mmcfg to __hwdom_init.
 - Try to add as many MMCFG regions as possible, even if one fails to
   add.
 - Change some fields of the hvm_mmcfg struct: turn size into a
   unsigned int, segment into uint16_t and bus into uint8_t.
 - Convert some address parameters from unsigned long to paddr_t for
   consistency.
 - Make vpci_mmcfg_decode_addr return the decoded register in the
   return of the function.
 - Introduce a new macro to convert a MMCFG address into a BDF, and
   use it in vpci_mmcfg_decode_addr to clarify the logic.
 - In vpci_mmcfg_{read/write} unify the logic for 8B accesses and
   smaller ones.
 - Add the __hwdom_init attribute to register_vpci_mmcfg_handler.
 - Test that reg + size doesn't cross a device boundary.

Changes since v3:
 - Propagate changes from previous patches: drop xen_ prefix for vpci
   functions, pass slot and func instead of devfn and fix the error
   paths of the MMCFG handlers.
 - s/ecam/mmcfg/.
 - Move the destroy code to a separate function, so the hvm_mmcfg
   struct can be private to hvm/io.c.
 - Constify the return of vpci_mmcfg_find.
 - Use d instead of v->domain in vpci_mmcfg_accept.
 - Allow 8byte accesses to the mmcfg.

Changes since v1:
 - Added locking.

7 years agovpci: introduce basic handlers to trap accesses to the PCI config space
Roger Pau Monne [Fri, 23 Feb 2018 13:32:22 +0000 (13:32 +0000)]
vpci: introduce basic handlers to trap accesses to the PCI config space

This functionality is going to reside in vpci.c (and the corresponding
vpci.h header), and should be arch-agnostic. The handlers introduced
in this patch setup the basic functionality required in order to trap
accesses to the PCI config space, and allow decoding the address and
finding the corresponding handler that should handle the access
(although no handlers are implemented).

Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
setup inside of a x86 HVM file, since that's not shared with other
arches.

A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
whether a domain should use the newly introduced vPCI handlers, this
is only enabled for PVH Dom0 at the moment.

A very simple user-space test is also provided, so that the basic
functionality of the vPCI traps can be asserted. This has been proven
quite helpful during development, since the logic to handle partial
accesses or accesses that expand across multiple registers is not
trivial.

The handlers for the registers are added to a linked list that's keep
sorted at all times. Both the read and write handlers support accesses
that expand across multiple emulated registers and contain gaps not
emulated.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
[IO parts]
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
---
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Julien Grall <julien.grall@arm.com>
Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v9:
 - Introduce HAS_VPCI Kconfig option.
 - Drop Jan and Wei's RB (keep Paul's since the HAS_VPCI addition
   doesn't change IO code).

Changes since v8:
 - Rebase on top of XSA-256.

Changes since v7:
 - Constify d in vpci_portio_read.
 - ASSERT the correctness of the address in the read/write handlers.
 - Add newlines between non-fallthrough case statements.

Changes since v6:
 - Align the vpci handlers in the linker script.
 - Switch add/remove register functions to take a vpci parameter
   instead of a pci_dev.
 - Expand comment of merge_result.
 - Return X86EMUL_UNHANDLEABLE if accessing cfc and cf8 is disabled.

Changes since v5:
 - Use a spinlock per pci device.
 - Use the recently introduced pci_sbdf_t type.
 - Fix test harness to use the right handler type and the newly
   introduced lock.
 - Move the position of the vpci sections in the linker scripts.
 - Constify domain and pci_dev in vpci_{read/write}.
 - Fix typos in comments.
 - Use _XEN_VPCI_H_ as header guard.

Changes since v4:
* User-space test harness:
 - Do not redirect the output of the test.
 - Add main.c and emul.h as dependencies of the Makefile target.
 - Use the same rule to modify the vpci and list headers.
 - Remove underscores from local macro variables.
 - Add _check suffix to the test harness multiread function.
 - Change the value written by every different size in the multiwrite
   test.
 - Use { } to initialize the r16 and r20 arrays (instead of { 0 }).
 - Perform some of the read checks with the local variable directly.
 - Expand some comments.
 - Implement a dummy rwlock.
* Hypervisor code:
 - Guard the linker script changes with CONFIG_HAS_PCI.
 - Rename vpci_access_check to vpci_access_allowed and make it return
   bool.
 - Make hvm_pci_decode_addr return the register as return value.
 - Use ~3 instead of 0xfffc to remove the register offset when
   checking accesses to IO ports.
 - s/head/prev in vpci_add_register.
 - Add parentheses around & in vpci_add_register.
 - Fix register removal.
 - Change the BUGs in vpci_{read/write}_hw helpers to
   ASSERT_UNREACHABLE.
 - Make merge_result static and change the computation of the mask to
   avoid using a uint64_t.
 - Modify vpci_read to only read from hardware the not-emulated gaps.
 - Remove the vpci_val union and use a uint32_t instead.
 - Change handler read type to return a uint32_t instead of modifying
   a variable passed by reference.
 - Constify the data opaque parameter of read handlers.
 - Change the size parameter of the vpci_{read/write} functions to
   unsigned int.
 - Place the array of initialization handlers in init.rodata or
   .rodata depending on whether late-hwdom is enabled.
 - Remove the pci_devs lock, assume the Dom0 is well behaved and won't
   remove the device while trying to access it.
 - Change the recursive spinlock into a rw lock for performance
   reasons.

Changes since v3:
* User-space test harness:
 - Fix spaces in container_of macro.
 - Implement a dummy locking functions.
 - Remove 'current' macro make current a pointer to the statically
   allocated vpcu.
 - Remove unneeded parentheses in the pci_conf_readX macros.
 - Fix the name of the write test macro.
 - Remove the dummy EXPORT_SYMBOL macro (this was needed by the RB
   code only).
 - Import the max macro.
 - Test all possible read/write size combinations with all possible
   emulated register sizes.
 - Introduce a test for register removal.
* Hypervisor code:
 - Use a sorted list in order to store the config space handlers.
 - Remove some unneeded 'else' branches.
 - Make the IO port handlers always return X86EMUL_OKAY, and set the
   data to all 1's in case of read failure (write are simply ignored).
 - In hvm_select_ioreq_server reuse local variables when calling
   XEN_DMOP_PCI_SBDF.
 - Store the pointers to the initialization functions in the .rodata
   section.
 - Do not ignore the return value of xen_vpci_add_handlers in
   setup_one_hwdom_device.
 - Remove the vpci_init macro.
 - Do not hide the pointers inside of the vpci_{read/write}_t
   typedefs.
 - Rename priv_data to private in vpci_register.
 - Simplify checking for register overlap in vpci_register_cmp.
 - Check that the offset and the length match before removing a
   register in xen_vpci_remove_register.
 - Make vpci_read_hw return a value rather than storing it in a
   pointer passed by parameter.
 - Handler dispatcher functions vpci_{read/write} no longer return an
   error code, errors on reads/writes should be treated like hardware
   (writes ignored, reads return all 1's or garbage).
 - Make sure pcidevs is locked before calling pci_get_pdev_by_domain.
 - Use a recursive spinlock for the vpci lock, so that spin_is_locked
   checks that the current CPU is holding the lock.
 - Make the code less error-chatty by removing some of the printk's.
 - Pass the slot and the function as separate parameters to the
   handler dispatchers (instead of passing devfn).
 - Allow handlers to be registered with either a read or write
   function only, the missing handler will be replaced by a dummy
   handler (writes ignored, reads return 1's).
 - Introduce PCI_CFG_SPACE_* defines from Linux.
 - Simplify the handler dispatchers by removing the recursion, now the
   dispatchers iterate over the list of sorted handlers and call them
   in order.
 - Remove the GENMASK_BYTES, SHIFT_RIGHT_BYTES and ADD_RESULT macros,
   and instead provide a merge_result function in order to merge a
   register output into a partial result.
 - Rename the fields of the vpci_val union to u8/u16/u32.
 - Remove the return values from the read/write handlers, errors
   should be handled internally and signaled as would be done on
   native hardware.
 - Remove the usage of the GENMASK macro.

Changes since v2:
 - Generalize the PCI address decoding and use it for IOREQ code also.

Changes since v1:
 - Allow access to cross a word-boundary.
 - Add locking.
 - Add cleanup to xen_vpci_add_handlers in case of failure.

7 years agovvmx: fixes after CR4 trapping optimizations
Roger Pau Monne [Thu, 1 Mar 2018 16:54:24 +0000 (16:54 +0000)]
vvmx: fixes after CR4 trapping optimizations

Commit 406817 doesn't update nested VMX code in order to take into
account L1 CR4 host mask when nested guest (L2) writes to CR4, and
thus the mask written to CR4_GUEST_HOST_MASK is likely not as
restrictive as it should be.

Also the VVMCS GUEST_CR4 value should be updated to match the
underlying value when syncing the VVMCS state.

Fixes: 406817 ("vmx/hap: optimize CR4 trapping")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Jun Nakajima <jun.nakajima@intel.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Sergey Dyasli <sergey.dyasli@citrix.com>
---
I've manually tested and AFAICT this fixes the osstest failure
detected in 120076 ("test-amd64-amd64-qemuu-nested-intel").
---
Changes since v1:
 - Use guest_cr[4] in order to update the nested VMCS GUEST_CR4.

7 years agoxl: remove apic option for PVH guests
Roger Pau Monne [Wed, 28 Feb 2018 09:48:50 +0000 (09:48 +0000)]
xl: remove apic option for PVH guests

XSA-256 forces the local APIC to always be enabled for PVH guests, so
ignore any apic option for PVH guests. Update the documentation
accordingly.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <George.Dunlap@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Tim Deegan <tim@xen.org>
---
Changes since v1:
 - Remove the trailing "if present" from the AP startup.

7 years agox86/PV: convert page table emulation code from paddr_t to intpte_t
Jan Beulich [Thu, 1 Mar 2018 14:14:29 +0000 (15:14 +0100)]
x86/PV: convert page table emulation code from paddr_t to intpte_t

It's dealing with PTEs after all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: make all FPU emulation use the stub
Jan Beulich [Thu, 1 Mar 2018 14:11:45 +0000 (15:11 +0100)]
x86emul: make all FPU emulation use the stub

While this means quite some reduction of (source) code, the main
purpose is to no longer have exceptions raised from other than stubs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoignores: update .hgignore
Roger Pau Monné [Thu, 1 Mar 2018 14:11:07 +0000 (15:11 +0100)]
ignores: update .hgignore

To add the shim build output and build directory.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agoignores: update list of git ignored files
Roger Pau Monné [Thu, 1 Mar 2018 14:10:54 +0000 (15:10 +0100)]
ignores: update list of git ignored files

Add the shim build symbol file and remove the xen-shim binary (which
is no longer created).

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agofirmware/shim: better filtering of intermediate files during Xen tree setup
Jan Beulich [Thu, 1 Mar 2018 14:10:15 +0000 (15:10 +0100)]
firmware/shim: better filtering of intermediate files during Xen tree setup

I have no idea what *.1 is meant to cover. Instead also exclude
preprocessed and non-source assembly files.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agofirmware/shim: better filtering of dependency files during Xen tree setup
Jan Beulich [Thu, 1 Mar 2018 14:10:02 +0000 (15:10 +0100)]
firmware/shim: better filtering of dependency files during Xen tree setup

I have no idea what *.d1 is supposed to refer to - we only have .*.d
and .*.d2 files (note also the leading dot). Also switch to passing
-name instead of -path to find - that's a requirement for .*.d et al to
work, but would probably have been better from the beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxc: really tolerate empty PV records
Jan Beulich [Thu, 1 Mar 2018 14:09:38 +0000 (15:09 +0100)]
libxc: really tolerate empty PV records

Commit 119ee4d773 ("tools/libxc: Tolerate specific zero-content records
in migration v2 streams") meant tolerate those, but failed to set rc
accordingly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/hvm: Constify the read side of vlapic handling
Andrew Cooper [Wed, 21 Feb 2018 18:10:00 +0000 (18:10 +0000)]
x86/hvm: Constify the read side of vlapic handling

This is in preparation to make hvm_x2apic_msr_read() take a const vcpu
pointer.  One modification is to alter vlapic_get_tmcct() to not use current.

This in turn needs an alteration to hvm_get_guest_time_fixed(), which is safe
because the only mutable action it makes is to take the domain plt lock.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/vmx: Simplfy the default cases in vmx_msr_{read,write}_intercept()
Andrew Cooper [Mon, 26 Feb 2018 14:23:03 +0000 (14:23 +0000)]
x86/vmx: Simplfy the default cases in vmx_msr_{read,write}_intercept()

The default case of vmx_msr_write_intercept() in particular is very tangled.

First of all, fold long_mode_do_msr_{read,write}() into their callers.  These
functions were split out in the past because of the 32bit build of Xen, but it
is unclear why the cases weren't simply #ifdef'd in place.

Next, invert the vmx_write_guest_msr()/is_last_branch_msr() logic to break if
the condition is satisfied, rather than nesting if it wasn't.  This allows the
wrmsr_hypervisor_regs() call to be un-nested with respect to the other default
logic.

No practical difference from a guests point of view.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/hvm: fix domain crash when CR3 has the noflush bit set
Razvan Cojocaru [Wed, 28 Feb 2018 10:38:15 +0000 (12:38 +0200)]
x86/hvm: fix domain crash when CR3 has the noflush bit set

In hardware, when PCID support is enabled and the NOFLUSH bit is set
when writing a CR3 value, the hardware will clear that that bit and
change the CR3 without flushing the TLB. hvm_set_cr3(), however, was
ignoring this bit; the result was that post-vm_event checks detected
an invalid CR3 value and crashed the domain.

Handle NOFLUSH in hvm_set_cr3() by:
1. Clearing the bit
2. Passing a "noflush" flag to lower-level cr3 setting functions to
indicate that a flush should not be performed.

Also clear X86_CR3_NOFLUSH when reporting CR3 monitored CR3 writes.

This allows introspection to be used on VMs whose operating system uses
the NOFLUSH bit.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reported-by: Bitweasil <bitweasil@cryptohaze.com>
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agoxen: sched/credit: convert scheduling parameter to s_time_t when set
Dario Faggioli [Fri, 23 Feb 2018 16:41:33 +0000 (17:41 +0100)]
xen: sched/credit: convert scheduling parameter to s_time_t when set

Basically, instead of converting integers to s_time_t
at usage time (hot paths), do the convertion when the
values are set (cold paths).

This applies to the timeslice and the ratelimit
parameters of Credit1.

Note that, when changing the type of the fields of
struct csched_private (from unsigned to s_time_t),
ncpus is moved up a bit, for better packing.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen/arm: Flush TLBs before turning on the MMU to avoid stale entries
Julien Grall [Tue, 27 Feb 2018 11:15:57 +0000 (11:15 +0000)]
xen/arm: Flush TLBs before turning on the MMU to avoid stale entries

We don't know what is the state of the TLBs when booting Xen. To avoid
stale entries, it is necessary to flush the TLBs before turning on the
MMU.

Reported-by: Iain Hunter <iain@hunterembedded.co.uk>
Signed-off-by: Julien Grall <julien.gralL@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agorelease-checklist.txt: Say to increment SUPPORT.md version number
Ian Jackson [Wed, 31 Jan 2018 13:02:32 +0000 (13:02 +0000)]
release-checklist.txt: Say to increment SUPPORT.md version number

CC: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoSUPPORT.md: increment version number
Ian Jackson [Wed, 31 Jan 2018 13:02:01 +0000 (13:02 +0000)]
SUPPORT.md: increment version number

CC: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agocommon/gnttab: Introduce command line feature controls
Andrew Cooper [Thu, 24 Aug 2017 14:31:08 +0000 (15:31 +0100)]
common/gnttab: Introduce command line feature controls

This patch was originally released as part of XSA-226.  It retains the same
command line syntax (as various downstreams are mitigating XSA-226 using this
mechanism) but the defaults have been updated due to the revised XSA-226
patched, after which transitive grants are believed to functioning
properly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/HVM: don't give the wrong impression of WRMSR succeeding
Jan Beulich [Tue, 27 Feb 2018 14:12:23 +0000 (15:12 +0100)]
x86/HVM: don't give the wrong impression of WRMSR succeeding

... for non-existent MSRs: wrmsr_hypervisor_regs()'s comment clearly
says that the function returns 0 for unrecognized MSRs, so
{svm,vmx}_msr_write_intercept() should not convert this into success. We
don't want to unconditionally fail the access though, as we can't be
certain the list of handled MSRs is complete enough for the guest types
we care about, so instead mirror what we do on the read paths and probe
the MSR to decide whether to raise #GP.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agovmx/hap: optimize CR4 trapping
Roger Pau Monné [Tue, 27 Feb 2018 13:10:33 +0000 (14:10 +0100)]
vmx/hap: optimize CR4 trapping

There a bunch of bits in CR4 that should be allowed to be set directly
by the guest without requiring Xen intervention, currently this is
already done by passing through guest writes into the CR4 used when
running in non-root mode, but taking an expensive vmexit in order to
do so.

xenalyze reports the following when running a PV guest in shim mode:

 CR_ACCESS             3885950  6.41s 17.04%  3957 cyc { 2361| 3378| 7920}
   cr4  3885940  6.41s 17.04%  3957 cyc { 2361| 3378| 7920}
   cr3        1  0.00s  0.00%  3480 cyc { 3480| 3480| 3480}
     *[  0]        1  0.00s  0.00%  3480 cyc { 3480| 3480| 3480}
   cr0        7  0.00s  0.00%  7112 cyc { 3248| 5960|17480}
   clts        2  0.00s  0.00%  4588 cyc { 3456| 5720| 5720}

After this change this turns into:

 CR_ACCESS                  12  0.00s  0.00%  9972 cyc { 3680|11024|24032}
   cr4        2  0.00s  0.00% 17528 cyc {11024|24032|24032}
   cr3        1  0.00s  0.00%  3680 cyc { 3680| 3680| 3680}
     *[  0]        1  0.00s  0.00%  3680 cyc { 3680| 3680| 3680}
   cr0        7  0.00s  0.00%  9209 cyc { 4184| 7848|17488}
   clts        2  0.00s  0.00%  8232 cyc { 5352|11112|11112}

Note that this optimized trapping is currently only applied to guests
running with HAP on Intel hardware. If using shadow paging more CR4
bits need to be unconditionally trapped, which makes this approach
unlikely to yield any important performance improvements.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/PV: fix off-by-one in I/O bitmap limit check
Jan Beulich [Tue, 27 Feb 2018 13:10:00 +0000 (14:10 +0100)]
x86/PV: fix off-by-one in I/O bitmap limit check

With everyone having their tags below agreeing that putting things the
other way around in the comparison makes things easier to understand, do
that rearrangement while changing the line anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.apu@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agohvm/svm: implement CPUID events
Alexandru Isaila [Tue, 27 Feb 2018 13:09:21 +0000 (14:09 +0100)]
hvm/svm: implement CPUID events

At this moment the CPUID events for the AMD architecture are not
forwarded to the monitor layer.

This patch adds the CPUID event to the common capabilities and then
forwards the event to the monitor layer.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
7 years agox86/hvm: Disallow the creation of HVM domains without Local APIC emulation
Andrew Cooper [Tue, 27 Feb 2018 13:08:36 +0000 (14:08 +0100)]
x86/hvm: Disallow the creation of HVM domains without Local APIC emulation

There are multiple problems, not necesserily limited to:

 * Guests which configure event channels via hvmop_set_evtchn_upcall_vector(),
   or which hit %cr8 emulation will cause Xen to fall over a NULL vlapic->regs
   pointer.

 * On Intel hardware, disabling the TPR_SHADOW execution control without
   reenabling CR8_{LOAD,STORE} interception means that the guests %cr8
   accesses interact with the real TPR.  Amongst other things, setting the
   real TPR to 0xf blocks even IPIs from interrupting this CPU.

 * On hardware which sets up the use of Interrupt Posting, including
   IOMMU-Posting, guests run without the appropriate non-root configuration,
   which at a minimum will result in dropped interrupts.

Whether no-LAPIC mode is of any use at all remains to be seen.

This is XSA-256.

Reported-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: don't blindly free status pages upon version change
Jan Beulich [Tue, 27 Feb 2018 13:07:12 +0000 (14:07 +0100)]
gnttab: don't blindly free status pages upon version change

There may still be active mappings, which would trigger the respective
BUG_ON(). Split the loop into one dealing with the page attributes and
the second (when the first fully passed) freeing the pages. Return an
error if any pages still have pending references.

This is part of XSA-255.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agognttab/ARM: don't corrupt shared GFN array
Jan Beulich [Tue, 27 Feb 2018 13:04:44 +0000 (14:04 +0100)]
gnttab/ARM: don't corrupt shared GFN array

... by writing status GFNs to it. Introduce a second array instead.
Also implement gnttab_status_gmfn() properly now that the information is
suitably being tracked.

While touching it anyway, remove a misguided (but luckily benign) upper
bound check from gnttab_shared_gmfn(): We should never access beyond the
bounds of that array.

This is part of XSA-255.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agomemory: don't implicitly unpin for decrease-reservation
Jan Beulich [Tue, 27 Feb 2018 13:03:27 +0000 (14:03 +0100)]
memory: don't implicitly unpin for decrease-reservation

It very likely was a mistake (copy-and-paste from domain cleanup code)
to implicitly unpin here: The caller should really unpin itself before
(or after, if they so wish) requesting the page to be removed.

This is XSA-252.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agogrant: Release domain lock on 'map' path in cache_flush
George Dunlap [Tue, 27 Feb 2018 11:16:55 +0000 (11:16 +0000)]
grant: Release domain lock on 'map' path in cache_flush

common/grant_table.c:cache_flush() grabs the rcu lock for the current
domain, but only releases it on error paths.

Note that this is not a security issue, as the preempt count is used
exclusively for assertions at the moment.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/time: Rework pv_soft_rdtsc() to aid further cleanup
Andrew Cooper [Mon, 19 Feb 2018 14:54:57 +0000 (14:54 +0000)]
x86/time: Rework pv_soft_rdtsc() to aid further cleanup

Having pv_soft_rdtsc() emulate all parts of an rdtscp is awkward, and gets in
the way of some intended cleanup.

 * Drop the rdtscp parameter and always make the caller responsible for ecx
   updates when appropriate.
 * Switch the function from being void, and return the main timestamp in the
   return value.

The regs parameter is still needed, but only for the stats collection, once
again bringing into question their utility.  The parameter can however switch
to being const.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
Andrew Cooper [Mon, 19 Feb 2018 10:40:20 +0000 (10:40 +0000)]
x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context

If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
advertised/hidden.

At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
Therefore, default PV guests can read the value from a previously scheduled
HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.

Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case.

To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
update of the MSR, and use this function consistently across the codebase.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agoxen/arm: vpsci: Rework the logic to start AArch32 vCPU in Thumb mode
Julien Grall [Fri, 23 Feb 2018 18:57:29 +0000 (18:57 +0000)]
xen/arm: vpsci: Rework the logic to start AArch32 vCPU in Thumb mode

32-bit domain is able to select the instruction (ARM vs Thumb) to use
when boot a new vCPU via CPU_ON. This is indicated via bit[0] of the
entry point address (see "T32 support" in PSCI v1.1 DEN0022D). bit[0]
must be cleared when setting the PC.

At the moment, Xen is setting the CPSR.T but never clear bit[0]. Clear
it to match the specification.

At the same time, slighlty rework the code to make clear thumb is only for
32-bit domain. Lastly, take the opportunity to switch is_thumb from int
to bool.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: vpsci: Introduce and use PSCI_INVALID_ADDRESS
Julien Grall [Fri, 23 Feb 2018 18:57:28 +0000 (18:57 +0000)]
xen/arm: vpsci: Introduce and use PSCI_INVALID_ADDRESS

PSCI 1.0 added the error return PSCI_INVALID_ADDRESS. It is used to
indicate the entry point address is known to be invalid.

In Xen case, this error could be returned when a 64-bit vCPU is using a
Thumb entry address.

For PSCI 0.1 implementation, return PSCI_INVALID_PARAMETERS instead.

Suggested-by: mirela.simonovic@aggios.com
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: mirela.simonovic@aggios.com
7 years agoxen/arm: vpsci: Update the return type for MIGRATE_INFO_TYPE
Julien Grall [Fri, 23 Feb 2018 18:57:27 +0000 (18:57 +0000)]
xen/arm: vpsci: Update the return type for MIGRATE_INFO_TYPE

int32_t. Update the function return type to match it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: mirela.simonovic@aggios.com
7 years agoxen/arm: psci: Prefix with static any functions not exported
Julien Grall [Fri, 23 Feb 2018 18:57:26 +0000 (18:57 +0000)]
xen/arm: psci: Prefix with static any functions not exported

A bunch of PSCI functions are not prefixed with static despite no one is
using them outside the file and the prototype is not available in
psci.h.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: psci: Consolidate PSCI version print
Julien Grall [Fri, 23 Feb 2018 18:57:25 +0000 (18:57 +0000)]
xen/arm: psci: Consolidate PSCI version print

Xen is printing the same way the PSCI version for 0.1, 0.2 and later.
The only different is the former is hardcoded.

Furthermore PSCI is now used for other things than SMP bring up. So only
print the PSCI version in psci_init.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: vpsci: Remove parameter 'ver' from do_common_cpu
Julien Grall [Fri, 23 Feb 2018 18:57:24 +0000 (18:57 +0000)]
xen/arm: vpsci: Remove parameter 'ver' from do_common_cpu

Currently, the behavior of do_common_cpu will slightly change depending
on the PSCI version passed in parameter. Looking at the code, more the
specific 0.2 behavior could move out of the function or adapted for 0.1:

    - x0/r0 can be updated on PSCI 0.1 because general purpose registers
    are undefined upon CPU on. This was deduced from the spec not
    mentioning the state of general purpose registers on CPU on.
    - PSCI 0.1 does not defined PSCI_ALREADY_ON. However, it would be
    safer to bail out if the CPU is already on.

Based on this, the parameter 'ver' is removed and do_psci_cpu_on
(implementation for PSCI 0.1) is adapted to avoid returning
PSCI_ALREADY_ON.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
Julien Grall [Fri, 23 Feb 2018 18:57:23 +0000 (18:57 +0000)]
xen/arm64: Kill PSCI_GET_VERSION as a variant-2 workaround

Now that we've standardised on SMCCC v1.1 to perform the branch
prediction invalidation, let's drop the previous band-aid. If vendors
haven't updated their firmware to do SMCCC 1.1, they haven't updated
PSCI either, so we don't loose anything.

This is aligned with the Linux commit 3a0a397ff5ff.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support
Julien Grall [Fri, 23 Feb 2018 18:57:22 +0000 (18:57 +0000)]
xen/arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening support

Add the detection and runtime code for ARM_SMCCC_ARCH_WORKAROUND_1.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: smccc: Implement SMCCC v1.1 inline primitive
Julien Grall [Fri, 23 Feb 2018 18:57:21 +0000 (18:57 +0000)]
xen/arm: smccc: Implement SMCCC v1.1 inline primitive

One of the major improvement of SMCCC v1.1 is that it only clobbers the
first 4 registers, both on 32 and 64bit. This means that it becomes very
easy to provide an inline version of the SMC call primitive, and avoid
performing a function call to stash the registers that woudl otherwise
be clobbered by SMCCC v1.0.

This patch has been adapted to Xen from Linux commit f2d3b2e8759a. The
changes mades are:
    - Using Xen coding style
    - Remove HVC as not used by Xen
    - Add arm_smccc_res structure

Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: psci: Detect SMCCC version
Julien Grall [Fri, 23 Feb 2018 18:57:20 +0000 (18:57 +0000)]
xen/arm: psci: Detect SMCCC version

PSCI 1.0 and later allows the SMCCC version to be (indirectly) probed
via PSCI_FEATURES. If the PSCI_FEATURES does not exist (PSCI 0.2 or
earlier) and the function returns an error, then we assume SMCCC 1.0
is implemented.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: smccc: Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR}
Julien Grall [Fri, 23 Feb 2018 18:57:19 +0000 (18:57 +0000)]
xen/arm: smccc: Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR}

Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR} to easily convert
between a 32-bit value and a version number. The encoding is based on
2.2.2 in "Firmware interfaces for mitigation CVE-2017-5715" (ARM DEN 0070A).

Also re-use them to define ARM_SMCCC_VERSION_1_0 and ARM_SMCCC_VERSION_1_1.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm64: Print a per-CPU message with the BP hardening method used
Julien Grall [Fri, 23 Feb 2018 18:57:18 +0000 (18:57 +0000)]
xen/arm64: Print a per-CPU message with the BP hardening method used

This will make easier to know whether BP hardening has been enabled for
a CPU and which method is used.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babcuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm64: Implement a fast path for handling SMCCC_ARCH_WORKAROUND_1
Julien Grall [Fri, 23 Feb 2018 18:57:17 +0000 (18:57 +0000)]
xen/arm64: Implement a fast path for handling SMCCC_ARCH_WORKAROUND_1

The function SMCCC_ARCH_WORKAROUND_1 will be called by the guest for
hardening the branch predictor. So we want the handling to be as fast as
possible.

As the mitigation is applied on every guest exit, we can check for the
call before saving all the context and return very early.

For now, only provide a fast path for HVC64 call. Because the code rely
on 2 registers, x0 and x1 are saved in advance.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: Adapt smccc.h to be able to use it in assembly code
Julien Grall [Fri, 23 Feb 2018 18:57:16 +0000 (18:57 +0000)]
xen/arm: Adapt smccc.h to be able to use it in assembly code

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: vsmc: Implement SMCCC_ARCH_WORKAROUND_1 BP hardening support
Julien Grall [Fri, 23 Feb 2018 18:57:15 +0000 (18:57 +0000)]
xen/arm: vsmc: Implement SMCCC_ARCH_WORKAROUND_1 BP hardening support

SMCCC 1.1 offers firmware-based CPU workarounds. In particular,
SMCCC_ARCH_WORKAROUND_1 provides BP hardening for variant 2 of XSA-254
(CVE-2017-5715).

If the hypervisor has some mitigation for this issue, report that we
deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the hypervisor
workaround on every guest exit.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: vsmc: Implement SMCCC 1.1
Julien Grall [Fri, 23 Feb 2018 18:57:14 +0000 (18:57 +0000)]
xen/arm: vsmc: Implement SMCCC 1.1

The new SMC Calling Convention (v1.1) allows for a reduced overhead when
calling into the firmware, and provides a new feature discovery
mechanism. See "Firmware interfaces for mitigating CVE-2017-5715"
ARM DEN 00070A.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: vpsci: Add support for PSCI 1.1
Julien Grall [Fri, 23 Feb 2018 18:57:13 +0000 (18:57 +0000)]
xen/arm: vpsci: Add support for PSCI 1.1

At the moment, Xen provides virtual PSCI interface compliant with 0.1
and 0.2. Since them, the specification has been updated and the latest
version is 1.1 (see ARM DEN 0022D).

>From an implementation point of view, only PSCI_FEATURES is mandatory.
The rest is optional and can be left unimplemented for now.

At the same time, the compatible for PSCI node have been updated to
expose "arm,psci-1.0".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: mirela.simonovic@aggios.com
7 years agoxen/arm: psci: Rework the PSCI definitions
Julien Grall [Fri, 23 Feb 2018 18:57:12 +0000 (18:57 +0000)]
xen/arm: psci: Rework the PSCI definitions

Some PSCI functions are only available in the 32-bit version. After
recent changes, Xen always needs to know whether the call was made using
32-bit id or 64-bit id. So we don't emulate reserved one.

With the current naming scheme, it is not easy to know which call
supports 32-bit and 64-bit id. So rework the definitions to encode the
version in the name. From now the functions will be named PSCI_0_2_FNxx
where xx is 32 or 64.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
Andrew Cooper [Tue, 20 Feb 2018 11:08:32 +0000 (11:08 +0000)]
x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()

c/s d2f86bf604 which introduced "struct hvm_save_descriptor *d" accidentally
ended up shadowing the "struct domain *d" function parameter.  Rename the
former to desc.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/clang: allow integrated assembler usage
Roger Pau Monne [Fri, 23 Feb 2018 14:11:00 +0000 (14:11 +0000)]
x86/clang: allow integrated assembler usage

If the required features are present.

Modify as-option-add to add an option in case the test fails, and use
it to detect whether the required clang integrated assembler features
are present.

This patch has been tested with clang 3.5, clang 6, gcc 6.4.0 without
retpoline support and gcc 7.3.1 with retpoline support.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: add .size/.type directives to indirect thunk generation macro
Jan Beulich [Fri, 23 Feb 2018 13:25:54 +0000 (14:25 +0100)]
x86: add .size/.type directives to indirect thunk generation macro

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoget_maintainers.pl: Avoid THE_REST when files are added or removed
Alan Robinson [Fri, 23 Feb 2018 13:24:56 +0000 (14:24 +0100)]
get_maintainers.pl: Avoid THE_REST when files are added or removed

When files are added or removed /dev/null is used as a place
holder name in the patch for the absent file.  Don't try and
find a MAINTAINER for this place holder, it only ever flags
and then spams THE REST, behaviour for a real filename is
unchanged.

Signed-off-by: Alan Robinson <Alan.Robinson@ts.fujitsu.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agobuild: Rename as-insn-check to as-option-add
Andrew Cooper [Wed, 21 Feb 2018 18:20:15 +0000 (18:20 +0000)]
build: Rename as-insn-check to as-option-add

as-insn-check mutates the passed-in flags.  Rename it to as-option-add, in
line with cc-option-add, and update all callers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agobuild: Help attempts to syntax highlight Config.mk
Andrew Cooper [Wed, 21 Feb 2018 17:58:04 +0000 (17:58 +0000)]
build: Help attempts to syntax highlight Config.mk

Some attempts to syntax highlight Config.mk end up thinking that most of
Config.mk is a string, due to the unbalanced squote.  Provide a balancing
squote in a comment to compensate.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen: append EXTRA_CFLAGS_XEN_CORE to CFLAGS
Doug Goldstein [Fri, 23 Feb 2018 10:05:35 +0000 (11:05 +0100)]
xen: append EXTRA_CFLAGS_XEN_CORE to CFLAGS

Allow a user to supply extra CFLAGS via the EXTRA_CFLAGS_XEN_CORE
environment variable for hypervisor builds. This is not a
configuration that is supported but is only aimed to help support
testing and troubleshooting when you need to make changes.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agobuild: remove shim related targets
Roger Pau Monné [Fri, 23 Feb 2018 10:05:19 +0000 (11:05 +0100)]
build: remove shim related targets

There's no need to have shim specific targets, so just use the regular
xen makefile targets in order to build the shim binary.

When the shim is build as part of the firmware directory install the
stripped Xen binary to the firmware directory and place a binary with
symbols in the debug directory.

The objcopy step of the shim build is also removed in this patch:
since the shim is booted in PVH mode there's no need for the resulting
binary to be in elf32 format. Xen can load PVH kernels with either a
32 or 64bit elf header.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/svm: enable pause filtering threshold
Brian Woods [Fri, 23 Feb 2018 10:04:48 +0000 (11:04 +0100)]
x86/svm: enable pause filtering threshold

If available, enable the pause filtering threshold feature.  See the
previous commit for more information.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/svm: add support for pause filtering threshold
Brian Woods [Fri, 23 Feb 2018 10:03:36 +0000 (11:03 +0100)]
x86/svm: add support for pause filtering threshold

Add support for enabling the pause filtering threshold feature.  This
causes the pause filtering count to reset if there's pause filtering
threshold cycles or greater between pauses.  See AMD APM Vol 2 Section
15.14.4 for more details.

The values of the pause filtering count and threshold were found by
iterating over different values of the count and threshold while running
kernbench and a pi spigot algorithm with yields placed in it.  A
balanced setting for both variable provides:

(Using averaged elapsed time with kernbench)
old = 852.0
new = 848.8
improvement = .4%

For system without pause filtering threshold, the change, from 3000 to
4000 for the count, should not negatively effect system performance.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Babu Moger <babu.moger@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86: fix indirect thunk usage of CONFIG_INDIRECT_THUNK
Roger Pau Monné [Fri, 23 Feb 2018 10:00:31 +0000 (11:00 +0100)]
x86: fix indirect thunk usage of CONFIG_INDIRECT_THUNK

When indirect_thunk_asm.h is instantiated directly into assembly files
CONFIG_INDIRECT_THUNK might not be defined, and thus using .if against
it is wrong.

Add a check to define CONFIG_INDIRECT_THUNK to 0 if not defined, so
that using .if CONFIG_INDIRECT_THUNK is always correct.

This suppresses the following clang error:

<instantiation>:8:9: error: expected absolute expression
    .if CONFIG_INDIRECT_THUNK == 1
        ^
<instantiation>:1:1: note: while in macro instantiation
INDIRECT_BRANCH call %rdx
^
entry.S:589:9: note: while in macro instantiation
        INDIRECT_CALL %rdx
        ^

Note that this is a preparatory patch in order to enable clang's
integrated assembler, the integrated assembler is not yet enabled for
assembly files.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoVT-d: use two 32-bit writes to update DMAR fault address registers
Haozhong Zhang [Fri, 23 Feb 2018 09:59:31 +0000 (10:59 +0100)]
VT-d: use two 32-bit writes to update DMAR fault address registers

The 64-bit DMAR fault address is composed of two 32 bits registers
DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec:
"Software is expected to access 32-bit registers as aligned doublewords",
a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
DMAR_FEUADDR_REG separately in order to update a 64-bit fault address,
rather than a 64-bit write to DMAR_FEADDR_REG. Note that when x2APIC
is not enabled DMAR_FEUADDR_REG is reserved and it's not necessary to
update it.

Though I haven't seen any errors caused by such one 64-bit write on
real machines, it's still better to follow the specification.

Fixes: ae05fd3912b ("VT-d: use qword MMIO access for MSI address writes")
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/svm: add EFER SVME support for VGIF/VLOAD
Brian Woods [Tue, 20 Feb 2018 22:27:02 +0000 (16:27 -0600)]
x86/svm: add EFER SVME support for VGIF/VLOAD

Only enable virtual VMLOAD/SAVE and VGIF if the guest EFER.SVME is set.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agosysctl: correct comment in xen_sysctl_pcitopoinfo
Olaf Hering [Wed, 21 Feb 2018 13:44:58 +0000 (14:44 +0100)]
sysctl: correct comment in xen_sysctl_pcitopoinfo

Refer to correct member of struct xen_sysctl_pcitopoinfo in comment.

Fixes: commit 61319fbfd9 ("sysctl: add sysctl interface for querying PCI topology")
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/tmem: Convert the file common/tmem_xen.c to use typesafe MFN
Julien Grall [Wed, 21 Feb 2018 14:02:44 +0000 (14:02 +0000)]
xen/tmem: Convert the file common/tmem_xen.c to use typesafe MFN

The file common/tmem_xen.c is now converted to use typesafe. This is
requiring to override the macro page_to_mfn to make it work with mfn_t.

Note that all variables converted to mfn_t havem there initial value,
when set, switch from 0 to INVALID_MFN. This is fine because the initial
values was always overriden before used.

Also add a couple of missing newlines suggested by Andrew in the code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agobuild: filter out command line assembler arguments
Roger Pau Monne [Tue, 20 Feb 2018 14:10:12 +0000 (14:10 +0000)]
build: filter out command line assembler arguments

If the assembler is not used. This happens when using cc -E or cc -S
for example. GCC will just ignore the -Wa,... when the assembler is
not called, but clang will complain loudly and fail.

Also enable passing -Wa,-I$(BASEDIR)/include to clang now that it's
safe to do so.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agobuild: do not hardcode AFLAGS for as-insn tests
Roger Pau Monne [Tue, 20 Feb 2018 14:10:11 +0000 (14:10 +0000)]
build: do not hardcode AFLAGS for as-insn tests

Hardcoding as-insn to use AFLAGS is not correct. For once the test is
performed using a C file with inline assembly, and secondly the flags
used can be passed by the caller together with the CC.

Fix as-insn-check to pass the flags given as parameter to the test.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Fix usage comments as they are changing]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/arm: vgic: Make sure the number of SPIs is a multiple of 32
Julien Grall [Fri, 16 Feb 2018 14:59:56 +0000 (14:59 +0000)]
xen/arm: vgic: Make sure the number of SPIs is a multiple of 32

The vGIC relies on having a pending_irq available for every IRQs
described in the ranks. As each rank describes 32 interrupts, we need to
make sure the number of SPIs is a multiple of 32.

Reported-by: Jeff Kubascik <Jeff.Kubascik@dornerworks.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Cc: Jarvis Roach <Jarvis.Roach@dornerworks.com>
7 years agoasm-x86/monitor: Add MONITOR_EVENT_INTERRUPT to common capabilities
Alexandru Isaila [Mon, 19 Feb 2018 13:07:06 +0000 (15:07 +0200)]
asm-x86/monitor: Add MONITOR_EVENT_INTERRUPT to common capabilities

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
7 years agox86/msr: add Raw and Host domain policies
Sergey Dyasli [Mon, 19 Feb 2018 11:29:26 +0000 (11:29 +0000)]
x86/msr: add Raw and Host domain policies

Raw policy contains the actual values from H/W MSRs. Add PLATFORM_INFO
msr to the policy during probe_cpuid_faulting().

Host policy may have certain features disabled if Xen decides not
to use them. For now, make Host policy equal to Raw policy with
cpuid_faulting availability dependent on X86_FEATURE_CPUID_FAULTING.

Finally, derive HVM/PV max domain policies from the Host policy.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
Igor Druzhinin [Tue, 20 Feb 2018 09:16:56 +0000 (10:16 +0100)]
x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap

We're noticing a reproducible system boot hang on certain
Skylake platforms where the BIOS is configured in legacy
boot mode with x2APIC disabled. The system stalls immediately
after writing the first SMP initialization sequence into APIC ICR.

The cause of the problem is watchdog NMI handler execution -
somewhere near the end of NMI handling (after it's already
rescheduled the next NMI) it tries to access IO port 0x61
to get the actual NMI reason on CPU0. Unfortunately, this
port is emulated by BIOS using SMIs and this emulation for
some reason takes more time than we expect during INIT-SIPI-SIPI
sequence. As the result, the system is constantly moving between
NMI and SMI handler and not making any progress.

To avoid this, initialize the watchdog after SMP bootstrap on
CPU0 and, additionally, protect the NMI handler by moving
IO port access before NMI re-scheduling. The latter should also
help in case of post boot CPU onlining. Although we're running
watchdog at much lower frequency at this point, it's neveretheless
possible we may trigger the issue anyway.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoshim: allow building of just the shim with build-ID-incapable linker
Jan Beulich [Tue, 20 Feb 2018 09:10:59 +0000 (10:10 +0100)]
shim: allow building of just the shim with build-ID-incapable linker

The ELF note the shim build inserts causes mkelf32 to choke on the
second program header. However, the output of mkelf32 isn't really
needed when building inside tools/firmware/ - an attempt to build it is
made solely because of a wrong dependency.

Further changes to the make logic will be needed to also allow building
a shim-enabled "normal" xen with such a linker (as it looks the --notes
option will need passing not just when the linker support build ID
generation).

Also drop a stray variable setting from the x86 Makefile.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: libxenstat: fix format string overflow
Dario Faggioli [Fri, 16 Feb 2018 18:38:48 +0000 (19:38 +0100)]
tools: libxenstat: fix format string overflow

With gcc 7.3.0, the build fails like this:

src/xenstat_linux.c: In function ‘getBridge’
src/xenstat_linux.c:78:34: warning: ‘%s’ directive writing up to 255 bytes into a region of size 241 [-Wformat-overflow=]
     sprintf(tmp, "/sys/class/net/%s/bridge", de->d_name);
                                  ^~
src/xenstat_linux.c:78:5: note: ‘sprintf’ output between 23 and 278 bytes into a destination of size 256
     sprintf(tmp, "/sys/class/net/%s/bridge", de->d_name);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Fix by making the buffer bigger.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoshut down domain when last vCPU goes down
Jan Beulich [Mon, 19 Feb 2018 13:00:31 +0000 (14:00 +0100)]
shut down domain when last vCPU goes down

I've just had to deal with an early boot crash of Linux which occurred
so early that even "earlyprintk=xen" did not produce any useful output.
Hence the domain appeared to hang, while in fact it had brought down its
only vCPU. By translating this to a shutdown, the situation will be
better recognizable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/PV: avoid indirect call/thunk in I/O emulation
Jan Beulich [Mon, 19 Feb 2018 12:59:37 +0000 (13:59 +0100)]
x86/PV: avoid indirect call/thunk in I/O emulation

The stub is within reach from the .text section, so there's no point
using an indirect call here. This has the added benefit of there no
longer being two sufficiently different approaches, breaking one of
which people may not even notice.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citix.com>
7 years agohvm/monitor: fix usage of the control register mask
Roger Pau Monne [Fri, 16 Feb 2018 18:16:23 +0000 (18:16 +0000)]
hvm/monitor: fix usage of the control register mask

Previous usage is not correct and would prevent certain updates from
being notified to the monitor client.

For example if (value ^ old) == (PGE | PSE) and mask == PGE this
update would not be notified.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
7 years agox86/microcode: Propagate microcode update errors
Uwe Dannowski [Fri, 16 Feb 2018 13:19:54 +0000 (13:19 +0000)]
x86/microcode: Propagate microcode update errors

Errors on updating the microcode in the processor were silently
dropped when invoked via the microcode_update hypercall. Also, the log
message was misleading.

Signed-off-by: Uwe Dannowski <uwed@amazon.de>
Reviewed-by: Stefan Nuernberger <snu@amazon.de>
Reviewed-by: Martin Pohlack <mpohlack@amazon.de>
Reviewed-by: Amit Shah <aams@amazon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/srat: fix end calculation in nodes_cover_memory()
Jan Beulich [Thu, 15 Feb 2018 17:17:32 +0000 (18:17 +0100)]
x86/srat: fix end calculation in nodes_cover_memory()

Along the lines of commit 7226486767 ("x86/srat: fix the end pfn check
in valid_numa_range()") nodes_cover_memory() also doesn't consistently
use "end": It's set to an inclusive value initially, but then compared
to the exclusive "end" field of struct node and also possibly set to
nodes[j].start, making it exclusive too. Change the initialization to
make the variable consistently exclusive.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/hvm/dmop: only copy what is needed to/from the guest
Ross Lagerwall [Thu, 15 Feb 2018 17:16:17 +0000 (18:16 +0100)]
x86/hvm/dmop: only copy what is needed to/from the guest

dm_op() fails with -EFAULT if the struct xen_dm_op given by the guest is
smaller than Xen's struct xen_dm_op. This is a problem because DMOP is
meant to be a stable ABI but it breaks whenever the size of struct
xen_dm_op changes.

To fix this, change how the copying to and from the guest is done. When
copying from the guest, first copy the header and inspect the op. Then,
only copy the correct amount needed for that op. When copying to the
guest, don't copy the header. Rather, copy only the correct amount
needed for that particular op.

So now the dm_op() will fail if the guest does not supply enough bytes
for the specific op. It will not fail if the guest supplies too many
bytes for the specific op, but Xen will not copy the extra bytes.

Remove some now unused macros and helper functions.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agohvm/svm: Enable CR events
Alexandru Isaila [Thu, 15 Feb 2018 10:22:26 +0000 (12:22 +0200)]
hvm/svm: Enable CR events

The CR_INTERCEPT_CR3_WRITE intercept is out of the vmcb->_cr_intercepts
so the AMD arch can't intercept CR events.

This patch implements the CR intercept by adding the flag on a
write_ctrlreg event. The monitor write ctrlreg event is moved from the
Intel side to the common capabilities side.

We just need to enable the SVM intercept and then hvm_mov_to_cr() will
forward the event on to the monitor when appropriate.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agohvm/svm: Enable MSR events
Alexandru Isaila [Thu, 15 Feb 2018 10:22:25 +0000 (12:22 +0200)]
hvm/svm: Enable MSR events

At this moment there is no function to enable msr interception on svm.

This patch implements this function and moves the mov to msr monitor
event
form the Intel arch side to the common capabilities.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agohvm/svm: Enable Breakpoint events
Alexandru Isaila [Thu, 15 Feb 2018 10:22:24 +0000 (12:22 +0200)]
hvm/svm: Enable Breakpoint events

This commit implements the breakpoint events for svm.
At the moment, the Breakpoint vmexit is not forwarded to the monitor
layer.
This patch adds the hvm_monitor_debug call to the VMEXIT_EXCEPTION_BP.
Also, the Software Breakpoint cap is moved from the Intel arch to the
common part of the code.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
Andrew Cooper [Mon, 12 Feb 2018 16:06:00 +0000 (16:06 +0000)]
x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings

The current XPTI implementation isolates the directmap (and therefore a lot of
guest data), but a large quantity of CPU0's state (including its stack)
remains visible.

Furthermore, an attacker able to read .text is in a vastly superior position
to normal when it comes to fingerprinting Xen for known vulnerabilities, or
scanning for ROP/Spectre gadgets.

Collect together the entrypoints in .text.entry (currently 3x4k frames, but
can almost certainly be slimmed down), and create a common mapping which is
inserted into each per-cpu shadow.  The stubs are also inserted into this
mapping by pointing at the in-use L2.  This allows stubs allocated later (SMP
boot, or CPU hotplug) to work without further changes to the common mappings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/emul: Fix the decoding of segment overrides in 64bit mode
Andrew Cooper [Thu, 5 Oct 2017 14:30:49 +0000 (14:30 +0000)]
x86/emul: Fix the decoding of segment overrides in 64bit mode

Explicit segment overides other than %fs and %gs are documented as ignored by
both Intel and AMD.

In practice, this means that:

 * Explicit uses of %ss don't actually yield #SS[0] for non-canonical
   memory references.
 * Explicit uses of %{e,c,d}s don't override %rbp/%rsp-based memory references
   to yield #GP[0] for non-canonical memory references.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/entry: Use 32bit xors rater than 64bit xors for clearing GPRs
Andrew Cooper [Wed, 14 Feb 2018 13:07:05 +0000 (13:07 +0000)]
x86/entry: Use 32bit xors rater than 64bit xors for clearing GPRs

Intel's Silvermont/Knights Landing architecture treats them as full ALU
operations, rather than zeroing idoms.

No functional change, and no change in code volume (only changing the bit
selection in the REX prefix).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/arm: cpuerrata: Actually check errata on non-boot CPUs
Julien Grall [Wed, 14 Feb 2018 12:22:23 +0000 (12:22 +0000)]
xen/arm: cpuerrata: Actually check errata on non-boot CPUs

The cpu errata framework was introduced in commit 8b01f6364f "xen/arm:
Detect silicon revision and set cap bits accordingly" and was meant to
detect errata present on any CPUs (via check_local_cpu_errata). However,
the function to check the MIDR (is_affected_midr_range) mistakenly
always use the boot CPU MIDR.

Fix is_affected_midr_range to use the current CPU MIDR.

Reported-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Blacklist SMMU on Thunder-X
Julien Grall [Wed, 14 Feb 2018 15:30:45 +0000 (15:30 +0000)]
xen/arm: Blacklist SMMU on Thunder-X

Xen does not yet support Cavium SMMU because it requires some
workaround. For the time being, blacklist them.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Extend the number of memory banks supported
Julien Grall [Wed, 14 Feb 2018 15:30:44 +0000 (15:30 +0000)]
xen/arm: Extend the number of memory banks supported

When booting using Grub on Thunder-X, the number of memory available is
greater than 64. Bump the number to 128, so we can take advantage of all
the memory.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoasm-x86/monitor: Fix monitor capability reporting on SVM systems
Alexandru Isaila [Mon, 12 Feb 2018 15:08:15 +0000 (17:08 +0200)]
asm-x86/monitor: Fix monitor capability reporting on SVM systems

No monitor features are available on AMD and all
capabilities are passed only to the Intel processor architecture.
This means that the arch_monitor_get_capabilities returns
capabilities = 0.

This patch is separating out features which are implemented on both
systems from those implemented only on Intel, so that we advertize the
working capabilities on AMD.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
7 years agox86/spec_ctrl: Fix several bugs in SPEC_CTRL_ENTRY_FROM_INTR_IST
Andrew Cooper [Wed, 14 Feb 2018 10:38:34 +0000 (10:38 +0000)]
x86/spec_ctrl: Fix several bugs in SPEC_CTRL_ENTRY_FROM_INTR_IST

DO_OVERWRITE_RSB clobbers %rax, meaning in practice that the bti_ist_info
field gets zeroed.  Older versions of this code had the DO_OVERWRITE_RSB
register selectable, so reintroduce this ability and use it to cause the
INTR_IST path to use %rdx instead.

The use of %dl for the %cs.rpl check means that when an IST interrupt hits
Xen, we try to load 1 into the high 32 bits of MSR_SPEC_CTRL, suffering a #GP
fault instead.

Also, drop an unused label which was a copy/paste mistake.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reported-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agofirmware/shim: avoid mkdir error during Xen tree setup
Jan Beulich [Wed, 14 Feb 2018 07:16:00 +0000 (08:16 +0100)]
firmware/shim: avoid mkdir error during Xen tree setup

"mkdir -p" reports a missing operand, as config/ has no subdirs. Oddly
enough this doesn't cause the whole command (and hence the build to
fail), despite the "set -e" now covering the entire set of commands -
perhaps a quirk of the relatively old bash I've seen this with (a few
simple experiments suggest that commands inside () producing a non-
success status would exit the inner shell, but not the outer one).

Add a dummy . argument to the invocation.

Suggested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agofirmware/shim: correctly handle errors during Xen tree setup
Jan Beulich [Tue, 13 Feb 2018 17:19:33 +0000 (18:19 +0100)]
firmware/shim: correctly handle errors during Xen tree setup

"set -e" on a separate Makefile line is meaningless. Glue together all
the lines that this is supposed to cover.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agobitops: rename LOG_2 to ilog2
Sameer Goel [Tue, 13 Feb 2018 16:56:42 +0000 (17:56 +0100)]
bitops: rename LOG_2 to ilog2

Changing the name of the macro from LOG_2 to ilog2.This makes the function name
similar to its Linux counterpart. Since, this is not used in multiple places,
the code churn is minimal.

This change helps in porting unchanged code from Linux.

Signed-off-by: Sameer Goel <sameer.goel@linaro.org>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agocoverage: add documentation for LLVM coverage
Roger Pau Monné [Tue, 13 Feb 2018 16:56:20 +0000 (17:56 +0100)]
coverage: add documentation for LLVM coverage

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxsm: add bodge when compiling with llvm coverage support
Roger Pau Monné [Tue, 13 Feb 2018 16:55:43 +0000 (17:55 +0100)]
xsm: add bodge when compiling with llvm coverage support

llvm coverage support seems to disable some of the optimizations
needed in order to compile xsm, and the end result is that references
to __xsm_action_mismatch_detected are left in the object files.

Since coverage support cannot be used in production, introduce
__xsm_action_mismatch_detected for llvm coverage builds.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agocoverage: introduce support for llvm profiling
Roger Pau Monné [Tue, 13 Feb 2018 16:54:09 +0000 (17:54 +0100)]
coverage: introduce support for llvm profiling

Introduce the functionality in order to fill the hooks of the
cov_sysctl_ops struct. Note that the functionality is still not wired
into the build system.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: use paging_mark_pfn_dirty()
Jan Beulich [Tue, 13 Feb 2018 16:29:50 +0000 (17:29 +0100)]
x86: use paging_mark_pfn_dirty()

... in preference over paging_mark_dirty(), when the PFN is known
anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/mm: clean up SHARED_M2P{,_ENTRY} uses
Jan Beulich [Tue, 13 Feb 2018 16:28:36 +0000 (17:28 +0100)]
x86/mm: clean up SHARED_M2P{,_ENTRY} uses

Stop open-coding SHARED_M2P() and drop a pointless use of it from
paging_mfn_is_dirty() (!VALID_M2P() is a superset of SHARED_M2P()) and
another one from free_page_type() (prior assertions render this
redundant).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agotools/libxl: mark special pages as reserved in e820 map for PVH
Juergen Gross [Tue, 21 Nov 2017 11:06:06 +0000 (12:06 +0100)]
tools/libxl: mark special pages as reserved in e820 map for PVH

The "special pages" for PVH guests include the frames for console and
Xenstore ring buffers. Those have to be marked as "Reserved" in the
guest's E820 map, as otherwise conflicts might arise later e.g. when
hotplugging memory into the guest.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>