Roger Pau Monne [Fri, 23 Feb 2018 13:32:23 +0000 (13:32 +0000)]
pci: add support to size ROM BARs to pci_size_mem_bar
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
--- Cc: Jan Beulich <jbeulich@suse.com>
---
Changes since v6:
- Remove the rom local variable.
Changes since v5:
- Use the flags field.
- Introduce a mask local variable.
- Simplify return.
Roger Pau Monne [Fri, 23 Feb 2018 13:32:23 +0000 (13:32 +0000)]
pci: split code to size BARs from pci_add_device
So that it can be called from outside in order to get the size of regular PCI
BARs. This will be required in order to map the BARs from PCI devices into PVH
Dom0 p2m.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
--- Cc: Jan Beulich <jbeulich@suse.com>
---
Changes since v7:
- Do not return error from pci_size_mem_bar in order to keep previous
behavior.
Changes since v6:
- Remove the vf and addr local variables.
- Change the way flags are declared.
- Move the last bool parameter to the flags field.
Changes since v5:
- Introduce a flags field for pci_size_mem_bar.
- Use pci_sbdf_t.
Changes since v4:
- Restore printing whether the BAR is from a vf.
- Make the psize pointer parameter not optional.
- s/u64/uint64_t.
- Remove some unneeded parentheses.
- Assert the return value is never 0.
- Use the newly introduced pci_sbdf_t type.
Changes since v3:
- Rename function to size BARs to pci_size_mem_bar.
- Change the parameters passed to the function. Pass the position and
whether the BAR is the last one, instead of the (base, max_bars,
*index) tuple.
- Make the function return the number of BARs consumed (1 for 32b, 2
for 64b BARs).
- Change the dprintk back to printk.
- Do not log another error message in pci_add_device in case
pci_size_mem_bar fails.
Roger Pau Monne [Fri, 23 Feb 2018 13:32:23 +0000 (13:32 +0000)]
x86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0
So that MMCFG regions not present in the MCFG ACPI table can be added
at run time by the hardware domain.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
--- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v7:
- Add newline in hvm_physdev_op for non-fallthrough case.
Changes since v6:
- Do not return EEXIST if the same exact region is already tracked by
Xen.
Changes since v5:
- Check for has_vpci before calling register_vpci_mmcfg_handler
instead of checking for is_hvm_domain.
Changes since v4:
- Change the hardware_domain check in hvm_physdev_op to a vpci check.
- Only register the MMCFG area, but don't scan it.
Roger Pau Monne [Fri, 23 Feb 2018 13:32:22 +0000 (13:32 +0000)]
x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas
Introduce a set of handlers for the accesses to the MMCFG areas. Those
areas are setup based on the contents of the hardware MMCFG tables,
and the list of handled MMCFG areas is stored inside of the hvm_domain
struct.
The read/writes are forwarded to the generic vpci handlers once the
address is decoded in order to obtain the device and register the
guest is trying to access.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
--- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v7:
- Add check for end_bus >= start_bus to register_vpci_mmcfg_handler.
- Protect destroy_vpci_mmcfg with the mmcfg_lock.
Changes since v6:
- Move allocation of mmcfg outside of the locked region.
- Do proper overlap checks when adding mmcfg regions.
- Return _RETRY if the mcfg region cannot be found in the read/write
handlers. This means the mcfg area has been removed between the
accept and the read/write calls.
Changes since v5:
- Switch to use pci_sbdf_t.
- Switch to the new per vpci locks.
- Move the mmcfg related external definitions to asm-x86/pci.h.
Changes since v4:
- Change the attribute of pvh_setup_mmcfg to __hwdom_init.
- Try to add as many MMCFG regions as possible, even if one fails to
add.
- Change some fields of the hvm_mmcfg struct: turn size into a
unsigned int, segment into uint16_t and bus into uint8_t.
- Convert some address parameters from unsigned long to paddr_t for
consistency.
- Make vpci_mmcfg_decode_addr return the decoded register in the
return of the function.
- Introduce a new macro to convert a MMCFG address into a BDF, and
use it in vpci_mmcfg_decode_addr to clarify the logic.
- In vpci_mmcfg_{read/write} unify the logic for 8B accesses and
smaller ones.
- Add the __hwdom_init attribute to register_vpci_mmcfg_handler.
- Test that reg + size doesn't cross a device boundary.
Changes since v3:
- Propagate changes from previous patches: drop xen_ prefix for vpci
functions, pass slot and func instead of devfn and fix the error
paths of the MMCFG handlers.
- s/ecam/mmcfg/.
- Move the destroy code to a separate function, so the hvm_mmcfg
struct can be private to hvm/io.c.
- Constify the return of vpci_mmcfg_find.
- Use d instead of v->domain in vpci_mmcfg_accept.
- Allow 8byte accesses to the mmcfg.
Roger Pau Monne [Fri, 23 Feb 2018 13:32:22 +0000 (13:32 +0000)]
vpci: introduce basic handlers to trap accesses to the PCI config space
This functionality is going to reside in vpci.c (and the corresponding
vpci.h header), and should be arch-agnostic. The handlers introduced
in this patch setup the basic functionality required in order to trap
accesses to the PCI config space, and allow decoding the address and
finding the corresponding handler that should handle the access
(although no handlers are implemented).
Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
setup inside of a x86 HVM file, since that's not shared with other
arches.
A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
whether a domain should use the newly introduced vPCI handlers, this
is only enabled for PVH Dom0 at the moment.
A very simple user-space test is also provided, so that the basic
functionality of the vPCI traps can be asserted. This has been proven
quite helpful during development, since the logic to handle partial
accesses or accesses that expand across multiple registers is not
trivial.
The handlers for the registers are added to a linked list that's keep
sorted at all times. Both the read and write handlers support accesses
that expand across multiple emulated registers and contain gaps not
emulated.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
[IO parts] Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
--- Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tim Deegan <tim@xen.org> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Paul Durrant <paul.durrant@citrix.com>
---
Changes since v9:
- Introduce HAS_VPCI Kconfig option.
- Drop Jan and Wei's RB (keep Paul's since the HAS_VPCI addition
doesn't change IO code).
Changes since v8:
- Rebase on top of XSA-256.
Changes since v7:
- Constify d in vpci_portio_read.
- ASSERT the correctness of the address in the read/write handlers.
- Add newlines between non-fallthrough case statements.
Changes since v6:
- Align the vpci handlers in the linker script.
- Switch add/remove register functions to take a vpci parameter
instead of a pci_dev.
- Expand comment of merge_result.
- Return X86EMUL_UNHANDLEABLE if accessing cfc and cf8 is disabled.
Changes since v5:
- Use a spinlock per pci device.
- Use the recently introduced pci_sbdf_t type.
- Fix test harness to use the right handler type and the newly
introduced lock.
- Move the position of the vpci sections in the linker scripts.
- Constify domain and pci_dev in vpci_{read/write}.
- Fix typos in comments.
- Use _XEN_VPCI_H_ as header guard.
Changes since v4:
* User-space test harness:
- Do not redirect the output of the test.
- Add main.c and emul.h as dependencies of the Makefile target.
- Use the same rule to modify the vpci and list headers.
- Remove underscores from local macro variables.
- Add _check suffix to the test harness multiread function.
- Change the value written by every different size in the multiwrite
test.
- Use { } to initialize the r16 and r20 arrays (instead of { 0 }).
- Perform some of the read checks with the local variable directly.
- Expand some comments.
- Implement a dummy rwlock.
* Hypervisor code:
- Guard the linker script changes with CONFIG_HAS_PCI.
- Rename vpci_access_check to vpci_access_allowed and make it return
bool.
- Make hvm_pci_decode_addr return the register as return value.
- Use ~3 instead of 0xfffc to remove the register offset when
checking accesses to IO ports.
- s/head/prev in vpci_add_register.
- Add parentheses around & in vpci_add_register.
- Fix register removal.
- Change the BUGs in vpci_{read/write}_hw helpers to
ASSERT_UNREACHABLE.
- Make merge_result static and change the computation of the mask to
avoid using a uint64_t.
- Modify vpci_read to only read from hardware the not-emulated gaps.
- Remove the vpci_val union and use a uint32_t instead.
- Change handler read type to return a uint32_t instead of modifying
a variable passed by reference.
- Constify the data opaque parameter of read handlers.
- Change the size parameter of the vpci_{read/write} functions to
unsigned int.
- Place the array of initialization handlers in init.rodata or
.rodata depending on whether late-hwdom is enabled.
- Remove the pci_devs lock, assume the Dom0 is well behaved and won't
remove the device while trying to access it.
- Change the recursive spinlock into a rw lock for performance
reasons.
Changes since v3:
* User-space test harness:
- Fix spaces in container_of macro.
- Implement a dummy locking functions.
- Remove 'current' macro make current a pointer to the statically
allocated vpcu.
- Remove unneeded parentheses in the pci_conf_readX macros.
- Fix the name of the write test macro.
- Remove the dummy EXPORT_SYMBOL macro (this was needed by the RB
code only).
- Import the max macro.
- Test all possible read/write size combinations with all possible
emulated register sizes.
- Introduce a test for register removal.
* Hypervisor code:
- Use a sorted list in order to store the config space handlers.
- Remove some unneeded 'else' branches.
- Make the IO port handlers always return X86EMUL_OKAY, and set the
data to all 1's in case of read failure (write are simply ignored).
- In hvm_select_ioreq_server reuse local variables when calling
XEN_DMOP_PCI_SBDF.
- Store the pointers to the initialization functions in the .rodata
section.
- Do not ignore the return value of xen_vpci_add_handlers in
setup_one_hwdom_device.
- Remove the vpci_init macro.
- Do not hide the pointers inside of the vpci_{read/write}_t
typedefs.
- Rename priv_data to private in vpci_register.
- Simplify checking for register overlap in vpci_register_cmp.
- Check that the offset and the length match before removing a
register in xen_vpci_remove_register.
- Make vpci_read_hw return a value rather than storing it in a
pointer passed by parameter.
- Handler dispatcher functions vpci_{read/write} no longer return an
error code, errors on reads/writes should be treated like hardware
(writes ignored, reads return all 1's or garbage).
- Make sure pcidevs is locked before calling pci_get_pdev_by_domain.
- Use a recursive spinlock for the vpci lock, so that spin_is_locked
checks that the current CPU is holding the lock.
- Make the code less error-chatty by removing some of the printk's.
- Pass the slot and the function as separate parameters to the
handler dispatchers (instead of passing devfn).
- Allow handlers to be registered with either a read or write
function only, the missing handler will be replaced by a dummy
handler (writes ignored, reads return 1's).
- Introduce PCI_CFG_SPACE_* defines from Linux.
- Simplify the handler dispatchers by removing the recursion, now the
dispatchers iterate over the list of sorted handlers and call them
in order.
- Remove the GENMASK_BYTES, SHIFT_RIGHT_BYTES and ADD_RESULT macros,
and instead provide a merge_result function in order to merge a
register output into a partial result.
- Rename the fields of the vpci_val union to u8/u16/u32.
- Remove the return values from the read/write handlers, errors
should be handled internally and signaled as would be done on
native hardware.
- Remove the usage of the GENMASK macro.
Changes since v2:
- Generalize the PCI address decoding and use it for IOREQ code also.
Changes since v1:
- Allow access to cross a word-boundary.
- Add locking.
- Add cleanup to xen_vpci_add_handlers in case of failure.
Roger Pau Monne [Thu, 1 Mar 2018 16:54:24 +0000 (16:54 +0000)]
vvmx: fixes after CR4 trapping optimizations
Commit 406817 doesn't update nested VMX code in order to take into
account L1 CR4 host mask when nested guest (L2) writes to CR4, and
thus the mask written to CR4_GUEST_HOST_MASK is likely not as
restrictive as it should be.
Also the VVMCS GUEST_CR4 value should be updated to match the
underlying value when syncing the VVMCS state.
Fixes: 406817 ("vmx/hap: optimize CR4 trapping") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Jun Nakajima <jun.nakajima@intel.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Sergey Dyasli <sergey.dyasli@citrix.com>
---
I've manually tested and AFAICT this fixes the osstest failure
detected in 120076 ("test-amd64-amd64-qemuu-nested-intel").
---
Changes since v1:
- Use guest_cr[4] in order to update the nested VMCS GUEST_CR4.
Roger Pau Monne [Wed, 28 Feb 2018 09:48:50 +0000 (09:48 +0000)]
xl: remove apic option for PVH guests
XSA-256 forces the local APIC to always be enabled for PVH guests, so
ignore any apic option for PVH guests. Update the documentation
accordingly.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tim Deegan <tim@xen.org>
---
Changes since v1:
- Remove the trailing "if present" from the AP startup.
Jan Beulich [Thu, 1 Mar 2018 14:10:02 +0000 (15:10 +0100)]
firmware/shim: better filtering of dependency files during Xen tree setup
I have no idea what *.d1 is supposed to refer to - we only have .*.d
and .*.d2 files (note also the leading dot). Also switch to passing
-name instead of -path to find - that's a requirement for .*.d et al to
work, but would probably have been better from the beginning.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 1 Mar 2018 14:09:38 +0000 (15:09 +0100)]
libxc: really tolerate empty PV records
Commit 119ee4d773 ("tools/libxc: Tolerate specific zero-content records
in migration v2 streams") meant tolerate those, but failed to set rc
accordingly.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Wed, 21 Feb 2018 18:10:00 +0000 (18:10 +0000)]
x86/hvm: Constify the read side of vlapic handling
This is in preparation to make hvm_x2apic_msr_read() take a const vcpu
pointer. One modification is to alter vlapic_get_tmcct() to not use current.
This in turn needs an alteration to hvm_get_guest_time_fixed(), which is safe
because the only mutable action it makes is to take the domain plt lock.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 26 Feb 2018 14:23:03 +0000 (14:23 +0000)]
x86/vmx: Simplfy the default cases in vmx_msr_{read,write}_intercept()
The default case of vmx_msr_write_intercept() in particular is very tangled.
First of all, fold long_mode_do_msr_{read,write}() into their callers. These
functions were split out in the past because of the 32bit build of Xen, but it
is unclear why the cases weren't simply #ifdef'd in place.
Next, invert the vmx_write_guest_msr()/is_last_branch_msr() logic to break if
the condition is satisfied, rather than nesting if it wasn't. This allows the
wrmsr_hypervisor_regs() call to be un-nested with respect to the other default
logic.
No practical difference from a guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Razvan Cojocaru [Wed, 28 Feb 2018 10:38:15 +0000 (12:38 +0200)]
x86/hvm: fix domain crash when CR3 has the noflush bit set
In hardware, when PCID support is enabled and the NOFLUSH bit is set
when writing a CR3 value, the hardware will clear that that bit and
change the CR3 without flushing the TLB. hvm_set_cr3(), however, was
ignoring this bit; the result was that post-vm_event checks detected
an invalid CR3 value and crashed the domain.
Handle NOFLUSH in hvm_set_cr3() by:
1. Clearing the bit
2. Passing a "noflush" flag to lower-level cr3 setting functions to
indicate that a flush should not be performed.
Also clear X86_CR3_NOFLUSH when reporting CR3 monitored CR3 writes.
This allows introspection to be used on VMs whose operating system uses
the NOFLUSH bit.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reported-by: Bitweasil <bitweasil@cryptohaze.com> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Ian Jackson [Wed, 31 Jan 2018 13:02:32 +0000 (13:02 +0000)]
release-checklist.txt: Say to increment SUPPORT.md version number
CC: Andrew Cooper <andrew.cooper3@citrix.com> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Ian Jackson [Wed, 31 Jan 2018 13:02:01 +0000 (13:02 +0000)]
SUPPORT.md: increment version number
CC: Andrew Cooper <andrew.cooper3@citrix.com> Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Andrew Cooper [Thu, 24 Aug 2017 14:31:08 +0000 (15:31 +0100)]
common/gnttab: Introduce command line feature controls
This patch was originally released as part of XSA-226. It retains the same
command line syntax (as various downstreams are mitigating XSA-226 using this
mechanism) but the defaults have been updated due to the revised XSA-226
patched, after which transitive grants are believed to functioning
properly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 27 Feb 2018 14:12:23 +0000 (15:12 +0100)]
x86/HVM: don't give the wrong impression of WRMSR succeeding
... for non-existent MSRs: wrmsr_hypervisor_regs()'s comment clearly
says that the function returns 0 for unrecognized MSRs, so
{svm,vmx}_msr_write_intercept() should not convert this into success. We
don't want to unconditionally fail the access though, as we can't be
certain the list of handled MSRs is complete enough for the guest types
we care about, so instead mirror what we do on the read paths and probe
the MSR to decide whether to raise #GP.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Roger Pau Monné [Tue, 27 Feb 2018 13:10:33 +0000 (14:10 +0100)]
vmx/hap: optimize CR4 trapping
There a bunch of bits in CR4 that should be allowed to be set directly
by the guest without requiring Xen intervention, currently this is
already done by passing through guest writes into the CR4 used when
running in non-root mode, but taking an expensive vmexit in order to
do so.
xenalyze reports the following when running a PV guest in shim mode:
Note that this optimized trapping is currently only applied to guests
running with HAP on Intel hardware. If using shadow paging more CR4
bits need to be unconditionally trapped, which makes this approach
unlikely to yield any important performance improvements.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 27 Feb 2018 13:10:00 +0000 (14:10 +0100)]
x86/PV: fix off-by-one in I/O bitmap limit check
With everyone having their tags below agreeing that putting things the
other way around in the comparison makes things easier to understand, do
that rearrangement while changing the line anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.apu@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Alexandru Isaila [Tue, 27 Feb 2018 13:09:21 +0000 (14:09 +0100)]
hvm/svm: implement CPUID events
At this moment the CPUID events for the AMD architecture are not
forwarded to the monitor layer.
This patch adds the CPUID event to the common capabilities and then
forwards the event to the monitor layer.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Andrew Cooper [Tue, 27 Feb 2018 13:08:36 +0000 (14:08 +0100)]
x86/hvm: Disallow the creation of HVM domains without Local APIC emulation
There are multiple problems, not necesserily limited to:
* Guests which configure event channels via hvmop_set_evtchn_upcall_vector(),
or which hit %cr8 emulation will cause Xen to fall over a NULL vlapic->regs
pointer.
* On Intel hardware, disabling the TPR_SHADOW execution control without
reenabling CR8_{LOAD,STORE} interception means that the guests %cr8
accesses interact with the real TPR. Amongst other things, setting the
real TPR to 0xf blocks even IPIs from interrupting this CPU.
* On hardware which sets up the use of Interrupt Posting, including
IOMMU-Posting, guests run without the appropriate non-root configuration,
which at a minimum will result in dropped interrupts.
Whether no-LAPIC mode is of any use at all remains to be seen.
This is XSA-256.
Reported-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 27 Feb 2018 13:07:12 +0000 (14:07 +0100)]
gnttab: don't blindly free status pages upon version change
There may still be active mappings, which would trigger the respective
BUG_ON(). Split the loop into one dealing with the page attributes and
the second (when the first fully passed) freeing the pages. Return an
error if any pages still have pending references.
This is part of XSA-255.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 27 Feb 2018 13:04:44 +0000 (14:04 +0100)]
gnttab/ARM: don't corrupt shared GFN array
... by writing status GFNs to it. Introduce a second array instead.
Also implement gnttab_status_gmfn() properly now that the information is
suitably being tracked.
While touching it anyway, remove a misguided (but luckily benign) upper
bound check from gnttab_shared_gmfn(): We should never access beyond the
bounds of that array.
This is part of XSA-255.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 27 Feb 2018 13:03:27 +0000 (14:03 +0100)]
memory: don't implicitly unpin for decrease-reservation
It very likely was a mistake (copy-and-paste from domain cleanup code)
to implicitly unpin here: The caller should really unpin itself before
(or after, if they so wish) requesting the page to be removed.
This is XSA-252.
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 19 Feb 2018 14:54:57 +0000 (14:54 +0000)]
x86/time: Rework pv_soft_rdtsc() to aid further cleanup
Having pv_soft_rdtsc() emulate all parts of an rdtscp is awkward, and gets in
the way of some intended cleanup.
* Drop the rdtscp parameter and always make the caller responsible for ecx
updates when appropriate.
* Switch the function from being void, and return the main timestamp in the
return value.
The regs parameter is still needed, but only for the stats collection, once
again bringing into question their utility. The parameter can however switch
to being const.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 19 Feb 2018 10:40:20 +0000 (10:40 +0000)]
x86/pv: Avoid leaking other guests' MSR_TSC_AUX values into PV context
If the CPU pipeline supports RDTSCP or RDPID, a guest can observe the value in
MSR_TSC_AUX, irrespective of whether the relevant CPUID features are
advertised/hidden.
At the moment, paravirt_ctxt_switch_to() only writes to MSR_TSC_AUX if
TSC_MODE_PVRDTSCP mode is enabled, but this is not the default mode.
Therefore, default PV guests can read the value from a previously scheduled
HVM vcpu, or TSC_MODE_PVRDTSCP-enabled PV guest.
Alter the PV path to always write to MSR_TSC_AUX, using 0 in the common case.
To amortise overhead cost, introduce wrmsr_tsc_aux() which performs a lazy
update of the MSR, and use this function consistently across the codebase.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Julien Grall [Fri, 23 Feb 2018 18:57:29 +0000 (18:57 +0000)]
xen/arm: vpsci: Rework the logic to start AArch32 vCPU in Thumb mode
32-bit domain is able to select the instruction (ARM vs Thumb) to use
when boot a new vCPU via CPU_ON. This is indicated via bit[0] of the
entry point address (see "T32 support" in PSCI v1.1 DEN0022D). bit[0]
must be cleared when setting the PC.
At the moment, Xen is setting the CPSR.T but never clear bit[0]. Clear
it to match the specification.
At the same time, slighlty rework the code to make clear thumb is only for
32-bit domain. Lastly, take the opportunity to switch is_thumb from int
to bool.
Julien Grall [Fri, 23 Feb 2018 18:57:24 +0000 (18:57 +0000)]
xen/arm: vpsci: Remove parameter 'ver' from do_common_cpu
Currently, the behavior of do_common_cpu will slightly change depending
on the PSCI version passed in parameter. Looking at the code, more the
specific 0.2 behavior could move out of the function or adapted for 0.1:
- x0/r0 can be updated on PSCI 0.1 because general purpose registers
are undefined upon CPU on. This was deduced from the spec not
mentioning the state of general purpose registers on CPU on.
- PSCI 0.1 does not defined PSCI_ALREADY_ON. However, it would be
safer to bail out if the CPU is already on.
Based on this, the parameter 'ver' is removed and do_psci_cpu_on
(implementation for PSCI 0.1) is adapted to avoid returning
PSCI_ALREADY_ON.
Julien Grall [Fri, 23 Feb 2018 18:57:23 +0000 (18:57 +0000)]
xen/arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
Now that we've standardised on SMCCC v1.1 to perform the branch
prediction invalidation, let's drop the previous band-aid. If vendors
haven't updated their firmware to do SMCCC 1.1, they haven't updated
PSCI either, so we don't loose anything.
This is aligned with the Linux commit 3a0a397ff5ff.
One of the major improvement of SMCCC v1.1 is that it only clobbers the
first 4 registers, both on 32 and 64bit. This means that it becomes very
easy to provide an inline version of the SMC call primitive, and avoid
performing a function call to stash the registers that woudl otherwise
be clobbered by SMCCC v1.0.
This patch has been adapted to Xen from Linux commit f2d3b2e8759a. The
changes mades are:
- Using Xen coding style
- Remove HVC as not used by Xen
- Add arm_smccc_res structure
Julien Grall [Fri, 23 Feb 2018 18:57:20 +0000 (18:57 +0000)]
xen/arm: psci: Detect SMCCC version
PSCI 1.0 and later allows the SMCCC version to be (indirectly) probed
via PSCI_FEATURES. If the PSCI_FEATURES does not exist (PSCI 0.2 or
earlier) and the function returns an error, then we assume SMCCC 1.0
is implemented.
Add macros SMCCC_VERSION, SMCCC_VERSION_{MINOR, MAJOR} to easily convert
between a 32-bit value and a version number. The encoding is based on
2.2.2 in "Firmware interfaces for mitigation CVE-2017-5715" (ARM DEN 0070A).
Also re-use them to define ARM_SMCCC_VERSION_1_0 and ARM_SMCCC_VERSION_1_1.
Julien Grall [Fri, 23 Feb 2018 18:57:17 +0000 (18:57 +0000)]
xen/arm64: Implement a fast path for handling SMCCC_ARCH_WORKAROUND_1
The function SMCCC_ARCH_WORKAROUND_1 will be called by the guest for
hardening the branch predictor. So we want the handling to be as fast as
possible.
As the mitigation is applied on every guest exit, we can check for the
call before saving all the context and return very early.
For now, only provide a fast path for HVC64 call. Because the code rely
on 2 registers, x0 and x1 are saved in advance.
Julien Grall [Fri, 23 Feb 2018 18:57:15 +0000 (18:57 +0000)]
xen/arm: vsmc: Implement SMCCC_ARCH_WORKAROUND_1 BP hardening support
SMCCC 1.1 offers firmware-based CPU workarounds. In particular,
SMCCC_ARCH_WORKAROUND_1 provides BP hardening for variant 2 of XSA-254
(CVE-2017-5715).
If the hypervisor has some mitigation for this issue, report that we
deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the hypervisor
workaround on every guest exit.
Julien Grall [Fri, 23 Feb 2018 18:57:14 +0000 (18:57 +0000)]
xen/arm: vsmc: Implement SMCCC 1.1
The new SMC Calling Convention (v1.1) allows for a reduced overhead when
calling into the firmware, and provides a new feature discovery
mechanism. See "Firmware interfaces for mitigating CVE-2017-5715"
ARM DEN 00070A.
Julien Grall [Fri, 23 Feb 2018 18:57:13 +0000 (18:57 +0000)]
xen/arm: vpsci: Add support for PSCI 1.1
At the moment, Xen provides virtual PSCI interface compliant with 0.1
and 0.2. Since them, the specification has been updated and the latest
version is 1.1 (see ARM DEN 0022D).
>From an implementation point of view, only PSCI_FEATURES is mandatory.
The rest is optional and can be left unimplemented for now.
At the same time, the compatible for PSCI node have been updated to
expose "arm,psci-1.0".
Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: mirela.simonovic@aggios.com
Julien Grall [Fri, 23 Feb 2018 18:57:12 +0000 (18:57 +0000)]
xen/arm: psci: Rework the PSCI definitions
Some PSCI functions are only available in the 32-bit version. After
recent changes, Xen always needs to know whether the call was made using
32-bit id or 64-bit id. So we don't emulate reserved one.
With the current naming scheme, it is not easy to know which call
supports 32-bit and 64-bit id. So rework the definitions to encode the
version in the name. From now the functions will be named PSCI_0_2_FNxx
where xx is 32 or 64.
Andrew Cooper [Tue, 20 Feb 2018 11:08:32 +0000 (11:08 +0000)]
x86/hvm: Don't shadow the domain parameter in hvm_save_cpu_msrs()
c/s d2f86bf604 which introduced "struct hvm_save_descriptor *d" accidentally
ended up shadowing the "struct domain *d" function parameter. Rename the
former to desc.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Fri, 23 Feb 2018 14:11:00 +0000 (14:11 +0000)]
x86/clang: allow integrated assembler usage
If the required features are present.
Modify as-option-add to add an option in case the test fails, and use
it to detect whether the required clang integrated assembler features
are present.
This patch has been tested with clang 3.5, clang 6, gcc 6.4.0 without
retpoline support and gcc 7.3.1 with retpoline support.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Alan Robinson [Fri, 23 Feb 2018 13:24:56 +0000 (14:24 +0100)]
get_maintainers.pl: Avoid THE_REST when files are added or removed
When files are added or removed /dev/null is used as a place
holder name in the patch for the absent file. Don't try and
find a MAINTAINER for this place holder, it only ever flags
and then spams THE REST, behaviour for a real filename is
unchanged.
Signed-off-by: Alan Robinson <Alan.Robinson@ts.fujitsu.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Wed, 21 Feb 2018 17:58:04 +0000 (17:58 +0000)]
build: Help attempts to syntax highlight Config.mk
Some attempts to syntax highlight Config.mk end up thinking that most of
Config.mk is a string, due to the unbalanced squote. Provide a balancing
squote in a comment to compensate.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Doug Goldstein [Fri, 23 Feb 2018 10:05:35 +0000 (11:05 +0100)]
xen: append EXTRA_CFLAGS_XEN_CORE to CFLAGS
Allow a user to supply extra CFLAGS via the EXTRA_CFLAGS_XEN_CORE
environment variable for hypervisor builds. This is not a
configuration that is supported but is only aimed to help support
testing and troubleshooting when you need to make changes.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monné [Fri, 23 Feb 2018 10:05:19 +0000 (11:05 +0100)]
build: remove shim related targets
There's no need to have shim specific targets, so just use the regular
xen makefile targets in order to build the shim binary.
When the shim is build as part of the firmware directory install the
stripped Xen binary to the firmware directory and place a binary with
symbols in the debug directory.
The objcopy step of the shim build is also removed in this patch:
since the shim is booted in PVH mode there's no need for the resulting
binary to be in elf32 format. Xen can load PVH kernels with either a
32 or 64bit elf header.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Brian Woods [Fri, 23 Feb 2018 10:03:36 +0000 (11:03 +0100)]
x86/svm: add support for pause filtering threshold
Add support for enabling the pause filtering threshold feature. This
causes the pause filtering count to reset if there's pause filtering
threshold cycles or greater between pauses. See AMD APM Vol 2 Section
15.14.4 for more details.
The values of the pause filtering count and threshold were found by
iterating over different values of the count and threshold while running
kernbench and a pi spigot algorithm with yields placed in it. A
balanced setting for both variable provides:
(Using averaged elapsed time with kernbench)
old = 852.0
new = 848.8
improvement = .4%
For system without pause filtering threshold, the change, from 3000 to
4000 for the count, should not negatively effect system performance.
Signed-off-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Roger Pau Monné [Fri, 23 Feb 2018 10:00:31 +0000 (11:00 +0100)]
x86: fix indirect thunk usage of CONFIG_INDIRECT_THUNK
When indirect_thunk_asm.h is instantiated directly into assembly files
CONFIG_INDIRECT_THUNK might not be defined, and thus using .if against
it is wrong.
Add a check to define CONFIG_INDIRECT_THUNK to 0 if not defined, so
that using .if CONFIG_INDIRECT_THUNK is always correct.
This suppresses the following clang error:
<instantiation>:8:9: error: expected absolute expression
.if CONFIG_INDIRECT_THUNK == 1
^
<instantiation>:1:1: note: while in macro instantiation
INDIRECT_BRANCH call %rdx
^
entry.S:589:9: note: while in macro instantiation
INDIRECT_CALL %rdx
^
Note that this is a preparatory patch in order to enable clang's
integrated assembler, the integrated assembler is not yet enabled for
assembly files.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Haozhong Zhang [Fri, 23 Feb 2018 09:59:31 +0000 (10:59 +0100)]
VT-d: use two 32-bit writes to update DMAR fault address registers
The 64-bit DMAR fault address is composed of two 32 bits registers
DMAR_FEADDR_REG and DMAR_FEUADDR_REG. According to VT-d spec:
"Software is expected to access 32-bit registers as aligned doublewords",
a hypervisor should use two 32-bit writes to DMAR_FEADDR_REG and
DMAR_FEUADDR_REG separately in order to update a 64-bit fault address,
rather than a 64-bit write to DMAR_FEADDR_REG. Note that when x2APIC
is not enabled DMAR_FEUADDR_REG is reserved and it's not necessary to
update it.
Though I haven't seen any errors caused by such one 64-bit write on
real machines, it's still better to follow the specification.
Fixes: ae05fd3912b ("VT-d: use qword MMIO access for MSI address writes") Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Brian Woods [Tue, 20 Feb 2018 22:27:02 +0000 (16:27 -0600)]
x86/svm: add EFER SVME support for VGIF/VLOAD
Only enable virtual VMLOAD/SAVE and VGIF if the guest EFER.SVME is set.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Julien Grall [Wed, 21 Feb 2018 14:02:44 +0000 (14:02 +0000)]
xen/tmem: Convert the file common/tmem_xen.c to use typesafe MFN
The file common/tmem_xen.c is now converted to use typesafe. This is
requiring to override the macro page_to_mfn to make it work with mfn_t.
Note that all variables converted to mfn_t havem there initial value,
when set, switch from 0 to INVALID_MFN. This is fine because the initial
values was always overriden before used.
Also add a couple of missing newlines suggested by Andrew in the code.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Roger Pau Monne [Tue, 20 Feb 2018 14:10:12 +0000 (14:10 +0000)]
build: filter out command line assembler arguments
If the assembler is not used. This happens when using cc -E or cc -S
for example. GCC will just ignore the -Wa,... when the assembler is
not called, but clang will complain loudly and fail.
Also enable passing -Wa,-I$(BASEDIR)/include to clang now that it's
safe to do so.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Tue, 20 Feb 2018 14:10:11 +0000 (14:10 +0000)]
build: do not hardcode AFLAGS for as-insn tests
Hardcoding as-insn to use AFLAGS is not correct. For once the test is
performed using a C file with inline assembly, and secondly the flags
used can be passed by the caller together with the CC.
Fix as-insn-check to pass the flags given as parameter to the test.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Fix usage comments as they are changing] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Fri, 16 Feb 2018 14:59:56 +0000 (14:59 +0000)]
xen/arm: vgic: Make sure the number of SPIs is a multiple of 32
The vGIC relies on having a pending_irq available for every IRQs
described in the ranks. As each rank describes 32 interrupts, we need to
make sure the number of SPIs is a multiple of 32.
Sergey Dyasli [Mon, 19 Feb 2018 11:29:26 +0000 (11:29 +0000)]
x86/msr: add Raw and Host domain policies
Raw policy contains the actual values from H/W MSRs. Add PLATFORM_INFO
msr to the policy during probe_cpuid_faulting().
Host policy may have certain features disabled if Xen decides not
to use them. For now, make Host policy equal to Raw policy with
cpuid_faulting availability dependent on X86_FEATURE_CPUID_FAULTING.
Finally, derive HVM/PV max domain policies from the Host policy.
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Igor Druzhinin [Tue, 20 Feb 2018 09:16:56 +0000 (10:16 +0100)]
x86/nmi: start NMI watchdog on CPU0 after SMP bootstrap
We're noticing a reproducible system boot hang on certain
Skylake platforms where the BIOS is configured in legacy
boot mode with x2APIC disabled. The system stalls immediately
after writing the first SMP initialization sequence into APIC ICR.
The cause of the problem is watchdog NMI handler execution -
somewhere near the end of NMI handling (after it's already
rescheduled the next NMI) it tries to access IO port 0x61
to get the actual NMI reason on CPU0. Unfortunately, this
port is emulated by BIOS using SMIs and this emulation for
some reason takes more time than we expect during INIT-SIPI-SIPI
sequence. As the result, the system is constantly moving between
NMI and SMI handler and not making any progress.
To avoid this, initialize the watchdog after SMP bootstrap on
CPU0 and, additionally, protect the NMI handler by moving
IO port access before NMI re-scheduling. The latter should also
help in case of post boot CPU onlining. Although we're running
watchdog at much lower frequency at this point, it's neveretheless
possible we may trigger the issue anyway.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 20 Feb 2018 09:10:59 +0000 (10:10 +0100)]
shim: allow building of just the shim with build-ID-incapable linker
The ELF note the shim build inserts causes mkelf32 to choke on the
second program header. However, the output of mkelf32 isn't really
needed when building inside tools/firmware/ - an attempt to build it is
made solely because of a wrong dependency.
Further changes to the make logic will be needed to also allow building
a shim-enabled "normal" xen with such a linker (as it looks the --notes
option will need passing not just when the linker support build ID
generation).
Also drop a stray variable setting from the x86 Makefile.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Fri, 16 Feb 2018 18:38:48 +0000 (19:38 +0100)]
tools: libxenstat: fix format string overflow
With gcc 7.3.0, the build fails like this:
src/xenstat_linux.c: In function ‘getBridge’
src/xenstat_linux.c:78:34: warning: ‘%s’ directive writing up to 255 bytes into a region of size 241 [-Wformat-overflow=]
sprintf(tmp, "/sys/class/net/%s/bridge", de->d_name);
^~
src/xenstat_linux.c:78:5: note: ‘sprintf’ output between 23 and 278 bytes into a destination of size 256
sprintf(tmp, "/sys/class/net/%s/bridge", de->d_name);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix by making the buffer bigger.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Mon, 19 Feb 2018 13:00:31 +0000 (14:00 +0100)]
shut down domain when last vCPU goes down
I've just had to deal with an early boot crash of Linux which occurred
so early that even "earlyprintk=xen" did not produce any useful output.
Hence the domain appeared to hang, while in fact it had brought down its
only vCPU. By translating this to a shutdown, the situation will be
better recognizable.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 19 Feb 2018 12:59:37 +0000 (13:59 +0100)]
x86/PV: avoid indirect call/thunk in I/O emulation
The stub is within reach from the .text section, so there's no point
using an indirect call here. This has the added benefit of there no
longer being two sufficiently different approaches, breaking one of
which people may not even notice.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citix.com>
Uwe Dannowski [Fri, 16 Feb 2018 13:19:54 +0000 (13:19 +0000)]
x86/microcode: Propagate microcode update errors
Errors on updating the microcode in the processor were silently
dropped when invoked via the microcode_update hypercall. Also, the log
message was misleading.
Signed-off-by: Uwe Dannowski <uwed@amazon.de> Reviewed-by: Stefan Nuernberger <snu@amazon.de> Reviewed-by: Martin Pohlack <mpohlack@amazon.de> Reviewed-by: Amit Shah <aams@amazon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 15 Feb 2018 17:17:32 +0000 (18:17 +0100)]
x86/srat: fix end calculation in nodes_cover_memory()
Along the lines of commit 7226486767 ("x86/srat: fix the end pfn check
in valid_numa_range()") nodes_cover_memory() also doesn't consistently
use "end": It's set to an inclusive value initially, but then compared
to the exclusive "end" field of struct node and also possibly set to
nodes[j].start, making it exclusive too. Change the initialization to
make the variable consistently exclusive.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ross Lagerwall [Thu, 15 Feb 2018 17:16:17 +0000 (18:16 +0100)]
x86/hvm/dmop: only copy what is needed to/from the guest
dm_op() fails with -EFAULT if the struct xen_dm_op given by the guest is
smaller than Xen's struct xen_dm_op. This is a problem because DMOP is
meant to be a stable ABI but it breaks whenever the size of struct
xen_dm_op changes.
To fix this, change how the copying to and from the guest is done. When
copying from the guest, first copy the header and inspect the op. Then,
only copy the correct amount needed for that op. When copying to the
guest, don't copy the header. Rather, copy only the correct amount
needed for that particular op.
So now the dm_op() will fail if the guest does not supply enough bytes
for the specific op. It will not fail if the guest supplies too many
bytes for the specific op, but Xen will not copy the extra bytes.
Remove some now unused macros and helper functions.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Alexandru Isaila [Thu, 15 Feb 2018 10:22:26 +0000 (12:22 +0200)]
hvm/svm: Enable CR events
The CR_INTERCEPT_CR3_WRITE intercept is out of the vmcb->_cr_intercepts
so the AMD arch can't intercept CR events.
This patch implements the CR intercept by adding the flag on a
write_ctrlreg event. The monitor write ctrlreg event is moved from the
Intel side to the common capabilities side.
We just need to enable the SVM intercept and then hvm_mov_to_cr() will
forward the event on to the monitor when appropriate.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Alexandru Isaila [Thu, 15 Feb 2018 10:22:25 +0000 (12:22 +0200)]
hvm/svm: Enable MSR events
At this moment there is no function to enable msr interception on svm.
This patch implements this function and moves the mov to msr monitor
event
form the Intel arch side to the common capabilities.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Alexandru Isaila [Thu, 15 Feb 2018 10:22:24 +0000 (12:22 +0200)]
hvm/svm: Enable Breakpoint events
This commit implements the breakpoint events for svm.
At the moment, the Breakpoint vmexit is not forwarded to the monitor
layer.
This patch adds the hvm_monitor_debug call to the VMEXIT_EXCEPTION_BP.
Also, the Software Breakpoint cap is moved from the Intel arch to the
common part of the code.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Mon, 12 Feb 2018 16:06:00 +0000 (16:06 +0000)]
x86/xpti: Hide almost all of .text and all .data/.rodata/.bss mappings
The current XPTI implementation isolates the directmap (and therefore a lot of
guest data), but a large quantity of CPU0's state (including its stack)
remains visible.
Furthermore, an attacker able to read .text is in a vastly superior position
to normal when it comes to fingerprinting Xen for known vulnerabilities, or
scanning for ROP/Spectre gadgets.
Collect together the entrypoints in .text.entry (currently 3x4k frames, but
can almost certainly be slimmed down), and create a common mapping which is
inserted into each per-cpu shadow. The stubs are also inserted into this
mapping by pointing at the in-use L2. This allows stubs allocated later (SMP
boot, or CPU hotplug) to work without further changes to the common mappings.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Wed, 14 Feb 2018 12:22:23 +0000 (12:22 +0000)]
xen/arm: cpuerrata: Actually check errata on non-boot CPUs
The cpu errata framework was introduced in commit 8b01f6364f "xen/arm:
Detect silicon revision and set cap bits accordingly" and was meant to
detect errata present on any CPUs (via check_local_cpu_errata). However,
the function to check the MIDR (is_affected_midr_range) mistakenly
always use the boot CPU MIDR.
Fix is_affected_midr_range to use the current CPU MIDR.
Julien Grall [Wed, 14 Feb 2018 15:30:44 +0000 (15:30 +0000)]
xen/arm: Extend the number of memory banks supported
When booting using Grub on Thunder-X, the number of memory available is
greater than 64. Bump the number to 128, so we can take advantage of all
the memory.
Alexandru Isaila [Mon, 12 Feb 2018 15:08:15 +0000 (17:08 +0200)]
asm-x86/monitor: Fix monitor capability reporting on SVM systems
No monitor features are available on AMD and all
capabilities are passed only to the Intel processor architecture.
This means that the arch_monitor_get_capabilities returns
capabilities = 0.
This patch is separating out features which are implemented on both
systems from those implemented only on Intel, so that we advertize the
working capabilities on AMD.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Andrew Cooper [Wed, 14 Feb 2018 10:38:34 +0000 (10:38 +0000)]
x86/spec_ctrl: Fix several bugs in SPEC_CTRL_ENTRY_FROM_INTR_IST
DO_OVERWRITE_RSB clobbers %rax, meaning in practice that the bti_ist_info
field gets zeroed. Older versions of this code had the DO_OVERWRITE_RSB
register selectable, so reintroduce this ability and use it to cause the
INTR_IST path to use %rdx instead.
The use of %dl for the %cs.rpl check means that when an IST interrupt hits
Xen, we try to load 1 into the high 32 bits of MSR_SPEC_CTRL, suffering a #GP
fault instead.
Also, drop an unused label which was a copy/paste mistake.
Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 14 Feb 2018 07:16:00 +0000 (08:16 +0100)]
firmware/shim: avoid mkdir error during Xen tree setup
"mkdir -p" reports a missing operand, as config/ has no subdirs. Oddly
enough this doesn't cause the whole command (and hence the build to
fail), despite the "set -e" now covering the entire set of commands -
perhaps a quirk of the relatively old bash I've seen this with (a few
simple experiments suggest that commands inside () producing a non-
success status would exit the inner shell, but not the outer one).
Add a dummy . argument to the invocation.
Suggested-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Sameer Goel [Tue, 13 Feb 2018 16:56:42 +0000 (17:56 +0100)]
bitops: rename LOG_2 to ilog2
Changing the name of the macro from LOG_2 to ilog2.This makes the function name
similar to its Linux counterpart. Since, this is not used in multiple places,
the code churn is minimal.
This change helps in porting unchanged code from Linux.
Signed-off-by: Sameer Goel <sameer.goel@linaro.org> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monné [Tue, 13 Feb 2018 16:55:43 +0000 (17:55 +0100)]
xsm: add bodge when compiling with llvm coverage support
llvm coverage support seems to disable some of the optimizations
needed in order to compile xsm, and the end result is that references
to __xsm_action_mismatch_detected are left in the object files.
Since coverage support cannot be used in production, introduce
__xsm_action_mismatch_detected for llvm coverage builds.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Roger Pau Monné [Tue, 13 Feb 2018 16:54:09 +0000 (17:54 +0100)]
coverage: introduce support for llvm profiling
Introduce the functionality in order to fill the hooks of the
cov_sysctl_ops struct. Note that the functionality is still not wired
into the build system.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 13 Feb 2018 16:29:50 +0000 (17:29 +0100)]
x86: use paging_mark_pfn_dirty()
... in preference over paging_mark_dirty(), when the PFN is known
anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 13 Feb 2018 16:28:36 +0000 (17:28 +0100)]
x86/mm: clean up SHARED_M2P{,_ENTRY} uses
Stop open-coding SHARED_M2P() and drop a pointless use of it from
paging_mfn_is_dirty() (!VALID_M2P() is a superset of SHARED_M2P()) and
another one from free_page_type() (prior assertions render this
redundant).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Juergen Gross [Tue, 21 Nov 2017 11:06:06 +0000 (12:06 +0100)]
tools/libxl: mark special pages as reserved in e820 map for PVH
The "special pages" for PVH guests include the frames for console and
Xenstore ring buffers. Those have to be marked as "Reserved" in the
guest's E820 map, as otherwise conflicts might arise later e.g. when
hotplugging memory into the guest.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>