]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agoretire bitkeeper bits
Doug Goldstein [Sun, 25 Mar 2018 02:32:47 +0000 (21:32 -0500)]
retire bitkeeper bits

While the project could migrate from git to $nextscm, its unlikely that
these bits will ever be useful again.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agoConfig.mk: update seabios to release 1.11.1
Wei Liu [Tue, 27 Mar 2018 15:26:27 +0000 (16:26 +0100)]
Config.mk: update seabios to release 1.11.1

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/vpci: validate data first in register_vpci_mmcfg_handler
Wei Liu [Tue, 27 Mar 2018 11:04:20 +0000 (12:04 +0100)]
x86/vpci: validate data first in register_vpci_mmcfg_handler

Avoid the need to deallocate memory when the data is invalid. It has
the benefit to not fragment memory in Xen.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agovpci: make sure handlers can deal with size == 0
Roger Pau Monné [Tue, 27 Mar 2018 08:28:24 +0000 (10:28 +0200)]
vpci: make sure handlers can deal with size == 0

The code is not prepared to handle such case, so just return early. In
the debug case add an assert.

Coverity ID: 1430809

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agovpci/msi: fix size of the vectors fields
Roger Pau Monné [Tue, 27 Mar 2018 08:27:46 +0000 (10:27 +0200)]
vpci/msi: fix size of the vectors fields

The current size (5bits) is not enough to store the maximum number of
vectors (32), bump it by one bit.

Also change the layout so that 'vectors' is aligned to a 8bit
boundary.

Note that the size of the struct is still the same.

Coverity ID: 1430810

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/svm: implement debug events
Alexandru Isaila [Tue, 27 Mar 2018 08:26:42 +0000 (10:26 +0200)]
x86/svm: implement debug events

At this moment the Debug events for the AMD architecture are not
forwarded to the monitor layer.

This patch adds the Debug event to the common capabilities, adds
the VMEXIT_ICEBP then forwards the event to the monitor layer.

Chapter 2: SVM Processor and Platform Extensions: "Note: A vector 1
exception generated by the single byte INT1
instruction (also known as ICEBP) does not trigger the #DB
intercept. Software should use the dedicated ICEBP
intercept to intercept ICEBP"

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
7 years agoxsm/schedop: introduce vcpuinfo permissions verification
Andrii Anisov [Tue, 27 Mar 2018 08:26:17 +0000 (10:26 +0200)]
xsm/schedop: introduce vcpuinfo permissions verification

Introduce per-vcpu scheduler operations permission verification.
As long as Xvcpuinfo are in fact scheduler configuration manipulations
there is no need to introduce specific access vectors.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Daniel De Graaf <dgegra@tycho.nsa.gov>
7 years agoARM: GIC: Allow reading pending state of a hardware IRQ
Andre Przywara [Thu, 4 Jan 2018 12:38:58 +0000 (12:38 +0000)]
ARM: GIC: Allow reading pending state of a hardware IRQ

To synchronize level triggered interrupts which are mapped into a guest,
we need to update the virtual line level at certain points in time.
For a hardware mapped interrupt the GIC is the only place where we can
easily access this information.
Implement a gic_hw_operations member to return the pending state of a
particular interrupt. Due to hardware limitations this only works for
private interrupts of the current CPU, so there is no CPU field in the
prototype.
This adds gicv2/3_peek_irq() helper functions, to read a bit in a bitmap
spread over several MMIO registers.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: GIC: Allow tweaking the active and pending state of an IRQ
Andre Przywara [Fri, 26 Jan 2018 16:09:44 +0000 (16:09 +0000)]
ARM: GIC: Allow tweaking the active and pending state of an IRQ

When playing around with hardware mapped, level triggered virtual IRQs,
there is the need to explicitly set the active or pending state of an
interrupt at some point.
To prepare the GIC for that, we introduce a set_active_state() and a
set_pending_state() function to let the VGIC manipulate the state of
an associated hardware IRQ.
This takes care of properly setting the _IRQ_INPROGRESS bit.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
7 years agoARM: GIC: add GIC_INVALID to enum gic_version
Andre Przywara [Wed, 21 Mar 2018 13:38:21 +0000 (13:38 +0000)]
ARM: GIC: add GIC_INVALID to enum gic_version

The enum gic_version at the moment just contains GIC_V2 and GIC_V3,
where GIC_V2 happens to map to 0. So without having initialised a
variable of that type, we will read back GIC_V2 (when allocated with zeroing
the memory).
To prevent ambiguities and to give an explicitly uninitialised state, add
a new first member: GIC_INVALID. Also make it obvious that this has a
"0" encoding.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agovpci/msix: fix incorrect usage of bitmask
Roger Pau Monné [Mon, 26 Mar 2018 13:17:12 +0000 (15:17 +0200)]
vpci/msix: fix incorrect usage of bitmask

The bitmask to clear the low bits of the address field should be
~0xffffffffull, the current mask clears both the low and the high bits
of the address field, which is a bug.

Reported-by: Coverity
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agovpci/bars: fix error message
Roger Pau Monné [Mon, 26 Mar 2018 13:16:14 +0000 (15:16 +0200)]
vpci/bars: fix error message

Error message is incorrectly using map when it should be using
map->map instead.

Coverity ID: 1430811

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/libxc: fix usage of XEN_X86_EMU_ALL after VPCI addition
Roger Pau Monne [Fri, 23 Mar 2018 10:57:56 +0000 (10:57 +0000)]
x86/libxc: fix usage of XEN_X86_EMU_ALL after VPCI addition

HVM guest should be created with (XEN_X86_EMU_ALL &
~XEN_X86_EMU_VPCI). This is not an issue for xl/libxl because it
already sets the correct emulation flags and doesn't pass a NULL
xc_domain_configuration_t to xc_domain_create.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/xenstore: fix linking libxenstore with ldl
Juergen Gross [Fri, 23 Mar 2018 07:42:53 +0000 (08:42 +0100)]
tools/xenstore: fix linking libxenstore with ldl

Commit 448c03b3cbe1487 ("tools/xenstore: try to get minimum thread
stack size for watch thread") added a dependency to libdl to
libxenstore. Unfortunately the way it was added requires now all
users of libxenstore to specify "-ldl" when linking. This can be
avoided by linking libxenstore.so specifying "-ldl" as a trailing
option. So use APPEND_LDFLAGS instead of LDFLAGS for adding the
"-ldl" option when linking libxenstore.so.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Tested-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agovpci: do not expose unneeded functions to the user-space test harness
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci: do not expose unneeded functions to the user-space test harness

Some functions in vpci.c (vpci_remove_device and vpci_add_handlers)
are not used by the user-space test harness, so guard them with
__XEN__ in order to avoid exposing them to the user-space test
harness.

Requested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agovpci/msix: add MSI-X handlers
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci/msix: add MSI-X handlers

Add handlers for accesses to the MSI-X message control field on the
PCI configuration space, and traps for accesses to the memory region
that contains the MSI-X table and PBA. This traps detect attempts from
the guest to configure MSI-X interrupts and properly sets them up.

Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
BIR are not trapped by Xen at the moment.

Finally, turn the panic in the Dom0 PVH builder into a warning.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[IO]
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agovpci: add a priority parameter to the vPCI register initializer
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci: add a priority parameter to the vPCI register initializer

This is needed for MSI-X, since MSI-X will need to be initialized
before parsing the BARs, so that the header BAR handlers are aware of
the MSI-X related holes and make sure they are not mapped in order for
the trap handlers to work properly.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[ARM]
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agovpci/msi: add MSI handlers
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci/msi: add MSI handlers

Add handlers for the MSI control, address, data and mask fields in
order to detect accesses to them and setup the interrupts as requested
by the guest.

Note that the pending register is not trapped, and the guest can
freely read/write to it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[IO]
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agox86/pt: mask MSI vectors on unbind
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
x86/pt: mask MSI vectors on unbind

When a MSI device with per-vector masking capabilities is detected or
added to Xen all the vectors are masked when initializing it. This
implies that the first time the interrupt is bound to a domain it's
masked.

This however only applies to the first time the interrupt is bound
because neither the unbind nor the pirq unmap will mask the vector
again. In order to fix this re-mask the interrupt when unbinding it
from a guest. This makes sure that pairs of bind/unbind will always
get the same masking state.

Note that no issues have been reported regarding this behavior because
QEMU always uses the newly introduced XEN_PT_GFLAGSSHIFT_UNMASKED when
binding interrupts, so it's always unmasked.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agovpci: add header handlers
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci: add header handlers

Introduce a set of handlers that trap accesses to the PCI BARs and the
command register, in order to snoop BAR sizing and BAR relocation.

The command handler is used to detect changes to bit 2 (response to
memory space accesses), and maps/unmaps the BARs of the device into
the guest p2m. A rangeset is used in order to figure out which memory
to map/unmap. This makes it easier to keep track of the possible
overlaps with other BARs, and will also simplify MSI-X support, where
certain regions of a BAR might be used for the MSI-X table or PBA.

The BAR register handlers are used to detect attempts by the guest to
size or relocate the BARs.

Note that the long running BAR mapping and unmapping operations are
deferred to be performed by hvm_io_pending, so that they can be safely
preempted.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[IO]
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agoxen: introduce rangeset_consume_ranges
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
xen: introduce rangeset_consume_ranges

This function allows to iterate over a rangeset while removing the
processed regions.

This will be used in order to split processing of large memory areas
when mapping them into the guest p2m.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agopci: add support to size ROM BARs to pci_size_mem_bar
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
pci: add support to size ROM BARs to pci_size_mem_bar

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agopci: split code to size BARs from pci_add_device
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
pci: split code to size BARs from pci_add_device

So that it can be called from outside in order to get the size of regular PCI
BARs. This will be required in order to map the BARs from PCI devices into PVH
Dom0 p2m.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
x86/physdev: enable PHYSDEVOP_pci_mmcfg_reserved for PVH Dom0

So that MMCFG regions not present in the MCFG ACPI table can be added
at run time by the hardware domain.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agox86/mmcfg: add handlers for the PVH Dom0 MMCFG areas
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas

Introduce a set of handlers for the accesses to the MMCFG areas. Those
areas are setup based on the contents of the hardware MMCFG tables,
and the list of handled MMCFG areas is stored inside of the hvm_domain
struct.

The read/writes are forwarded to the generic vpci handlers once the
address is decoded in order to obtain the device and register the
guest is trying to access.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agovpci: introduce basic handlers to trap accesses to the PCI config space
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
vpci: introduce basic handlers to trap accesses to the PCI config space

This functionality is going to reside in vpci.c (and the corresponding
vpci.h header), and should be arch-agnostic. The handlers introduced
in this patch setup the basic functionality required in order to trap
accesses to the PCI config space, and allow decoding the address and
finding the corresponding handler that should handle the access
(although no handlers are implemented).

Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
setup inside of a x86 HVM file, since that's not shared with other
arches.

A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
whether a domain should use the newly introduced vPCI handlers, this
is only enabled for PVH Dom0 at the moment.

A very simple user-space test is also provided, so that the basic
functionality of the vPCI traps can be asserted. This has been proven
quite helpful during development, since the logic to handle partial
accesses or accesses that expand across multiple registers is not
trivial.

The handlers for the registers are added to a linked list that's keep
sorted at all times. Both the read and write handlers support accesses
that expand across multiple emulated registers and contain gaps not
emulated.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[IO parts]
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
[ARM]
Acked-by: Julien Grall <julien.grall@arm.com>
[Tools]
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86emul: fix #XM delivery typo
Jan Beulich [Thu, 22 Mar 2018 17:02:19 +0000 (18:02 +0100)]
x86emul: fix #XM delivery typo

This clearly wasn't meant the way it was originally written.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/shadow: fold sh_x86_emulate_{write,cmpxchg}() into their only callers
Jan Beulich [Thu, 22 Mar 2018 09:43:54 +0000 (10:43 +0100)]
x86/shadow: fold sh_x86_emulate_{write,cmpxchg}() into their only callers

The functions have a single caller only and are now guest paging type
independent (except for the tracing part), so have no need to exist as
standalone ones, let alone multiple times. Replace the two prior hooks
with just a single one for dealing with tracing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agox86/shadow: fully move unmap-dest into common code
Jan Beulich [Thu, 22 Mar 2018 09:43:21 +0000 (10:43 +0100)]
x86/shadow: fully move unmap-dest into common code

By adding guest PTE size to shadow emulation context, the work begun by
commit 2c80710a78 ("x86/shadow: compile most write emulation code just
once") can be completed, paving the road for further movement into
common code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agox86/HVM: use x86emul_write_xcr()
Jan Beulich [Thu, 22 Mar 2018 09:42:31 +0000 (10:42 +0100)]
x86/HVM: use x86emul_write_xcr()

...  instead of directly calling handle_xsetbv(), to make use of the
additional checking there.

Also don't call hvm_monitor_crX(XCR0, ...) for indexes other than zero
anymore.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/HVM: make use of new read-modify-write emulator hook
Jan Beulich [Thu, 22 Mar 2018 09:41:37 +0000 (10:41 +0100)]
x86/HVM: make use of new read-modify-write emulator hook

..., at least as far as currently possible, i.e. when a mapping can be
obtained.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/HVM: do actual CMPXCHG in hvmemul_cmpxchg()
Jan Beulich [Thu, 22 Mar 2018 09:41:02 +0000 (10:41 +0100)]
x86/HVM: do actual CMPXCHG in hvmemul_cmpxchg()

..., at least as far as currently possible, i.e. when a mapping can be
obtained.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: also handle shifts through ->rmw()
Jan Beulich [Thu, 22 Mar 2018 09:40:24 +0000 (10:40 +0100)]
x86emul: also handle shifts through ->rmw()

These don't allow LOCK, but still are read-modify-write operations, so
are better handled that way too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: add read-modify-write hook
Jan Beulich [Thu, 22 Mar 2018 09:39:32 +0000 (10:39 +0100)]
x86emul: add read-modify-write hook

In order to correctly emulate read-modify-write insns, especially
LOCKed ones, we should not issue reads and writes separately. Use a
new hook to combine both, and don't uniformly read the memory
destination anymore. Instead, DstMem opcodes without Mov now need to
have done so in their respective case blocks.

Also strip bogus _ prefixes from macro parameters when this only affects
lines which are being changed anyway.

In the test harness, besides some re-ordering to facilitate running a
few tests twice (one without and a second time with the .rmw hook in
place), tighten a few EFLAGS checks and add a test for NOT with memory
operand (in particular to verify EFLAGS don't get altered there).

For now make use of the hook optional for callers; eventually we may
want to consider making this mandatory.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: correctly handle CMPXCHG* comparison failures
Jan Beulich [Thu, 22 Mar 2018 09:38:39 +0000 (10:38 +0100)]
x86emul: correctly handle CMPXCHG* comparison failures

If the ->cmpxchg() hook finds a mismatch, we should deal with this the
same way as when the "manual" comparison reports a mismatch.

This involves reverting bfce0e62c3 ("x86/emul: Drop
X86EMUL_CMPXCHG_FAILED"), albeit with X86EMUL_CMPXCHG_FAILED now
becoming a value distinct from X86EMUL_RETRY.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agox86emul: tell cmpxchg hook whether LOCK is in effect
Jan Beulich [Thu, 22 Mar 2018 09:38:02 +0000 (10:38 +0100)]
x86emul: tell cmpxchg hook whether LOCK is in effect

This is necessary for the hook to correctly perform the operation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
7 years agox86/HVM: eliminate custom #MF/#XM handling
Jan Beulich [Thu, 22 Mar 2018 09:37:26 +0000 (10:37 +0100)]
x86/HVM: eliminate custom #MF/#XM handling

Use the generic stub exception handling instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: adjust_bnd() should check XCR0
Jan Beulich [Thu, 22 Mar 2018 09:36:55 +0000 (10:36 +0100)]
x86emul: adjust_bnd() should check XCR0

Experimentally MPX instructions have been confirmed to behave as NOPs
unless both related XCR0 bits are set to 1. By implication branches
then also don't clear BNDn.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: abstract out XCRn accesses
Jan Beulich [Thu, 22 Mar 2018 09:35:50 +0000 (10:35 +0100)]
x86emul: abstract out XCRn accesses

Use hooks, just like done for other special purpose registers.

This includes moving XCR0 checks from hvmemul_get_fpu() to the emulator
itself as well as adding support for XGETBV emulation.

For now fuzzer reads will obtain the real values (minus the fuzzing of
the hook pointer itself).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com> [tracing parts]
7 years agoci: add new bits to MAINTAINERS combine with Travis
Doug Goldstein [Thu, 15 Mar 2018 15:54:04 +0000 (10:54 -0500)]
ci: add new bits to MAINTAINERS combine with Travis

Created a new section just called 'CI' since this is adding GitLab CI
and still leaving the old Travis CI files around. This consolidates the
two sections and adds the new files as well as adding another Travis
file that was missing.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoci: use GitLab CI to build
Doug Goldstein [Sun, 11 Mar 2018 06:08:50 +0000 (00:08 -0600)]
ci: use GitLab CI to build

Added a GitLab CI config which has a lot more flexibility to allow us to
test a lot more distro configurations than Travis can and even build
test on FreeBSD. This includes a modified copy of scripts/travis-build
that is expected to diverge future over time as we build more than what
Travis is currently building.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoci: add Dockerfile for Debian stretch
Doug Goldstein [Wed, 14 Mar 2018 16:23:31 +0000 (11:23 -0500)]
ci: add Dockerfile for Debian stretch

Added a Dockerfile which captures all the necessary dependencies to
build Xen on a Debian stretch system.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoci: add Dockerfile for Debian jessie
Doug Goldstein [Tue, 13 Mar 2018 02:32:27 +0000 (21:32 -0500)]
ci: add Dockerfile for Debian jessie

Added a Dockerfile which captures all the necessary dependencies to
build Xen on a Debian jessie system.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoci: add Dockerfile for Ubuntu 16.04
Doug Goldstein [Mon, 12 Mar 2018 17:45:00 +0000 (12:45 -0500)]
ci: add Dockerfile for Ubuntu 16.04

Added a Dockerfile which captures all the necessary dependencies to
build Xen on a Ubuntu 16.04 system.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoci: add Dockerfile for Ubuntu 14.04
Doug Goldstein [Mon, 12 Mar 2018 17:41:33 +0000 (12:41 -0500)]
ci: add Dockerfile for Ubuntu 14.04

Added a Dockerfile which captures all the necessary dependencies to
build Xen on a Ubuntu 14.04 system.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoci: add Dockerfile for CentOS 7.2
Doug Goldstein [Mon, 12 Mar 2018 17:40:45 +0000 (12:40 -0500)]
ci: add Dockerfile for CentOS 7.2

Added a Dockerfile which captures all the necessary dependencies to
build Xen on a CentOS 7.2 system.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoci: add README and makefile for containers
Doug Goldstein [Tue, 13 Mar 2018 03:15:07 +0000 (22:15 -0500)]
ci: add README and makefile for containers

Add a basic README explaining the containers and how people can use them
to locally test with if they see an error in CI and want to reproduce it
locally. Added a makefile to help with building and pushing the
containers to the container registry.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxenbaked.c: Avoid divide by zero issue
Joe Jin [Wed, 14 Mar 2018 17:14:03 +0000 (10:14 -0700)]
xenbaked.c: Avoid divide by zero issue

xenbaked.c -> dump_stats(), run_time = time(&end_time) - time(&start_time),
time() returns the value in seconds. If one cancels xenmon.py immediately
after started, run_time can be zero, and then xenbaked will hit divide by
zero fault.

Signed-off-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/hvm: add stricter permissions checks to ioreq server control plane
Paul Durrant [Tue, 20 Mar 2018 18:05:25 +0000 (18:05 +0000)]
x86/hvm: add stricter permissions checks to ioreq server control plane

There has always been an intention in the ioreq server API that only the
domain that creates an ioreq server should be able to manipulate it.
However, so far, nothing has enforced this. This means that two domains
with DM_PRIV over a target domain can currently manipulate each others
ioreq servers.

A previous patch added code to take a reference and store a pointer to the
domain that creates an ioreq server. This patch now adds checks to the
functions that manipulate the ioreq server to make sure they are being
called by the same domain.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/hvm: re-structure some of the ioreq server look-up loops
Paul Durrant [Tue, 20 Mar 2018 18:05:24 +0000 (18:05 +0000)]
x86/hvm: re-structure some of the ioreq server look-up loops

This patch is a cosmetic re-structuring of some of the loops with look up
an ioreq server based on target domain and server id.

The restructuring is done separately here to ease review of a subsquent
patch.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/hvm: take a reference on ioreq server emulating domain
Paul Durrant [Tue, 20 Mar 2018 18:05:23 +0000 (18:05 +0000)]
x86/hvm: take a reference on ioreq server emulating domain

When an ioreq server is created the code currently stores the id
of the emulating domain, but does not take a reference on that domain.

This patch modifies the code to hold a reference for the lifetime of the
ioreq server.

NOTE: ioreq servers are either destroyed explicitly or destroyed implicitly
      in context of XEN_DOMCTL_destroydomain.
      If the emulating domain is shut down prior to the target then the
      any domain reference held by an ioreq server will prevent it from
      being destroyed. However, if an emulating domain is shut down prior
      to its target then it is likely that the target's vcpus will block
      fairly quickly waiting for emulation that will never occur, and when
      the target domain is destroyed the reference on the zombie emulating
      domain will be dropped allowing both to be cleaned up.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/hvm: stop passing explicit domid to hvm_create_ioreq_server()
Paul Durrant [Tue, 20 Mar 2018 18:05:22 +0000 (18:05 +0000)]
x86/hvm: stop passing explicit domid to hvm_create_ioreq_server()

Only in the legacy 'default server' case do we pass anything other than
current->domain->domain_id, and in that case we pass the value of
HVM_PARAM_DM_DOMAIN.

The only known user of HVM_PARAM_DM_DOMAIN is qemu-trad (and only when
compiled as a stubdom), which always sets it to DOMID_SELF (ignoring the
return value of xc_set_hvm_param) [1] and never reads it.

This patch:

- Disallows setting HVM_PARAM_DM_DOMAIN to anything other than DOMID_SELF
  and removes the call to hvm_set_dm_domain().
- Stops passing a domid to hvm_create_ioreq_server()
- Changes hvm_create_ioreq_server() to always set
  current->domain->domain_id as the domid of the emulating domain
- Removes the hvm_set_dm_domain() implementation since it is no longer
  needed.

[1] http://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob;f=hw/xen_machine_fv.c;#l299

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen: sched: simplify (and speedup) checking soft-affinity
Dario Faggioli [Wed, 21 Mar 2018 17:17:47 +0000 (17:17 +0000)]
xen: sched: simplify (and speedup) checking soft-affinity

The fact of whether or not a vCPU has a soft-affinity
which is effective, i.e., with the power of actually
affecting the scheduling of the vCPU itself rarely
changes. Very, very rarely, as compared to how often
we need to check for the same thing (basically, at
every scheduling decision!).

That can be improved by storing in a (per-vCPU) flag
(it's actually a boolean field in struct vcpu) whether
or not, considering how hard-affinity and soft-affinity
look like, soft-affinity should or not be taken into
account during scheduling decisions.

This saves some cpumask manipulations, which is nice,
considering how frequently they were being done. Note
that we can't get rid of 100% of the cpumask operations
involved in the check, because soft-affinity being
effective or not, not only depends on the relationship
between the hard and soft-affinity masks of a vCPU, but
also of the online pCPUs and/or of what pCPUs are part
of the cpupool where the vCPU lives, and that's rather
impractical to store in a per-vCPU flag. Still the
overhead is reduced to "just" one cpumask_subset() (and
only if the newly introduced flag is 'true')!

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen: sched: improve checking soft-affinity
Dario Faggioli [Wed, 21 Mar 2018 17:17:46 +0000 (17:17 +0000)]
xen: sched: improve checking soft-affinity

The function has_soft_affinity() determines whether the soft-affinity
of a vcpu will have any effect -- that is, whether the affinity will
have any difference, scheduling-wise, from an empty soft-affinity
mask.

Such function takes a custom cpumask as its third parameter for better
flexibility; but that mask is different from the vCPU's hard-affinity
only in one case. Getting rid of that parameter not only simplifies
the function, but enables optimizing the soft affinity check.

It's mostly mechanical, with the exception of
sched_credit.c:_cshed_cpu_pick(), which was the one case where we
passed in something other than the existing hard-affinity.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen: sched: optimize exclusive pinning case (Credit1 & 2)
Dario Faggioli [Wed, 21 Mar 2018 17:17:45 +0000 (17:17 +0000)]
xen: sched: optimize exclusive pinning case (Credit1 & 2)

Exclusive pinning of vCPUs is used, sometimes, for
achieving the highest level of determinism, and the
least possible overhead, for the vCPUs in question.

Although static 1:1 pinning is not recommended, for
general use cases, optimizing the tickling code (of
Credit1 and Credit2) is easy and cheap enough, so go
for it.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen: sched: introduce 'adjust_affinity' hook.
Dario Faggioli [Wed, 21 Mar 2018 17:17:44 +0000 (17:17 +0000)]
xen: sched: introduce 'adjust_affinity' hook.

For now, just as a way to give a scheduler an "heads up",
about the fact that the affinity changed.

This enables some optimizations, such as pre-computing
and storing (e.g., in flags) facts like a vcpu being
exclusively pinned to a pcpu, or having or not a
soft affinity. I.e., conditions that, despite the fact
that they rarely change, are right now checked very
frequently, even in hot paths.

Note that, as we expect many scheduler specific
implementations of the adjust_affinity hook to do
something with the per-scheduler vCPU private data,
this commit moves the calls to sched_set_affinity()
after that is allocated (in sched_init_vcpu()).

Note also that this, in future, may turn out as a useful
mean for, e.g., having the schedulers vet, ack or nack
the changes themselves.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen/arm: gic: Read unconditionally the source from the LRs
Julien Grall [Wed, 21 Mar 2018 03:34:35 +0000 (03:34 +0000)]
xen/arm: gic: Read unconditionally the source from the LRs

Commit 5cb00d1 "ARM: GIC: extend LR read/write functions to cover EOI
and source" extended gic_lr to cover the source. The new field was only
set for SGIs interrupt in the read function. However, the write function
is writing the field unconditionally for virtual interrupt.

This means that if the caller was combining the 2 functions (e.g to
update the LR), the source need to be set to 0 by the caller.
Unfortunately, gic_update_one_lr is not zeroing the structure before
reading the LRs. This will lead to trigger the assert randomly.

Instead of zeroing the structure in gic_update_one_lr, make sure that
the source is written unconditionally on read. This is also simplifying
the code to avoid an if statement in the read path.

Lastly, properly update the comments in write_lr that was mistakenly
speaking about the read lr path.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/libxc: suppress direct access to Credit1's migration delay
Dario Faggioli [Thu, 15 Mar 2018 17:51:46 +0000 (18:51 +0100)]
xen/libxc: suppress direct access to Credit1's migration delay

Removes special purpose access to Credit1 vCPU
migration delay parameter.

This fixes a build breakage, occuring when Xen
is configured with SCHED_CREDIT=n.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agotools: xenpm: continue to support {set, get}-vcpu-migration-delay
Dario Faggioli [Thu, 15 Mar 2018 17:51:38 +0000 (18:51 +0100)]
tools: xenpm: continue to support {set, get}-vcpu-migration-delay

Now that it is possible to get and set the migration
delay via the SCHEDOP sysctl, use that in xenpm, instead
of the special purpose libxc interface (which will be
removed in a following commit).

The sysctl, however, requires a cpupool-id argument,
for knowing on which scheduler it is operating on. In
this case, since we don't want to alter xenpm's command
line interface, we always use '0', which means xenpm
will always act on the default cpupool ('Pool-0').

>From this commit on, `xenpm {set,get}-vcpu-migration-delay'
commands work again. But that is only for the sake of
backward compatibility, and their use is deprecated, in
favour of 'xl sched-credit -s [-c <poolid>] -m <delay>'.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools: libxl/xl: allow to get/set Credit1's vcpu_migration_delay
Dario Faggioli [Thu, 15 Mar 2018 17:51:30 +0000 (18:51 +0100)]
tools: libxl/xl: allow to get/set Credit1's vcpu_migration_delay

Make it possible to get and set a (Credit1) scheduler's
vCPU migration delay via the SCHEDOP sysctl, from both
libxl and xl (no change needed in libxc).

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen: sched/credit1: make vcpu_migration_delay per-cpupool
Dario Faggioli [Thu, 15 Mar 2018 17:51:23 +0000 (18:51 +0100)]
xen: sched/credit1: make vcpu_migration_delay per-cpupool

Right now, vCPU migration delay is controlled by
the vcpu_migration_delay boot parameter. This means
the same value will always be used for every instance
of Credit1, in any cpupool that will be created.

Also, in order to get and set such value, a special
purpose libxc interface is defined, and used by the
xenpm tool. And this is problematic if Xen is built
without Credit1 support.

This commit adds a vcpu_migr_delay field inside
struct csched_private, so that we can get/set the
migration delay indepently for each Credit1 instance,
in different cpupools.

Getting and setting now happens via XEN_SYSCTL_SCHEDOP_*,
which is much better suited for this parameter.

The value of the boot time parameter is used for
initializing the vcpu_migr_delay field of the private
structure of all the scheduler instances, when they're
created.

While there, save reading NOW() and doing any s_time_t
operation, when the migration delay of a scheduler is
zero (as it is, by default), in
__csched_vcpu_is_cache_hot().

Finally, note that, from this commit on, using `xenpm
{set,get}-vcpu-migration-delay' will have no effect
any longer. A subsequent commit will re-enable it, for
the sake of backwards-compatibility.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen/tools: support Python 2 and Python 3
Doug Goldstein [Wed, 28 Feb 2018 19:18:44 +0000 (13:18 -0600)]
xen/tools: support Python 2 and Python 3

These changes should make it possible to support modern Pythons as well
as the oldest Python 2 still supported.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoREADME: require Python 2.4 or newer
Doug Goldstein [Wed, 28 Feb 2018 19:18:43 +0000 (13:18 -0600)]
README: require Python 2.4 or newer

Increase the minimum required Python to 2.4 or newer.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agofix null sched build with clang and debug=n
Doug Goldstein [Tue, 20 Mar 2018 10:23:29 +0000 (11:23 +0100)]
fix null sched build with clang and debug=n

The null_dom() static inline is just used when debug=y so with clang it
results in an error with the default CFLAGS and debug=n. This function
is used in only one place and it a one line helper so remove it until we
actually need it.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
7 years agox86/mwait-idle: add Gemini Lake support
David E. Box [Tue, 20 Mar 2018 10:21:58 +0000 (11:21 +0100)]
x86/mwait-idle: add Gemini Lake support

Gemini Lake uses the same C-states as Broxton and also uses the
IRTL MSR's to determine maximum C-state latency.

Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit 1b2e87687d3f951a66900cab6f1583d94099d2f7]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoARM: GIC: extend LR read/write functions to cover EOI and source
Julien Grall [Thu, 15 Mar 2018 20:30:13 +0000 (20:30 +0000)]
ARM: GIC: extend LR read/write functions to cover EOI and source

So far our LR read/write functions do not handle the EOI bit and the
source CPUID bits in an LR, because the current VGIC implementation does
not use them.
Extend the gic_lr data structure to hold these bits of information by
using a union to differentiate field used depending on whether the vIRQ
has a corresponding pIRQ.

This allows the new VGIC to use this information.

This is based on the original patch sent by Andre Przywara [1].

[1] https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg00435.html

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: GIC: Only set pirq in the LR when hw_status is set
Julien Grall [Thu, 15 Mar 2018 20:30:12 +0000 (20:30 +0000)]
xen/arm: GIC: Only set pirq in the LR when hw_status is set

The field pirq should only be valid when the virtual interrupt
is associated to a physical interrupt.

This change will help to extend gic_lr for supporting specific virtual
interrupt field (e.g eoi, source) that clashes with the PIRQ field.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: gic: Split the field state in gic_lr in 2 fields active and pending
Julien Grall [Thu, 15 Mar 2018 20:30:11 +0000 (20:30 +0000)]
xen/arm: gic: Split the field state in gic_lr in 2 fields active and pending

Mostly making the code nicer to read.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: gic: Use bool instead of uint8_t for the hw_status in gic_lr
Julien Grall [Thu, 15 Mar 2018 20:30:10 +0000 (20:30 +0000)]
xen/arm: gic: Use bool instead of uint8_t for the hw_status in gic_lr

hw_status can only be 1 or 0. So convert to a bool.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: vgic: Override the group in lr everytime
Julien Grall [Thu, 15 Mar 2018 20:30:09 +0000 (20:30 +0000)]
xen/arm: vgic: Override the group in lr everytime

At the moment, write_lr is assuming the caller will set correctly the
group. However the group should always be 0 when the guest is using
vGICv2 and 1 for vGICv3. As the caller should not care about the group,
override it directly.

With that change, write_lr is now behaving like update_lr for the group.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: gic: Fix indentation in gic_update_one_lr
Julien Grall [Thu, 15 Mar 2018 20:30:08 +0000 (20:30 +0000)]
xen/arm: gic: Fix indentation in gic_update_one_lr

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
7 years agoARM: Implement vcpu_kick()
Andre Przywara [Thu, 15 Mar 2018 20:30:07 +0000 (20:30 +0000)]
ARM: Implement vcpu_kick()

If we change something in a vCPU that affects its runnability or
otherwise needs the vCPU's attention, we might need to tell the scheduler
about it.
We are using this in one place (vIRQ injection) at the moment, but will
need this at more places soon.
So let's factor out this functionality, using the already existing
vcpu_kick() prototype (used in x86 only so far), to make this available
to the rest of the Xen code.
Also adjust the perfcounter name to reflect the new usage.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: VGIC: rename gic_event_needs_delivery()
Andre Przywara [Thu, 15 Mar 2018 20:30:06 +0000 (20:30 +0000)]
ARM: VGIC: rename gic_event_needs_delivery()

gic_event_needs_delivery() is not named very intuitively, especially
the gic_ prefix is somewhat misleading.
Rename it to vgic_vcpu_pending_irq(), which makes it clear that this
relates to the virtual GIC and is about interrupts.
Also add a VCPU parameter, which makes the code more flexible in the
future. The current VGIC expect this to be the current VCPU, so add
an assert to spot any regressions.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoarm/boot: Mark construct_dom0() as __init
Andrew Cooper [Mon, 19 Mar 2018 19:13:44 +0000 (19:13 +0000)]
arm/boot: Mark construct_dom0() as __init

Its sole caller, start_xen(), is __init.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: Fix platform name to xilinx_zynqmp from xgene_storm
Amit Singh Tomar [Sun, 18 Mar 2018 09:20:26 +0000 (14:50 +0530)]
xen/arm: Fix platform name to xilinx_zynqmp from xgene_storm

Signed-off-by: Amit Singh Tomar <amittomer25@gmail.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoxen/arm: p2m: Prevent deadlock when using memaccess
Julien Grall [Mon, 12 Mar 2018 15:34:52 +0000 (15:34 +0000)]
xen/arm: p2m: Prevent deadlock when using memaccess

Commit 7d623b358a4 "arm/mem_access: Add long-descriptor based gpt"
assumed the read-write lock can be taken recursively. However, this
assumption is wrong and will lead to deadlock when the lock is
contended.

The read lock is taken recursively in the following case:
    1) get_page_from_gva
        => Take the read lock (first read lock)
        => Call p2m_mem_access_check_and_get_page on failure when
        memaccess is enabled
    2) p2m_mem_access_check_and_get_page
        => If hardware translation failed fallback to software lookup
        => Call guest_walk_tables
    3) guest_walk_tables
        => Will use access_guest_memory_by_ipa to access stage-1 page-table
    4) access_guest_memory_by_ipa
        => Because Arm does not have hardware instruction to only do
        stage-2 page-table, this is done in software.
        => Take the read lock (second read lock)

To avoid the nested lock, rework the locking in get_page_from_gva and
p2m_mem_access_check_and_get_page. The latter will now be called without
the p2m lock. The new locking in p2m_mem_accces_check_and_get_page will
not cover the translation of the VA to an IPA.

This is fine because we can't promise that the stage-1 page-table have
changed behind our back (they are under guest control). Modification in
the stage-2 page-table can now happen, but I can't issue any potential
issue here except with the break-before-make sequence used when updating
page-table. gva_to_ipa may fail if the sequence is executed at the same
on another CPU. In that case we would fallback in the software lookup
path.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Sergej Proskurin <proskurin@sec.in.tum.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery
Julien Grall [Mon, 12 Mar 2018 13:19:35 +0000 (13:19 +0000)]
xen/arm: Relax ARM_SMCCC_ARCH_WORKAROUND_1 discovery

A recent update to the ARM SMCCC_ARCH_WORKAROUND_1 specification (see [1])
allows firmware to return a non zero, positive value, to describe that
although the mitigation is implemented at the higher exception level,
the CPU on which the call is made is not affected.

Relax the check on the return value from ARM_WORKAROUND_1 so that we
only error out if the returned value is negative.

[1] https://developer.arm.com/support/security-update/downloads
"Firmware interfaces for mitigating CVE-2017-5715 System Software on Arm
Systems"

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Restrict when a physical IRQ can be routed/removed from/to a domain
Julien Grall [Thu, 8 Mar 2018 15:24:04 +0000 (15:24 +0000)]
xen/arm: Restrict when a physical IRQ can be routed/removed from/to a domain

Xen is currently allowing to route/remove an interrupt from/to the
domain while it is running.

However, we never sync the virtual interrupt state to the physical
interrupt. This could lead to undesirable effect on the vGIC emulation
and potentially the hardware.

One solution would be to sync the interrupt state when routing, but I am
not sure it is worth the effort as you never really when it is safe to
route/remove the interrupt when a domain is running.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86: correct EFLAGS.IF in SYSENTER frame
Jan Beulich [Fri, 16 Mar 2018 16:27:36 +0000 (17:27 +0100)]
x86: correct EFLAGS.IF in SYSENTER frame

Commit 9d1d31ad94 ("x86: slightly reduce Meltdown band-aid overhead")
moved the STI past the PUSHF. While this isn't an active problem (as we
force EFLAGS.IF to 1 before exiting to guest context), let's not risk
internal confusion by finding a PV guest frame with interrupts
apparently off.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/mm: Clean up share_xen_page_with_guest() API
Andrew Cooper [Thu, 8 Mar 2018 19:24:58 +0000 (19:24 +0000)]
xen/mm: Clean up share_xen_page_with_guest() API

The share_xen_page_with_guest() functions are used by common code, and are
implemented the same by each arch.  Move the declarations into the common mm.h
rather than duplicating them in each arch/mm.h

Turn an int readonly into a boolean enum, to retain ro/rw context at the
callsites, but use shorter labels which avoids a large number of split lines.

Implement share_xen_page_with_privileged_guests() as a static inline wrapper
around share_xen_page_with_guest() to avoid having a call into a separate
translation unit whose only purpose is to shuffle function arguments.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/domain: Pass the full domctl_createdomain struct to create_domain()
Andrew Cooper [Thu, 8 Mar 2018 12:39:36 +0000 (12:39 +0000)]
xen/domain: Pass the full domctl_createdomain struct to create_domain()

In future patches, the structure will be extended with further information,
and this is far cleaner than adding extra parameters.

One minor tweak is that the setting of guest_type needs to be deferred until
config is known-good to dereference, but this doesn't result in any changed
behaviour as system domains never used to pass XEN_DOMCTL_CDF_hvm_guest.

Also for completeness, move the setting of d->handle into the tail of
domain_create() where it more logically should live.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/domain: Optimise the order of actions in arch_domain_create()
Andrew Cooper [Thu, 8 Mar 2018 17:25:29 +0000 (17:25 +0000)]
x86/domain: Optimise the order of actions in arch_domain_create()

The only relevent initialisation for the idle domain is the context switch and
poisoned pointers.  Collect these bits together early in the function and exit
when complete (although as a consequence, the e820 and vtsc lock
initialisation are moved forwards).  This allows us to remove subsequent
is_idle_domain() checks and unindent most of the logic.

Furthermore, we no longer call these functions for the idle domain:
 * mapcache_domain_init() and tsc_set_info() were previously guarded against
   the idle domain, and have had their guards turned into ASSERT()s.
 * pit_init() is implicitly guarded by has_vpit().
 * psr_domain_init() no longer allocates a socket array.

Finally, two changes are introduced for the benefit of the following patch:
 * For PV hardware domains, or XEN_X86_EMU_PIT into emflags rather than into
   config->emulation_flags, to facilitating config becoming const.
 * References to domcr_flags are moved until after the idle early exist, to
   facilitiate them being unavailable for system domains.

No practical change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/domain: Remove unused parameters from {hvm,pv}_domain_initialise()
Andrew Cooper [Thu, 8 Mar 2018 13:58:41 +0000 (13:58 +0000)]
x86/domain: Remove unused parameters from {hvm,pv}_domain_initialise()

Neither domcr_flags nor config are used on either side.  Drop them, making
{hvm,pv}_domain_initialise() symmetric with all the other domain/vcpu
initialise/destroy calls.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agoxen/domain: Drop all DOMCRF_* constants
Andrew Cooper [Thu, 8 Mar 2018 11:31:47 +0000 (11:31 +0000)]
xen/domain: Drop all DOMCRF_* constants

With DOMCRF_dummy removed, all remaining DOMCRF_* identically match their
DOMCTL counterparts.  Avoid having a conversion between two different bit
layouts, and use the DOMCTL_CDF_* constants everywhere.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/domain: Drop DOMCRF_dummy
Andrew Cooper [Thu, 8 Mar 2018 11:03:17 +0000 (11:03 +0000)]
xen/domain: Drop DOMCRF_dummy

At the moment, there is a tight coupling between the domid and the use of
DOMCRF_dummy.  Instead of using DOMCRF_dummy, base the one relevant decision
on domid alone.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
7 years agoSUPPORT.md: Multiple IOREQ servers are experimental
George Dunlap [Wed, 14 Mar 2018 11:05:47 +0000 (11:05 +0000)]
SUPPORT.md: Multiple IOREQ servers are experimental

The code has been there in the hypervisor for several releases, but
there is no toolstack support.

While we're here delete some trailing whitespace.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: Implement enable_nmis() in C
Andrew Cooper [Thu, 15 Mar 2018 16:15:45 +0000 (16:15 +0000)]
xen/x86: Implement enable_nmis() in C

I don't recall why I chose to implement this in assembly to begin with, but
it can happily live in a static inline instead, and only has two callers.

Doing so reduces the quantity of code in .text.entry.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agotools/libacpi: Drop useless print messages
Andrew Cooper [Thu, 15 Mar 2018 11:56:40 +0000 (11:56 +0000)]
tools/libacpi: Drop useless print messages

Libraries have no buisness using stdout directly, and these have no real
value.  Dropping them removes the following output when building a PVH guest:

  [root@fusebot ~]# xl create shim.cfg
  Parsing config from shim.cfg
  S3 disabled
  S4 disabled
  CONV disabled
  [root@fusebot ~]#

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86emul: place test blobs in executable section
Jan Beulich [Thu, 15 Mar 2018 16:01:33 +0000 (17:01 +0100)]
x86emul: place test blobs in executable section

This allows the section contents to be disassembled without going
through any extra hoops, simplifying the analysis of problems in test
and/or emulation code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: support 3DNow! insns
Jan Beulich [Thu, 15 Mar 2018 16:00:56 +0000 (17:00 +0100)]
x86emul: support 3DNow! insns

Yes, recent AMD CPUs don't support them anymore, but I think we should
nevertheless cope.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR
Liran Alon [Thu, 15 Mar 2018 15:59:52 +0000 (16:59 +0100)]
x86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR

According to Intel SDM section "Interrupt Acceptance for Fixed Interrupts":
"The trigger mode register (TMR) indicates the trigger mode of the
interrupt (see Figure 10-20). Upon acceptance of an interrupt
into the IRR, the corresponding TMR bit is cleared for
edge-triggered interrupts and set for level-triggered interrupts.
If a TMR bit is set when an EOI cycle for its corresponding
interrupt vector is generated, an EOI message is sent to
all I/O APICs."

Before this patch TMR-bit was cleared on LAPIC EOI which is not what
real hardware does. This was also confirmed in KVM upstream commit
a0c9a822bf37 ("KVM: dont clear TMR on EOI").

Behavior after this patch is aligned with both Intel SDM and KVM
implementation.

Signed-off-by: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/VMX: don't risk corrupting host CR4
Jan Beulich [Thu, 15 Mar 2018 11:45:30 +0000 (12:45 +0100)]
x86/VMX: don't risk corrupting host CR4

Instead of "syncing" the live value to what mmu_cr4_features has, make
sure vCPU-s run with the value most recently loaded into %cr4, such that
after the next VM exit we continue to run with the intended value rather
than a possibly stale one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86: ignore guest microcode loading attempts
Jan Beulich [Thu, 15 Mar 2018 11:44:24 +0000 (12:44 +0100)]
x86: ignore guest microcode loading attempts

The respective MSRs are write-only, and hence attempts by guests to
write to these are - as of 1f1d183d49 ("x86/HVM: don't give the wrong
impression of WRMSR succeeding") no longer ignored. Restore original
behavior for the two affected MSRs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoRevert "tools: detect appropriate debug optimization level"
Wei Liu [Wed, 14 Mar 2018 17:15:15 +0000 (17:15 +0000)]
Revert "tools: detect appropriate debug optimization level"

This reverts commit b43501451733193b265de30fd79a764363a2a473.

Due to the implementation of cc-option, the check is always true,
which means build for gcc that doesn't have -Og support is broken.

This patch can be reapplied once we have fixed cc-option.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agodocs: Fix entry for the "usbdev" option
Anthony PERARD [Wed, 14 Mar 2018 15:00:14 +0000 (15:00 +0000)]
docs: Fix entry for the "usbdev" option

The man for xl.cfg have the "devtype=hostdev" option, but xl only
understand "type=hostdev", fix the manual to reflect actual
implementation.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/entry: Trivial nonfunctional fixes
Andrew Cooper [Wed, 14 Mar 2018 10:36:09 +0000 (10:36 +0000)]
x86/entry: Trivial nonfunctional fixes

 * Drop unnecessary size suffixes
 * The C pseudocode refers to a trap_info object, not trap_bounce.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Fix guest crashes following f75b1a5247b "x86/pv: Drop int80_bounce from struc...
Andrew Cooper [Wed, 14 Mar 2018 10:48:36 +0000 (10:48 +0000)]
x86/pv: Fix guest crashes following f75b1a5247b "x86/pv: Drop int80_bounce from struct pv_vcpu"

The original init_int80_direct_trap() was in fact buggy; `int $0x80` is not an
exception.  This went unnoticed for years because int80_bounce and trap_bounce
were separate structures, but were combined by this change.

Exception handling is different to interrupt handling for PV guests.  By
reusing trap_bounce, the following corner case can occur:

 * Handle a guest `int $0x80` instruction.  Latches TBF_EXCEPTION into
   trap_bounce.
 * Handle an exception, which emulates to success (such as ptwr support),
   which leaves trap_bounce unmodified.
 * The exception exit path sees TBF_EXCEPTION set and re-injects the `int
   $0x80` a second time.

Drop the TBF_EXCEPTION from the int80 invocation, which matches the equivalent
logic from the syscall/sysenter paths.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agolibxl_qmp: Tell QEMU about live migration or snapshot
Anthony PERARD [Tue, 13 Mar 2018 11:13:18 +0000 (11:13 +0000)]
libxl_qmp: Tell QEMU about live migration or snapshot

Since version 2.10, QEMU will lock the disk images so a second QEMU
instance will not try to open it. This would prevent live migration from
working correctly. A new parameter as been added to the QMP command
"xen-save-devices-state" in QEMU version 2.11 which allow to unlock the
disk image for a live migration, but also keep it locked for a snapshot.

QEMU commit: 5d6c599fe1d69a1bf8c5c4d3c58be2b31cd625ad
"migration, xen: Fix block image lock issue on live migration"

The extra "live" parameter can only be use if QEMU knows about it, so
only add it if qemu is recent enough.

The struct libxl__domain_suspend_state as now knowledge if the suspend
is part of a live migration.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agolibxl: Add a version check of QEMU for QMP commands
Anthony PERARD [Tue, 13 Mar 2018 11:13:17 +0000 (11:13 +0000)]
libxl: Add a version check of QEMU for QMP commands

On connection to QEMU via QMP, the version of QEMU is provided, store it
for later use.

Add a function qmp_qemu_check_version that can be used to check if QEMU
is new enough for certain fonctionnality. This will be used in a moment.

As it's a static function, it is commented out until first use, which is
in the next patch.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
7 years agogitignore: ignore wrappers.c link for fuzzer
Wei Liu [Wed, 14 Mar 2018 11:02:31 +0000 (11:02 +0000)]
gitignore: ignore wrappers.c link for fuzzer

At the same time reorder the entries alphabetically.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>