]> xenbits.xensource.com Git - people/dwmw2/xen.git/log
people/dwmw2/xen.git
5 years agoAMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Jan Beulich [Wed, 31 Jul 2019 11:17:01 +0000 (13:17 +0200)]
AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format

This is in preparation of actually enabling x2APIC mode, which requires
this wider IRTE format to be used.

A specific remark regarding the first hunk changing
amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
tables when creating new one"). Other code introduced by that change has
meanwhile disappeared or further changed, and I wonder if - rather than
adding an x2apic_enabled check to the conditional - the bypass couldn't
be deleted altogether. For now the goal is to affect the non-x2APIC
paths as little as possible.

Take the liberty and use the new "fresh" flag to suppress an unneeded
flush in update_intremap_entry_from_ioapic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: pass IOMMU to {get,free,update}_intremap_entry()
Jan Beulich [Wed, 31 Jul 2019 11:16:14 +0000 (13:16 +0200)]
AMD/IOMMU: pass IOMMU to {get,free,update}_intremap_entry()

The functions will want to know IOMMU properties (specifically the IRTE
size) subsequently.

Rather than introducing a second error path bogusly returning -E... from
amd_iommu_read_ioapic_from_ire(), also change the existing one to follow
VT-d in returning the raw (untranslated) IO-APIC RTE.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: use bit field for IRTE
Jan Beulich [Wed, 31 Jul 2019 11:15:39 +0000 (13:15 +0200)]
AMD/IOMMU: use bit field for IRTE

At the same time restrict its scope to just the single source file
actually using it, and abstract accesses by introducing a union of
pointers. (A union of the actual table entries is not used to make it
impossible to [wrongly, once the 128-bit form gets added] perform
pointer arithmetic / array accesses on derived types.)

Also move away from updating the entries piecemeal: Construct a full new
entry, and write it out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: use bit field for control register
Jan Beulich [Wed, 31 Jul 2019 11:15:04 +0000 (13:15 +0200)]
AMD/IOMMU: use bit field for control register

Also introduce a field in struct amd_iommu caching the most recently
written control register. All writes should now happen exclusively from
that cached value, such that it is guaranteed to be up to date.

Take the opportunity and add further fields. Also convert a few boolean
function parameters to bool, such that use of !! can be avoided.

Because of there now being definitions beyond bit 31, writel() also gets
replaced by writeq() when updating hardware.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: use bit field for extended feature register
Jan Beulich [Wed, 31 Jul 2019 11:14:27 +0000 (13:14 +0200)]
AMD/IOMMU: use bit field for extended feature register

This also takes care of several of the shift values wrongly having been
specified as hex rather than dec.

Take the opportunity and
- replace a readl() pair by a single readq(),
- add further fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agogrant_table: harden version dependent accesses
Norbert Manthey [Wed, 31 Jul 2019 11:13:09 +0000 (13:13 +0200)]
grant_table: harden version dependent accesses

Guests can issue grant table operations and provide guest controlled
data to them. This data is used as index for memory loads after bound
checks have been done. Depending on the grant table version, the
size of elements in containers differ. As the base data structure is
a page, the number of elements per page also differs. Consequently,
bound checks are version dependent, so that speculative execution can
happen in several stages, the bound check as well as the version check.

This commit mitigates cases where out-of-bound accesses could happen
due to the version comparison. In cases, where no different memory
locations are accessed on the code path that follow an if statement,
no protection is required. No different memory locations are accessed
in the following functions after a version check:

 * gnttab_setup_table: only calculated numbersi are used, and then
        function gnttab_grow_table is called, which is version protected

 * gnttab_transfer: the case that depends on the version check just gets
        into copying a page or not

 * acquire_grant_for_copy: the not fixed comparison is on the abort path
        and does not access other structures, and on the else branch
        accesses only structures that have been validated before

 * gnttab_set_version: all accessible data is allocated for both versions
        Furthermore, the functions gnttab_populate_status_frames and
        gnttab_unpopulate_status_frames received a block_speculation
        macro. Hence, this code will only be executed once the correct
        version is visible in the architectural state.

 * gnttab_release_mappings: this function is called only during domain
       destruction and control is not returned to the guest

 * mem_sharing_gref_to_gfn: speculation will be stoped by the second if
       statement, as that places a barrier on any path to be executed.

 * gnttab_get_status_frame_mfn: no version dependent check, because all
       accesses, except the gt->status[idx], do not perform index-based
       accesses, or speculative out-of-bound accesses in the
       gnttab_grow_table function call.

 * gnttab_usage_print: cannot be triggered by the guest

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agogrant_table: harden bound accesses
Norbert Manthey [Wed, 31 Jul 2019 11:12:12 +0000 (13:12 +0200)]
grant_table: harden bound accesses

Guests can issue grant table operations and provide guest controlled
data to them. This data is used as index for memory loads after bound
checks have been done. To avoid speculative out-of-bound accesses, we
use the array_index_nospec macro where applicable, or the macro
block_speculation. Note, the block_speculation macro is used on all
path in shared_entry_header and nr_grant_entries. This way, after a
call to such a function, all bound checks that happened before become
architectural visible, so that no additional protection is required
for corresponding array accesses. As the way we introduce an lfence
instruction might allow the compiler to reload certain values from
memory multiple times, we try to avoid speculatively continuing
execution with stale register data by moving relevant data into
function local variables.

Speculative execution is not blocked in case one of the following
properties is true:
 - path cannot be triggered by the guest
 - path does not return to the guest
 - path does not result in an out-of-bound access
 - path is unlikely to be executed repeatedly in rapid succession
Only the combination of the above properties allows to actually leak
continuous chunks of memory. Therefore, we only add the penalty of
protective mechanisms in case a potential speculative out-of-bound
access matches all the above properties.

This commit addresses only out-of-bound accesses whose index is
directly controlled by the guest, and the index is checked before.
Potential out-of-bound accesses that are caused by speculatively
evaluating the version of the current table are not addressed in this
commit. Hence, speculative out-of-bound accesses might still be
possible, for example in gnttab_get_status_frame_mfn, when calling
gnttab_grow_table, the assertion that the grant table version equals
two might not hold under speculation.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Fix build dependenices for reloc.c
Andrew Cooper [Tue, 30 Jul 2019 16:40:33 +0000 (17:40 +0100)]
x86/boot: Fix build dependenices for reloc.c

c/s 201f852eaf added start_info.h and kconfig.h to reloc.c, but only updated
start_info.h in RELOC_DEPS.

This causes reloc.c to not be regenerated when Kconfig changes.  It is most
noticeable when enabling CONFIG_PVH and finding the resulting binary crash
early with:

  (d9) (XEN)
  (d9) (XEN) ****************************************
  (d9) (XEN) Panic on CPU 0:
  (d9) (XEN) Magic value is wrong: c2c2c2c2
  (d9) (XEN) ****************************************
  (d9) (XEN)
  (d9) (XEN) Reboot in five seconds...
  (XEN) d9v0 Triple fault - invoking HVM shutdown action 1

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoxen: credit2: avoid using cpumask_weight() in hot-paths
Dario Faggioli [Mon, 29 Jul 2019 10:49:09 +0000 (12:49 +0200)]
xen: credit2: avoid using cpumask_weight() in hot-paths

cpumask_weight() is known to be expensive. In Credit2, we use it in
load-balancing, but only for knowing how many CPUs are active in a
runqueue.

Keeping such count in an integer field of the per-runqueue data
structure we have, completely avoids the need for cpumask_weight().

While there, remove as much other uses of it as we can, even if not in
hot-paths.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86: don't include {amd-,}iommu.h from fixmap.h
Jan Beulich [Tue, 30 Jul 2019 10:00:05 +0000 (12:00 +0200)]
x86: don't include {amd-,}iommu.h from fixmap.h

The #include was added by 0700c962ac ("Add AMD IOMMU support into
hypervisor") and I then didn't drop it again in d7f913b8de ("AMD IOMMU:
use ioremap()"); similarly for xen/iommu.h in 99321e0e6c ("VT-d: use
ioremap()"). Avoid needlessly re-building unrelated files when only
IOMMU definitions have changed.

Two #include-s of xen/init.h turn out necessary as replacement.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agodomain: stash xen_domctl_createdomain flags in struct domain
Paul Durrant [Tue, 30 Jul 2019 09:59:01 +0000 (11:59 +0200)]
domain: stash xen_domctl_createdomain flags in struct domain

These are canonical source of data used to set various other flags. If
they are available directly in struct domain then the other flags are no
longer needed.

This patch simply copies the flags into a new 'options' field in
struct domain. Subsequent patches will do the related clean-up work.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoxen/arm64: head: Introduce print_reg
Julien Grall [Mon, 22 Jul 2019 21:39:28 +0000 (22:39 +0100)]
xen/arm64: head: Introduce print_reg

At the moment, the user should save x30/lr if it cares about it.

Follow-up patches will introduce more use of putn in place where lr
should be preserved.

Furthermore, any user of putn should also move the value to register x0
if it was stored in a different register.

For convenience, a new macro is introduced to print a given register.
The macro will take care for us to move the value to x0 and also
preserve lr.

Lastly the new macro is used to replace all the callsite of putn. This
will simplify rework/review later on.

Note that CurrentEL is now stored in x5 instead of x4 because the latter
will be clobbered by the macro print_reg.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Rework UART initialization on boot CPU
Julien Grall [Mon, 22 Jul 2019 21:39:27 +0000 (22:39 +0100)]
xen/arm64: head: Rework UART initialization on boot CPU

Anything executed after the label common_start can be executed on all
CPUs. However most of the instructions executed between the label
common_start and init_uart are not executed on the boot CPU.

The only instructions executed are to lookup the CPUID so it can be
printed on the console (if earlyprintk is enabled). Printing the CPUID
is not entirely useful to have for the boot CPU and requires a
conditional branch to bypass unused instructions.

Furthermore, the function init_uart is only called for boot CPU
requiring another conditional branch. This makes the code a bit tricky
to follow.

The UART initialization is now moved before the label common_start. This
now requires to have a slightly altered print for the boot CPU and set
the early UART base address in each the two path (boot CPU and
secondary CPUs).

This has the nice effect to remove a couple of conditional branch in
the code.

After this rework, the CPUID is only used at the very beginning of the
secondary CPUs boot path. So there is no need to "reserve" x24 for the
CPUID.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Don't clobber x30/lr in the macro PRINT
Julien Grall [Mon, 22 Jul 2019 21:39:26 +0000 (22:39 +0100)]
xen/arm64: head: Don't clobber x30/lr in the macro PRINT

The current implementation of the macro PRINT will clobber x30/lr. This
means the user should save lr if it cares about it.

Follow-up patches will introduce more use of PRINT in place where lr
should be preserved. Rather than requiring all the users to preserve
lr, the macro PRINT is modified to save and restore it.

While the comment state x3 will be clobbered, this is not the case. So
PRINT will use x3 to preserve lr.

Lastly, take the opportunity to move the comment on top of PRINT and use
PRINT in init_uart. Both changes will be helpful in a follow-up patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Mark the end of subroutines with ENDPROC
Julien Grall [Mon, 22 Jul 2019 21:39:25 +0000 (22:39 +0100)]
xen/arm64: head: Mark the end of subroutines with ENDPROC

putn() and puts() are two subroutines. Add ENDPROC for the benefits of
static analysis tools and the reader.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: macros: Introduce an assembly macro to alias x30
Julien Grall [Mon, 22 Jul 2019 21:39:24 +0000 (22:39 +0100)]
xen/arm64: macros: Introduce an assembly macro to alias x30

The return address of a function is always stored in x30. For convenience,
introduce a register alias so "lr" can be used in assembly.

This is defined in asm-arm/arm64/macros.h to allow all assembly files
to use it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: SCTLR_EL1 is a 64-bit register on Arm64
Julien Grall [Tue, 23 Jul 2019 21:35:48 +0000 (22:35 +0100)]
xen/arm: SCTLR_EL1 is a 64-bit register on Arm64

On Arm64, system registers are always 64-bit including SCTLR_EL1.
However, Xen is assuming this is 32-bit because earlier revision of
Armv8 had the top 32-bit RES0 (see ARM DDI0595.b).

>From Armv8.5, some bits in [63:32] will be defined and allowed to be
modified by the guest. So we would effectively reset those bits to 0
after each context switch. This means the guest may not function
correctly afterwards.

Rather than resetting to 0 the bits [63:32], preserve them across
context switch.

Note that the corresponding register on Arm32 (i.e SCTLR) is always
32-bit. So we need to use register_t anywhere we deal the SCTLR{,_EL1}.

Outside interface is switched to use 64-bit to allow ABI compatibility
between 32-bit and 64-bit.

[Stefano: fix typo in commit message]

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/public: arch-arm: Restrict the visibility of struct vcpu_guest_core_regs
Julien Grall [Tue, 23 Jul 2019 21:35:47 +0000 (22:35 +0100)]
xen/public: arch-arm: Restrict the visibility of struct vcpu_guest_core_regs

Currently, the structure vcpu_guest_core_regs is part of the public API.
This implies that any change in the structure should be backward
compatible.

However, the structure is only needed by the tools and Xen. It is also
not expected to be ever used outside of that context. So we could save us
some headache by only declaring the structure for Xen and tools.

[Stefano: improve comment code style]

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: irq: Don't use _IRQ_PENDING when handling host interrupt
Julien Grall [Sun, 2 Jun 2019 10:26:14 +0000 (11:26 +0100)]
xen/arm: irq: Don't use _IRQ_PENDING when handling host interrupt

While SPIs are shared between CPU, it is not possible to receive the
same interrupts on a different CPU while the interrupt is in active
state.

For host interrupt (i.e routed to Xen), the deactivation of the
interrupt is done at the end of the handling. This can alternatively be
done outside of the handler by calling gic_set_active_state().

At the moment, gic_set_active_state() is only called by the vGIC for
interrupt routed to the guest. It is hard to find a reason for Xen to
directly play with the active state for interrupt routed to Xen.

To simplify the handling of host interrupt, gic_set_activate_state() is
now restricted to interrupts routed to guest.

This means the _IRQ_PENDING logic is now unecessary on Arm as a same
interrupt can never come up while in the loop and nobody should play
with the flag behind our back.

[Stefano: improve in-code comment]

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/public: arch-arm: Use xen_mk_ullong instead of suffixing value with ULL
Julien Grall [Mon, 3 Jun 2019 16:08:29 +0000 (17:08 +0100)]
xen/public: arch-arm: Use xen_mk_ullong instead of suffixing value with ULL

There are a few places in include/public/arch-arm.h that are still
suffixing immediate with ULL instead of using xen_mk_ullong.

The latter allows a consumer to easily tweak the header if ULL is not
supported.

So switch the remaining users of ULL to xen_mk_ullong.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen: don't longjmp() after domain_crash() in check_wakeup_from_wait()
Juergen Gross [Mon, 29 Jul 2019 04:36:24 +0000 (06:36 +0200)]
xen: don't longjmp() after domain_crash() in check_wakeup_from_wait()

Continuing on the stack saved by __prepare_to_wait() on the wrong cpu
is rather dangerous.

Instead of doing so just call the scheduler again as it already is
happening in the similar case in __prepare_to_wait() when doing the
setjmp() would be wrong.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/arm: cpuerrata: Align a virtual address before unmap
Andrii Anisov [Thu, 18 Jul 2019 13:22:20 +0000 (16:22 +0300)]
xen/arm: cpuerrata: Align a virtual address before unmap

After changes introduced by 9cc0618eb0 "xen/arm: mm: Sanity check any
update of Xen page tables" we are able to vmap/vunmap page aligned
addresses only.

So if we add a page address remainder to the mapped virtual address,
we have to mask it out before unmapping.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agotools: ipxe: update for fixing build with GCC9
Dario Faggioli [Fri, 26 Jul 2019 22:13:49 +0000 (00:13 +0200)]
tools: ipxe: update for fixing build with GCC9

Building with GCC9 (on openSUSE Tubmleweed) generates a lot of errors of
the "taking address of packed member of ... may result in an unaligned
pointer value" kind.

Updating to upstream commit 1dd56dbd11082 ("[build] Workaround compilation
error with gcc 9.1") seems to fix the problem.

For more info, see:

https://git.ipxe.org/ipxe.git/commit/1dd56dbd11082fb622c2ed21cfaced4f47d798a6

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/libxl: Add iothread support for COLO
Zhang Chen [Fri, 26 Jul 2019 16:27:23 +0000 (00:27 +0800)]
tools/libxl: Add iothread support for COLO

Xen COLO and KVM COLO shared lots of code in Qemu.
The colo-compare object in Qemu now requires an 'iothread' property since QEMU 2.11.

Detail:
https://wiki.qemu.org/Features/COLO

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agoRe-instate "xen/arm: fix mask calculation in pdx_init_mask"
Stefano Stabellini [Fri, 21 Jun 2019 20:20:25 +0000 (13:20 -0700)]
Re-instate "xen/arm: fix mask calculation in pdx_init_mask"

The commit 11911563610786615c2b3a01cdcaaf09a6f9e38d "xen/arm: fix mask
calculation in pdx_init_mask" was correct, but exposed a bug in
maddr_to_virt(). The bug in maddr_to_virt() was fixed by
612d476e74a314be514ee6a9744eea8db09d32e5 "xen/arm64: Correctly compute
the virtual address in maddr_to_virt()", so we can re-instate the
first commit now.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
5 years agoxen/arm64: Correctly compute the virtual address in maddr_to_virt()
Julien Grall [Thu, 18 Jul 2019 11:57:14 +0000 (12:57 +0100)]
xen/arm64: Correctly compute the virtual address in maddr_to_virt()

The helper maddr_to_virt() is used to translate a machine address to a
virtual address. To save some valuable address space, some part of the
machine address may be compressed.

In theory the PDX code is free to compress any bits so there are no
guarantee the machine index computed will be always greater than
xenheap_mfn_start. This would result to return a virtual address that is
not part of the direct map and trigger a crash at least on debug-build later
on because of the check in virt_to_page().

A recently reverted patch (see 1191156361 "xen/arm: fix mask calculation
in pdx_init_mask") allows the PDX to compress more bits and triggered a
crash on AMD Seattle Platform.

Avoid the crash by keeping track of the base PDX for the xenheap and use
it for computing the virtual address.

Note that virt_to_maddr() does not need to have similar modification as
it is using the hardware to translate the virtual address to a machine
address.

Take the opportunity to fix the ASSERT() as the direct map base address
correspond to the start of the RAM (this is not always 0).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agosched: refactor code around vcpu_deassign() in null scheduler
Dario Faggioli [Fri, 26 Jul 2019 08:46:38 +0000 (10:46 +0200)]
sched: refactor code around vcpu_deassign() in null scheduler

vcpu_deassign() is called only once (in _vcpu_remove()).

Let's consolidate the two functions into one.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agoxen: merge temporary vcpu pinning scenarios
Juergen Gross [Fri, 26 Jul 2019 08:45:49 +0000 (10:45 +0200)]
xen: merge temporary vcpu pinning scenarios

Today there are two scenarios which are pinning vcpus temporarily to
a single physical cpu:

- wait_event() handling
- SCHEDOP_pin_override handling

Each of those cases are handled independently today using their own
temporary cpumask to save the old affinity settings.

The two cases can be combined as the first case will only pin a vcpu to
the physical cpu it is already running on, while SCHEDOP_pin_override is
allowed to fail.

So merge the two temporary pinning scenarios by only using one cpumask
and a per-vcpu bitmask for specifying which of the scenarios is
currently active (they are both allowed to be active for the same vcpu).

Note that we don't need to call domain_update_node_affinity() as we
are only pinning for a brief period of time.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoschedule: fix a comment missprint
Andrii Anisov [Fri, 26 Jul 2019 08:45:31 +0000 (10:45 +0200)]
schedule: fix a comment missprint

Fix the comment misprint, so it refers to the exact function name.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86: optimize loading of GDT at context switch
Juergen Gross [Fri, 26 Jul 2019 08:43:42 +0000 (10:43 +0200)]
x86: optimize loading of GDT at context switch

Instead of dynamically decide whether the previous vcpu was using full
or default GDT just add a percpu variable for that purpose. This at
once removes the need for testing vcpu_ids to differ twice.

This change improves performance by 0.5% - 1% on my test machine when
doing parallel compilation.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotboot: remove maintainers and declare orphaned
Roger Pau Monne [Thu, 25 Jul 2019 13:51:12 +0000 (15:51 +0200)]
tboot: remove maintainers and declare orphaned

Gang Wei Intel email address has been bouncing for some time now, and
the other maintainer is non-responsive to patches [0], so remove
maintainers and declare INTEL(R) TRUSTED EXECUTION TECHNOLOGY (TXT)
orphaned.

[0] https://lists.xenproject.org/archives/html/xen-devel/2019-05/msg00563.html

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/dmi: Constify quirks data
Andrew Cooper [Wed, 24 Jul 2019 14:08:16 +0000 (15:08 +0100)]
x86/dmi: Constify quirks data

All DMI quirks tables are mutable, but are only ever read.

Update dmi_check_system() and dmi_system_id.callback to pass a const pointer,
and move all quirks tables into __initconstrel.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/dmi: Drop trivial callback functions
Andrew Cooper [Wed, 24 Jul 2019 14:05:16 +0000 (15:05 +0100)]
x86/dmi: Drop trivial callback functions

dmi_check_system() returns the number of matches.  This being nonzero is more
efficient than making a function pointer call to a trivial function to modify
a variable.

No functional change, but this results in less compiled code, which is
also (fractionally) quicker to run.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: Drop CONFIG_ACPI_SLEEP
Andrew Cooper [Wed, 24 Jul 2019 17:10:52 +0000 (18:10 +0100)]
x86: Drop CONFIG_ACPI_SLEEP

This option is hardcoded to 1, and the #ifdef-ary doesn't exclude wakeup.S,
which makes it useless code noise.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/dmi: Drop warning with an obsolete URL
Andrew Cooper [Wed, 24 Jul 2019 17:47:25 +0000 (18:47 +0100)]
x86/dmi: Drop warning with an obsolete URL

This quirk doesn't change anything in Xen, and the web page doesn't exist.

The wayback machine confirms that the link disappeared somewhere between
2003-06-14 and 2004-07-07.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/iommu: avoid mapping the interrupt address range for hwdom
Roger Pau Monné [Thu, 25 Jul 2019 10:17:34 +0000 (12:17 +0200)]
x86/iommu: avoid mapping the interrupt address range for hwdom

Current code only prevent mapping the lapic page into the guest
physical memory map. Expand the range to be 0xFEEx_xxxx as described
in the Intel VTd specification section 3.13 "Handling Requests to
Interrupt Address Range".

AMD also lists this address range in the AMD SR5690 Databook, section
2.4.4 "MSI Interrupt Handling and MSI to HT Interrupt Conversion".

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agopassthrough/amd: Clean iommu_hap_pt_share enabled code
Alexandru Isaila [Thu, 25 Jul 2019 10:16:58 +0000 (12:16 +0200)]
passthrough/amd: Clean iommu_hap_pt_share enabled code

At this moment IOMMU pt sharing is disabled by commit [1].

This patch cleans the unreachable code garded by iommu_hap_pt_share.

[1] c2ba3db31ef2d9f1e40e7b6c16cf3be3d671d555

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoiommu / x86: move call to scan_pci_devices() out of vendor code
Paul Durrant [Thu, 25 Jul 2019 10:16:21 +0000 (12:16 +0200)]
iommu / x86: move call to scan_pci_devices() out of vendor code

It's not vendor specific so it doesn't really belong there.

Scanning the PCI topology also really doesn't have much to do with IOMMU
initialization. It doesn't depend on there even being an IOMMU. This patch
moves to the call to the beginning of iommu_hardware_setup() but only
places it there because the topology information would be otherwise unused.

Subsequent patches will actually make use of the PCI topology during
(x86) IOMMU initialization.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/IOMMU: don't restrict IRQ affinities to online CPUs
Jan Beulich [Thu, 25 Jul 2019 10:14:52 +0000 (12:14 +0200)]
x86/IOMMU: don't restrict IRQ affinities to online CPUs

In line with "x86/IRQ: desc->affinity should strictly represent the
requested value" the internally used IRQ(s) also shouldn't be restricted
to online ones. Make set_desc_affinity() (set_msi_affinity() then does
by implication) cope with a NULL mask being passed (just like
assign_irq_vector() does), and have IOMMU code pass NULL instead of
&cpu_online_map (when, for VT-d, there's no NUMA node information
available).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agox86/pv: Move async_exception_cleanup() into pv/iret.c
Andrew Cooper [Tue, 23 Jul 2019 19:46:35 +0000 (20:46 +0100)]
x86/pv: Move async_exception_cleanup() into pv/iret.c

All callers are in pv/iret.c.  Move the function and make it static.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/x86: cleanup unused NMI/MCE code
Juergen Gross [Wed, 24 Jul 2019 11:26:57 +0000 (13:26 +0200)]
xen/x86: cleanup unused NMI/MCE code

pv_raise_interrupt() is only called for NMIs these days, so the MCE
specific part can be removed. Rename pv_raise_interrupt() to
pv_raise_nmi() and NMI_MCE_SOFTIRQ to NMI_SOFTIRQ.

Additionally there is no need to pin the vcpu which the NMI is delivered
to; that is a leftover of (already removed) MCE handling. So remove the
pinning, too. Note that pinning was introduced by commit 355b0469a8
adding MCE support (with NMI support existing already). MCE using that
pinning was removed with commit 3a91769d6e again without cleaning up the
code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-and-tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agopassthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()
Andrew Cooper [Thu, 19 Oct 2017 10:50:18 +0000 (11:50 +0100)]
passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait()

DMA-ing to the stack is considered bad practice.  In this case, if a
timeout occurs because of a sluggish device which is processing the
request, the completion notification will corrupt the stack of a
subsequent deeper call tree.

Place the poll_slot in a percpu area and DMA to that instead.

Fix the declaration of saddr in struct qinval_entry, to avoid a shift by
two.  The requirement here is that the DMA address is dword aligned,
which is covered by poll_slot's type.

This change does not address other issues.  Correlating completions
after a timeout with their request is a more complicated change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/iommu: add comment regarding setting of need_sync
Roger Pau Monné [Tue, 23 Jul 2019 15:00:07 +0000 (17:00 +0200)]
x86/iommu: add comment regarding setting of need_sync

Clarify why relaxed hardware domains don't need iommu page-table
syncing.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: switch pci_conf_write32 to use pci_sbdf_t
Roger Pau Monné [Tue, 23 Jul 2019 14:59:23 +0000 (16:59 +0200)]
pci: switch pci_conf_write32 to use pci_sbdf_t

This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agopci: switch pci_conf_write16 to use pci_sbdf_t
Roger Pau Monné [Tue, 23 Jul 2019 14:58:42 +0000 (16:58 +0200)]
pci: switch pci_conf_write16 to use pci_sbdf_t

This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: switch pci_conf_write8 to use pci_sbdf_t
Roger Pau Monné [Tue, 23 Jul 2019 14:58:07 +0000 (16:58 +0200)]
pci: switch pci_conf_write8 to use pci_sbdf_t

This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: switch pci_conf_read32 to use pci_sbdf_t
Roger Pau Monné [Tue, 23 Jul 2019 14:54:38 +0000 (16:54 +0200)]
pci: switch pci_conf_read32 to use pci_sbdf_t

This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.

While there convert {IGD/IOH}_DEV to be a pci_sbdf_t itself instead of
a device number.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agopci: switch pci_conf_read16 to use pci_sbdf_t
Roger Pau Monné [Tue, 23 Jul 2019 14:54:01 +0000 (16:54 +0200)]
pci: switch pci_conf_read16 to use pci_sbdf_t

This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agopci: switch pci_conf_read8 to use pci_sbdf_t
Roger Pau Monné [Tue, 23 Jul 2019 14:53:24 +0000 (16:53 +0200)]
pci: switch pci_conf_read8 to use pci_sbdf_t

This reduces the number of parameters of the function to two, and
simplifies some of the calling sites.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86emul: unconditionally deliver #UD for LWP insns
Jan Beulich [Tue, 23 Jul 2019 14:52:19 +0000 (16:52 +0200)]
x86emul: unconditionally deliver #UD for LWP insns

This is to accompany commit 91f86f8634 ("x86/svm: Drop support for AMD's
Lightweight Profiling").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/sched: fix locking in restore_vcpu_affinity()
Juergen Gross [Tue, 23 Jul 2019 09:20:55 +0000 (11:20 +0200)]
xen/sched: fix locking in restore_vcpu_affinity()

Commit 0763cd2687897b55e7 ("xen/sched: don't disable scheduler on cpus
during suspend") removed a lock in restore_vcpu_affinity() which needs
to stay: cpumask_scratch_cpu() must be protected by the scheduler
lock. restore_vcpu_affinity() is being called by thaw_domains(), so
with multiple domains in the system another domain might already be
running and the scheduler might make use of cpumask_scratch_cpu()
already.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/arm: remove unused dt_device_node parameter
Viktor Mitin [Tue, 18 Jun 2019 08:58:51 +0000 (11:58 +0300)]
xen/arm: remove unused dt_device_node parameter

Some of the function generating nodes (e.g make_timer_node)
take in a dt_device_node parameter, but never used it.
It is actually misused when creating DT for DomU.
So it is the best to remove the parameter.

Suggested-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Viktor Mitin <viktor_mitin@epam.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agox86/crash: fix kexec transition breakage
Igor Druzhinin [Fri, 19 Jul 2019 13:07:48 +0000 (14:07 +0100)]
x86/crash: fix kexec transition breakage

Following 6ff560f7f ("x86/SMP: don't try to stop already stopped CPUs")
an incorrect condition was placed into kexec transition path
leaving crashing CPU always online breaking kdump kernel entering.
Correct it by unifying the condition with smp_send_stop().

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
5 years agoAMD/IOMMU: pass IOMMU to amd_iommu_alloc_intremap_table()
Jan Beulich [Mon, 22 Jul 2019 10:06:10 +0000 (12:06 +0200)]
AMD/IOMMU: pass IOMMU to amd_iommu_alloc_intremap_table()

The function will want to know IOMMU properties (specifically the IRTE
size) subsequently.

Correct indentation of one of the call sites at this occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: pass IOMMU to iterate_ivrs_entries() callback
Jan Beulich [Mon, 22 Jul 2019 10:05:27 +0000 (12:05 +0200)]
AMD/IOMMU: pass IOMMU to iterate_ivrs_entries() callback

Both users will want to know IOMMU properties (specifically the IRTE
size) subsequently. Leverage this to avoid pointless calls to the
callback when IVRS mapping table entries are unpopulated. To avoid
leaking interrupt remapping tables (bogusly) allocated for IOMMUs
themselves, this requires suppressing their allocation in the first
place, taking a step further what commit 757122c0cf ('AMD/IOMMU: don't
"add" IOMMUs') had done.

Additionally suppress the call for alias entries, as again both users
don't care about these anyway. In fact this eliminates a fair bit of
redundancy from dump output.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: process softirqs while dumping IRTs
Jan Beulich [Mon, 22 Jul 2019 10:03:46 +0000 (12:03 +0200)]
AMD/IOMMU: process softirqs while dumping IRTs

When there are sufficiently many devices listed in the ACPI tables (no
matter if they actually exist), output may take way longer than the
watchdog would like.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: free more memory when cleaning up after error
Jan Beulich [Mon, 22 Jul 2019 09:59:01 +0000 (11:59 +0200)]
AMD/IOMMU: free more memory when cleaning up after error

The interrupt remapping in-use bitmaps were leaked in all cases. The
ring buffers and the mapping of the MMIO space were leaked for any IOMMU
that hadn't been enabled yet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agox86/vLAPIC: avoid speculative out of bounds accesses
Jan Beulich [Mon, 22 Jul 2019 09:50:58 +0000 (11:50 +0200)]
x86/vLAPIC: avoid speculative out of bounds accesses

Array indexes used in the MSR read/write emulation functions as well as
the direct VMX / APIC-V hook are derived from guest controlled values.
Restrict their ranges to limit the side effects of speculative
execution.

Along these lines also constrain the vlapic_lvt_mask[] access.

Remove the unused vlapic_lvt_{vector,dm}() instead of adjusting them.

This is part of the speculative hardening effort.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: move {,_}clear_irq_vector()
Jan Beulich [Mon, 22 Jul 2019 09:48:08 +0000 (11:48 +0200)]
x86/IRQ: move {,_}clear_irq_vector()

This is largely to drop a forward declaration. There's one functional
change - clear_irq_vector() gets marked __init, as its only caller is
check_timer(). Beyond this only a few stray blanks get removed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: eliminate some on-stack cpumask_t instances
Jan Beulich [Mon, 22 Jul 2019 09:47:38 +0000 (11:47 +0200)]
x86/IRQ: eliminate some on-stack cpumask_t instances

Use scratch_cpumask where possible, to avoid creating these possibly
large stack objects. We can't use it in _assign_irq_vector() and
set_desc_affinity(), as these get called in IRQ context.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: tighten vector checks
Jan Beulich [Mon, 22 Jul 2019 09:47:06 +0000 (11:47 +0200)]
x86/IRQ: tighten vector checks

Use valid_irq_vector() rather than "> 0".

Also replace an open-coded use of IRQ_VECTOR_UNASSIGNED.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: drop redundant cpumask_empty() from move_masked_irq()
Jan Beulich [Mon, 22 Jul 2019 09:46:31 +0000 (11:46 +0200)]
x86/IRQ: drop redundant cpumask_empty() from move_masked_irq()

The subsequent cpumask_intersects() covers the "empty" case quite fine.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: make fixup_irqs() skip unconnected internally used interrupts
Jan Beulich [Mon, 22 Jul 2019 09:45:58 +0000 (11:45 +0200)]
x86/IRQ: make fixup_irqs() skip unconnected internally used interrupts

Since the "Cannot set affinity ..." warning is a one time one, avoid
triggering it already at boot time when parking secondary threads and
the serial console uses a (still unconnected at that time) PCI IRQ.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQs: correct/tighten vector check in _clear_irq_vector()
Jan Beulich [Mon, 22 Jul 2019 09:45:28 +0000 (11:45 +0200)]
x86/IRQs: correct/tighten vector check in _clear_irq_vector()

If any particular value was to be checked against, it would need to be
IRQ_VECTOR_UNASSIGNED.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Be more strict though and use valid_irq_vector() instead.

Take the opportunity and also convert local variables to unsigned int.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: target online CPUs when binding guest IRQ
Jan Beulich [Mon, 22 Jul 2019 09:44:50 +0000 (11:44 +0200)]
x86/IRQ: target online CPUs when binding guest IRQ

fixup_irqs() skips interrupts without action. Hence such interrupts can
retain affinity to just offline CPUs. With "noirqbalance" in effect,
pirq_guest_bind() so far would have left them alone, resulting in a non-
working interrupt.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: fix locking around vector management
Jan Beulich [Mon, 22 Jul 2019 09:44:02 +0000 (11:44 +0200)]
x86/IRQ: fix locking around vector management

All of __{assign,bind,clear}_irq_vector() manipulate struct irq_desc
fields, and hence ought to be called with the descriptor lock held in
addition to vector_lock. This is currently the case for only
set_desc_affinity() (in the common case) and destroy_irq(), which also
clarifies what the nesting behavior between the locks has to be.
Reflect the new expectation by having these functions all take a
descriptor as parameter instead of an interrupt number.

Also take care of the two special cases of calls to set_desc_affinity():
set_ioapic_affinity_irq() and VT-d's dma_msi_set_affinity() get called
directly as well, and in these cases the descriptor locks hadn't got
acquired till now. For set_ioapic_affinity_irq() this means acquiring /
releasing of the IO-APIC lock can be plain spin_{,un}lock() then.

Drop one of the two leading underscores from all three functions at
the same time.

There's one case left where descriptors get manipulated with just
vector_lock held: setup_vector_irq() assumes its caller to acquire
vector_lock, and hence can't itself acquire the descriptor locks (wrong
lock order). I don't currently see how to address this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com> [VT-d]
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: consolidate use of ->arch.cpu_mask
Jan Beulich [Mon, 22 Jul 2019 09:43:16 +0000 (11:43 +0200)]
x86/IRQ: consolidate use of ->arch.cpu_mask

Mixed meaning was implied so far by different pieces of code -
disagreement was in particular about whether to expect offline CPUs'
bits to possibly be set. Switch to a mostly consistent meaning
(exception being high priority interrupts, which would perhaps better
be switched to the same model as well in due course). Use the field to
record the vector allocation mask, i.e. potentially including bits of
offline (parked) CPUs. This implies that before passing the mask to
certain functions (most notably cpu_mask_to_apicid()) it needs to be
further reduced to the online subset.

The exception of high priority interrupts is also why for the moment
_bind_irq_vector() is left as is, despite looking wrong: It's used
exclusively for IRQ0, which isn't supposed to move off CPU0 at any time.

The prior lack of restricting to online CPUs in set_desc_affinity()
before calling cpu_mask_to_apicid() in particular allowed (in x2APIC
clustered mode) offlined CPUs to end up enabled in an IRQ's destination
field. (I wonder whether vector_allocation_cpumask_flat() shouldn't
follow a similar model, using cpu_present_map in favor of
cpu_online_map.)

For IO-APIC code it was definitely wrong to potentially store, as a
fallback, TARGET_CPUS (i.e. all online ones) into the field, as that
would have caused problems when determining on which CPUs to release
vectors when they've gone out of use. Disable interrupts instead when
no valid target CPU can be established (which code elsewhere should
guarantee to never happen), and log a message in such an unlikely event.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: desc->affinity should strictly represent the requested value
Jan Beulich [Mon, 22 Jul 2019 09:42:32 +0000 (11:42 +0200)]
x86/IRQ: desc->affinity should strictly represent the requested value

desc->arch.cpu_mask reflects the actual set of target CPUs. Don't ever
fiddle with desc->affinity itself, except to store caller requested
values. Note that assign_irq_vector() now takes a NULL incoming CPU mask
to mean "all CPUs" now, rather than just "all currently online CPUs".
This way no further affinity adjustment is needed after onlining further
CPUs.

This renders both set_native_irq_info() uses (which weren't using proper
locking anyway) redundant - drop the function altogether.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: deal with move cleanup count state in fixup_irqs()
Jan Beulich [Mon, 22 Jul 2019 09:41:55 +0000 (11:41 +0200)]
x86/IRQ: deal with move cleanup count state in fixup_irqs()

The cleanup IPI may get sent immediately before a CPU gets removed from
the online map. In such a case the IPI would get handled on the CPU
being offlined no earlier than in the interrupts disabled window after
fixup_irqs()' main loop. This is too late, however, because a possible
affinity change may incur the need for vector assignment, which will
fail when the IRQ's move cleanup count is still non-zero.

To fix this
- record the set of CPUs the cleanup IPIs gets actually sent to alongside
  setting their count,
- adjust the count in fixup_irqs(), accounting for all CPUs that the
  cleanup IPI was sent to, but that are no longer online,
- bail early from the cleanup IPI handler when the CPU is no longer
  online, to prevent double accounting.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: deal with move-in-progress state in fixup_irqs()
Jan Beulich [Mon, 22 Jul 2019 09:41:02 +0000 (11:41 +0200)]
x86/IRQ: deal with move-in-progress state in fixup_irqs()

The flag being set may prevent affinity changes, as these often imply
assignment of a new vector. When there's no possible destination left
for the IRQ, the clearing of the flag needs to happen right from
fixup_irqs().

Additionally _assign_irq_vector() needs to avoid setting the flag when
there's no online CPU left in what gets put into ->arch.old_cpu_mask.
The old vector can be released right away in this case.

Also extend the log message about broken affinity to include the new
affinity as well, allowing to notice issues with affinity changes not
actually having taken place. Swap the if/else-if order there at the
same time to reduce the amount of conditions checked.

At the same time replace two open coded instances of the new helper
function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agotools/libxc: allow controlling the max C-state sub-state
Ross Lagerwall [Mon, 22 Jul 2019 09:35:19 +0000 (11:35 +0200)]
tools/libxc: allow controlling the max C-state sub-state

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Make handling in do_pm_op() more homogeneous: Before interpreting
op->cpuid as such, handle all operations not acting on a particular
CPU. Also expose the setting via xenpm.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: allow limiting the max C-state sub-state
Ross Lagerwall [Mon, 22 Jul 2019 09:34:32 +0000 (11:34 +0200)]
x86: allow limiting the max C-state sub-state

Allow limiting the max C-state sub-state by appending to the max_cstate
command-line parameter. E.g. max_cstate=1,0
The limit only applies to the highest legal C-state. For example:
 max_cstate = 1, max_csubstate = 0 ==> C0, C1 okay, but not C1E
 max_cstate = 1, max_csubstate = 1 ==> C0, C1 and C1E okay, but not C2
 max_cstate = 2, max_csubstate = 0 ==> C0, C1, C1E, C2 okay, but not C3
 max_cstate = 2, max_csubstate = 1 ==> C0, C1, C1E, C2 okay, but not C3

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/AMD: make C-state handling independent of Dom0
Jan Beulich [Mon, 22 Jul 2019 09:34:03 +0000 (11:34 +0200)]
x86/AMD: make C-state handling independent of Dom0

At least for more recent CPUs, following what BKDG / PPR suggest for the
BIOS to surface via ACPI we can make ourselves independent of Dom0
uploading respective data.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpuidle: really use C1 for "urgent" CPUs
Jan Beulich [Mon, 22 Jul 2019 09:32:20 +0000 (11:32 +0200)]
x86/cpuidle: really use C1 for "urgent" CPUs

For one on recent AMD CPUs entering C1 (if available at all) requires
use of MWAIT, while HLT (i.e. default_idle()) would put the processor
into as deep as CC6. And then even on other vendors' CPUs we should
avoid entering default_idle() when the intended state can be reached
by using the active idle driver's facilities.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpuidle: switch to uniform meaning of "max_cstate="
Jan Beulich [Mon, 22 Jul 2019 09:31:38 +0000 (11:31 +0200)]
x86/cpuidle: switch to uniform meaning of "max_cstate="

While the MWAIT idle driver already takes it to mean an actual C state,
the ACPI idle driver so far used it as a list index. The list index,
however, is an implementation detail of Xen and affected by firmware
settings (i.e. not necessarily uniform for a particular system).

While touching this code also avoid invoking menu_get_trace_data()
when tracing is not active. For consistency do this also for the
MWAIT driver.

Note that I'm intentionally not adding any sorting logic to set_cx():
Before and after this patch we assume entries to arrive in order, so
this would be an orthogonal change.

Take the opportunity and add minimal documentation for the command line
option.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/shadow: ditch dangerous declarations
Jan Beulich [Mon, 22 Jul 2019 09:30:10 +0000 (11:30 +0200)]
x86/shadow: ditch dangerous declarations

This started out with me noticing the latent bug of there being HVM
related declarations in common.c that their producer doesn't see, and
that hence could go out of sync at any time. However, go farther than
fixing just that and move the functions actually using these into hvm.c.
This way the items in question can simply become static, and no separate
declarations are needed at all.

Within the moved code constify and rename or outright delete the struct
vcpu * local variables and re-format a comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mtrr: Skip cache flushes on CPUs with cache self-snooping
Ricardo Neri [Fri, 19 Jul 2019 11:51:24 +0000 (13:51 +0200)]
x86/mtrr: Skip cache flushes on CPUs with cache self-snooping

Programming MTRR registers in multi-processor systems is a rather lengthy
process. Furthermore, all processors must program these registers in lock
step and with interrupts disabled; the process also involves flushing
caches and TLBs twice. As a result, the process may take a considerable
amount of time.

On some platforms, this can lead to a large skew of the refined-jiffies
clock source. Early when booting, if no other clock is available (e.g.,
booting with hpet=disabled), the refined-jiffies clock source is used to
monitor the TSC clock source. If the skew of refined-jiffies is too large,
Linux wrongly assumes that the TSC is unstable:

  clocksource: timekeeping watchdog on CPU1: Marking clocksource
               'tsc-early' as unstable because the skew is too large:
  clocksource: 'refined-jiffies' wd_now: fffedc10 wd_last:
               fffedb90 mask: ffffffff
  clocksource: 'tsc-early' cs_now: 5eccfddebc cs_last: 5e7e3303d4
               mask: ffffffffffffffff
  tsc: Marking TSC unstable due to clocksource watchdog

As per measurements, around 98% of the time needed by the procedure to
program MTRRs in multi-processor systems is spent flushing caches with
wbinvd(). As per the Section 11.11.8 of the Intel 64 and IA 32
Architectures Software Developer's Manual, it is not necessary to flush
caches if the CPU supports cache self-snooping. Thus, skipping the cache
flushes can reduce by several tens of milliseconds the time needed to
complete the programming of the MTRR registers:

Platform                       Before    After
104-core (208 Threads) Skylake  1437ms      28ms
  2-core (  4 Threads) Haswell   114ms       2ms

Reported-by: Mohammad Etemadi <mohammad.etemadi@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
[Linux commit fd329f276ecaad7a371d6f91b9bbea031d0c3440]

Use alternatives patching instead of static_cpu_has() (which we don't
have [yet]).

Interestingly we've been lacking the 2nd wbinvd(), which I'm taking the
liberty here.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata
Ricardo Neri [Fri, 19 Jul 2019 11:50:38 +0000 (13:50 +0200)]
x86/cpu/intel: Clear cache self-snoop capability in CPUs with known errata

From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Processors which have self-snooping capability can handle conflicting
memory type across CPUs by snooping its own cache. However, there exists
CPU models in which having conflicting memory types still leads to
unpredictable behavior, machine check errors, or hangs.

Clear this feature on affected CPUs to prevent its use.

Suggested-by: Alan Cox <alan.cox@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
[Linux commit 1e03bff3600101bd9158d005e4313132e55bdec8]

Strip Yonah - as per ark.intel.com it doesn't look to be 64-bit capable.
Call the new function on the boot CPU only. Don't clear the CPU feature
flag itself, as it is exposed to guests (who could otherwise observe it
disappear after migration).

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: compile mem_sharing subsystem only when kconfig is enabled
Tamas K Lengyel [Fri, 19 Jul 2019 11:49:47 +0000 (13:49 +0200)]
x86/mem_sharing: compile mem_sharing subsystem only when kconfig is enabled

Disable it by default as it is only an experimental subsystem.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: enable mem_share audit mode only in debug builds
Tamas K Lengyel [Fri, 19 Jul 2019 11:49:26 +0000 (13:49 +0200)]
x86/mem_sharing: enable mem_share audit mode only in debug builds

Improves performance for release builds.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agox86/mem_sharing: copy a page_lock version to be internal to memshr
Tamas K Lengyel [Fri, 19 Jul 2019 11:48:38 +0000 (13:48 +0200)]
x86/mem_sharing: copy a page_lock version to be internal to memshr

Patch cf4b30dca0a "Add debug code to detect illegal page_lock and put_page_type
ordering" added extra sanity checking to page_lock/page_unlock for debug builds
with the assumption that no hypervisor path ever locks two pages at once.

This assumption doesn't hold during memory sharing so we copy a version of
page_lock/unlock to be used exclusively in the memory sharing subsystem
without the sanity checks.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: reorder when pages are unlocked and released
Tamas K Lengyel [Fri, 19 Jul 2019 11:47:17 +0000 (13:47 +0200)]
x86/mem_sharing: reorder when pages are unlocked and released

Calling _put_page_type while also holding the page_lock for that page
can cause a deadlock. There may be code-paths still in place where this
is an issue, but for normal sharing purposes this has been tested and
works.

Removing grabbing the extra page reference at certain points is done
because it is no longer needed, a reference is held till necessary with
this reorder thus the extra reference is redundant.

The comment being dropped is incorrect since it's now out-of-date.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/trace: Implement TRACE_?D() in a more efficient fashon
Andrew Cooper [Thu, 18 Jul 2019 15:24:42 +0000 (16:24 +0100)]
xen/trace: Implement TRACE_?D() in a more efficient fashon

These can easily be expressed with a variadic macro. No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agoxen/trace: Adjust types in function declarations
Andrew Cooper [Thu, 18 Jul 2019 13:41:48 +0000 (14:41 +0100)]
xen/trace: Adjust types in function declarations

Use uint32_t consistently for 'event', bool consistently for 'cycles',
and unsigned int consistently for 'extra'.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agoxen/trace: Add trace.h to MAINTAINER
Andrew Cooper [Thu, 18 Jul 2019 16:53:03 +0000 (17:53 +0100)]
xen/trace: Add trace.h to MAINTAINER

... to match the existing trace.c entry.

Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agolibxl_qmp: wait for completion of device removal
Chao Gao [Fri, 19 Jul 2019 09:24:08 +0000 (10:24 +0100)]
libxl_qmp: wait for completion of device removal

To remove a device from a domain, a qmp command is sent to qemu. But it is
handled by qemu asychronously. Even the qmp command is claimed to be done,
the actual handling in qemu side may happen later.
This behavior brings two questions:
1. Attaching a device back to a domain right after detaching the device from
that domain would fail with error:

libxl: error: libxl_qmp.c:341:qmp_handle_error_response: Domain 1:received an
error message from QMP server: Duplicate ID 'pci-pt-60_00.0' for device

2. Accesses to PCI configuration space in Qemu may overlap with later device
reset issued by 'xl' or by pciback.

In order to avoid mentioned questions, wait for the completion of device
removal by querying all pci devices using qmp command and ensuring the target
device isn't listed. Only retry 5 times to avoid 'xl' potentially being blocked
by qemu.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Message-Id: <1562133373-19208-1-git-send-email-chao.gao@intel.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agogolang/xenlight: Fixing compilation for go 1.11
Daniel P. Smith [Thu, 18 Jul 2019 21:11:44 +0000 (22:11 +0100)]
golang/xenlight: Fixing compilation for go 1.11

This deals with two casting issues for compiling under go 1.11:
- explicitly cast to *C.xentoollog_logger for Ctx.logger pointer
- add cast to unsafe.Pointer for the C string cpath

Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoMAINTAINERS: Make myself libxl golang binding maintainer
George Dunlap [Mon, 8 Jul 2019 10:56:24 +0000 (06:56 -0400)]
MAINTAINERS: Make myself libxl golang binding maintainer

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86emul: Ignore ssse3-{aes,pclmul}.[ch] as well
Andrew Cooper [Thu, 18 Jul 2019 15:09:27 +0000 (16:09 +0100)]
x86emul: Ignore ssse3-{aes,pclmul}.[ch] as well

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/trace: Fix build with !CONFIG_TRACEBUFFER
Andrew Cooper [Thu, 18 Jul 2019 13:29:35 +0000 (14:29 +0100)]
xen/trace: Fix build with !CONFIG_TRACEBUFFER

GCC reports:

In file included from hvm.c:24:0:
/local/xen.git/xen/include/xen/trace.h: In function ‘tb_control’:
/local/xen.git/xen/include/xen/trace.h:60:13: error: ‘ENOSYS’
undeclared (first use in this function)
     return -ENOSYS;
             ^~~~~~

Include xen/errno.h to resolve the issue.  While tweaking this, add comments
to the #else and #endif, as they are a fair distance apart.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mm: Provide more useful information in diagnostics
Andrew Cooper [Sat, 13 Apr 2019 21:03:05 +0000 (22:03 +0100)]
x86/mm: Provide more useful information in diagnostics

 * alloc_l?_table() should identify the failure, not just state that there is
   one.
 * get_page() should use %pd for the two domains, to render system domains in
   a more obvious way.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86emul: add a PCLMUL/VPCLMUL test case to the harness
Jan Beulich [Wed, 17 Jul 2019 13:46:08 +0000 (15:46 +0200)]
x86emul: add a PCLMUL/VPCLMUL test case to the harness

Also use this for AVX512_VBMI2 VPSH{L,R}D{,V}{D,Q,W} testing (only the
quad word right shifts get actually used; the assumption is that their
"left" counterparts as well as the double word and word forms then work
as well).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
5 years agox86emul: add a SHA test case to the harness
Jan Beulich [Wed, 17 Jul 2019 13:45:34 +0000 (15:45 +0200)]
x86emul: add a SHA test case to the harness

Also use this for AVX512VL VPRO{L,R}{,V}D as well as some further shifts
testing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: add an AES/VAES test case to the harness
Jan Beulich [Wed, 17 Jul 2019 13:44:54 +0000 (15:44 +0200)]
x86emul: add an AES/VAES test case to the harness

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: restore ordering within main switch statement
Jan Beulich [Wed, 17 Jul 2019 13:43:57 +0000 (15:43 +0200)]
x86emul: restore ordering within main switch statement

Incremental additions and/or mistakes have lead to some code blocks
sitting in "unexpected" places. Re-sort the case blocks (opcode space;
major opcode; 66/F3/F2 prefix; legacy/VEX/EVEX encoding).

As an exception the opcode space 0x0f EVEX-encoded VPEXTRW is left at
its current place, to keep it close to the "pextr" label.

Pure code movement.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
5 years agox86emul: support GFNI insns
Jan Beulich [Wed, 17 Jul 2019 13:43:06 +0000 (15:43 +0200)]
x86emul: support GFNI insns

As to the feature dependency adjustment, while strictly speaking SSE is
a sufficient prereq (to have XMM registers), vectors of bytes and qwords
have got introduced only with SSE2. gcc, for example, uses a similar
connection in its respective intrinsics header.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: support VAES insns
Jan Beulich [Wed, 17 Jul 2019 13:41:58 +0000 (15:41 +0200)]
x86emul: support VAES insns

As to the feature dependency adjustment, just like for VPCLMULQDQ while
strictly speaking AVX is a sufficient prereq (to have YMM registers),
256-bit vectors of integers have got fully introduced with AVX2 only.

A new test case (also covering AESNI) will be added to the harness by a
subsequent patch.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citirx.com>
5 years agox86emul: support VPCLMULQDQ insns
Jan Beulich [Wed, 17 Jul 2019 13:41:20 +0000 (15:41 +0200)]
x86emul: support VPCLMULQDQ insns

As to the feature dependency adjustment, while strictly speaking AVX is
a sufficient prereq (to have YMM registers), 256-bit vectors of integers
have got fully introduced with AVX2 only. Sadly gcc can't be used as a
reference here: They don't provide any AVX512-independent built-in at
all.

Along the lines of PCLMULQDQ, since the insns here and in particular
their memory access patterns follow the usual scheme, I didn't think it
was necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: support AVX512_VNNI insns
Jan Beulich [Wed, 17 Jul 2019 13:40:42 +0000 (15:40 +0200)]
x86emul: support AVX512_VNNI insns

Along the lines of the 4FMAPS case, convert the 4VNNIW-based table
entries to a decoder adjustment. Because of the current sharing of table
entries between different (implied) opcode prefixes and with the same
major opcodes being used for vp4dpwssd{,s}, which have a different
memory operand size and different Disp8 scaling, the pre-existing table
entries get converted to a decoder override. The table entries will now
represent the insns here, in line with other table entries preferably
representing the prefix-66 insns.

As in a few cases before, since the insns here and in particular their
memory access patterns follow the usual scheme, I didn't think it was
necessary to add a contrived test specifically for them, beyond the
Disp8 scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: support AVX512_4VNNIW insns
Jan Beulich [Wed, 17 Jul 2019 13:39:54 +0000 (15:39 +0200)]
x86emul: support AVX512_4VNNIW insns

As in a few cases before, since the insns here and in particular their
memory access patterns follow the AVX512_4FMAPS scheme, I didn't think
it was necessary to add contrived tests specifically for them, beyond
the Disp8 scaling ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>