]> xenbits.xensource.com Git - people/royger/xen.git/log
people/royger/xen.git
3 years agocirrus: update FreeBSD to 12.3 update-freebsd gitlab/update-freebsd
Roger Pau Monne [Tue, 22 Feb 2022 13:36:30 +0000 (14:36 +0100)]
cirrus: update FreeBSD to 12.3

Switch from using a FreeBSD 12.2 to a 12.3 image.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoCI/Coverity: Do not build QEMU, SeaBIOS or OVMF
Roger Pau Monne [Fri, 18 Feb 2022 12:00:42 +0000 (13:00 +0100)]
CI/Coverity: Do not build QEMU, SeaBIOS or OVMF

Such external projects should have their own Coverity runs, and
there's not much point in also making them part of our scan (apart
from greatly increasing the amount of code scanned).

Trim the dependencies now that QEMU is not built.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoCI: add github workflow to run Coverity scans
Roger Pau Monne [Fri, 18 Feb 2022 12:00:41 +0000 (13:00 +0100)]
CI: add github workflow to run Coverity scans

Add a workflow that performs a build like it's done by osstest
Coverity flight and uploads the result to Coverity for analysis. The
build process is exactly the same as the one currently used in
osstest, and it's also run at the same time (bi-weekly).

This has one big benefit over using osstest: we no longer have to care
about keeping the Coverity tools up to date in osstest.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoCI: Clean up alpine containers
Andrew Cooper [Thu, 17 Feb 2022 21:16:35 +0000 (21:16 +0000)]
CI: Clean up alpine containers

 * `apk --no-cache` is the preferred way of setting up containers, and it does
   shrink the image by a few MB.
 * Neither container needs curl-dev or automake.
 * Flex and bison are needed for Xen, so move to the Xen block.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoCI: Add gnu grep to alpine containers
Andrew Cooper [Tue, 15 Feb 2022 20:49:10 +0000 (20:49 +0000)]
CI: Add gnu grep to alpine containers

A forthcoming change is going to want more support than busybox's grep can
provide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86: replace a few do_div() uses
Jan Beulich [Fri, 18 Feb 2022 13:47:25 +0000 (14:47 +0100)]
x86: replace a few do_div() uses

When the macro's "return value" is not used, the macro use can be
replaced by a simply division, avoiding some obfuscation.

According to my observations, no change to generated code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agovideo/vesa: adjust (not just) command line option handling
Jan Beulich [Fri, 18 Feb 2022 13:46:27 +0000 (14:46 +0100)]
video/vesa: adjust (not just) command line option handling

Document the remaining option. Add section annotation to the variable
holding the parsed value as well as a few adjacent ones. Adjust the
types of font_height and vga_compat.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agovideo/vesa: drop "vesa-remap" command line option
Jan Beulich [Fri, 18 Feb 2022 13:45:45 +0000 (14:45 +0100)]
video/vesa: drop "vesa-remap" command line option

If we get mode dimensions wrong, having the remapping size controllable
via command line option isn't going to help much. Drop the option.

While adjusting this also
- add __initdata to the variable,
- use ROUNDUP() instead of open-coding it.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agovideo/vesa: drop "vesa-mtrr" command line option
Jan Beulich [Fri, 18 Feb 2022 13:45:14 +0000 (14:45 +0100)]
video/vesa: drop "vesa-mtrr" command line option

Now that we use ioremap_wc() for mapping the frame buffer, there's no
need for this option anymore. As noted in the change introducing the
use of ioremap_wc(), mtrr_add() didn't work in certain cases anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agovideo/vesa: unmap frame buffer when relinquishing console
Jan Beulich [Fri, 18 Feb 2022 13:44:32 +0000 (14:44 +0100)]
video/vesa: unmap frame buffer when relinquishing console

There's no point in keeping the VA space occupied when no further output
will occur.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86: move .text.kexec
Jan Beulich [Fri, 18 Feb 2022 13:43:58 +0000 (14:43 +0100)]
x86: move .text.kexec

The source file requests page alignment - avoid a padding hole by
placing it right after .text.entry. On average this yields a .text size
reduction of 2k.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86: introduce ioremap_wc()
Jan Beulich [Fri, 18 Feb 2022 13:42:39 +0000 (14:42 +0100)]
x86: introduce ioremap_wc()

In order for a to-be-introduced ERMS form of memcpy() to not regress
boot performance on certain systems when video output is active, we
first need to arrange for avoiding further dependency on firmware
setting up MTRRs in a way we can actually further modify. On many
systems, due to the continuously growing amounts of installed memory,
MTRRs get configured with at least one huge WB range, and with MMIO
ranges below 4Gb then forced to UC via overlapping MTRRs. mtrr_add(), as
it is today, can't deal with such a setup. Hence on such systems we
presently leave the frame buffer mapped UC, leading to significantly
reduced performance when using REP STOSB / REP MOVSB.

On post-PentiumII hardware (i.e. any that's capable of running 64-bit
code), an effective memory type of WC can be achieved without MTRRs, by
simply referencing the respective PAT entry from the PTEs. While this
will leave the switch to ERMS forms of memset() and memcpy() with
largely unchanged performance, the change here on its own improves
performance on affected systems quite significantly: Measuring just the
individual affected memcpy() invocations yielded a speedup by a factor
of over 250 on my initial (Skylake) test system. memset() isn't getting
improved by as much there, but still by a factor of about 20.

While adding {__,}PAGE_HYPERVISOR_WC, also add {__,}PAGE_HYPERVISOR_WT
to, at the very least, make clear what PTE flags this memory type uses.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoIOMMU/PCI: propagate get_device_group_id() failure
Jan Beulich [Fri, 18 Feb 2022 13:19:42 +0000 (14:19 +0100)]
IOMMU/PCI: propagate get_device_group_id() failure

The VT-d hook can indicate an error, which shouldn't be ignored. Convert
the hook's return value to a proper error code, and let that bubble up.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: replace flush_all_cache()
Jan Beulich [Fri, 18 Feb 2022 13:18:51 +0000 (14:18 +0100)]
VT-d: replace flush_all_cache()

Let's use infrastructure we have available instead of an open-coded
wbinvd() invocation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d / x86: re-arrange cache syncing
Jan Beulich [Fri, 18 Feb 2022 13:18:01 +0000 (14:18 +0100)]
VT-d / x86: re-arrange cache syncing

The actual function should always have lived in core x86 code; move it
there, replacing get_cache_line_size() by readily available (except very
early during boot; see the code comment) data. Also rename the function.

Drop the respective IOMMU hook, (re)introducing a respective boolean
instead. Replace a true and an almost open-coding instance of
iommu_sync_cache().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agox86/cpuid: add CPUID flag for Extended Destination ID support
Roger Pau Monné [Fri, 18 Feb 2022 08:17:47 +0000 (09:17 +0100)]
x86/cpuid: add CPUID flag for Extended Destination ID support

Introduce the CPUID flag to be used in order to signal the support for
using an extended destination ID in IO-APIC RTEs and MSI address
fields. Such format expands the maximum target APIC ID from 255 to
32768 without requiring the usage of interrupt remapping.

The design document describing the feature can be found at:

http://david.woodhou.se/15-bit-msi.pdf

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/libxl: don't allow IOMMU usage with PoD
Roger Pau Monné [Fri, 18 Feb 2022 08:03:08 +0000 (09:03 +0100)]
tools/libxl: don't allow IOMMU usage with PoD

Prevent libxl from creating guests that attempts to use PoD together
with an IOMMU, even if no devices are actually assigned.

While the hypervisor could support using PoD together with an IOMMU as
long as no devices are assigned, such usage seems doubtful. There's no
guarantee the guest has PoD no longer be active, and thus a later
assignment of a PCI device to such domain could fail.

Preventing the usage of PoD together with an IOMMU at guest creation
avoids having to add checks for active PoD entries in the device
assignment paths.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/xenstore: add error indicator to ring page
Juergen Gross [Fri, 18 Feb 2022 08:02:48 +0000 (09:02 +0100)]
tools/xenstore: add error indicator to ring page

In case Xenstore is detecting a malicious ring page modification (e.g.
an invalid producer or consumer index set by a guest) it will ignore
the connection of that guest in future.

Add a new error field to the ring page indicating that case. Add a new
feature bit in order to signal the presence of that error field.

Move the ignore_connection() function to xenstored_domain.c in order
to be able to access the ring page for setting the error indicator.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/console: process softirqs between warning prints
Roger Pau Monné [Fri, 18 Feb 2022 08:02:16 +0000 (09:02 +0100)]
x86/console: process softirqs between warning prints

Process softirqs while printing end of boot warnings. Each warning can
be several lines long, and on slow consoles printing multiple ones
without processing softirqs can result in the watchdog triggering:

(XEN) [   22.277806] ***************************************************
(XEN) [   22.417802] WARNING: CONSOLE OUTPUT IS SYNCHRONOUS
(XEN) [   22.556029] This option is intended to aid debugging of Xen by ensuring
(XEN) [   22.696802] that all output is synchronously delivered on the serial line.
(XEN) [   22.838024] However it can introduce SIGNIFICANT latencies and affect
(XEN) [   22.978710] timekeeping. It is NOT recommended for production use!
(XEN) [   23.119066] ***************************************************
(XEN) [   23.258865] Booted on L1TF-vulnerable hardware with SMT/Hyperthreading
(XEN) [   23.399560] enabled.  Please assess your configuration and choose an
(XEN) [   23.539925] explicit 'smt=<bool>' setting.  See XSA-273.
(XEN) [   23.678860] ***************************************************
(XEN) [   23.818492] Booted on MLPDS/MFBDS-vulnerable hardware with SMT/Hyperthreading
(XEN) [   23.959811] enabled.  Mitigations will not be fully effective.  Please
(XEN) [   24.100396] choose an explicit smt=<bool> setting.  See XSA-297.
(XEN) [   24.240254] *************************************************(XEN) [   24.247302] Watchdog timer detects that CPU0 is stuck!
(XEN) [   24.386785] ----[ Xen-4.17-unstable  x86_64  debug=y  Tainted:   C    ]----
(XEN) [   24.527874] CPU:    0
(XEN) [   24.662422] RIP:    e008:[<ffff82d04025b84a>] drivers/char/ns16550.c#ns16550_tx_ready+0x3a/0x90

Fixes: ee3fd57acd ('xen: add warning infrastructure')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agorwlock: remove unneeded subtraction
Roger Pau Monné [Fri, 18 Feb 2022 08:01:27 +0000 (09:01 +0100)]
rwlock: remove unneeded subtraction

There's no need to subtract _QR_BIAS from the lock value for storing
in the local cnts variable in the read lock slow path: the users of
the value in cnts only care about the writer-related bits and use a
mask to get the value.

Note that further setting of cnts in rspin_until_writer_unlock already
do not subtract _QR_BIAS.

Originally _QR_BIAS was subtracted from the result of
atomic_add_return_acquire in order to prevent GCC from emitting an
unneeded ADD instruction. This being in the lock slow path such
optimizations don't seem likely to make any relevant performance
difference. Also modern GCC and CLANG versions will already avoid
emitting the ADD instruction.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
3 years agox86/Intel: re-indent family 6 switch() in intel_log_freq()
Jan Beulich [Fri, 18 Feb 2022 08:00:10 +0000 (09:00 +0100)]
x86/Intel: re-indent family 6 switch() in intel_log_freq()

This was left at its previous indentation by e6e3cf191d37 ("x86/Intel:
also display CPU freq for family 0xf") to ease review. Remove the now
unnecessary level of indentation.

No functional change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agobuild: fix enforce unique symbols for recent clang version
Anthony PERARD [Fri, 18 Feb 2022 07:59:03 +0000 (08:59 +0100)]
build: fix enforce unique symbols for recent clang version

clang 6.0 and newer behave like gcc in regards for the FILE symbol, so
only the filename rather than the full path to the source file.

clang 3.8.1-24 (in our debian:stretch container) and 3.5.0-10
(in our debian:jessie container) do store the full path to the source
file in the FILE symbol.

Also we have commit 81ecb38b83 ("build: provide option to disambiguate
symbol names") which were using clang 5, and LLVM's commit f5040b9685a7
[1] ("Make .file directive to have basename only") which is part of
"llvmorg-6.0.0" tag but not "release/5.x" branch. Both suggest that
clang change of behavior happened with clang 6.0.

This means that we also need to check clang version to figure out
which command we need to use to redefine symbol.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
[1] https://github.com/llvm/llvm-project/commit/f5040b9685a760e584c576e9185295e54635d51e

3 years agobuild: rework cloc recipe
Anthony PERARD [Fri, 18 Feb 2022 07:58:52 +0000 (08:58 +0100)]
build: rework cloc recipe

We are going to make other modifications to the cloc recipe, so this
patch prepare make those modification easier.

We replace the Makefile meta programming by just a shell script which
should be easier to read and is actually faster to execute.

Instead of looking for files in "$(BASEDIR)", we use "." which is give
the same result overall. We also avoid the need for a temporary file
as cloc can read the list of files from stdin.

No change intended to the output of `cloc`.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: prepare to always invoke $(MAKE) from xen/, use $(obj)
Anthony PERARD [Fri, 18 Feb 2022 07:58:01 +0000 (08:58 +0100)]
build: prepare to always invoke $(MAKE) from xen/, use $(obj)

In a future patch, when building a subdirectory, we will set
"obj=$subdir" rather than change directory.

Before that, we add "$(obj)" and "$(src)" in as many places as
possible where we will need to know which subdirectory is been built.
"$(obj)" is for files been generated during the build, and "$(src)" is
for files present in the source tree.

For now, we set both to "." in Rules.mk and Makefile.clean.

A few places don't tolerate the addition of "./", this is because make
remove the leading "./" in targets and dependencies in rules, so these
will be change later.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com> # XSM
3 years agobuild: set ALL_OBJS in main Makefile; move prelink.o to main Makefile
Anthony PERARD [Fri, 18 Feb 2022 07:57:03 +0000 (08:57 +0100)]
build: set ALL_OBJS in main Makefile; move prelink.o to main Makefile

This is to avoid arch/$arch/Makefile having to recurse into parents
directories.

This avoid duplication of the logic to build prelink.o between arches.

In order to do that, we cut the $(TARGET) target in the main Makefile in
two, there is a "prepare" phase/target runned before starting to build
"prelink.o" which will prepare "include/" among other things, then all
the $(ALL_OBJS) will be generated in order to build "prelink.o" and
finally $(TARGET) will be generated by calling into "arch/*/" to make
$(TARGET).

Now we don't need to prefix $(ALL_OBJS) with $(BASEDIR) as it is now
only used from the main Makefile. Other changes is to use "$<" instead
of spelling "prelink.o" in the target "$(TARGET)" in both
arch/*/Makefile.

Beside "prelink.o" been at a different location, no other functional
change intended.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/docs: Document how to do passthrough without IOMMU
Stefano Stabellini [Mon, 14 Feb 2022 03:19:56 +0000 (03:19 +0000)]
xen/docs: Document how to do passthrough without IOMMU

This commit creates a new doc to document how to do passthrough without IOMMU.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: if direct-map domain use native UART address and IRQ number for vPL011
Stefano Stabellini [Mon, 14 Feb 2022 03:19:55 +0000 (03:19 +0000)]
xen/arm: if direct-map domain use native UART address and IRQ number for vPL011

We always use a fix address to map the vPL011 to domains. The address
could be a problem for direct-map domains.

So, for domains that are directly mapped, reuse the address of the
physical UART on the platform to avoid potential clashes.

Do the same for the virtual IRQ number: instead of always using
GUEST_VPL011_SPI, try to reuse the physical SPI number if possible.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: if direct-map domain use native addresses for GICv3
Stefano Stabellini [Mon, 14 Feb 2022 03:19:54 +0000 (03:19 +0000)]
xen/arm: if direct-map domain use native addresses for GICv3

Today we use native addresses to map the GICv3 for Dom0 and fixed
addresses for DomUs.

This patch changes the behavior so that native addresses are used for
all domain which is using the host memory layout

Considering that DOM0 may not always be directly mapped in the future,
this patch introduces a new helper "domain_use_host_layout()" that
wraps both two check "is_domain_direct_mapped(d) || is_hardware_domain(d)"
for more flexible usage.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: gate make_gicv3_domU_node with CONFIG_GICV3
Penny Zheng [Mon, 14 Feb 2022 03:19:53 +0000 (03:19 +0000)]
xen/arm: gate make_gicv3_domU_node with CONFIG_GICV3

This commit gates function make_gicv3_domU_node with CONFIG_GICV3.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: if direct-map domain use native addresses for GICv2
Stefano Stabellini [Mon, 14 Feb 2022 03:19:52 +0000 (03:19 +0000)]
xen/arm: if direct-map domain use native addresses for GICv2

Today we use native addresses to map the GICv2 for Dom0 and fixed
addresses for DomUs.

This patch changes the behavior so that native addresses are used for
all domains that are direct-mapped.

NEW VGIC has different naming schemes, like referring distributor base
address as vgic_dist_base, other than the dbase. So this patch also introduces
vgic_dist_base/vgic_cpu_base accessor to access correct distributor base
address/cpu interface base address on varied scenarios,

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: add ASSERT_UNREACHABLE in allocate_static_memory
Penny Zheng [Mon, 14 Feb 2022 03:19:51 +0000 (03:19 +0000)]
xen/arm: add ASSERT_UNREACHABLE in allocate_static_memory

Helper allocate_static_memory is not meant to be reachable when built with
!CONFIG_STATIC_MEMORY, so this commit adds ASSERT_UNREACHABLE in it to catch
potential misuse.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: introduce direct-map for domUs
Penny Zheng [Mon, 14 Feb 2022 03:19:50 +0000 (03:19 +0000)]
xen/arm: introduce direct-map for domUs

Cases where domU needs direct-map memory map:
  * IOMMU not present in the system.
  * IOMMU disabled if it doesn't cover a specific device and all the guests
are trusted. Thinking a mixed scenario, where a few devices with IOMMU and
a few without, then guest DMA security still could not be totally guaranteed.
So users may want to disable the IOMMU, to at least gain some performance
improvement from IOMMU disabled.
  * IOMMU disabled as a workaround when it doesn't have enough bandwidth.
To be specific, in a few extreme situation, when multiple devices do DMA
concurrently, these requests may exceed IOMMU's transmission capacity.
  * IOMMU disabled when it adds too much latency on DMA. For example,
TLB may be missing in some IOMMU hardware, which may bring latency in DMA
progress, so users may want to disable it in some realtime scenario.
  * Guest OS relies on the host memory layout

This commit introduces a new helper assign_static_memory_11 to allocate
static memory as guest RAM for direct-map domain.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: introduce new helper parse_static_mem_prop and acquire_static_memory_bank
Penny Zheng [Mon, 14 Feb 2022 03:19:49 +0000 (03:19 +0000)]
xen/arm: introduce new helper parse_static_mem_prop and acquire_static_memory_bank

Later, we will introduce assign_static_memory_11 for allocating static
memory for direct-map domains, and it will share a lot common codes with
the existing allocate_static_memory.

In order not to bring a lot of duplicate codes, and also to make the whole
code more readable, this commit extracts common codes into two new helpers
parse_static_mem_prop and acquire_static_memory_bank.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: Allow device-passthrough even the IOMMU is off
Stefano Stabellini [Mon, 14 Feb 2022 03:19:48 +0000 (03:19 +0000)]
xen/arm: Allow device-passthrough even the IOMMU is off

At the moment, we are only supporting device-passthrough when Xen has
enabled the IOMMU. There are some use cases where it is not possible to
use the IOMMU (e.g. doesn't exist, hardware limitation, performance) yet
it would be OK to assign a device to trusted domain so long they are
direct-mapped or the device doesn't do DMA.

Note that when the IOMMU is disabled, it will be necessary to add
xen,force-assign-without-iommu for every device that needs to be assigned.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoxen: introduce CDF_directmap
Stefano Stabellini [Mon, 14 Feb 2022 03:19:47 +0000 (03:19 +0000)]
xen: introduce CDF_directmap

This commit introduces a new arm-specific flag CDF_directmap to specify
that a domain should have its memory direct-map(guest physical address
== host physical address) at domain creation.

Also, add a directmap flag under struct arch_domain and use it to
reimplement is_domain_direct_mapped.

For now, direct-map is only available when statically allocated memory is
used for the domain, that is, "xen,static-mem" must be also defined in the
domain configuration.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen: introduce internal CDF_xxx flags for domain creation
Stefano Stabellini [Mon, 14 Feb 2022 03:19:46 +0000 (03:19 +0000)]
xen: introduce internal CDF_xxx flags for domain creation

We are passing an internal-only boolean flag at domain creation to
specify whether we want the domain to be privileged (i.e. dom0) or
not. Another flag will be introduced later in this series.

This commit extends original "boolean" to an "unsigned int" covering both
the existing "is_priv" and our new "directmap", which will be introduced later.

To make visible the relationship, we name the respective constants CDF_xxx
(with no XEN_DOMCTL_ prefix) to represent the difference with the public
constants XEN_DOMCTL_CDF_xxx.

Allocate bit 0 as CDF_privileged: whether a domain is privileged or not.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoMAINTAINERS: make Bertrand ARM maintainer
Stefano Stabellini [Thu, 10 Feb 2022 19:08:47 +0000 (11:08 -0800)]
MAINTAINERS: make Bertrand ARM maintainer

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Asked-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agox86emul: fix SIMD test overriding of VBROADCASTS{S,D}
Jan Beulich [Mon, 14 Feb 2022 09:09:15 +0000 (10:09 +0100)]
x86emul: fix SIMD test overriding of VBROADCASTS{S,D}

Despite their suffixes these aren't scalar instructions, and hence the
128- and 256-bit EVEX forms may not be used without AVX512VL. Gcc11 ends
up generating such instances for simd-sg.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86emul: fix VPBLENDMW with mask and memory operand
Jan Beulich [Mon, 14 Feb 2022 09:08:38 +0000 (10:08 +0100)]
x86emul: fix VPBLENDMW with mask and memory operand

Element size for this opcode depends on EVEX.W, not the low opcode bit.
Make use of AVX512BW being a prereq to AVX512_BITALG and move the case
label there, adding an AVX512BW feature check.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86emul: work around gcc11 bug in SIMD tests
Jan Beulich [Mon, 14 Feb 2022 09:08:17 +0000 (10:08 +0100)]
x86emul: work around gcc11 bug in SIMD tests

Gcc11 looks to have trouble with conditional expressions used with
vector operands: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104497.
Replace two instances causing SEGV there in certain cases.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agocpuid: initialize cpuinfo with boot_cpu_data
Norbert Manthey [Mon, 14 Feb 2022 09:07:36 +0000 (10:07 +0100)]
cpuid: initialize cpuinfo with boot_cpu_data

When re-identifying CPU data, we might use uninitialized data when
checking for the cache line property to adapt the cache
alignment. The data that depends on this uninitialized read is
currently not forwarded.

To avoid problems in the future, initialize the data cpuinfo
structure before re-identifying the CPU again.

The trace to hit the uninitialized read reported by Coverity is:

bool recheck_cpu_features(unsigned int cpu)
...
    struct cpuinfo_x86 c;
    ...
    identify_cpu(&c);

void identify_cpu(struct cpuinfo_x86 *c)
...
    generic_identify(c)

static void generic_identify(struct cpuinfo_x86 *c)
...
        if (this_cpu->c_early_init)
                this_cpu->c_early_init(c); // which is early_init_intel

static void early_init_intel(struct cpuinfo_x86 *c)
...
    if (c->x86 == 15 && c->x86_cache_alignment == 64)
        c->x86_cache_alignment = 128;

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/Intel: also display CPU freq for family 0xf
Jan Beulich [Mon, 14 Feb 2022 09:06:11 +0000 (10:06 +0100)]
x86/Intel: also display CPU freq for family 0xf

Actually we can do better than simply bailing for there not being any
PLATFORM_INFO MSR on these. The "max" part of the information is
available in another MSR, alongside the scaling factor (which is
encoded in similar ways to Core/Core2, and hence the decoding table can
be shared).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/Intel: skip CORE_THREAD_COUNT read on family 0xf
Jan Beulich [Mon, 14 Feb 2022 09:05:38 +0000 (10:05 +0100)]
x86/Intel: skip CORE_THREAD_COUNT read on family 0xf

This avoids an unnecessary (and always somewhat scary) log message for
the recovered from #GP(0).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/Intel: skip PLATFORM_INFO reads on family 0xf
Jan Beulich [Mon, 14 Feb 2022 09:04:35 +0000 (10:04 +0100)]
x86/Intel: skip PLATFORM_INFO reads on family 0xf

This avoids unnecessary (and always somewhat scary) log messages for the
recovered from #GP(0).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoxen/serial: scif: add support for HSCIF
Volodymyr Babchuk [Tue, 8 Feb 2022 11:23:55 +0000 (11:23 +0000)]
xen/serial: scif: add support for HSCIF

HSCIF is a high-speed variant of Renesas SCIF serial interface. From
Xen point of view, they almost the same, only difference is in FIFO
size.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agox86/Intel: don't log bogus frequency range on Core/Core2 processors
Jan Beulich [Wed, 9 Feb 2022 11:52:01 +0000 (12:52 +0100)]
x86/Intel: don't log bogus frequency range on Core/Core2 processors

Models 0F and 17 don't have PLATFORM_INFO documented. While it exists on
at least model 0F, the information there doesn't match the scheme used
on newer models (I'm observing a range of 700 ... 600 MHz reported on a
Xeon E5345).

Sadly the Enhanced Intel Core instance of the table entry is not self-
consistent: The numeric description of the low 3 bits doesn't match the
subsequent more textual description in some of the cases; I'm using the
former here.

Include the older Core model 0E as well as the two other Core2 models,
none of which have respective MSR tables in the SDM.

Fixes: f6b6517cd5db ("x86: retrieve and log CPU frequency information")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoxen: add option to disable GNTTABOP_transfer
Juergen Gross [Wed, 9 Feb 2022 11:51:05 +0000 (12:51 +0100)]
xen: add option to disable GNTTABOP_transfer

The grant table operation GNTTABOP_transfer is meant to be used in
PV device backends, and it hasn't been used in Linux since the old
Xen-o-Linux days.

Add a command line sub-option to the "gnttab" option for disabling the
GNTTABOP_transfer functionality.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/mm: tidy XENMEM_{get,set}_pod_target handling
Jan Beulich [Wed, 9 Feb 2022 11:50:28 +0000 (12:50 +0100)]
x86/mm: tidy XENMEM_{get,set}_pod_target handling

Do away with the "pod_target_out_unlock" label. In particular by folding
if()-s, the logic can be expressed with less code (and no goto-s) this
way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/P2M: move map_domain_gfn() (again)
Jan Beulich [Wed, 9 Feb 2022 11:48:59 +0000 (12:48 +0100)]
x86/P2M: move map_domain_gfn() (again)

The main user is the guest walking code, so move it back there; commit
9a6787cc3809 ("x86/mm: build map_domain_gfn() just once") would perhaps
better have kept it there in the first place. This way it'll only get
built when it's actually needed (and still only once).

This also eliminates one more CONFIG_HVM conditional from p2m.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: drop a few CONFIG_HVM
Jan Beulich [Wed, 9 Feb 2022 11:47:40 +0000 (12:47 +0100)]
x86/P2M: drop a few CONFIG_HVM

This is to make it easier to see which parts of p2m.c still aren't HVM-
specific: In one case the conditionals sat in an already guarded region,
while in the other case P2M_AUDIT implies HVM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/spec-ctrl: Support Intel PSFD for guests
Andrew Cooper [Tue, 19 Oct 2021 20:22:27 +0000 (21:22 +0100)]
x86/spec-ctrl: Support Intel PSFD for guests

The Feb 2022 microcode from Intel retrofits AMD's MSR_SPEC_CTRL.PSFD interface
to Sunny Cove (IceLake) and later cores.

Update the MSR_SPEC_CTRL emulation, and expose it to guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/cpuid: Infrastructure for cpuid word 7:2.edx
Andrew Cooper [Thu, 27 Jan 2022 21:07:40 +0000 (21:07 +0000)]
x86/cpuid: Infrastructure for cpuid word 7:2.edx

While in principle it would be nice to keep leaf 7 in order, that would
involve having an extra 5 words of zeros in a featureset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotests/tsx: Extend test-tsx to check MSR_MCU_OPT_CTRL
Andrew Cooper [Thu, 10 Jun 2021 11:34:45 +0000 (12:34 +0100)]
tests/tsx: Extend test-tsx to check MSR_MCU_OPT_CTRL

This MSR needs to be identical across the system for TSX to have identical
behaviour everywhere.  Furthermore, its CPUID bit (SRBDS_CTRL) shouldn't be
visible to guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/tsx: Cope with TSX deprecation on WHL-R/CFL-R
Andrew Cooper [Wed, 16 Sep 2020 15:15:52 +0000 (16:15 +0100)]
x86/tsx: Cope with TSX deprecation on WHL-R/CFL-R

The February 2022 microcode is formally de-featuring TSX on the TAA-impacted
client CPUs.  The backup TAA mitigation (VERW regaining its flushing side
effect) is being dropped, meaning that `smt=0 spec-ctrl=md-clear` no longer
protects against TAA on these parts.

The new functionality enumerates itself via the RTM_ALWAYS_ABORT CPUID
bit (the same as June 2021), but has its control in MSR_MCU_OPT_CTRL as
opposed to MSR_TSX_FORCE_ABORT.

TSX now defaults to being disabled on ucode load.  Furthermore, if SGX is
enabled in the BIOS, TSX is locked and cannot be re-enabled.  In this case,
override opt_tsx to 0, so the RTM/HLE CPUID bits get hidden by default.

While updating the command line documentation, take the opportunity to add a
paragraph explaining what TSX being disabled actually means, and how migration
compatibility works.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/tsx: Move has_rtm_always_abort to an outer scope
Andrew Cooper [Wed, 23 Jun 2021 20:53:58 +0000 (21:53 +0100)]
x86/tsx: Move has_rtm_always_abort to an outer scope

We are about to introduce a second path which needs to conditionally force the
presence of RTM_ALWAYS_ABORT.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Clean up MSR_MCU_OPT_CTRL handling
Andrew Cooper [Wed, 19 May 2021 18:40:28 +0000 (19:40 +0100)]
x86/spec-ctrl: Clean up MSR_MCU_OPT_CTRL handling

Introduce cpu_has_srbds_ctrl as more users are going to appear shortly.

MSR_MCU_OPT_CTRL is gaining extra functionality, meaning that the current
default_xen_mcu_opt_ctrl is no longer a good fit.

Introduce two new helpers, update_mcu_opt_ctrl() which does a full RMW cycle
on the MSR, and set_in_mcu_opt_ctrl() which lets callers configure specific
bits at a time without clobbering each others settings.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/configure.ac: Replace macro AC_HELP_STRING
Michal Orzel [Tue, 1 Feb 2022 17:03:21 +0000 (18:03 +0100)]
tools/configure.ac: Replace macro AC_HELP_STRING

... with AS_HELP_STRING as the former is obsolete according
to GNU autoconf 2.67 documentation.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/libs: Fix build dependencies
Anthony PERARD [Tue, 8 Feb 2022 10:39:59 +0000 (10:39 +0000)]
tools/libs: Fix build dependencies

Some libs' Makefile aren't loading the dependencies files *.d2.

We can load them from "libs.mk" as none of the Makefile here are
changing $(DEPS) or $(DEPS_INCLUDE) so it is fine to move the
"include" to "libs.mk".

As a little improvement, don't load the dependencies files (and thus
avoid regenerating the *.d2 files) during `make clean`.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agotools/include: remove xen-external directory
Juergen Gross [Tue, 8 Feb 2022 07:06:38 +0000 (08:06 +0100)]
tools/include: remove xen-external directory

There is no user of tools/include/xen-external/* left. Remove it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/libs/evtchn: use _xen_list.h
Juergen Gross [Tue, 8 Feb 2022 07:06:37 +0000 (08:06 +0100)]
tools/libs/evtchn: use _xen_list.h

Instead of including xen-external/bsd-sys-queue.h use the header
_xen_list.h in minios.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/libs/toolcore: replace _xentoolcore_list.h with _xen_list.h
Juergen Gross [Tue, 8 Feb 2022 07:06:36 +0000 (08:06 +0100)]
tools/libs/toolcore: replace _xentoolcore_list.h with _xen_list.h

Remove generating _xentoolcore_list.h and use the common _xen_list.h
instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/libs/light: replace _libxl_list.h with _xen_list.h
Juergen Gross [Tue, 8 Feb 2022 07:06:35 +0000 (08:06 +0100)]
tools/libs/light: replace _libxl_list.h with _xen_list.h

Remove generating _libxl_list.h and use the common _xen_list.h instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/include: generate a _xen_list.h file
Juergen Gross [Tue, 8 Feb 2022 07:06:34 +0000 (08:06 +0100)]
tools/include: generate a _xen_list.h file

Today tools/include contains two basically identical header files
generated from the same source. They just differ by the used name space
and they are being generated from different Makefiles via a perl
script.

Prepare to have only one such header by using a more generic namespace
"XEN" for _xen_list.h.

As the original header hasn't been updated in the Xen tree since its
introduction about 10 years ago, and the updates of FreeBSD side have
mostly covered BSD internal debugging aids, just don't generate the
new header during build, especially as using the current FreeBSD
version of the file would require some updates of the perl script,
which are potentially more work than just doing the needed editing by
hand. Additionally this enables to remove the not needed debugging
extensions of FreeBSD.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agodom0/pvh: fix processing softirqs during memory map population
Roger Pau Monne [Mon, 7 Feb 2022 11:20:08 +0000 (12:20 +0100)]
dom0/pvh: fix processing softirqs during memory map population

Make sure softirqs are processed after every successful call to
guest_physmap_add_page. Even if only a single page is to be added,
it's unknown whether the p2m or the IOMMU will require splitting the
provided page into smaller ones, and thus in case of having to break
a 1G page into 4K entries the amount of time taken by a single of
those additions will be non-trivial. Stay on the safe side and check
for pending softirqs on every successful loop iteration.

Fixes: 5427134eae ('x86: populate PVHv2 Dom0 physical memory map')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/smp: Speed up on_selected_cpus()
Andrew Cooper [Fri, 4 Feb 2022 20:12:04 +0000 (20:12 +0000)]
xen/smp: Speed up on_selected_cpus()

cpumask_weight() is an incredibly expensive way to find if no bits are set,
made worse by the fact that the calculation is performed with the global
call_lock held.

This appears to be a missing optimisation from c/s 433f14699d48 ("x86: Clean
up smp_call_function handling.") in 2011 which dropped the logic requiring the
count of CPUs.

Switch to using cpumask_empty() instead, which will short circuit as soon as
it finds any set bit in the cpumask.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/hvm: Fix boot on systems where HVM isn't available
Andrew Cooper [Fri, 4 Feb 2022 17:01:41 +0000 (17:01 +0000)]
x86/hvm: Fix boot on systems where HVM isn't available

c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
alt-call") went too far with dropping NULL function pointer checks.

smp_callin() and S3 resume call hvm_cpu_up() unconditionally.  When the
platform doesn't support HVM, hvm_enable() exits without filling in hvm_funcs,
after which the altcall pass nukes the (now unconditional) indirect call,
causing:

  (XEN) ----[ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]----
  (XEN) CPU:    1
  (XEN) RIP:    e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7
  (XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor
  ...
  (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7):
  (XEN)  ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7
  (XEN)    [<ffff82d0402000e2>] F __high_start+0x42/0x60

To make matters worse, several paths including __stop_this_cpu() call
hvm_cpu_down() unconditionally too, so what happen next is:

  (XEN) ----[ Xen-4.17.0-10.18-d  x86_64  debug=y  Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c
  (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
  ...
  (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c):
  (XEN)  48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c
  (XEN)    [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8
  (XEN)    [<ffff82d04034a229>] F machine_restart+0xa2/0x298
  (XEN)    [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11
  (XEN)    [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea
  (XEN)    [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37
  (XEN)    [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5
  (XEN)    [<ffff82d04039482a>] F common_interrupt+0x10a/0x120
  (XEN)    [<ffff82d04031f649>] F __udelay+0x3a/0x51
  (XEN)    [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734
  (XEN)    [<ffff82d040203c2b>] F cpu_up+0x7d/0xde
  (XEN)    [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618
  (XEN)    [<ffff82d0402000ef>] F __high_start+0x4f/0x60

which recurses until hitting a stack overflow.  The #DF handler, which resets
its stack on each invocation, loops indefinitely.

Reinstate the NULL function pointer checks for hvm_cpu_{up,down}(), along with
comments explaining how the helpers are used.

Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/guest: Fix comment regarding CPUID compatibility
Andrew Cooper [Thu, 3 Feb 2022 18:03:49 +0000 (18:03 +0000)]
tools/guest: Fix comment regarding CPUID compatibility

It was Xen 4.14 where CPUID data was added to the migration stream, and 4.13
that we need to worry about with regards to compatibility.  Xen 4.12 isn't
relevant.

Expand and correct the commentary.

Fixes: 111c8c33a8a1 ("x86/cpuid: do not expand max leaves on restore")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/pci: detect when BARs are not suitably positioned
Roger Pau Monné [Thu, 3 Feb 2022 12:13:19 +0000 (13:13 +0100)]
xen/pci: detect when BARs are not suitably positioned

One of the boxes where I was attempting to boot Xen in PVH dom0 mode
has quirky firmware, as it will handover with a PCI device with memory
decoding enabled and a BAR of size 4K at address 0. Such BAR overlaps
with a RAM range on the e820.

This interacts badly with the dom0 PVH build, as BARs will be setup on
the p2m before RAM, so if there's a BAR positioned over a RAM region
it will trigger a domain crash when the dom0 builder attempts to
populate that region with a regular RAM page.

It's in general a very bad idea to have a BAR overlapping with any
memory region defined in the memory map, so add some sanity checks for
devices that are added with memory decoding enabled in order to assure
that BARs are not placed on top of memory regions defined in the
memory map. If overlaps are detected just disable the memory decoding
bit for the device and expect the hardware domain to properly position
the BAR.

Note apply_quirks must be called before check_pdev so that ignore_bars
is set when calling the later. PCI_HEADER_{NORMAL,BRIDGE}_NR_BARS
needs to be moved into pci_regs.h so it's defined even in the absence
of vPCI.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agovpci: shrink critical section in vpci_{read/write}
Roger Pau Monné [Thu, 3 Feb 2022 12:12:21 +0000 (13:12 +0100)]
vpci: shrink critical section in vpci_{read/write}

Shrink critical section in vpci_{read/write} as racing calls to
vpci_{read,write}_hw() shouldn't be a problem. Those are just wrappers
around pci_conf_{read,write} functions, and the required locking (in
case of using the IO ports) is already taken care in pci_conf_{read,write}.

Please note, that we anyways split 64bit writes into two 32bit ones
without taking the lock for the whole duration of the access, so it is
possible to see a partially updated state as a result of a 64bit write:
the PCI(e) specification don't seem to specify whether the ECAM is allowed
to split memory transactions into multiple Configuration Requests and
whether those could then interleave with requests from a different CPU.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/mwait-idle: enable interrupts before C1 on Xeons
Artem Bityutskiy [Wed, 2 Feb 2022 09:28:29 +0000 (10:28 +0100)]
x86/mwait-idle: enable interrupts before C1 on Xeons

Enable local interrupts before requesting C1 on the last two generations
of Intel Xeon platforms: Sky Lake, Cascade Lake, Cooper Lake, Ice Lake.
This decreases average C1 interrupt latency by about 5-10%, as measured
with the 'wult' tool.

The '->enter()' function of the driver enters C-states with local
interrupts disabled by executing the 'monitor' and 'mwait' pair of
instructions. If an interrupt happens, the CPU exits the C-state and
continues executing instructions after 'mwait'. It does not jump to
the interrupt handler, because local interrupts are disabled. The
cpuidle subsystem enables interrupts a bit later, after doing some
housekeeping.

With this patch, we enable local interrupts before requesting C1. In
this case, if the CPU wakes up because of an interrupt, it will jump
to the interrupt handler right away. The cpuidle housekeeping will be
done after the pending interrupt(s) are handled.

Enabling interrupts before entering a C-state has measurable impact
for faster C-states, like C1. Deeper, but slower C-states like C6 do
not really benefit from this sort of change, because their latency is
a lot higher comparing to the delay added by cpuidle housekeeping.

This change was also tested with cyclictest and dbench. In case of Ice
Lake, the average cyclictest latency decreased by 5.1%, and the average
'dbench' throughput increased by about 0.8%. Both tests were run for 4
hours with only C1 enabled (all other idle states, including 'POLL',
were disabled). CPU frequency was pinned to HFM, and uncore frequency
was pinned to the maximum value. The other platforms had similar
single-digit percentage improvements.

It is worth noting that this patch affects 'cpuidle' statistics a tiny
bit.  Before this patch, C1 residency did not include the interrupt
handling time, but with this patch, it will include it. This is similar
to what happens in case of the 'POLL' state, which also runs with
interrupts enabled.

Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[Linux commit: c227233ad64c77e57db738ab0e46439db71822a3]

We don't have a pointer into cpuidle_state_table[] readily available.
To compensate, propagate the flag into struct acpi_processor_cx.

Unlike Linux we want to
- disable IRQs again after MWAITing, as subsequently invoked functions
  assume so,
- avoid enabling IRQs if cstate_restore_tsc() is not a no-op, to avoid
  interfering with, in particular, the time rendezvous.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agomemory: XENMEM_add_to_physmap (almost) wrapping checks
Jan Beulich [Wed, 2 Feb 2022 09:26:06 +0000 (10:26 +0100)]
memory: XENMEM_add_to_physmap (almost) wrapping checks

Determining that behavior is correct (i.e. results in failure) for a
passed in GFN equaling INVALID_GFN is non-trivial. Make this quite a bit
more obvious by checking input in generic code - both for singular
requests to not match the value and for range ones to not pass / wrap
through it.

For Arm similarly make more obvious that no wrapping of MFNs passed
for XENMAPSPACE_dev_mmio and thus to map_dev_mmio_region() can occur:
Drop the "nr" parameter of the function to avoid future callers
appearing which might not themselves check for wrapping. Otherwise
the respective ASSERT() in rangeset_contains_range() could trigger.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoEFI: always map EfiRuntimeServices{Code,Data}
Sergey Temerkhanov [Wed, 2 Feb 2022 09:24:56 +0000 (10:24 +0100)]
EFI: always map EfiRuntimeServices{Code,Data}

This helps overcome problems observed with some UEFI implementations
which don't set the Attributes field in memery descriptors properly.

Signed-off-by: Sergey Temerkhanov <s.temerkhanov@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86/vmx: Drop spec_ctrl load in VMEntry path
Andrew Cooper [Tue, 1 Feb 2022 13:34:49 +0000 (13:34 +0000)]
x86/vmx: Drop spec_ctrl load in VMEntry path

This is not needed now that the VMEntry path is not responsible for loading
the guest's MSR_SPEC_CTRL value.

Fixes: 81f0eaadf84d ("x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/cpuid: Enable MSR_SPEC_CTRL in SVM guests by default
Andrew Cooper [Mon, 17 Jan 2022 20:29:09 +0000 (20:29 +0000)]
x86/cpuid: Enable MSR_SPEC_CTRL in SVM guests by default

With all other pieces in place, MSR_SPEC_CTRL is fully working for HVM guests.

Update the CPUID derivation logic (both PV and HVM to avoid losing subtle
changes), drop the MSR intercept, and explicitly enable the CPUID bits for HVM
guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/msr: AMD MSR_SPEC_CTRL infrastructure
Andrew Cooper [Mon, 17 Jan 2022 20:29:09 +0000 (20:29 +0000)]
x86/msr: AMD MSR_SPEC_CTRL infrastructure

Fill in VMCB accessors for spec_ctrl in svm_{get,set}_reg(), and CPUID checks
for all supported bits in guest_{rd,wr}msr().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/svm: VMEntry/Exit logic for MSR_SPEC_CTRL
Andrew Cooper [Fri, 21 Jan 2022 15:59:03 +0000 (15:59 +0000)]
x86/svm: VMEntry/Exit logic for MSR_SPEC_CTRL

Hardware maintains both host and guest versions of MSR_SPEC_CTRL, but guests
run with the logical OR of both values.  Therefore, in principle we want to
clear Xen's value before entering the guest.  However, for migration
compatibility (future work), and for performance reasons with SEV-SNP guests,
we want the ability to use a nonzero value behind the guest's back.  Use
vcpu_msrs to hold this value, with the guest value in the VMCB.

On the VMEntry path, adjusting MSR_SPEC_CTRL must be done after CLGI so as to
be atomic with respect to NMIs/etc.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Use common MSR_SPEC_CTRL logic for AMD
Andrew Cooper [Fri, 21 Jan 2022 15:59:03 +0000 (15:59 +0000)]
x86/spec-ctrl: Use common MSR_SPEC_CTRL logic for AMD

Currently, amd_init_ssbd() works by being the only write to MSR_SPEC_CTRL in
the system.  This ceases to be true when using the common logic.

Include AMD MSR_SPEC_CTRL in has_spec_ctrl to activate the common paths, and
introduce an AMD specific block to control alternatives.  Also update the
boot/resume paths to configure default_xen_spec_ctrl.

svm.h needs an adjustment to remove a dependency on include order.

For now, only active alternatives for HVM - PV will require more work.  No
functional change, as no alternatives are defined yet for HVM yet.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Record the last write to MSR_SPEC_CTRL
Andrew Cooper [Fri, 28 Jan 2022 11:57:19 +0000 (11:57 +0000)]
x86/spec-ctrl: Record the last write to MSR_SPEC_CTRL

In some cases, writes to MSR_SPEC_CTRL do not have interesting side effects,
and we should implement lazy context switching like we do with other MSRs.

In the short term, this will be used by the SVM infrastructure, but I expect
to extend it to other contexts in due course.

Introduce cpu_info.last_spec_ctrl for the purpose, and cache writes made from
the boot/resume paths.  The value can't live in regular per-cpu data when it
is eventually used for PV guests when XPTI might be active.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Don't use spec_ctrl_{enter,exit}_idle() for S3
Andrew Cooper [Fri, 28 Jan 2022 12:03:42 +0000 (12:03 +0000)]
x86/spec-ctrl: Don't use spec_ctrl_{enter,exit}_idle() for S3

'idle' here refers to hlt/mwait.  The S3 path isn't an idle path - it is a
platform reset.

We need to load default_xen_spec_ctrl unilaterally on the way back up.
Currently it happens as a side effect of X86_FEATURE_SC_MSR_IDLE or the next
return-to-guest, but that's fragile behaviour.

Conversely, there is no need to clear IBRS and flush the store buffers on the
way down; we're microseconds away from cutting power.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Introduce new has_spec_ctrl boolean
Andrew Cooper [Tue, 25 Jan 2022 17:14:48 +0000 (17:14 +0000)]
x86/spec-ctrl: Introduce new has_spec_ctrl boolean

Most MSR_SPEC_CTRL setup will be common between Intel and AMD.  Instead of
opencoding an OR of two features everywhere, introduce has_spec_ctrl instead.

Reword the comment above the Intel specific alternatives block to highlight
that it is Intel specific, and pull the setting of default_xen_spec_ctrl.IBRS
out because it will want to be common.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Drop use_spec_ctrl boolean
Andrew Cooper [Tue, 25 Jan 2022 16:09:59 +0000 (16:09 +0000)]
x86/spec-ctrl: Drop use_spec_ctrl boolean

Several bugfixes have reduced the utility of this variable from it's original
purpose, and now all it does is aid in the setup of SCF_ist_wrmsr.

Simplify the logic by drop the variable, and doubling up the setting of
SCF_ist_wrmsr for the PV and HVM blocks, which will make the AMD SPEC_CTRL
support easier to follow.  Leave a comment explaining why SCF_ist_wrmsr is
still necessary for the VMExit case.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/cpuid: Advertise SSB_NO to guests by default
Andrew Cooper [Thu, 27 Jan 2022 21:28:48 +0000 (21:28 +0000)]
x86/cpuid: Advertise SSB_NO to guests by default

This is a statement of hardware behaviour, and not related to controls for the
guest kernel to use.  Pass it straight through from hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoiommu/arm: Remove code duplication in all IOMMU drivers
Oleksandr Tyshchenko [Thu, 27 Jan 2022 19:55:52 +0000 (21:55 +0200)]
iommu/arm: Remove code duplication in all IOMMU drivers

All IOMMU drivers on Arm perform almost the same generic actions in
hwdom_init callback. Move this code to common arch_iommu_hwdom_init()
in order to get rid of code duplication.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Use refcount for the micro-TLBs
Oleksandr Tyshchenko [Thu, 27 Jan 2022 19:55:51 +0000 (21:55 +0200)]
iommu/ipmmu-vmsa: Use refcount for the micro-TLBs

Reference-count the micro-TLBs as several bus masters can be
connected to the same micro-TLB (and drop TODO comment).
This wasn't an issue so far, since the platform devices
(this driver deals with) get assigned/deassigned together during
domain creation/destruction. But, in order to support PCI devices
(which are hot-pluggable) in the near future we will need to
take care of.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
3 years agogitignore: remove stale entries
Juergen Gross [Mon, 31 Jan 2022 09:58:24 +0000 (10:58 +0100)]
gitignore: remove stale entries

The entries referring to tools/security have become stale more than
10 years ago. Remove them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/libs/light: don't touch nr_vcpus_out if listing vcpus and returning NULL
Dario Faggioli [Mon, 31 Jan 2022 09:58:07 +0000 (10:58 +0100)]
tools/libs/light: don't touch nr_vcpus_out if listing vcpus and returning NULL

If we are in libxl_list_vcpu() and we are returning NULL, let's avoid
touching the output parameter *nr_vcpus_out, which the caller should
have initialized to 0.

The current behavior could be problematic if are creating a domain and,
in the meantime, an existing one is destroyed when we have already done
some steps of the loop. At which point, we'd return a NULL list of vcpus
but with something different than 0 as the number of vcpus in that list.
And this can cause troubles in the callers (e.g., nr_vcpus_on_nodes()),
when they do a libxl_vcpuinfo_list_free().

Crashes due to this are rare and difficult to reproduce, but have been
observed, with stack traces looking like this one:

#0  libxl_bitmap_dispose (map=map@entry=0x50) at libxl_utils.c:626
#1  0x00007fe72c993a32 in libxl_vcpuinfo_dispose (p=p@entry=0x38) at _libxl_types.c:692
#2  0x00007fe72c94e3c4 in libxl_vcpuinfo_list_free (list=0x0, nr=<optimized out>) at libxl_utils.c:1059
#3  0x00007fe72c9528bf in nr_vcpus_on_nodes (vcpus_on_node=0x7fe71000eb60, suitable_cpumap=0x7fe721df0d38, tinfo_elements=48, tinfo=0x7fe7101b3900, gc=0x7fe7101bbfa0) at libxl_numa.c:258
#4  libxl__get_numa_candidate (gc=gc@entry=0x7fe7100033a0, min_free_memkb=4233216, min_cpus=4, min_nodes=min_nodes@entry=0, max_nodes=max_nodes@entry=0, suitable_cpumap=suitable_cpumap@entry=0x7fe721df0d38, numa_cmpf=0x7fe72c940110 <numa_cmpf>, cndt_out=0x7fe721df0cf0, cndt_found=0x7fe721df0cb4) at libxl_numa.c:394
#5  0x00007fe72c94152b in numa_place_domain (d_config=0x7fe721df11b0, domid=975, gc=0x7fe7100033a0) at libxl_dom.c:209
#6  libxl__build_pre (gc=gc@entry=0x7fe7100033a0, domid=domid@entry=975, d_config=d_config@entry=0x7fe721df11b0, state=state@entry=0x7fe710077700) at libxl_dom.c:436
#7  0x00007fe72c92c4a5 in libxl__domain_build (gc=0x7fe7100033a0, d_config=d_config@entry=0x7fe721df11b0, domid=975, state=0x7fe710077700) at libxl_create.c:444
#8  0x00007fe72c92de8b in domcreate_bootloader_done (egc=0x7fe721df0f60, bl=0x7fe7100778c0, rc=<optimized out>) at libxl_create.c:1222
#9  0x00007fe72c980425 in libxl__bootloader_run (egc=egc@entry=0x7fe721df0f60, bl=bl@entry=0x7fe7100778c0) at libxl_bootloader.c:403
#10 0x00007fe72c92f281 in initiate_domain_create (egc=egc@entry=0x7fe721df0f60, dcs=dcs@entry=0x7fe7100771b0) at libxl_create.c:1159
#11 0x00007fe72c92f456 in do_domain_create (ctx=ctx@entry=0x7fe71001c840, d_config=d_config@entry=0x7fe721df11b0, domid=domid@entry=0x7fe721df10a8, restore_fd=restore_fd@entry=-1, send_back_fd=send_back_fd@entry=-1, params=params@entry=0x0, ao_how=0x0, aop_console_how=0x7fe721df10f0) at libxl_create.c:1856
#12 0x00007fe72c92f776 in libxl_domain_create_new (ctx=0x7fe71001c840, d_config=d_config@entry=0x7fe721df11b0, domid=domid@entry=0x7fe721df10a8, ao_how=ao_how@entry=0x0, aop_console_how=aop_console_how@entry=0x7fe721df10f0) at libxl_create.c:2075

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Tested-by: James Fehlig <jfehlig@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoIOMMU/x86: switch to alternatives-call patching in further instances
Jan Beulich [Mon, 31 Jan 2022 09:57:27 +0000 (10:57 +0100)]
IOMMU/x86: switch to alternatives-call patching in further instances

This is, once again, to limit the number of indirect calls as much as
possible. The only hook invocation which isn't sensible to convert is
setup(). And of course Arm-only use sites are left alone as well.

Note regarding the introduction / use of local variables in pci.c:
struct pci_dev's involved fields are const. This const propagates, via
typeof(), to the local helper variables in the altcall macros. These
helper variables are, however, used as outputs (and hence can't be
const). In iommu_get_device_group() make use of the new local variables
to also simplify some adjacent code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
3 years agoVMX: sync VM-exit perf counters with known VM-exit reasons
Jan Beulich [Mon, 31 Jan 2022 09:56:28 +0000 (10:56 +0100)]
VMX: sync VM-exit perf counters with known VM-exit reasons

This has gone out of sync over time. Introduce a simplistic mechanism to
hopefully keep things in sync going forward.

Also limit the array index to just the "basic exit reason" part, which is
what the pseudo-enumeration covers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agopublic: add XEN_RING_NR_UNCONSUMED_*() macros to ring.h
Juergen Gross [Fri, 28 Jan 2022 10:47:00 +0000 (11:47 +0100)]
public: add XEN_RING_NR_UNCONSUMED_*() macros to ring.h

Today RING_HAS_UNCONSUMED_*() macros are returning the number of
unconsumed requests or responses instead of a boolean as the name of
the macros would imply.

As this "feature" is already being used, rename the macros to
XEN_RING_NR_UNCONSUMED_*() and define the RING_HAS_UNCONSUMED_*() macros
by using the new XEN_RING_NR_UNCONSUMED_*() macros. In order to avoid
future misuse let RING_HAS_UNCONSUMED_*() optionally really return a
boolean (can be activated by defining XEN_RING_HAS_UNCONSUMED_IS_BOOL).

Note that the known misuses need to be switched to the new
XEN_RING_NR_UNCONSUMED_*() macros when using the RING_HAS_UNCONSUMED_*()
variants returning a boolean value.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: fix exported variable name CFLAGS_stack_boundary
Anthony PERARD [Fri, 28 Jan 2022 10:44:33 +0000 (11:44 +0100)]
build: fix exported variable name CFLAGS_stack_boundary

Exporting a variable with a dash doesn't work reliably, they may be
striped from the environment when calling a sub-make or sub-shell.

CFLAGS-stack-boundary start to be removed from env in patch "build:
set ALL_OBJS in main Makefile; move prelink.o to main Makefile" when
running `make "ALL_OBJS=.."` due to the addition of the quote. At
least in my empirical tests.

Fixes: 2740d96efd ("xen/build: have the root Makefile generates the CFLAGS")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: avoid re-executing the main Makefile by introducing build.mk
Anthony PERARD [Fri, 28 Jan 2022 10:42:24 +0000 (11:42 +0100)]
build: avoid re-executing the main Makefile by introducing build.mk

Currently, the xen/Makefile is re-parsed several times: once to start
the build process, and several more time with Rules.mk including it.
This makes it difficult to work with a Makefile used for several
purpose, and it actually slow down the build process.

So this patch introduce "build.mk" which Rules.mk will use when
present instead of the "Makefile" of a directory. (Linux's Kbuild
named that file "Kbuild".)

We have a few targets to move to "build.mk" identified by them been
build via "make -f Rules.mk" without changing directory.

As for the main targets like "build", we can have them depends on
there underscore-prefix targets like "_build" without having to use
"Rules.mk" while still retaining the check for unsupported
architecture. (Those main rules are changed to be single-colon as
there should only be a single recipe for them.)

With nearly everything needed to move to "build.mk" moved, there is a
single dependency left from "Rules.mk": the variable $(TARGET), so its
assignement is moved to the main Makefile.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: set XEN_BUILD_EFI earlier
Anthony PERARD [Fri, 28 Jan 2022 10:41:09 +0000 (11:41 +0100)]
build: set XEN_BUILD_EFI earlier

We are going to need the variable XEN_BUILD_EFI earlier.

But a side effect of calculating the value of $(XEN_BUILD_EFI) is to
also to generate "efi/check.o" which is used for further checks.
Thus the whole chain that check for EFI support is moved to
"arch.mk".

Some other changes are made to avoid too much duplication:
    - $(efi-check): Used to avoid repeating "efi/check.*". We don't
      set it to the path to the source as it would be wrong as soon
      as we support out-of-tree build.
    - $(LD_PE_check_cmd): As it is called twice, with an updated
      $(EFI_LDFLAGS).

$(nr-fixups) is renamed to $(efi-nr-fixups) as the former might be
a bit too generic.

In order to avoid exporting MKRELOC, the variable is added to $(MAKE)
command line. The only modification needed is in target "build", the
modification target "$(TARGET)" will be needed with a following patch
"build: avoid re-executing the main Makefile by introducing build.mk".

We can now revert 24b0ce9a5da2, we don't need to override efi-y on
recursion anymore.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoautomation: remove python-dev from debian unstable build containers
Stefano Stabellini [Wed, 26 Jan 2022 01:45:28 +0000 (17:45 -0800)]
automation: remove python-dev from debian unstable build containers

Debian unstable doesn't have the legacy python-dev package anymore.
Note: only the arm64v8 container has been rebuilt.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/msr: Fix migration compatibility issue with MSR_SPEC_CTRL
Andrew Cooper [Wed, 19 Jan 2022 19:55:02 +0000 (19:55 +0000)]
x86/msr: Fix migration compatibility issue with MSR_SPEC_CTRL

This bug existed in early in 2018 between MSR_SPEC_CTRL arriving in microcode,
and SSBD arriving a few months later.  It went unnoticed presumably because
everyone was busy rebooting everything.

The same bug will reappear when adding PSFD support.

Clamp the guest MSR_SPEC_CTRL value to that permitted by CPUID on migrate.
The guest is already playing with reserved bits at this point, and clamping
the value will prevent a migration to a less capable host from failing.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/Intel: use CPUID bit to determine PPIN availability
Jan Beulich [Thu, 27 Jan 2022 12:54:42 +0000 (13:54 +0100)]
x86/Intel: use CPUID bit to determine PPIN availability

As of SDM revision 076 there is a CPUID bit for this functionality. Use
it to amend the existing model-based logic.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/cpuid: Infrastructure for leaf 7:1.ebx
Jan Beulich [Thu, 27 Jan 2022 12:54:42 +0000 (12:54 +0000)]
x86/cpuid: Infrastructure for leaf 7:1.ebx

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/cpuid: Disentangle logic for new feature leaves
Andrew Cooper [Thu, 27 Jan 2022 13:56:04 +0000 (13:56 +0000)]
x86/cpuid: Disentangle logic for new feature leaves

Adding a new feature leaf is a reasonable amount of boilerplate and for the
patch to build, at least one feature from the new leaf needs defining.  This
typically causes two non-trivial changes to be merged together.

First, have gen-cpuid.py write out some extra placeholder defines:

  #define CPUID_BITFIELD_11 bool :1, :1, lfence_dispatch:1, ...
  #define CPUID_BITFIELD_12 uint32_t :32 /* placeholder */
  #define CPUID_BITFIELD_13 uint32_t :32 /* placeholder */
  #define CPUID_BITFIELD_14 uint32_t :32 /* placeholder */
  #define CPUID_BITFIELD_15 uint32_t :32 /* placeholder */

This allows DECL_BITFIELD() to be added to struct cpuid_policy without
requiring a XEN_CPUFEATURE() declared for the leaf.  The choice of 4 is
arbitrary, and allows us to add more than one leaf at a time if necessary.

Second, rework generic_identify() to not use specific feature names.

The choice of deriving the index from a feature was to avoid mismatches, but
its correctness depends on bugs like c/s 249e0f1d8f20 ("x86/cpuid: Fix
TSXLDTRK definition") not happening.

Switch to using FEATURESET_* just like the policy/featureset helpers.  This
breaks the cognitive complexity of needing to know which leaf a specifically
named feature should reside in, and is shorter to write.  It is also far
easier to identify as correct at a glance, given the correlation with the
CPUID leaf being read.

In addition, tidy up some other bits of generic_identify()
 * Drop leading zeros from leaf numbers.
 * Don't use a locked update for X86_FEATURE_APERFMPERF.
 * Rework extended_cpuid_level calculation to avoid setting it twice.
 * Use "leaf >= $N" consistently so $N matches with the CPUID input.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/vmx: Fold VMCS logic in vmx_{get,set}_segment_register()
Andrew Cooper [Fri, 21 Jan 2022 11:00:09 +0000 (11:00 +0000)]
x86/vmx: Fold VMCS logic in vmx_{get,set}_segment_register()

Xen's segment enumeration almost matches the VMCS encoding order, while the
VMCS encoding order has the system segments immediately following the user
segments for all relevant attributes.

Use a sneaky xor to hide the difference in encoding order to fold the switch
statements, dropping 10 __vmread() and 10 __vmwrite() calls.  Bloat-o-meter
reports:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-433 (-433)
  Function                                     old     new   delta
  vmx_set_segment_register                     804     593    -211
  vmx_get_segment_register                     778     556    -222

showing that these wrappers aren't trivial.  In addition, 20 BUGs worth of
metadata are dropped.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agolibxl: force netback to wait for hotplug execution before connecting
Roger Pau Monné [Thu, 27 Jan 2022 12:51:19 +0000 (13:51 +0100)]
libxl: force netback to wait for hotplug execution before connecting

By writing an empty "hotplug-status" xenstore node in the backend path
libxl can force Linux netback to wait for hotplug script execution
before proceeding to the 'connected' state.

This is required so that netback doesn't skip state 2 (InitWait) and
thus blocks libxl waiting for such state in order to launch the
hotplug script (see libxl__wait_device_connection).

Reported-by: James Dingwall <james-xen@dingwall.me.uk>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: James Dingwall <james-xen@dingwall.me.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Tested-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
3 years agox86/Intel: IceLake D + Sapphire Rapids Xeons also support PPIN
Jan Beulich [Thu, 27 Jan 2022 12:50:19 +0000 (13:50 +0100)]
x86/Intel: IceLake D + Sapphire Rapids Xeons also support PPIN

This is as per Linux commits a331f5fdd36d ("x86/mce: Add Xeon Sapphire
Rapids to list of CPUs that support PPIN") and [tip.git] e464121f2d40
("x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN"), just
in case a subsequent change making use of the respective new CPUID bit
doesn't cover either of these models.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>