]> xenbits.xensource.com Git - xen.git/log
xen.git
3 years agox86/spec-ctrl: Drop use_spec_ctrl boolean
Andrew Cooper [Tue, 25 Jan 2022 16:09:59 +0000 (16:09 +0000)]
x86/spec-ctrl: Drop use_spec_ctrl boolean

Several bugfixes have reduced the utility of this variable from it's original
purpose, and now all it does is aid in the setup of SCF_ist_wrmsr.

Simplify the logic by drop the variable, and doubling up the setting of
SCF_ist_wrmsr for the PV and HVM blocks, which will make the AMD SPEC_CTRL
support easier to follow.  Leave a comment explaining why SCF_ist_wrmsr is
still necessary for the VMExit case.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/cpuid: Advertise SSB_NO to guests by default
Andrew Cooper [Thu, 27 Jan 2022 21:28:48 +0000 (21:28 +0000)]
x86/cpuid: Advertise SSB_NO to guests by default

This is a statement of hardware behaviour, and not related to controls for the
guest kernel to use.  Pass it straight through from hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoiommu/arm: Remove code duplication in all IOMMU drivers
Oleksandr Tyshchenko [Thu, 27 Jan 2022 19:55:52 +0000 (21:55 +0200)]
iommu/arm: Remove code duplication in all IOMMU drivers

All IOMMU drivers on Arm perform almost the same generic actions in
hwdom_init callback. Move this code to common arch_iommu_hwdom_init()
in order to get rid of code duplication.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Use refcount for the micro-TLBs
Oleksandr Tyshchenko [Thu, 27 Jan 2022 19:55:51 +0000 (21:55 +0200)]
iommu/ipmmu-vmsa: Use refcount for the micro-TLBs

Reference-count the micro-TLBs as several bus masters can be
connected to the same micro-TLB (and drop TODO comment).
This wasn't an issue so far, since the platform devices
(this driver deals with) get assigned/deassigned together during
domain creation/destruction. But, in order to support PCI devices
(which are hot-pluggable) in the near future we will need to
take care of.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
3 years agogitignore: remove stale entries
Juergen Gross [Mon, 31 Jan 2022 09:58:24 +0000 (10:58 +0100)]
gitignore: remove stale entries

The entries referring to tools/security have become stale more than
10 years ago. Remove them.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/libs/light: don't touch nr_vcpus_out if listing vcpus and returning NULL
Dario Faggioli [Mon, 31 Jan 2022 09:58:07 +0000 (10:58 +0100)]
tools/libs/light: don't touch nr_vcpus_out if listing vcpus and returning NULL

If we are in libxl_list_vcpu() and we are returning NULL, let's avoid
touching the output parameter *nr_vcpus_out, which the caller should
have initialized to 0.

The current behavior could be problematic if are creating a domain and,
in the meantime, an existing one is destroyed when we have already done
some steps of the loop. At which point, we'd return a NULL list of vcpus
but with something different than 0 as the number of vcpus in that list.
And this can cause troubles in the callers (e.g., nr_vcpus_on_nodes()),
when they do a libxl_vcpuinfo_list_free().

Crashes due to this are rare and difficult to reproduce, but have been
observed, with stack traces looking like this one:

#0  libxl_bitmap_dispose (map=map@entry=0x50) at libxl_utils.c:626
#1  0x00007fe72c993a32 in libxl_vcpuinfo_dispose (p=p@entry=0x38) at _libxl_types.c:692
#2  0x00007fe72c94e3c4 in libxl_vcpuinfo_list_free (list=0x0, nr=<optimized out>) at libxl_utils.c:1059
#3  0x00007fe72c9528bf in nr_vcpus_on_nodes (vcpus_on_node=0x7fe71000eb60, suitable_cpumap=0x7fe721df0d38, tinfo_elements=48, tinfo=0x7fe7101b3900, gc=0x7fe7101bbfa0) at libxl_numa.c:258
#4  libxl__get_numa_candidate (gc=gc@entry=0x7fe7100033a0, min_free_memkb=4233216, min_cpus=4, min_nodes=min_nodes@entry=0, max_nodes=max_nodes@entry=0, suitable_cpumap=suitable_cpumap@entry=0x7fe721df0d38, numa_cmpf=0x7fe72c940110 <numa_cmpf>, cndt_out=0x7fe721df0cf0, cndt_found=0x7fe721df0cb4) at libxl_numa.c:394
#5  0x00007fe72c94152b in numa_place_domain (d_config=0x7fe721df11b0, domid=975, gc=0x7fe7100033a0) at libxl_dom.c:209
#6  libxl__build_pre (gc=gc@entry=0x7fe7100033a0, domid=domid@entry=975, d_config=d_config@entry=0x7fe721df11b0, state=state@entry=0x7fe710077700) at libxl_dom.c:436
#7  0x00007fe72c92c4a5 in libxl__domain_build (gc=0x7fe7100033a0, d_config=d_config@entry=0x7fe721df11b0, domid=975, state=0x7fe710077700) at libxl_create.c:444
#8  0x00007fe72c92de8b in domcreate_bootloader_done (egc=0x7fe721df0f60, bl=0x7fe7100778c0, rc=<optimized out>) at libxl_create.c:1222
#9  0x00007fe72c980425 in libxl__bootloader_run (egc=egc@entry=0x7fe721df0f60, bl=bl@entry=0x7fe7100778c0) at libxl_bootloader.c:403
#10 0x00007fe72c92f281 in initiate_domain_create (egc=egc@entry=0x7fe721df0f60, dcs=dcs@entry=0x7fe7100771b0) at libxl_create.c:1159
#11 0x00007fe72c92f456 in do_domain_create (ctx=ctx@entry=0x7fe71001c840, d_config=d_config@entry=0x7fe721df11b0, domid=domid@entry=0x7fe721df10a8, restore_fd=restore_fd@entry=-1, send_back_fd=send_back_fd@entry=-1, params=params@entry=0x0, ao_how=0x0, aop_console_how=0x7fe721df10f0) at libxl_create.c:1856
#12 0x00007fe72c92f776 in libxl_domain_create_new (ctx=0x7fe71001c840, d_config=d_config@entry=0x7fe721df11b0, domid=domid@entry=0x7fe721df10a8, ao_how=ao_how@entry=0x0, aop_console_how=aop_console_how@entry=0x7fe721df10f0) at libxl_create.c:2075

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Tested-by: James Fehlig <jfehlig@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoIOMMU/x86: switch to alternatives-call patching in further instances
Jan Beulich [Mon, 31 Jan 2022 09:57:27 +0000 (10:57 +0100)]
IOMMU/x86: switch to alternatives-call patching in further instances

This is, once again, to limit the number of indirect calls as much as
possible. The only hook invocation which isn't sensible to convert is
setup(). And of course Arm-only use sites are left alone as well.

Note regarding the introduction / use of local variables in pci.c:
struct pci_dev's involved fields are const. This const propagates, via
typeof(), to the local helper variables in the altcall macros. These
helper variables are, however, used as outputs (and hence can't be
const). In iommu_get_device_group() make use of the new local variables
to also simplify some adjacent code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
3 years agoVMX: sync VM-exit perf counters with known VM-exit reasons
Jan Beulich [Mon, 31 Jan 2022 09:56:28 +0000 (10:56 +0100)]
VMX: sync VM-exit perf counters with known VM-exit reasons

This has gone out of sync over time. Introduce a simplistic mechanism to
hopefully keep things in sync going forward.

Also limit the array index to just the "basic exit reason" part, which is
what the pseudo-enumeration covers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agopublic: add XEN_RING_NR_UNCONSUMED_*() macros to ring.h
Juergen Gross [Fri, 28 Jan 2022 10:47:00 +0000 (11:47 +0100)]
public: add XEN_RING_NR_UNCONSUMED_*() macros to ring.h

Today RING_HAS_UNCONSUMED_*() macros are returning the number of
unconsumed requests or responses instead of a boolean as the name of
the macros would imply.

As this "feature" is already being used, rename the macros to
XEN_RING_NR_UNCONSUMED_*() and define the RING_HAS_UNCONSUMED_*() macros
by using the new XEN_RING_NR_UNCONSUMED_*() macros. In order to avoid
future misuse let RING_HAS_UNCONSUMED_*() optionally really return a
boolean (can be activated by defining XEN_RING_HAS_UNCONSUMED_IS_BOOL).

Note that the known misuses need to be switched to the new
XEN_RING_NR_UNCONSUMED_*() macros when using the RING_HAS_UNCONSUMED_*()
variants returning a boolean value.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: fix exported variable name CFLAGS_stack_boundary
Anthony PERARD [Fri, 28 Jan 2022 10:44:33 +0000 (11:44 +0100)]
build: fix exported variable name CFLAGS_stack_boundary

Exporting a variable with a dash doesn't work reliably, they may be
striped from the environment when calling a sub-make or sub-shell.

CFLAGS-stack-boundary start to be removed from env in patch "build:
set ALL_OBJS in main Makefile; move prelink.o to main Makefile" when
running `make "ALL_OBJS=.."` due to the addition of the quote. At
least in my empirical tests.

Fixes: 2740d96efd ("xen/build: have the root Makefile generates the CFLAGS")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: avoid re-executing the main Makefile by introducing build.mk
Anthony PERARD [Fri, 28 Jan 2022 10:42:24 +0000 (11:42 +0100)]
build: avoid re-executing the main Makefile by introducing build.mk

Currently, the xen/Makefile is re-parsed several times: once to start
the build process, and several more time with Rules.mk including it.
This makes it difficult to work with a Makefile used for several
purpose, and it actually slow down the build process.

So this patch introduce "build.mk" which Rules.mk will use when
present instead of the "Makefile" of a directory. (Linux's Kbuild
named that file "Kbuild".)

We have a few targets to move to "build.mk" identified by them been
build via "make -f Rules.mk" without changing directory.

As for the main targets like "build", we can have them depends on
there underscore-prefix targets like "_build" without having to use
"Rules.mk" while still retaining the check for unsupported
architecture. (Those main rules are changed to be single-colon as
there should only be a single recipe for them.)

With nearly everything needed to move to "build.mk" moved, there is a
single dependency left from "Rules.mk": the variable $(TARGET), so its
assignement is moved to the main Makefile.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: set XEN_BUILD_EFI earlier
Anthony PERARD [Fri, 28 Jan 2022 10:41:09 +0000 (11:41 +0100)]
build: set XEN_BUILD_EFI earlier

We are going to need the variable XEN_BUILD_EFI earlier.

But a side effect of calculating the value of $(XEN_BUILD_EFI) is to
also to generate "efi/check.o" which is used for further checks.
Thus the whole chain that check for EFI support is moved to
"arch.mk".

Some other changes are made to avoid too much duplication:
    - $(efi-check): Used to avoid repeating "efi/check.*". We don't
      set it to the path to the source as it would be wrong as soon
      as we support out-of-tree build.
    - $(LD_PE_check_cmd): As it is called twice, with an updated
      $(EFI_LDFLAGS).

$(nr-fixups) is renamed to $(efi-nr-fixups) as the former might be
a bit too generic.

In order to avoid exporting MKRELOC, the variable is added to $(MAKE)
command line. The only modification needed is in target "build", the
modification target "$(TARGET)" will be needed with a following patch
"build: avoid re-executing the main Makefile by introducing build.mk".

We can now revert 24b0ce9a5da2, we don't need to override efi-y on
recursion anymore.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoautomation: remove python-dev from debian unstable build containers
Stefano Stabellini [Wed, 26 Jan 2022 01:45:28 +0000 (17:45 -0800)]
automation: remove python-dev from debian unstable build containers

Debian unstable doesn't have the legacy python-dev package anymore.
Note: only the arm64v8 container has been rebuilt.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/msr: Fix migration compatibility issue with MSR_SPEC_CTRL
Andrew Cooper [Wed, 19 Jan 2022 19:55:02 +0000 (19:55 +0000)]
x86/msr: Fix migration compatibility issue with MSR_SPEC_CTRL

This bug existed in early in 2018 between MSR_SPEC_CTRL arriving in microcode,
and SSBD arriving a few months later.  It went unnoticed presumably because
everyone was busy rebooting everything.

The same bug will reappear when adding PSFD support.

Clamp the guest MSR_SPEC_CTRL value to that permitted by CPUID on migrate.
The guest is already playing with reserved bits at this point, and clamping
the value will prevent a migration to a less capable host from failing.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/Intel: use CPUID bit to determine PPIN availability
Jan Beulich [Thu, 27 Jan 2022 12:54:42 +0000 (13:54 +0100)]
x86/Intel: use CPUID bit to determine PPIN availability

As of SDM revision 076 there is a CPUID bit for this functionality. Use
it to amend the existing model-based logic.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/cpuid: Infrastructure for leaf 7:1.ebx
Jan Beulich [Thu, 27 Jan 2022 12:54:42 +0000 (12:54 +0000)]
x86/cpuid: Infrastructure for leaf 7:1.ebx

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/cpuid: Disentangle logic for new feature leaves
Andrew Cooper [Thu, 27 Jan 2022 13:56:04 +0000 (13:56 +0000)]
x86/cpuid: Disentangle logic for new feature leaves

Adding a new feature leaf is a reasonable amount of boilerplate and for the
patch to build, at least one feature from the new leaf needs defining.  This
typically causes two non-trivial changes to be merged together.

First, have gen-cpuid.py write out some extra placeholder defines:

  #define CPUID_BITFIELD_11 bool :1, :1, lfence_dispatch:1, ...
  #define CPUID_BITFIELD_12 uint32_t :32 /* placeholder */
  #define CPUID_BITFIELD_13 uint32_t :32 /* placeholder */
  #define CPUID_BITFIELD_14 uint32_t :32 /* placeholder */
  #define CPUID_BITFIELD_15 uint32_t :32 /* placeholder */

This allows DECL_BITFIELD() to be added to struct cpuid_policy without
requiring a XEN_CPUFEATURE() declared for the leaf.  The choice of 4 is
arbitrary, and allows us to add more than one leaf at a time if necessary.

Second, rework generic_identify() to not use specific feature names.

The choice of deriving the index from a feature was to avoid mismatches, but
its correctness depends on bugs like c/s 249e0f1d8f20 ("x86/cpuid: Fix
TSXLDTRK definition") not happening.

Switch to using FEATURESET_* just like the policy/featureset helpers.  This
breaks the cognitive complexity of needing to know which leaf a specifically
named feature should reside in, and is shorter to write.  It is also far
easier to identify as correct at a glance, given the correlation with the
CPUID leaf being read.

In addition, tidy up some other bits of generic_identify()
 * Drop leading zeros from leaf numbers.
 * Don't use a locked update for X86_FEATURE_APERFMPERF.
 * Rework extended_cpuid_level calculation to avoid setting it twice.
 * Use "leaf >= $N" consistently so $N matches with the CPUID input.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/vmx: Fold VMCS logic in vmx_{get,set}_segment_register()
Andrew Cooper [Fri, 21 Jan 2022 11:00:09 +0000 (11:00 +0000)]
x86/vmx: Fold VMCS logic in vmx_{get,set}_segment_register()

Xen's segment enumeration almost matches the VMCS encoding order, while the
VMCS encoding order has the system segments immediately following the user
segments for all relevant attributes.

Use a sneaky xor to hide the difference in encoding order to fold the switch
statements, dropping 10 __vmread() and 10 __vmwrite() calls.  Bloat-o-meter
reports:

  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-433 (-433)
  Function                                     old     new   delta
  vmx_set_segment_register                     804     593    -211
  vmx_get_segment_register                     778     556    -222

showing that these wrappers aren't trivial.  In addition, 20 BUGs worth of
metadata are dropped.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agolibxl: force netback to wait for hotplug execution before connecting
Roger Pau Monné [Thu, 27 Jan 2022 12:51:19 +0000 (13:51 +0100)]
libxl: force netback to wait for hotplug execution before connecting

By writing an empty "hotplug-status" xenstore node in the backend path
libxl can force Linux netback to wait for hotplug script execution
before proceeding to the 'connected' state.

This is required so that netback doesn't skip state 2 (InitWait) and
thus blocks libxl waiting for such state in order to launch the
hotplug script (see libxl__wait_device_connection).

Reported-by: James Dingwall <james-xen@dingwall.me.uk>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: James Dingwall <james-xen@dingwall.me.uk>
Reviewed-by: Paul Durrant <paul@xen.org>
Tested-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wei.liu@kernel.org>
3 years agox86/Intel: IceLake D + Sapphire Rapids Xeons also support PPIN
Jan Beulich [Thu, 27 Jan 2022 12:50:19 +0000 (13:50 +0100)]
x86/Intel: IceLake D + Sapphire Rapids Xeons also support PPIN

This is as per Linux commits a331f5fdd36d ("x86/mce: Add Xeon Sapphire
Rapids to list of CPUs that support PPIN") and [tip.git] e464121f2d40
("x86/cpu: Add Xeon Icelake-D to list of CPUs that support PPIN"), just
in case a subsequent change making use of the respective new CPUID bit
doesn't cover either of these models.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen: Replace arch_mfn_in_directmap() with arch_mfns_in_directmap()
Julien Grall [Wed, 26 Jan 2022 15:59:19 +0000 (15:59 +0000)]
xen: Replace arch_mfn_in_directmap() with arch_mfns_in_directmap()

The name of arch_mfn_in_directmap() suggests that it will check against
that the passed MFN should be in the directmap.

However, the current callers are passing the next MFN and the
implementation will return true for up to one MFN past the directmap.

It would be more meaningful to test the exact MFN rather than the
next one.

That said, the current expectation is the memory will be direct-mapped
from 0 up to a given MFN. This may not be a valid assumption on all
the architectures.

For instance, on Arm32 only the xenheap that will be direct-mapped.
This may not be allocated a the beginning of the RAM.

So take the opportunity to rework the parameters and pass the
number of pages we want to check. This also requires to rename
the helper to better match the implementation.

Note that the implementation of the helper on arm32 is left as-is
for now.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoiommu/ipmmu-vmsa: Set IPMMU bit IMSCTLR_USE_SECGRP to 0
Oleksandr Tyshchenko [Mon, 20 Dec 2021 21:15:55 +0000 (23:15 +0200)]
iommu/ipmmu-vmsa: Set IPMMU bit IMSCTLR_USE_SECGRP to 0

Based on the following commits from the Renesas BSP:
8fba83d97cca709a05139c38e29408e81ed4cf62
a8d93bc07da89a7fcf4d85f34d119a030310efa5
located at:
https://github.com/renesas-rcar/linux-bsp/tree/v5.10.41/rcar-5.1.3.rc5

Original commit messages:
 commit 8fba83d97cca709a05139c38e29408e81ed4cf62
 Author: Nam Nguyen <nam.nguyen.yh@renesas.com>
 Date:   Wed Apr 28 18:54:44 2021 +0700

  iommu/ipmmu-vmsa: Set IPMMU bit IMSCTLR_USE_SECGRP to 0

  Need to set bit IMSCTLR_USE_SECGRP to 0
  because H/W initial value is unknown, without this
  dma-transfer cannot be done due to address translation doesn't work.

Signed-off-by: Nam Nguyen <nam.nguyen.yh@renesas.com>
 commit a8d93bc07da89a7fcf4d85f34d119a030310efa5
 Author: Nam Nguyen <nam.nguyen.yh@renesas.com>
 Date:   Tue Sep 7 14:46:12 2021 +0700

  iommu/ipmmu-vmsa: Update IMSCTLR register offset address for R-Car S4

  Update IMSCTLR register offset address to align with R-Car S4 H/W UM.

Signed-off-by: Nam Nguyen <nam.nguyen.yh@renesas.com>
**********

It is still a question whether this really needs to be done in Xen,
rather in firmware, but better to be on the safe side. After all,
if firmware already takes care of clearing this bit, nothing bad
will happen.

Please note the following:
1. I decided to squash both commits since the first commit adds clearing
code and only the second one makes it functional on S4. Moreover, this is
not a direct port. So it would be better to introduce complete solution
by a single patch.
2. Although patch indeed does what it claims in the subject,
the implementation is different in comparison with original changes.
On Linux the clearing is done at runtime in ipmmu_domain_setup_context().
On Xen the clearing is done at boot time in ipmmu_probe().
The IMSCTLR is not a MMU "context" register at all, so I think there is
no point in performing the clearing each time we initialize the context,
instead perform the clearing at once during initialization. Also do not
abuse ctx_offset_stride_adj field for the register's offset calculation,
instead use recently added control_offset_base field.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Add Renesas R8A779F0 (R-Car S4) support
Oleksandr Tyshchenko [Mon, 20 Dec 2021 21:15:54 +0000 (23:15 +0200)]
iommu/ipmmu-vmsa: Add Renesas R8A779F0 (R-Car S4) support

Based on the following Linux upsteam commit:
7a62ced8ebd0e1b692c9dc4781a8d4ddb0f74792

Original commit message:
 commit 7a62ced8ebd0e1b692c9dc4781a8d4ddb0f74792
 Author: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
 Date:   Tue Sep 7 17:30:20 2021 +0900

  iommu/ipmmu-vmsa: Add support for r8a779a0

  Add support for r8a779a0 (R-Car V3U). The IPMMU hardware design
  of this SoC differs than others. So, add a new ipmmu_features for it.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://lore.kernel.org/r/20210907083020.907648-3-yoshihiro.shimoda.uh@renesas.com
Signed-off-by: Joerg Roedel <jroedel@suse.de>
**********

The R-Car S4 is an automotive System-on-Chip (SoC) for Car
Server/Communication Gateway and is one of the first products
in Renesas’ 4th-generation R-Car Family.

The integrated IOMMU HW is also VMSA-compatible and supports
stage 2 translation table format, therefore can be used with
current driver with slight modifications (thanks to the prereq
work).

In the context of Xen driver the main differences between Gen3
and Gen4 are the following:
- HW capacity was enlarged to support up to 16 IPMMU contexts
  (sets of page table) and up to 64 micro-TLBs per IPMMU device
- the memory mapped registers have different bases and offsets

Please note that Linux upstream doesn't support R-Car S4 SoC
yet unlike Renesas BSP [1], but it was decided to reuse upstream
patch for R-Car V3U anyway as the IPMMU HW settings are similar.

[1]
7003b9f732cf iommu/ipmmu-vmsa: Add Renesas R8A779F0 (R-Car S4) support
https://github.com/renesas-rcar/linux-bsp/tree/v5.10.41/rcar-5.1.3.rc5

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Add utlb_offset_base
Oleksandr Tyshchenko [Mon, 20 Dec 2021 21:15:53 +0000 (23:15 +0200)]
iommu/ipmmu-vmsa: Add utlb_offset_base

This is a non-verbatim port of corresponding Linux upsteam commit:
1289f7f15001c7ed36be6d23cb145c1d5feacdc8

Original commit message:
 commit 1289f7f15001c7ed36be6d23cb145c1d5feacdc8
 Author: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
 Date:   Wed Nov 6 11:35:50 2019 +0900

  iommu/ipmmu-vmsa: Add utlb_offset_base

  Since we will have changed memory mapping of the IPMMU in the future,
  this patch adds a utlb_offset_base into struct ipmmu_features
  for IMUCTR and IMUASID registers. No behavior change.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
**********

This is a prereq work needed to add support for S4 series easily
in the future.

Almost the same change as original commit makes, but without updating
struct ipmmu_features_default which Xen driver doesn't have (there is
no support of old Arm32 based Gen2 SoCs).

No change in behavior.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Calculate context registers' offset instead of a macro
Oleksandr Tyshchenko [Mon, 20 Dec 2021 21:15:52 +0000 (23:15 +0200)]
iommu/ipmmu-vmsa: Calculate context registers' offset instead of a macro

This is a non-verbatim port of corresponding Linux upsteam commit:
3dc28d9f59eaae41461542b27afe70339347ebb3

Original commit message:
 commit 3dc28d9f59eaae41461542b27afe70339347ebb3
 Author: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
 Date:   Wed Nov 6 11:35:48 2019 +0900

  iommu/ipmmu-vmsa: Calculate context registers' offset instead of a macro

  Since we will have changed memory mapping of the IPMMU in the future,
  this patch uses ipmmu_features values instead of a macro to
  calculate context registers offset. No behavior change.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
**********

This is a prereq work needed to add support for S4 series easily
in the future.

Almost the same change as original commit makes, but without updating
struct ipmmu_features_default which Xen driver doesn't have (there is
no support of old Arm32 based Gen2 SoCs).

No change in behavior.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Add light version of Linux's ipmmu_features
Oleksandr Tyshchenko [Mon, 20 Dec 2021 21:15:51 +0000 (23:15 +0200)]
iommu/ipmmu-vmsa: Add light version of Linux's ipmmu_features

This is a prereq work needed to add support for S4 series easily
in the future.

We don't need to pull the whole struct and all instances as Xen
driver doesn't support old Arm32 based Gen2 SoCs, so there is no
point in keeping all differences between Gen2 and Gen3 here.
All what we need is a minimal support to be able to operate with
Gen3 and new S4.

Add Gen3 specific info with only two fields (number_of_contexts and
num_utlbs) for now, the subsequent patches will add remaining bits.

No change in behavior.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Add helper functions for "uTLB" registers
Oleksandr Tyshchenko [Mon, 20 Dec 2021 21:15:50 +0000 (23:15 +0200)]
iommu/ipmmu-vmsa: Add helper functions for "uTLB" registers

This is a non-verbatim port of corresponding Linux upsteam commit:
3667c9978b2911dc1ded77f5971df477885409c4

Original commit message:
 commit 3667c9978b2911dc1ded77f5971df477885409c4
 Author: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
 Date:   Wed Nov 6 11:35:49 2019 +0900

  iommu/ipmmu-vmsa: Add helper functions for "uTLB" registers

  Since we will have changed memory mapping of the IPMMU in the future,
  This patch adds helper functions ipmmu_utlb_reg() and
  ipmmu_imu{asid,ctr}_write() for "uTLB" registers. No behavior change.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
**********

This is a prereq work needed to add support for S4 series easily
in the future.

Besides changes done in the original commit, we also need to introduce
ipmmu_imuctr_read() since Xen driver contains an additional logic in
ipmmu_utlb_enable() to prevent the use cases where devices which use
the same micro-TLB are assigned to different Xen domains.

No change in behavior.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Add helper functions for MMU "context" registers
Oleksandr Tyshchenko [Mon, 20 Dec 2021 21:15:49 +0000 (23:15 +0200)]
iommu/ipmmu-vmsa: Add helper functions for MMU "context" registers

This is a non-verbatim port of corresponding Linux upsteam commit:
16d9454f5e0447f9c19cbf350b35ed377b9f64eb

Original commit message:
 commit 16d9454f5e0447f9c19cbf350b35ed377b9f64eb
 Author: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
 Date:   Wed Nov 6 11:35:47 2019 +0900

  iommu/ipmmu-vmsa: Add helper functions for MMU "context" registers

  Since we will have changed memory mapping of the IPMMU in the future,
  This patch adds helper functions ipmmu_ctx_{reg,read,write}()
  for MMU "context" registers. No behavior change.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
**********

This is a prereq work needed to add support for S4 series easily
in the future.

Besides changes done in the original commit, we also need to update
an extra call sites which Linux driver doesn't have, but Xen driver
has such as ipmmu_ctx_write_cache(), etc.

No change in behavior.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoiommu/ipmmu-vmsa: Remove all unused register definitions
Oleksandr Tyshchenko [Mon, 20 Dec 2021 21:15:48 +0000 (23:15 +0200)]
iommu/ipmmu-vmsa: Remove all unused register definitions

This is a non-verbatim port of corresponding Linux upsteam commit:
77cf983892b2e0d40dc256b784930a9ffaad4fc8

Original commit message:
 commit 77cf983892b2e0d40dc256b784930a9ffaad4fc8
 Author: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
 Date:   Wed Nov 6 11:35:45 2019 +0900

  iommu/ipmmu-vmsa: Remove all unused register definitions

  To support different registers memory mapping hardware easily
  in the future, this patch removes all unused register
  definitions.

Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
**********

This is a prereq work needed to add support for S4 series easily
in the future.

Although Linux and Xen drivers have a lot in common, the main
differences are in translation stages (table formats), VMSAv8 modes,
supported SoC generations, etc, therefore that's why there is
a slight difference in registers/bits each driver considers unused.

No change in behavior.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agopassthrough/x86: stop pirq iteration immediately in case of error
Julien Grall [Wed, 5 Jan 2022 18:09:20 +0000 (18:09 +0000)]
passthrough/x86: stop pirq iteration immediately in case of error

pt_pirq_iterate() will iterate in batch over all the PIRQs. The outer
loop will bail out if 'rc' is non-zero but the inner loop will continue.

This means 'rc' will get clobbered and we may miss any errors (such as
-ERESTART in the case of the callback pci_clean_dpci_irq()).

This is CVE-2022-23035 / XSA-395.

Fixes: c24536b636f2 ("replace d->nr_pirqs sized arrays with radix tree")
Fixes: f6dd295381f4 ("dpci: replace tasklet with softirq")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoxen/grant-table: Only decrement the refcounter when grant is fully unmapped
Julien Grall [Fri, 19 Nov 2021 11:27:47 +0000 (11:27 +0000)]
xen/grant-table: Only decrement the refcounter when grant is fully unmapped

The grant unmapping hypercall (GNTTABOP_unmap_grant_ref) is not a
simple revert of the changes done by the grant mapping hypercall
(GNTTABOP_map_grant_ref).

Instead, it is possible to partially (or even not) clear some flags.
This will leave the grant is mapped until a future call where all
the flags would be cleared.

XSA-380 introduced a refcounting that is meant to only be dropped
when the grant is fully unmapped. Unfortunately, unmap_common() will
decrement the refcount for every successful call.

A consequence is a domain would be able to underflow the refcount
and trigger a BUG().

Looking at the code, it is not clear to me why a domain would
want to partially clear some flags in the grant-table. But as
this is part of the ABI, it is better to not change the behavior
for now.

Fix it by checking if the maptrack handle has been released before
decrementing the refcounting.

This is CVE-2022-23034 / XSA-394.

Fixes: 9781b51efde2 ("gnttab: replace mapkind()")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: p2m: Always clear the P2M entry when the mapping is removed
Julien Grall [Tue, 14 Dec 2021 09:53:44 +0000 (09:53 +0000)]
xen/arm: p2m: Always clear the P2M entry when the mapping is removed

Commit 2148a125b73b ("xen/arm: Track page accessed between batch of
Set/Way operations") allowed an entry to be invalid from the CPU PoV
(lpae_is_valid()) but valid for Xen (p2m_is_valid()). This is useful
to track which page is accessed and only perform an action on them
(e.g. clean & invalidate the cache after a set/way instruction).

Unfortunately, __p2m_set_entry() is only zeroing the P2M entry when
lpae_is_valid() returns true. This means the entry will not be zeroed
if the entry was valid from Xen PoV but invalid from the CPU PoV for
tracking purpose.

As a consequence, this will allow a domain to continue to access the
page after it was removed.

Resolve the issue by always zeroing the entry if it the LPAE bit is
set or the entry is about to be removed.

This is CVE-2022-23033 / XSA-393.

Reported-by: Dmytro Firsov <Dmytro_Firsov@epam.com>
Fixes: 2148a125b73b ("xen/arm: Track page accessed between batch of Set/Way operations")
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Julien Grall <jgrall@amazon.com>
3 years agox86/pvh: print dom0 memory map
Roger Pau Monne [Tue, 25 Jan 2022 10:46:36 +0000 (11:46 +0100)]
x86/pvh: print dom0 memory map

I find it useful for debugging certain issues to have the memory map
dom0 is using, so print it when using `dom0=verbose` on the command
line.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/pvh: fix population of the low 1MB for dom0
Roger Pau Monne [Mon, 24 Jan 2022 16:13:12 +0000 (17:13 +0100)]
x86/pvh: fix population of the low 1MB for dom0

RMRRs are setup ahead of populating the p2m and hence the ASSERT when
populating the low 1MB needs to be relaxed when it finds an existing
entry: it's either RAM or a RMRR resulting from the IOMMU setup.

Rework the logic a bit and introduce a local mfn variable in order to
assert that if the gfn is populated and not RAM it is an identity map.

Fixes: 6b4f6a31ac ('x86/PVH: de-duplicate mappings for first Mb of Dom0 memory')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/time: drop pmt_scale_r
Jan Beulich [Mon, 24 Jan 2022 07:44:39 +0000 (08:44 +0100)]
x86/time: drop pmt_scale_r

Its only user, ns_to_acpi_pm_tick(), is unused.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoFrom: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Artem Bityutskiy [Mon, 24 Jan 2022 07:43:44 +0000 (08:43 +0100)]
From: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
x86/mwait-idle: add SnowRidge C-state table

Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
platforms.

The following has been changed.

 1. C1E latency changed from 10us to 15us. It was measured using the
    open source "wult" tool (the "nic" method, 15us is the 99.99th
    percentile).

 2. C1E power break even changed from 20us to 25us, which may result
    in less C1E residency in some workloads.

 3. C6 latency changed from 50us to 130us. Measured the same way as C1E.

The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
table commentary about this.

On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
though C6 is present in the table, the driver will only use it if the CPU does
support it.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit: 9cf93f056f783f986c19f40d5304d1bcffa0fc0d]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/mwait-idle: switch to using bool
Jan Beulich [Mon, 24 Jan 2022 07:43:08 +0000 (08:43 +0100)]
x86/mwait-idle: switch to using bool

When the driver was first ported, we didn't have "bool" yet, so
conversion to bool_t / 0 / 1 was necessary. Undo that conversion, easing
ports of newer changes as well as tidying things up.

Requested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/mwait-idle: stop exposing platform acronyms
Jan Beulich [Mon, 24 Jan 2022 07:42:25 +0000 (08:42 +0100)]
x86/mwait-idle: stop exposing platform acronyms

This follows Linux commit de09cdd09fa1 ("intel_idle: stop exposing
platform acronyms in sysfs"), but their main justifications (sysfs
exposure and similarity with acpi-idle) don't apply to us. The field is
only used in a single printk() right now, but having the platform tags
there isn't useful either.

Requested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/time: minor adjustments to init_pit()
Jan Beulich [Mon, 24 Jan 2022 07:40:59 +0000 (08:40 +0100)]
x86/time: minor adjustments to init_pit()

For one, "using_pit" shouldn't be set ahead of the function's last
(for now: only) error path. Otherwise "clocksource=pit" on the command
line can lead to misbehavior when actually taking that error path.

And then make an implicit assumption explicit: CALIBRATE_FRAC cannot,
for example, simply be changed to 10. The way init_pit() works, the
upper bound on the calibration period is about 54ms.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/APIC: drop 32-bit days remnants
Jan Beulich [Mon, 24 Jan 2022 07:40:13 +0000 (08:40 +0100)]
x86/APIC: drop 32-bit days remnants

Mercury and Neptune were Pentium chipsets - no need to work around their
errata, even more so that the workaround looks fragile.

Also ditch a Pentium-related and stale part of a comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/APIC: no need for timer calibration when using TDT
Jan Beulich [Mon, 24 Jan 2022 07:38:55 +0000 (08:38 +0100)]
x86/APIC: no need for timer calibration when using TDT

The only global effect of calibrate_APIC_clock() is the setting of
"bus_scale"; the final __setup_APIC_LVTT(0) is (at best) redundant with
the immediately following setup_APIC_timer() invocation. Yet "bus_scale"
isn't used when using TDT. Avoid wasting 100ms for calibration in this
case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/xenstore: fix error handling of check_store()
Juergen Gross [Fri, 21 Jan 2022 13:12:19 +0000 (14:12 +0100)]
tools/xenstore: fix error handling of check_store()

check_store() has an incomplete error handling: it doesn't check
whether "root" allocation succeeded, and it is leaking the memory of
"root" in case create_hashtable() fails.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/xenstore: drop DEFINE_HASHTABLE_* macros and usage intro
Juergen Gross [Fri, 21 Jan 2022 15:21:20 +0000 (16:21 +0100)]
tools/xenstore: drop DEFINE_HASHTABLE_* macros and usage intro

The DEFINE_HASHTABLE_* macros are used nowhere, so drop them.
The usage intro isn't really needed either.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/xenstore: fix hashtable_expand() zeroing new area
Juergen Gross [Fri, 21 Jan 2022 15:21:19 +0000 (16:21 +0100)]
tools/xenstore: fix hashtable_expand() zeroing new area

When realloc()ing the hashtable for expanding it, zero out all the new
bytes at the end of the table, not only one byte for each new element.

Fixes: 186f0e02a1c ("Added hashtable implementation")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/xenstore: merge hashtable_private.h into hashtable.c
Juergen Gross [Fri, 21 Jan 2022 15:21:18 +0000 (16:21 +0100)]
tools/xenstore: merge hashtable_private.h into hashtable.c

hashtable_private.h is used in hashtable.c only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoxen/x86: import intel-family.h from Linux
Roger Pau Monne [Thu, 20 Jan 2022 16:13:00 +0000 (17:13 +0100)]
xen/x86: import intel-family.h from Linux

Last commit to the file is:

7d697f0d5737 x86/cpu: Drop spurious underscore from RAPTOR_LAKE #define

This should help the readability of code that's currently open-coding
Intel model numbers.

No change introduced to existing code, it's expected that new code
could start using the defines. Changing existing users could cause
quite a lot of code churn.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/xenstore: use talloc_asprintf_append() in do_control_help()
Juergen Gross [Thu, 20 Jan 2022 06:59:47 +0000 (07:59 +0100)]
tools/xenstore: use talloc_asprintf_append() in do_control_help()

Instead of calculating the length of all help output and then
allocating the space for it, just use talloc_asprintf_append() to
expand the text as needed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86: Fix build with the get/set_reg() infrastructure
Andrew Cooper [Fri, 21 Jan 2022 10:19:00 +0000 (10:19 +0000)]
x86: Fix build with the get/set_reg() infrastructure

I clearly messed up concluding that the stubs were safe to drop.

The is_{pv,hvm}_domain() predicates are not symmetrical with both CONFIG_PV
and CONFIG_HVM.  As a result logic of the form `if ( pv/hvm ) ... else ...`
will always have one side which can't be DCE'd.

While technically only the hvm stubs are needed, due to the use of the
is_pv_domain() predicate in guest_{rd,wr}msr(), sort out the pv stubs too to
avoid leaving a bear trap for future users.

Fixes: 88d3ff7ab15d ("x86/guest: Introduce {get,set}_reg() infrastructure")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/hvm: Drop hvm_{get,set}_guest_bndcfgs() and use {get,set}_regs() instead
Andrew Cooper [Mon, 17 Jan 2022 18:40:50 +0000 (18:40 +0000)]
x86/hvm: Drop hvm_{get,set}_guest_bndcfgs() and use {get,set}_regs() instead

hvm_{get,set}_guest_bndcfgs() are thin wrappers around accessing MSR_BNDCFGS.

MPX was implemented on Skylake uarch CPUs and dropped in subsequent CPUs, and
is disabled by default in Xen VMs.

It would be nice to move all the logic into vmx_msr_{read,write}_intercept(),
but the common HVM migration code uses guest_{rd,wr}msr().  Therefore, use
{get,set}_regs() to reduce the quantity of "common" HVM code.

In lieu of having hvm_set_guest_bndcfgs() split out, use some #ifdef
CONFIG_HVM in guest_wrmsr().  In vmx_{get,set}_regs(), split the switch
statements into two depending on whether the require remote VMCS acquisition
or not.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling
Andrew Cooper [Wed, 12 Jan 2022 15:47:27 +0000 (15:47 +0000)]
x86/spec-ctrl: Fix NMI race condition with VT-x MSR_SPEC_CTRL handling

The logic was based on a mistaken understanding of how NMI blocking on vmexit
works.  NMIs are only blocked for EXIT_REASON_NMI, and not for general exits.
Therefore, an NMI can in general hit early in the vmx_asm_vmexit_handler path,
and the guest's value will be clobbered before it is saved.

Switch to using MSR load/save lists.  This causes the guest value to be saved
atomically with respect to NMIs/MCEs/etc.

First, update vmx_cpuid_policy_changed() to configure the load/save lists at
the same time as configuring the intercepts.  This function is always used in
remote context, so extend the vmx_vmcs_{enter,exit}() block to cover the whole
function, rather than having multiple remote acquisitions of the same VMCS.

Both of vmx_{add,del}_guest_msr() can fail.  The -ESRCH delete case is fine,
but all others are fatal to the running of the VM, so handle them using
domain_crash() - this path is only used during domain construction anyway.

Second, update vmx_{get,set}_reg() to use the MSR load/save lists rather than
vcpu_msrs, and update the vcpu_msrs comment to describe the new state
location.

Finally, adjust the entry/exit asm.

Because the guest value is saved and loaded atomically, we do not need to
manually load the guest value, nor do we need to enable SCF_use_shadow.  This
lets us remove the use of DO_SPEC_CTRL_EXIT_TO_GUEST.  Additionally,
SPEC_CTRL_ENTRY_FROM_PV gets removed too, because on an early entry failure,
we're no longer in the guest MSR_SPEC_CTRL context needing to switch back to
Xen's context.

The only action remaining is to load Xen's MSR_SPEC_CTRL value on vmexit.  We
could in principle use the host msr list, but is expected to complicated
future work.  Delete DO_SPEC_CTRL_ENTRY_FROM_HVM entirely, and use a shorter
code sequence to simply reload Xen's setting from the top-of-stack block.

Adjust the comment at the top of spec_ctrl_asm.h in light of this bugfix.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM
Andrew Cooper [Wed, 12 Jan 2022 16:36:29 +0000 (16:36 +0000)]
x86/spec-ctrl: Drop SPEC_CTRL_{ENTRY_FROM,EXIT_TO}_HVM

These were written before Spectre/Meltdown went public, and there was large
uncertainty in how the protections would evolve.  As it turns out, they're
very specific to Intel hardware, and not very suitable for AMD.

Drop the macros, opencoding the relevant subset of functionality, and leaving
grep-fodder to locate the logic.  No change at all for VT-x.

For AMD, the only relevant piece of functionality is DO_OVERWRITE_RSB,
although we will soon be adding (different) logic to handle MSR_SPEC_CTRL.

This has a marginal improvement of removing an unconditional pile of long-nops
from the vmentry/exit path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/msr: Split MSR_SPEC_CTRL handling
Andrew Cooper [Wed, 12 Jan 2022 13:52:47 +0000 (13:52 +0000)]
x86/msr: Split MSR_SPEC_CTRL handling

In order to fix a VT-x bug, and support MSR_SPEC_CTRL on AMD, move
MSR_SPEC_CTRL handling into the new {pv,hvm}_{get,set}_reg() infrastructure.

Duplicate the msrs->spec_ctrl.raw accesses in the PV and VT-x paths for now.
The SVM path is currently unreachable because of the CPUID policy.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/guest: Introduce {get,set}_reg() infrastructure
Andrew Cooper [Mon, 17 Jan 2022 12:28:39 +0000 (12:28 +0000)]
x86/guest: Introduce {get,set}_reg() infrastructure

Various registers have per-guest-type or per-vendor locations or access
requirements.  To support their use from common code, provide accessors which
allow for per-guest-type behaviour.

For now, just infrastructure handling default cases and expectations.
Subsequent patches will start handling registers using this infrastructure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/hvm: Drop .is_singlestep_supported() callback
Andrew Cooper [Thu, 13 Jan 2022 18:37:13 +0000 (18:37 +0000)]
x86/hvm: Drop .is_singlestep_supported() callback

There is absolutely no need for a function pointer call here.

Drop the hook, introduce a singlestep_supported boolean, and configure it in
start_vmx() like all other optional functionality.

No functional change, but rather more efficient logic.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
3 years agoConfig.mk: update seabios to 1.15.0
Wei Liu [Sun, 16 Jan 2022 12:54:27 +0000 (12:54 +0000)]
Config.mk: update seabios to 1.15.0

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agolibs/guest: move cpu policy related prototypes to xenguest.h
Roger Pau Monné [Wed, 19 Jan 2022 12:51:26 +0000 (13:51 +0100)]
libs/guest: move cpu policy related prototypes to xenguest.h

Do this before adding any more stuff to xg_cpuid_x86.c.

The placement in xenctrl.h is wrong, as they are implemented by the
xenguest library. Note that xg_cpuid_x86.c needs to include
xg_private.h, and in turn also fix xg_private.h to include
xc_bitops.h. The bitops definition of BITS_PER_LONG needs to be
changed to not be an expression, so that xxhash.h can use it in a
preprocessor if directive.

As a result also modify xen-cpuid to include xenguest.h.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/mwait-idle: Adjust the SKX C6 parameters if PC6 is disabled
Chen Yu [Wed, 19 Jan 2022 12:50:43 +0000 (13:50 +0100)]
x86/mwait-idle: Adjust the SKX C6 parameters if PC6 is disabled

Because cpuidle assumes worst-case C-state parameters, PC6 parameters
are used for describing C6, which is worst-case for requesting CC6.
When PC6 is enabled, this is appropriate. But if PC6 is disabled
in the BIOS, the exit latency and target residency should be adjusted
accordingly.

Exit latency:
Previously the C6 exit latency was measured as the PC6 exit latency.
With PC6 disabled, the C6 exit latency should be the one of CC6.

Target residency:
With PC6 disabled, the idle duration within [CC6, PC6) would make the
idle governor choose C1E over C6. This would cause low energy-efficiency.
We should lower the bar to request C6 when PC6 is disabled.

To fill this gap, check if PC6 is disabled in the BIOS in the
MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
for C6 and set target_residency to 3 times of the new exit latency. [This
is consistent with how intel_idle driver uses _CST to calculate the
target_residency.] As a result, the OS would be more likely to choose C6
over C1E when PC6 is disabled, which is reasonable, because if C6 is
enabled, it implies that the user cares about energy, so choosing C6 more
frequently makes sense.

The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
model number as SkX, but their CC6 exit latencies are similar to the SKX
one, 96us and 89us respectively, so reuse the SKX value for them.

There is a concern that it might be better to use a more generic approach
instead of optimizing every platform. However, if the required code
complexity and different PC6 bit interpretation on different platforms
are taken into account, tuning the code per platform seems to be an
acceptable tradeoff.

Link: https://intel.github.io/wult/
Suggested-by: Len Brown <len.brown@intel.com>
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit: 64233338499126c5c31e07165735ab5441c7e45a]

Alongside the dropping of "const" from skx_cstates[] add __read_mostly,
and extend that to other similar non-const tables.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Wed, 19 Jan 2022 12:49:30 +0000 (13:49 +0100)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

3 years agox86/mwait-idle: add Icelake-D support
Artem Bityutskiy [Wed, 19 Jan 2022 12:46:05 +0000 (13:46 +0100)]
x86/mwait-idle: add Icelake-D support

This patch adds Icelake Xeon D support to the intel_idle driver.

Since Icelake D and Icelake SP C-state characteristics the same,
we use Icelake SP C-states table for Icelake D as well.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit: 22141d5f411895bb1b0df2a6b05f702e11e63918]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/mwait-idle: update ICX C6 data
Artem Bityutskiy [Wed, 19 Jan 2022 12:45:11 +0000 (13:45 +0100)]
x86/mwait-idle: update ICX C6 data

Change IceLake Xeon C6 latency from 128 us to 170 us. The latency
was measured with the "wult" tool and corresponds to the 99.99th
percentile when measuring with the "nic" method. Note, the 128 us
figure correspond to the median latency, but in intel_idle we use
the "worst case" latency figure instead.

C6 target residency was increased from 384 us to 600 us, which may
result in less C6 residency in some workloads. This value was tested
and compared to values 384, and 1000. Value 600 is a reasonable
tradeoff between power and performance.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit: d484b8bfc6fa71a088e4ac85d9ce11aa0385867e]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/mwait-idle: mention assumption that WBINVD is not needed
Alexander Monakov [Wed, 19 Jan 2022 12:44:31 +0000 (13:44 +0100)]
x86/mwait-idle: mention assumption that WBINVD is not needed

Intel SDM does not explicitly say that entering a C-state via MWAIT will
implicitly flush CPU caches as appropriate for that C-state. However,
documentation for individual Intel CPU generations does mention this
behavior.

Since intel_idle binds to any Intel CPU with MWAIT, list this assumption
of MWAIT behavior.

In passing, reword opening comment to make it clear that the driver can
load on any old and future Intel CPU with MWAIT.

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit: 8bb2e2a887afdf8a39e68fa0dccf82a168aae655]

Dropped "reword opending comment" part - this doesn't apply to our code:
First thing mwait_idle_probe() does is call x86_match_cpu(); we do not
have a 2nd such call looking for just MWAIT (in order to the use _CST
data directly, which we can't get our hands at _CST at this point yet).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agotools/libs/gnttab: remove old mini-os callback
Juergen Gross [Wed, 19 Jan 2022 07:28:23 +0000 (08:28 +0100)]
tools/libs/gnttab: remove old mini-os callback

It is possible now to delete minios_gnttab_close_fd().

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/libs/evtchn: remove old mini-os callback
Juergen Gross [Wed, 19 Jan 2022 07:28:22 +0000 (08:28 +0100)]
tools/libs/evtchn: remove old mini-os callback

It is possible now to delete minios_evtchn_close_fd() and the extern
declaration of event_queue.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoconfig: use more recent mini-os commit
Juergen Gross [Wed, 19 Jan 2022 07:28:21 +0000 (08:28 +0100)]
config: use more recent mini-os commit

In order to be able to use the recent Mini-OS features switch to the
most recent commit.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/libs/ctrl: remove file related handling
Juergen Gross [Sun, 16 Jan 2022 08:23:46 +0000 (09:23 +0100)]
tools/libs/ctrl: remove file related handling

There is no special file handling related to libxenctrl in Mini-OS
any longer, so the close hook can be removed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
3 years agotools/libs/gnttab: decouple more from mini-os
Juergen Gross [Sun, 16 Jan 2022 08:23:45 +0000 (09:23 +0100)]
tools/libs/gnttab: decouple more from mini-os

libgnttab is using implementation details of Mini-OS. Change that by
letting libgnttab use the new alloc_file_type() and get_file_from_fd()
functions and the generic dev pointer of struct file from Mini-OS.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/libs/evtchn: decouple more from mini-os
Juergen Gross [Sun, 16 Jan 2022 08:23:44 +0000 (09:23 +0100)]
tools/libs/evtchn: decouple more from mini-os

Mini-OS and libevtchn are using implementation details of each other.
Change that by letting libevtchn use the new alloc_file_type() and
get_file_from_fd() function and the generic dev pointer of struct file
from Mini-OS.

By using private struct declarations Mini-OS will be able to drop the
libevtchn specific definitions of struct evtchn_port_info and
evtchn_port_list in future. While at it use bool for "pending" and
"bound".

Switch to use xce as function parameter instead of fd where possible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoconfig: use more recent mini-os commit
Juergen Gross [Tue, 18 Jan 2022 14:21:02 +0000 (15:21 +0100)]
config: use more recent mini-os commit

In order to be able to use the recent Mini-OS features switch to the
most recent commit.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/APIC: mark wait_tick_pvh() __init
Jan Beulich [Mon, 17 Jan 2022 16:29:42 +0000 (17:29 +0100)]
x86/APIC: mark wait_tick_pvh() __init

It should have been that way right from its introduction by 02e0de011555
("x86: APIC timer calibration when running as a guest").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
3 years agoMAINTAINERS: email address update in TXT section
Lukasz Hawrylko [Mon, 17 Jan 2022 16:29:00 +0000 (17:29 +0100)]
MAINTAINERS: email address update in TXT section

As I am not working for Intel anymore, I would like to update my email address
to my private one.

Signed-off-by: Lukasz Hawrylko <lukasz@hawrylko.pl>
3 years agoMAINTAINERS: update my email address
Nick Rosbrook [Mon, 17 Jan 2022 16:28:36 +0000 (17:28 +0100)]
MAINTAINERS: update my email address

I am no longer an employee at AIS. Use my personal email address
instead.

Signed-off-by: Nick Rosbrook <rosbrookn@gmail.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agox86/HVM: convert remaining hvm_funcs hook invocations to alt-call
Jan Beulich [Mon, 17 Jan 2022 08:45:04 +0000 (09:45 +0100)]
x86/HVM: convert remaining hvm_funcs hook invocations to alt-call

The aim being to have as few indirect calls as possible (see [1]),
whereas during initial conversion performance was the main aspect and
hence rarely used hooks didn't get converted. Apparently one use of
get_interrupt_shadow() was missed at the time.

While doing this, drop NULL checks ahead of CPU management and .nhvm_*()
calls when the hook is always present. Also convert the
.nhvm_vcpu_reset() call to alternative_vcall(), as the return value is
unused and the caller has currently no way of propagating it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
[1] https://lists.xen.org/archives/html/xen-devel/2021-11/msg01822.html

3 years agobuild: adjust include/xen/compile.h generation
Jan Beulich [Fri, 14 Jan 2022 10:03:03 +0000 (11:03 +0100)]
build: adjust include/xen/compile.h generation

Prior to 19427e439e01 ("build: generate "include/xen/compile.h" with
if_changed") running "make install-xen" as root would not have printed
the banner under normal circumstances. Its printing would instead have
indicated that something was wrong (or during a normal build the lack
of printing would do so).

Further aforementioned change had another undesirable effect, which I
didn't notice during review: Originally compile.h would have been
re-generated (and final binaries re-linked) when its dependencies were
updated after an earlier build. This is no longer the case now, which
means that if some other file also was updated, then the re-build done
during "make install-xen" would happen with a stale compile.h (as its
updating is suppressed in this case).

Restore the earlier behavior for both aspects.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/hvm: Improve hvm_set_guest_pat() code generation
Andrew Cooper [Wed, 12 Jan 2022 13:54:12 +0000 (13:54 +0000)]
x86/hvm: Improve hvm_set_guest_pat() code generation

This is a fastpath on virtual vmentry/exit, and forcing guest_pat to be
spilled to the stack is bad.  Performing the shift in a register is far more
efficient.

Drop the (IMO useless) log message.  MSR_PAT only gets altered on boot, and a
bad value will be entirely evident in the ensuing #GP backtrace.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/hvm: Rework nested hap functions to reduce parameters
Andrew Cooper [Tue, 30 Nov 2021 17:05:09 +0000 (17:05 +0000)]
x86/hvm: Rework nested hap functions to reduce parameters

Most functions in this call chain have 8 parameters, meaning that the final
two booleans are spilled to the stack for calls.

First, delete nestedhap_walk_L1_p2m and introduce nhvm_hap_walk_L1_p2m() as a
thin wrapper around hvm_funcs, just like all the other nhvm_*() hooks.  This
involves including xen/mm.h as the forward declaration of struct npfec is no
longer enough.

Next, replace the triple of booleans with struct npfec, which contains the
same information in the bottom 3 bits.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/hvm: Simplify hvm_enable_msr_interception()
Andrew Cooper [Tue, 30 Nov 2021 14:37:59 +0000 (14:37 +0000)]
x86/hvm: Simplify hvm_enable_msr_interception()

The sole caller doesn't check the return value, and both vendors implement the
hook.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agolibxl/PCI: Fix PV hotplug & stubdom coldplug
Jason Andryuk [Thu, 13 Jan 2022 13:33:16 +0000 (14:33 +0100)]
libxl/PCI: Fix PV hotplug & stubdom coldplug

commit 0fdb48ffe7a1 "libxl: Make sure devices added by pci-attach are
reflected in the config" broken PCI hotplug (xl pci-attach) for PV
domains when it moved libxl__create_pci_backend() later in the function.

This also broke HVM + stubdom PCI passthrough coldplug.  For that, the
PCI devices are hotplugged to a running PV stubdom, and then the QEMU
QMP device_add commands are made to QEMU inside the stubdom.

A running PV domain calls libxl__wait_for_backend().  With the current
placement of libxl__create_pci_backend(), the path does not exist and
the call immediately fails:
libxl: error: libxl_device.c:1388:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/43/0 does not exist
libxl: error: libxl_pci.c:1764:device_pci_add_done: Domain 42:libxl__device_pci_add failed for PCI device 0:2:0.0 (rc -3)
libxl: error: libxl_create.c:1857:domcreate_attach_devices: Domain 42:unable to add pci devices

The wait is only relevant when:
1) The domain is PV
2) The domain is running
3) The backend is already present

This is because:

1) xen-pcifront is only used for PV.  It does not load for HVM domains
   where QEMU is used.

2) If the domain is not running (starting), then the frontend state will
   be Initialising.  xen-pciback waits for the frontend to transition to
   at Initialised before attempting to connect.  So a wait for a
   non-running domain is not applicable as the backend will not
   transition to Connected.

3) For presence, num_devs is already used to determine if the backend
   needs to be created.  Re-use num_devs to determine if the backend
   wait is necessary.  The wait is necessary to avoid racing with
   another PCI attachment reconfiguring the front/back or changing to
   some other state like closing.  If we are creating the backend, then
   we don't have to worry about the state since it is being created.

Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are
reflected in the config")

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agobuild: correct usage comments in Kbuild.include
Jan Beulich [Thu, 13 Jan 2022 13:32:34 +0000 (14:32 +0100)]
build: correct usage comments in Kbuild.include

Macros with arguments need to be invoked via $(call ...); don't misguide
people looking up usage of such macros.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/time: improve TSC / CPU freq calibration accuracy
Jan Beulich [Thu, 13 Jan 2022 13:31:52 +0000 (14:31 +0100)]
x86/time: improve TSC / CPU freq calibration accuracy

While the problem report was for extreme errors, even smaller ones would
better be avoided: The calculated period to run calibration loops over
can (and usually will) be shorter than the actual time elapsed between
first and last platform timer and TSC reads. Adjust values returned from
the init functions accordingly.

On a Skylake system I've tested this on accuracy (using HPET) went from
detecting in some cases more than 220kHz too high a value to about
±2kHz. On other systems (or on this system, but with PMTMR) the original
error range was much smaller, with less (in some cases only very little)
improvement.

Reported-by: James Dingwall <james-xen@dingwall.me.uk>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/time: use relative counts in calibration loops
Jan Beulich [Thu, 13 Jan 2022 13:30:18 +0000 (14:30 +0100)]
x86/time: use relative counts in calibration loops

Looping until reaching/exceeding a certain value is error prone: If the
target value is close enough to the wrapping point, the loop may not
terminate at all. Switch to using delta values, which then allows to
fold the two loops each into just one.

Fixes: 93340297802b ("x86/time: calibrate TSC against platform timer")
Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agotools/libs/evtchn: Deduplicate xenevtchn_fd()
Andrew Cooper [Mon, 10 Jan 2022 12:29:05 +0000 (12:29 +0000)]
tools/libs/evtchn: Deduplicate xenevtchn_fd()

struct xenevtchn_handle is common in private.h, meaning that xenevtchn_fd()
has exactly one correct implementation.

Implement it in core.c, rather than identically for each OS.  This matches all
other libraries (call, gnttab, gntshr) which implement an fd getter.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agoMAINTAINERS: requesting to be TXT reviewer
Daniel P. Smith [Wed, 12 Jan 2022 07:55:20 +0000 (08:55 +0100)]
MAINTAINERS: requesting to be TXT reviewer

I would like to submit myself, Daniel P. Smith, as a reviewer of TXT support in
Xen.

Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agotools/debugger: fix make distclean
Juergen Gross [Wed, 12 Jan 2022 07:54:59 +0000 (08:54 +0100)]
tools/debugger: fix make distclean

"make distclean" will complain that "-c" is no supported flag for make.

Fix that by using "-C".

The error has been present for a long time, but it was uncovered only
recently.

Fixes: 2400a9a365c5619 ("tools/debugger: Allow make to recurse into debugger/")
Fixes: f9c9b127753e9ed ("tools: fix make distclean")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/paging: replace most mfn_valid() in log-dirty handling
Jan Beulich [Wed, 12 Jan 2022 07:54:20 +0000 (08:54 +0100)]
x86/paging: replace most mfn_valid() in log-dirty handling

Top level table and intermediate table entries get explicitly set to
INVALID_MFN when un-allocated. There's therefore no need to use the more
expensive mfn_valid() when checking for that sentinel.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/paging: tidy paging_mfn_is_dirty()
Jan Beulich [Wed, 12 Jan 2022 07:53:05 +0000 (08:53 +0100)]
x86/paging: tidy paging_mfn_is_dirty()

The function returning a boolean indicator, make it return bool. Also
constify its struct domain parameter, albeit requiring to also adjust
mm_locked_by_me(). Furthermore the function is used by shadow code only.

Since mm_locked_by_me() needs touching anyway, also switch its return
type to bool.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoSUPPORT.md: limit support statement for Linux and Windows frontends
Juergen Gross [Tue, 11 Jan 2022 10:43:48 +0000 (11:43 +0100)]
SUPPORT.md: limit support statement for Linux and Windows frontends

Change the support state of Linux and Windows pv frontends from
"supported" to "supported with caveats" in order to reflect that the
frontends can probably be harmed by their respective backends.

Some of the Linux frontends have been hardened already.

This is XSA-376

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/viridian: EOI MSR should always happen in affected vCPU context
Roger Pau Monné [Tue, 11 Jan 2022 10:42:49 +0000 (11:42 +0100)]
x86/viridian: EOI MSR should always happen in affected vCPU context

The HV_X64_MSR_EOI wrmsr should always happen with the target vCPU
as current, as there's no support for EOI'ing interrupts on a remote
vCPU.

While there also turn the unconditional assert at the top of the
function into an error on non-debug builds.

No functional change intended.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/altp2m: p2m_altp2m_get_or_propagate() should honor present page order
Jan Beulich [Thu, 6 Jan 2022 15:12:39 +0000 (16:12 +0100)]
x86/altp2m: p2m_altp2m_get_or_propagate() should honor present page order

Prior to XSA-304 the only caller merely happened to not use any further
the order value that it passes into the function. Already then this was
a latent issue: The function really should, in the "get" case, hand back
the order the underlying mapping actually uses (or actually the smaller
of the two), such that (going forward) there wouldn't be any action on
unrelated mappings (in particular ones which did already diverge from
the host P2M).

Similarly in the "propagate" case only the smaller of the two orders
should actually get used for creating the new entry, again to avoid
altering mappings which did already diverge from the host P2M.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
3 years agotools/xen-detect: avoid possible pitfall with cpuid()
Jan Beulich [Thu, 6 Jan 2022 15:12:15 +0000 (16:12 +0100)]
tools/xen-detect: avoid possible pitfall with cpuid()

The 64-bit form forces %ecx to 0 while the 32-bit one so far didn't - it
only ended up that way when "pv_context" is zero. While presently no
leaf queried by callers has separate subleaves, let's avoid chancing it.

While there
- replace references to operands by number,
- relax constraints where possible,
- limit PUSH/POP to just the registers not also used as input,
all where applicable also for the 64-bit variant.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/spec-ctrl: Fix default calculation of opt_srb_lock
Andrew Cooper [Tue, 4 Jan 2022 14:11:55 +0000 (14:11 +0000)]
x86/spec-ctrl: Fix default calculation of opt_srb_lock

Since this logic was introduced, opt_tsx has become more complicated and
shouldn't be compared to 0 directly.  While there are no buggy logic paths,
the correct expression is !(opt_tsx & 1) but the rtm_disabled boolean is
easier and clearer to use.

Fixes: 8fe24090d940 ("x86/cpuid: Rework HLE and RTM handling")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/libxc: Drop copy-in in xc_physinfo()
Andrew Cooper [Thu, 23 Dec 2021 16:10:15 +0000 (16:10 +0000)]
tools/libxc: Drop copy-in in xc_physinfo()

The first thing XEN_SYSCTL_physinfo does is zero op->u.physinfo.

Do not copy-in.  It's pointless, and most callers don't initialise their
xc_physinfo_t buffer to begin with.  Remove the redundant zeroing from the
remaining callers.

Spotted by Coverity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoxenperf: omit meaningless trailing zeroes from output
Jan Beulich [Tue, 4 Jan 2022 09:21:12 +0000 (10:21 +0100)]
xenperf: omit meaningless trailing zeroes from output

There's no point producing a long chain of zeroes when the previously
calculated total value was zero. To guard against mistakenly skipping
non-zero individual fields, widen "sum" to "unsigned long long".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agolibxc: avoid clobbering errno in xc_domain_pod_target()
Jan Beulich [Tue, 4 Jan 2022 09:20:15 +0000 (10:20 +0100)]
libxc: avoid clobbering errno in xc_domain_pod_target()

do_memory_op() supplies return value and has "errno" set the usual way.
Don't overwrite "errno" with 1 (aka EPERM on at least Linux). There's
also no reason to overwrite "err".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoVT-d: shorten vtd_flush_{context,iotlb}_reg()
Jan Beulich [Tue, 4 Jan 2022 09:19:32 +0000 (10:19 +0100)]
VT-d: shorten vtd_flush_{context,iotlb}_reg()

Their calculations of the value to write to the respective command
register can be partly folded, resulting in almost 100 bytes less code
for these two relatively short functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: use DMA_TLB_IVA_ADDR()
Jan Beulich [Tue, 4 Jan 2022 09:18:18 +0000 (10:18 +0100)]
VT-d: use DMA_TLB_IVA_ADDR()

Let's use the macro in the one place it's supposed to be used, and in
favor of then unnecessary manipulations of the address in
iommu_flush_iotlb_psi(): All leaf functions then already deal correctly
with the supplied address.

There also has never been a need to require (i.e. assert for) the
passing in of 4k-aligned addresses - it'll always be the order-sized
range containing the address which gets flushed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: properly parenthesize a number of macros
Jan Beulich [Tue, 4 Jan 2022 09:17:44 +0000 (10:17 +0100)]
VT-d: properly parenthesize a number of macros

Let's eliminate the risk of any of these macros getting used with more
complex expressions as arguments.

Where touching lines anyway, also
- switch from u64 to uint64_t,
- drop unnecessary parentheses,
- drop pointless 0x prefixes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoxenperf: name "newer" hypercalls
Jan Beulich [Tue, 4 Jan 2022 09:16:48 +0000 (10:16 +0100)]
xenperf: name "newer" hypercalls

This table must not have got updated in quite a while; tmem_op for
example has managed to not only appear since then, but also disappear
again (adding a name for it nevertheless, to make more obvious that
something strange is going on if the slot would ever have a non-zero
value).

Also resolve arch_0 and arch_1 to more meaningful names on x86.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoVT-d: avoid allocating domid_{bit,}map[] when possible
Jan Beulich [Tue, 4 Jan 2022 09:16:04 +0000 (10:16 +0100)]
VT-d: avoid allocating domid_{bit,}map[] when possible

When an IOMMU implements the full 16 bits worth of DID in context
entries, there's no point going through a memory base translation table.
For IOMMUs not using Caching Mode we can simply use the domain IDs
verbatim, while for Caching Mode we need to avoid DID 0.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agox86/EPT: squash meaningless TLB flush
Jan Beulich [Tue, 4 Jan 2022 09:13:06 +0000 (10:13 +0100)]
x86/EPT: squash meaningless TLB flush

ept_free_entry() gets called after a flush was already issued, if one is
necessary in the first place. That behavior is similar to NPT, which
also doesn't have any further flush in p2m_free_entry(). (Furthermore,
the function being recursive, in case of recursiveness way too many
flushes would have been issued.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agomm: introduce INVALID_{G,M}FN_RAW
Jan Beulich [Tue, 21 Dec 2021 09:42:02 +0000 (10:42 +0100)]
mm: introduce INVALID_{G,M}FN_RAW

This allows properly tying together INVALID_{G,M}FN and
INVALID_{G,M}FN_INITIALIZER as well as using the actual values in
compile time constant expressions (or even preprocessor directives).

Since INVALID_PFN is unused, and with x86'es paging_mark_pfn_dirty()
being the only user of pfn_t it also doesn't seem likely that new uses
would appear, remove that one at this same occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>