]> xenbits.xensource.com Git - xen.git/log
xen.git
3 years agoxen/cpupool: Allow cpupool0 to use different scheduler
Luca Fancellu [Fri, 6 May 2022 12:00:12 +0000 (13:00 +0100)]
xen/cpupool: Allow cpupool0 to use different scheduler

Currently cpupool0 can use only the default scheduler, and
cpupool_create has an hardcoded behavior when creating the pool 0
that doesn't allocate new memory for the scheduler, but uses the
default scheduler structure in memory.

With this commit it is possible to allocate a different scheduler for
the cpupool0 when using the boot time cpupool.
To achieve this the hardcoded behavior in cpupool_create is removed
and the cpupool0 creation is moved.

When compiling without boot time cpupools enabled, the current
behavior is maintained (except that cpupool0 scheduler memory will be
allocated).

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agoarm/dom0less: assign dom0less guests to cpupools
Luca Fancellu [Fri, 6 May 2022 12:00:11 +0000 (13:00 +0100)]
arm/dom0less: assign dom0less guests to cpupools

Introduce domain-cpupool property of a xen,domain device tree node,
that specifies the cpupool device tree handle of a xen,cpupool node
that identifies a cpupool created at boot time where the guest will
be assigned on creation.

Add member to the xen_domctl_createdomain public interface so the
XEN_DOMCTL_INTERFACE_VERSION version is bumped.

Add public function to retrieve a pool id from the device tree
cpupool node.

Update documentation about the property.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agoxen/cpupool: Don't allow removing cpu0 from cpupool0
Luca Fancellu [Fri, 6 May 2022 12:00:10 +0000 (13:00 +0100)]
xen/cpupool: Don't allow removing cpu0 from cpupool0

Cpu0 must remain in cpupool0, otherwise some operations like moving cpus
between cpupools, cpu hotplug, destroying cpupools, shutdown of the host,
might not work in a sane way.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agoxen/cpupool: Create different cpupools at boot time
Luca Fancellu [Fri, 6 May 2022 12:00:09 +0000 (13:00 +0100)]
xen/cpupool: Create different cpupools at boot time

Introduce a way to create different cpupools at boot time, this is
particularly useful on ARM big.LITTLE system where there might be the
need to have different cpupools for each type of core, but also
systems using NUMA can have different cpu pools for each node.

The feature on arm relies on a specification of the cpupools from the
device tree to build pools and assign cpus to them.

ACPI is not supported for this feature.

With this patch, cpupool0 can now have less cpus than the number of
online ones, so update the default case for opt_dom0_max_vcpus.

Documentation is created to explain the feature.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
3 years agoxen/sched: retrieve scheduler id by name
Luca Fancellu [Fri, 6 May 2022 12:00:08 +0000 (13:00 +0100)]
xen/sched: retrieve scheduler id by name

Add a static function to retrieve the scheduler pointer using the
scheduler name.

Add a public function to retrieve the scheduler id by the scheduler
name that makes use of the new static function.

Take the occasion to replace open coded scheduler search with the
new static function in scheduler_init.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
3 years agoxen/sched: create public function for cpupools creation
Luca Fancellu [Fri, 6 May 2022 12:00:07 +0000 (13:00 +0100)]
xen/sched: create public function for cpupools creation

Create new public function to create cpupools, can take as parameter
the scheduler id or a negative value that means the default Xen
scheduler will be used.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agotools/cpupools: Give a name to unnamed cpupools
Luca Fancellu [Fri, 6 May 2022 12:00:06 +0000 (13:00 +0100)]
tools/cpupools: Give a name to unnamed cpupools

With the introduction of boot time cpupools, Xen can create many
different cpupools at boot time other than cpupool with id 0.

Since these newly created cpupools can't have an
entry in Xenstore, create the entry using xen-init-dom0
helper with the usual convention: Pool-<cpupool id>.

Given the change, remove the check for poolid == 0 from
libxl_cpupoolid_to_name(...).

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoarm/its: enable LPIs before mapping the collection table
Rahul Singh [Wed, 4 May 2022 17:15:12 +0000 (18:15 +0100)]
arm/its: enable LPIs before mapping the collection table

When Xen boots on the platform that implements the GIC 600, ITS
MAPC_LPI_OFF uncorrectable command error issue is observed.

As per the GIC-600 TRM (Revision: r1p6) MAPC_LPI_OFF command error can
be reported if the MAPC command has tried to map a collection to a core
that does not have LPIs enabled. The definition of GICR.EnableLPIs
also suggests enabling the LPIs before sending any ITS command that
involves LPIs

0b0 LPI support is disabled. Any doorbell interrupt generated as a
    result of a write to a virtual LPI register must be discarded,
    and any ITS translation requests or commands involving LPIs in
    this Redistributor are ignored.

0b1 LPI support is enabled.

To fix the MAPC command error issue, enable the LPIs using
GICR_CTLR.EnableLPIs before mapping the collection table.

gicv3_enable_lpis() is using writel_relaxed(), write to the GICR_CTLR
register may not be visible before gicv3_its_setup_collection() send the
MAPC command. Use wmb() after writel_relaxed() to make sure register
write to enable LPIs is visible.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agodocs: Fix SUPPORT matrix generation after a5968a553f6a
Julien Grall [Mon, 9 May 2022 08:07:07 +0000 (09:07 +0100)]
docs: Fix SUPPORT matrix generation after a5968a553f6a

Commit a5968a553f6a "SUPPORT.MD: Correct the amount of physical memory
supported for Arm" added a support statement split over two lines.

Unfortunately, docs/support-matrix-generate throw an error for it:

    Generating support matrix (origin/stable-NN )
    + docs/support-matrix-generate HEAD https://xenbits.xen.org/docs/unstable/SUPPORT.html origin/stable-NN https://xenbits.xen.org/docs/NN-testing/SUPPORT.html
    Status, x86: Supported up to 8 TiB. Hosts with more memory are
                 supported, but not security supported.
    Status, Arm32: Supported up to 12 GiB
    Status, Arm64: Supported up to 2 TiB
    ^ cannot parse status codeblock line:
                 supported, but not security supported.
     ? at docs/parse-support-md line 172, <F> chunk 1.

It would be good to allow split support statement (to keep lines below
80 characters) but my knowledge of the script is very limited.

Therefore, workaround the error by describing the support statement
in one long line.

Fixes: a5968a553f6a "SUPPORT.MD: Correct the amount of physical memory supported for Arm"
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
3 years agoxen: io: Fix race between sending an I/O and domain shutdown
Julien Grall [Thu, 5 May 2022 17:51:31 +0000 (18:51 +0100)]
xen: io: Fix race between sending an I/O and domain shutdown

Xen provides hypercalls to shutdown (SCHEDOP_shutdown{,_code}) and
resume a domain (XEN_DOMCTL_resumedomain). They can be used for checkpoint
where the expectation is the domain should continue as nothing happened
afterwards.

hvmemul_do_io() and handle_pio() will act differently if the return
code of hvm_send_ioreq() (resp. hvmemul_do_pio_buffer()) is X86EMUL_RETRY.

In this case, the I/O state will be reset to STATE_IOREQ_NONE (i.e
no I/O is pending) and/or the PC will not be advanced.

If the shutdown request happens right after the I/O was sent to the
IOREQ, then emulation code will end up to re-execute the instruction
and therefore forward again the same I/O (at least when reading IO port).

This would be problem if the access has a side-effect. A dumb example,
is a device implementing a counter which is incremented by one for every
access. When running shutdown/resume in a loop, the value read by the
OS may not be the old value + 1.

Add an extra boolean in the structure hvm_vcpu_io to indicate whether
the I/O was suspended. This is then used in place of checking the domain
is shutting down in hvmemul_do_io() and handle_pio() as they should
act on suspend (i.e. vcpu_start_shutdown_deferral() returns false) rather
than shutdown.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoMAINTAINERS: add myself as reviewer for IOMMU vendor independent code
Roger Pau Monné [Fri, 6 May 2022 12:53:31 +0000 (14:53 +0200)]
MAINTAINERS: add myself as reviewer for IOMMU vendor independent code

That also covers the PCI bits which I'm interested on.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobump default SeaBIOS version to 1.16.0
Jan Beulich [Fri, 6 May 2022 12:46:52 +0000 (14:46 +0200)]
bump default SeaBIOS version to 1.16.0

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agox86: avoid SORT_BY_INIT_PRIORITY with old GNU ld
Jan Beulich [Thu, 5 May 2022 14:26:50 +0000 (16:26 +0200)]
x86: avoid SORT_BY_INIT_PRIORITY with old GNU ld

Support for this construct was added in 2.22 only. Avoid the need to
introduce logic to probe for linker script capabilities by (ab)using the
probe for a command line option having appeared at about the same time.

Note that this remains x86-specific because Arm is unaffected, by
requiring GNU ld 2.24 or newer.

Fixes: 4b7fd8153ddf ("x86: fold sections in final binaries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agotools/xenstore: don't let special watches be children of /
Juergen Gross [Mon, 2 May 2022 10:07:22 +0000 (12:07 +0200)]
tools/xenstore: don't let special watches be children of /

When firing special watches (e.g. "@releaseDomain"), they will be
regarded to be valid children of the "/" node. So a domain having
registered a watch for "/" and having the privilege to receive
the special watches will receive those special watch events for the
registered "/" watch.

Fix that by calling the related fire_watches() with the "exact"
parameter set to true, causing a mismatch for the "/" node.

Reported-by: Raphael Ning <raphning@amazon.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Raphael Ning <raphning@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/arm: Advertise workaround 1 if we apply 3
Bertrand Marquis [Tue, 3 May 2022 09:38:30 +0000 (10:38 +0100)]
xen/arm: Advertise workaround 1 if we apply 3

SMCC_WORKAROUND_3 is handling both Spectre v2 and spectre BHB.
So when a guest is asking if we support workaround 1, tell yes if we
apply workaround 3 on exception entry as it handles it.

This will allow guests not supporting Spectre BHB but impacted by
spectre v2 to still handle it correctly.
The modified behaviour is coherent with what the Linux kernel does in
KVM for guests.

While there use ARM_SMCCC_SUCCESS instead of 0 for the return code value
for workaround detection to be coherent with Workaround 2 handling.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoSUPPORT.MD: Correct the amount of physical memory supported for Arm
Julien Grall [Thu, 5 May 2022 10:46:57 +0000 (11:46 +0100)]
SUPPORT.MD: Correct the amount of physical memory supported for Arm

As part of XSA-385, SUPPORT.MD gained a statement regarding the amount
of physical memory supported.

However, booting Xen on a Arm platform with that amount of memory would
result to a breakage because the frametable area is too small.

The wiki [1] (as of April 2022) claims we were able to support up to
5 TiB on Arm64 and 16 GiB. However, this is not the case because
the struct page_info has always been bigger than expected (56 bytes
for 64-bit and 32-bytes for 32-bit).

I don't have any HW with such amount of memory. So rather than
modifying the code, take the opportunity to use the limit that should
work on Arm (2 TiB for 64-bit and 12 GiB for 32-bit).

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm part
3 years agooptee: immediately free RPC buffers that are released by OP-TEE
Jens Wiklander [Wed, 4 May 2022 05:49:12 +0000 (07:49 +0200)]
optee: immediately free RPC buffers that are released by OP-TEE

This commit fixes a case overlooked in [1].

There are two kinds of shared memory buffers used by OP-TEE:
1. Normal payload buffer
2. Internal command structure buffers

The internal command structure buffers are represented with a shadow
copy internally in Xen since this buffer can contain physical addresses
that may need to be translated between real physical address and guest
physical address without leaking information to the guest.

[1] fixes the problem when releasing the normal payload buffers. The
internal command structure buffers must be released in the same way.
Failure to follow this order opens a window where the guest has freed
the shared memory but Xen is still tracking the buffer.

During this window the guest may happen to recycle this particular
shared memory in some other thread and try to use it. Xen will block
this which will lead to spurious failures to register a new shared
memory block.

Fix this by freeing the internal command structure buffers first before
informing the guest that the buffer can be freed.

[1] 5b13eb1d978e ("optee: immediately free buffers that are released by OP-TEE")

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
[stefano: minor code style fix]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
3 years agolinker/lld: do not generate quoted section names
Roger Pau Monné [Mon, 2 May 2022 06:51:45 +0000 (08:51 +0200)]
linker/lld: do not generate quoted section names

LLVM LD doesn't strip the quotes from the section names, and so the
resulting binary ends up with section names like:

  [ 1] ".text"           PROGBITS         ffff82d040200000  00008000
       000000000018cbc1  0000000000000000  AX       0     0     4096

This confuses some tools (like gdb) and prevents proper parsing of the
binary.

The issue has already been reported and is being fixed in LLD.  In
order to workaround this issue and keep the GNU ld support define
different DECL_SECTION macros depending on the used ld
implementation.

Drop the quotes from the definitions of the debug sections in
DECL_DEBUG{2}, as those quotes are not required for GNU ld either.

Fixes: 6254920587c3 ('x86: quote section names when defining them in linker script')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agokconfig: detect LD implementation
Roger Pau Monné [Mon, 2 May 2022 06:50:39 +0000 (08:50 +0200)]
kconfig: detect LD implementation

Detect GNU and LLVM ld implementations. This is required for further
patches that will introduce diverging behaviour depending on the
linker implementation in use.

Note that LLVM ld returns "compatible with GNU linkers" as part of the
version string, so be on the safe side and use '^' to only match at
the start of the line in case LLVM ever decides to change the text to
use "compatible with GNU ld" instead.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoscripts/add_maintainers.pl: add -o as an alternative to --patchdir
Elliott Mitchell [Mon, 2 May 2022 06:50:02 +0000 (08:50 +0200)]
scripts/add_maintainers.pl: add -o as an alternative to --patchdir

This matches the output directory option used by `git format-patch`.  I
suspect I'm not the only one who finds matching `git format-patch` more
intuitive, than -d for directory.

Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agox86/msr: handle reads to MSR_P5_MC_{ADDR,TYPE}
Roger Pau Monné [Mon, 2 May 2022 06:49:12 +0000 (08:49 +0200)]
x86/msr: handle reads to MSR_P5_MC_{ADDR,TYPE}

Windows Server 2019 Essentials will unconditionally attempt to read
P5_MC_ADDR MSR at boot and throw a BSOD if injected a #GP.

Fix this by mapping MSR_P5_MC_{ADDR,TYPE} to
MSR_IA32_MCi_{ADDR,STATUS}, as reported also done by hardware in Intel
SDM "Mapping of the Pentium Processor Machine-Check Errors to the
Machine-Check Architecture" section.

Reported-by: Steffen Einsle <einsle@phptrix.de>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoIOMMU/x86: disallow device assignment to PoD guests
Jan Beulich [Mon, 2 May 2022 06:48:02 +0000 (08:48 +0200)]
IOMMU/x86: disallow device assignment to PoD guests

While it is okay for IOMMU page tables to be set up for guests starting
in PoD mode, actual device assignment may only occur once all PoD
entries have been removed from the P2M. So far this was enforced only
for boot-time assignment, and only in the tool stack.

Also use the new function to replace p2m_pod_entry_count(): Its unlocked
access to p2m->pod.entry_count wasn't really okay (irrespective of the
result being stale by the time the caller gets to see it). Nor was the
use of that function in line with the immediately preceding comment: A
PoD guest isn't just one with a non-zero entry count, but also one with
a non-empty cache (e.g. prior to actually launching the guest).

To allow the tool stack to see a consistent snapshot of PoD state, move
the tail of XENMEM_{get,set}_pod_target handling into a function, adding
proper locking there.

In libxl take the liberty to use the new local variable r also for a
pre-existing call into libxc.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoRevert "page_alloc: assert IRQs are enabled in heap alloc/free"
Julien Grall [Fri, 29 Apr 2022 09:04:40 +0000 (10:04 +0100)]
Revert "page_alloc: assert IRQs are enabled in heap alloc/free"

This reverts commit fa6dc0879ffd3dffffaea2837953c7a8761a9ba0 as there
are more fallout on Arm.g

3 years agoMAINTAINERS: add Rahul as SMMU maintainer
Stefano Stabellini [Tue, 26 Apr 2022 20:27:32 +0000 (13:27 -0700)]
MAINTAINERS: add Rahul as SMMU maintainer

Add Rahul as ARM SMMU maintainer. Create a new explicit entry for "ARM
SMMU" also with Julien which is the original contributor of the code and
continues to maintain it.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <julien@xen.org>
3 years agox86/mem_sharing: make fork_reset more configurable
Tamas K Lengyel [Thu, 28 Apr 2022 14:15:33 +0000 (16:15 +0200)]
x86/mem_sharing: make fork_reset more configurable

Alow specify distinct parts of the fork VM to be reset. This is useful when a
fuzzing operation involves mapping in only a handful of pages that are known
ahead of time. Throwing these pages away just to be re-copied immediately is
expensive, thus allowing to specify partial resets can speed things up.

Also allow resetting to be initiated from vm_event responses as an
optiomization.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoPCI: replace "secondary" flavors of PCI_{DEVFN,BDF,SBDF}()
Jan Beulich [Thu, 28 Apr 2022 14:14:26 +0000 (16:14 +0200)]
PCI: replace "secondary" flavors of PCI_{DEVFN,BDF,SBDF}()

At their use sites the numeric suffixes are at least odd to read, first
and foremost for PCI_DEVFN2() where the suffix doesn't even match the
number of arguments. Make use of count_args() such that a single flavor
each suffices (leaving aside helper macros, which aren't supposed to be
used from the outside).

In parse_ppr_log_entry() take the opportunity and drop two local
variables and convert an assignment to an initializer.

In VT-d code fold a number of bus+devfn comparison pairs into a single
BDF comparison.

No change to generated code for the vast majority of the adjustments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoPCI: replace stray uses of PCI_{DEVFN,BDF}2()
Jan Beulich [Thu, 28 Apr 2022 14:13:23 +0000 (16:13 +0200)]
PCI: replace stray uses of PCI_{DEVFN,BDF}2()

There's no good reason to use these when we already have a pci_sbdf_t
type object available. This extends to the use of PCI_BUS() in
pci_ecam_map_bus() as well.

No change to generated code (with gcc11 at least, and I have to admit
that I didn't expect compilers to necessarily be able to spot the
optimization potential on the original code).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agox86+libxl: correct p2m (shadow) memory pool size calculation
Jan Beulich [Thu, 28 Apr 2022 08:00:49 +0000 (10:00 +0200)]
x86+libxl: correct p2m (shadow) memory pool size calculation

The reference "to shadow the resident processes" is applicable to
domains (potentially) running in shadow mode only. Adjust the
calculations accordingly. This, however, requires further parameters.
Since the original function is deprecated anyway, and since it can't be
changed (for being part of a stable ABI), introduce a new (internal
only) function, with the deprecated one simply becoming a wrapper.

In dom0_paging_pages() also take the opportunity and stop open-coding
DIV_ROUND_UP().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/mwait-idle: add SPR support
Artem Bityutskiy [Thu, 28 Apr 2022 08:00:18 +0000 (10:00 +0200)]
x86/mwait-idle: add SPR support

Add Sapphire Rapids Xeon support.

Up until very recently, the C1 and C1E C-states were independent, but this
has changed in some new chips, including Sapphire Rapids Xeon (SPR). In these
chips the C1 and C1E states cannot be enabled at the same time. The "C1E
promotion" bit in 'MSR_IA32_POWER_CTL' also has its semantics changed a bit.

Here are the C1, C1E, and "C1E promotion" bit rules on Xeons before SPR.

1. If C1E promotion bit is disabled.
   a. C1  requests end up with C1  C-state.
   b. C1E requests end up with C1E C-state.
2. If C1E promotion bit is enabled.
   a. C1  requests end up with C1E C-state.
   b. C1E requests end up with C1E C-state.

Here are the C1, C1E, and "C1E promotion" bit rules on Sapphire Rapids Xeon.
1. If C1E promotion bit is disabled.
   a. C1  requests end up with C1 C-state.
   b. C1E requests end up with C1 C-state.
2. If C1E promotion bit is enabled.
   a. C1  requests end up with C1E C-state.
   b. C1E requests end up with C1E C-state.

Before SPR Xeon, the 'intel_idle' driver was disabling C1E promotion and was
exposing C1 and C1E as independent C-states. But on SPR, C1 and C1E cannot be
enabled at the same time.

This patch adds both C1 and C1E states. However, C1E is marked as with the
"CPUIDLE_FLAG_UNUSABLE" flag, which means that in won't be registered by
default. The C1E promotion bit will be cleared, which means that by default
only C1 and C6 will be registered on SPR.

The next patch will add an option for enabling C1E and disabling C1 on SPR.

Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 9edf3c0ffef0
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/mwait-idle: switch to asm/intel-family.h naming
Jan Beulich [Thu, 28 Apr 2022 07:59:14 +0000 (09:59 +0200)]
x86/mwait-idle: switch to asm/intel-family.h naming

This brings us (back) closer to the original Linux source.

While touching mwait_idle_state_table_update() also drop a stray leading
blank.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoxen/public: add new macro to ring.h
Juergen Gross [Thu, 28 Apr 2022 07:58:42 +0000 (09:58 +0200)]
xen/public: add new macro to ring.h

For the initialization of a ring page by the frontend two macros are
available in ring.h: SHARED_RING_INIT() and FRONT_RING_INIT().

All known users use always both of them in direct sequence.

Add another macro XEN_FRONT_RING_INIT() combining the two macros.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agodrivers/exynos4210: Remove unused-but-set variable
Michal Orzel [Wed, 27 Apr 2022 09:49:41 +0000 (11:49 +0200)]
drivers/exynos4210: Remove unused-but-set variable

Function exynos4210_uart_init_preirq defines and sets a variable
divisor but does not make use of it. Remove the definition and comment
out the assignment as this function already has some TODOs.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoplatforms/omap: Remove unused-but-set variable
Michal Orzel [Wed, 27 Apr 2022 09:49:40 +0000 (11:49 +0200)]
platforms/omap: Remove unused-but-set variable

Function omap5_init_time defines and sets the variable den but does not
make use of it. Remove this variable.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoplatforms/xgene: Make use of dt_device_get_address return value
Michal Orzel [Wed, 27 Apr 2022 09:49:39 +0000 (11:49 +0200)]
platforms/xgene: Make use of dt_device_get_address return value

Currently function xgene_check_pirq_eoi assigns the return value of
dt_device_get_address to a variable res but does not make use of it.
Fix it by making use of res in the condition checking the result of a
call to dt_device_get_address instead of checking the address stored in
dbase.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/sched: Remove unused-but-set variable
Michal Orzel [Wed, 27 Apr 2022 09:49:38 +0000 (11:49 +0200)]
xen/sched: Remove unused-but-set variable

Function schedule_cpu_add defines and sets a variable old_unit but
does not make use of it. Remove this variable.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
3 years agoxen/arm: smmu.c: Remove unused-but-set variable
Michal Orzel [Wed, 27 Apr 2022 09:49:37 +0000 (11:49 +0200)]
xen/arm: smmu.c: Remove unused-but-set variable

Function arm_smmu_init_context_bank defines and sets a variable
gr0_base but does not make use of it. Remove this variable.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoefi/boot.c: Remove unused-but-set variable
Michal Orzel [Wed, 27 Apr 2022 09:49:35 +0000 (11:49 +0200)]
efi/boot.c: Remove unused-but-set variable

Function efi_start defines and sets a variable size but does not
make use of it. Remove this variable.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: bootfdt.c: Remove unused-but-set variable
Michal Orzel [Wed, 27 Apr 2022 09:49:34 +0000 (11:49 +0200)]
xen/arm: bootfdt.c: Remove unused-but-set variable

Function device_tree_node_compatible defines and sets a variable
mlen but does not make use of it. Remove this variable.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/arm64: io: Handle data abort due to cache maintenance instructions
Ayan Kumar Halder [Thu, 24 Mar 2022 13:37:05 +0000 (13:37 +0000)]
xen/arm64: io: Handle data abort due to cache maintenance instructions

When the data abort is caused due to cache maintenance for an address,
there are three scenarios:-

1. Address belonging to a non emulated region - For this, Xen should
set the corresponding bit in the translation table entry to valid and
return to the guest to retry the instruction. This can happen sometimes
as Xen need to set the translation table entry to invalid. (for eg
'Break-Before-Make' sequence). Xen returns to the guest to retry the
instruction.

2. Address belongs to an emulated region - Xen should ignore the
instruction (ie increment the PC) and return to the guest.

3. Address is invalid - Xen should forward the data abort to the guest.

Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com>
[julien: Don't initialize p.size to 1 << info->dabt.size]
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agopage_alloc: assert IRQs are enabled in heap alloc/free
David Vrabel [Tue, 26 Apr 2022 08:33:01 +0000 (10:33 +0200)]
page_alloc: assert IRQs are enabled in heap alloc/free

Heap pages can only be safely allocated and freed with interrupts
enabled as they may require a TLB flush which may send IPIs (on x86).

Normally spinlock debugging would catch calls from the incorrect
context, but not from stop_machine_run() action functions as these are
called with spin lock debugging disabled.

Enhance the assertions in alloc_xenheap_pages() and
alloc_domheap_pages() to check interrupts are enabled. For consistency
the same asserts are used when freeing heap pages.

As an exception, when only 1 PCPU is online, allocations are permitted
with interrupts disabled as any TLB flushes would be local only. This
is necessary during early boot.

Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: alternative: Don't call vmap() within stop_machine_run()
Julien Grall [Tue, 26 Apr 2022 20:06:29 +0000 (21:06 +0100)]
xen/arm: alternative: Don't call vmap() within stop_machine_run()

Commit 88a037e2cfe1 "page_alloc: assert IRQs are enabled in heap
alloc/free" extended the checks in the buddy allocator to catch
any use of the helpers from context with interrupts disabled.

Unfortunately, the rule is not followed in the alternative code and
this will result to crash at boot with debug enabled:

(XEN) Xen call trace:
(XEN)    [<0022a510>] alloc_xenheap_pages+0x120/0x150 (PC)
(XEN)    [<00000000>] 00000000 (LR)
(XEN)    [<002736ac>] arch/arm/mm.c#xen_pt_update+0x144/0x6e4
(XEN)    [<002740d4>] map_pages_to_xen+0x10/0x20
(XEN)    [<00236864>] __vmap+0x400/0x4a4
(XEN)    [<0026aee8>] arch/arm/alternative.c#__apply_alternatives_multi_stop+0x144/0x1ec
(XEN)    [<0022fe40>] stop_machine_run+0x23c/0x300
(XEN)    [<002c40c4>] apply_alternatives_all+0x34/0x5c
(XEN)    [<002ce3e8>] start_xen+0xcb8/0x1024
(XEN)    [<00200068>] arch/arm/arm32/head.o#primary_switched+0xc/0x1c

The interrupts will be disabled by the state machine in stop_machine_run(),
hence why the ASSERT is hit.

For now the patch extending the checks has been reverted, but it would
be good to re-introduce it (allocation with interrupts disabled is not
desirable).

So move the re-mapping of Xen to the caller of stop_machine_run().

Signed-off-by: Julien Grall <jgrall@amazon.com>
Cc: David Vrabel <dvrabel@amazon.co.uk>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoEFI: don't mistakenly delete a file we never installed
Jan Beulich [Wed, 27 Apr 2022 07:15:03 +0000 (09:15 +0200)]
EFI: don't mistakenly delete a file we never installed

Just like for "install", make dealing with xen.efi on the EFI partition
dependent upon mount point and vendor directory being known.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agolibxl: retry QMP PCI device_add
Jason Andryuk [Wed, 27 Apr 2022 07:14:30 +0000 (09:14 +0200)]
libxl: retry QMP PCI device_add

PCI device assignment to an HVM with stubdom is potentially racy.  First
the PCI device is assigned to the stubdom via the PV PCI protocol.  Then
QEMU is sent a QMP command to attach the PCI device to QEMU running
within the stubdom.  However, the sysfs entries within the stubdom may
not have appeared by the time QEMU receives the device_add command
resulting in errors like:

libxl_qmp.c:1838:qmp_ev_parse_error_messages:Domain 10:Could not open '/sys/bus/pci/devices/0000:00:1f.3/config': No such file or directory

This patch retries the device assignment up to 10 times with a 1 second
delay between.  That roughly matches the overall hotplug timeout for
pci_add_timeout.  pci_add_timeout's initialization is moved to
do_pci_add since retries call into pci_add_qmp_device_add again.

The qmp_ev_parse_error_messages error is still printed since it happens
at a lower level than the pci code controlling the retries.  With that,
the "Retrying PCI add %d" message is also printed at ERROR level to
clarify what is happening.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/vmx: add hvm functions to get/set non-register state
Tamas K Lengyel [Wed, 27 Apr 2022 07:13:39 +0000 (09:13 +0200)]
x86/vmx: add hvm functions to get/set non-register state

During VM forking and resetting a failed vmentry has been observed due
to the guest non-register state going out-of-sync with the guest register
state. For example, a VM fork reset right after a STI instruction can trigger
the failed entry. This is due to the guest non-register state not being saved
from the parent VM, thus the reset operation only copies the register state.

Fix this by adding a new pair of hvm functions to get/set the guest
non-register state so that the overall vCPU state remains in sync.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoRevert "page_alloc: assert IRQs are enabled in heap alloc/free"
Jan Beulich [Tue, 26 Apr 2022 14:02:21 +0000 (16:02 +0200)]
Revert "page_alloc: assert IRQs are enabled in heap alloc/free"

This reverts commit 88a037e2cfe11a723fe420d3585837ab1bdc6f8a, as
it break booting on Arm.

3 years agopage_alloc: assert IRQs are enabled in heap alloc/free
David Vrabel [Tue, 26 Apr 2022 08:33:01 +0000 (10:33 +0200)]
page_alloc: assert IRQs are enabled in heap alloc/free

Heap pages can only be safely allocated and freed with interrupts
enabled as they may require a TLB flush which may send IPIs (on x86).

Normally spinlock debugging would catch calls from the incorrect
context, but not from stop_machine_run() action functions as these are
called with spin lock debugging disabled.

Enhance the assertions in alloc_xenheap_pages() and
alloc_domheap_pages() to check interrupts are enabled. For consistency
the same asserts are used when freeing heap pages.

As an exception, when only 1 PCPU is online, allocations are permitted
with interrupts disabled as any TLB flushes would be local only. This
is necessary during early boot.

Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxsm/flask: code style formatting
Daniel P. Smith [Tue, 26 Apr 2022 08:30:31 +0000 (10:30 +0200)]
xsm/flask: code style formatting

This is a quick code style cleanup patch for xsm/flask. The files flask_op.c
and hooks.c are Xen specific, thus full code style rules were applied. The
remaining files are from Linux and therefore only trailing whitespace was
remove from those files.

Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
3 years agoIOMMU: make domctl handler tolerate NULL domain
Jan Beulich [Tue, 26 Apr 2022 08:25:54 +0000 (10:25 +0200)]
IOMMU: make domctl handler tolerate NULL domain

Besides the reporter's issue of hitting a NULL deref when !CONFIG_GDBSX,
XEN_DOMCTL_test_assign_device can legitimately end up having NULL passed
here, when the domctl was passed DOMID_INVALID.

Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agoxen/iommu: cleanup iommu related domctl handling
Juergen Gross [Tue, 26 Apr 2022 08:23:58 +0000 (10:23 +0200)]
xen/iommu: cleanup iommu related domctl handling

Today iommu_do_domctl() is being called from arch_do_domctl() in the
"default:" case of a switch statement. This has led already to crashes
due to unvalidated parameters.

Fix that by moving the call of iommu_do_domctl() to the main switch
statement of do_domctl().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> # Arm
3 years agotools/libs/light: don't set errno to a negative value
Juergen Gross [Wed, 20 Apr 2022 07:31:19 +0000 (09:31 +0200)]
tools/libs/light: don't set errno to a negative value

Setting errno to a negative value makes no sense.

Fixes: e78e8b9bb649 ("libxl: Add interface for querying hypervisor about PCI topology")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/libs/guest: don't set errno to a negative value
Juergen Gross [Wed, 20 Apr 2022 07:31:18 +0000 (09:31 +0200)]
tools/libs/guest: don't set errno to a negative value

Setting errno to a negative error value makes no sense.

Fixes: cb99a64029c9 ("libxc: arm: allow passing a device tree blob to the guest")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/libs/ctrl: don't set errno to a negative value
Juergen Gross [Wed, 20 Apr 2022 07:31:17 +0000 (09:31 +0200)]
tools/libs/ctrl: don't set errno to a negative value

The claimed reason for setting errno to -1 is wrong. On x86
xc_domain_pod_target() will set errno to a sane value in the error
case.

Fixes: ff1745d5882b ("tools: libxl: do not set the PoD target on ARM")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/libs/evtchn: don't set errno to negative values
Juergen Gross [Wed, 20 Apr 2022 07:31:16 +0000 (09:31 +0200)]
tools/libs/evtchn: don't set errno to negative values

Setting errno to a negative value makes no sense.

Fixes: 6b6500b3cbaa ("tools/libs/evtchn: Add support for restricting a handle")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen: Split x86/debugger.h into common and arch specific parts
Bobby Eshleman [Tue, 28 Sep 2021 20:30:29 +0000 (13:30 -0700)]
xen: Split x86/debugger.h into common and arch specific parts

With all the non-CONFIG_CRASH_DEBUG functionality moved elsewhere, split
x86/debugger.h in two, with the stubs and explanation moved to xen/debugger.h.

In particular, this means that arches only need to provide an $arch/debugger.h
if they implement CONFIG_CRASH_DEBUG, and ARM's stub can be deleted.

Signed-off-by: Bobby Eshleman <bobby.eshleman@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/debugger: Misc cleanup prior to splitting
Andrew Cooper [Wed, 20 Apr 2022 13:40:45 +0000 (14:40 +0100)]
x86/debugger: Misc cleanup prior to splitting

 * Remove inappropriate semicolon from debugger_trap_immediate().
 * Try to explain what debugger_trap_fatal() is doing, and write it in a more
   legible way.
 * Drop unnecessary includes.  This includes common/domain.c which doesn't use
   any debugger functionality, even prior to this cleanup.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/gdbstub: Clean up includes
Andrew Cooper [Wed, 20 Apr 2022 12:48:05 +0000 (13:48 +0100)]
x86/gdbstub: Clean up includes

common/gdbstub.c wants struct gdb_context but only gets it transitively
through asm/debugger.h.  None of */gdbstub.c should include asm/debugger.h so
include xen/gdbstub.h instead.

Forward declare struct cpu_user_regs in xen/gdbstub.h so it doesn't depend on
the include order to compile.

x86/setup.c doesn't need xen/gdbstub.h at all, so drop it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/gdbsx: Move domain_pause_for_debugger() into gdbsx
Andrew Cooper [Wed, 20 Apr 2022 00:38:32 +0000 (01:38 +0100)]
x86/gdbsx: Move domain_pause_for_debugger() into gdbsx

domain_pause_for_debugger() is guest debugging (CONFIG_GDBSX) not host
debugging (CONFIG_CRASH_DEBUG).

Move it into the new gdbsx.c to drop the (incorrect) ifdefary, and provide a
static inline in the !CONFIG_GDBSX case so callers can optimise away
everything rather than having to emit a call to an empty function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/gdbsx: Rename debug.c to gdbsx.c
Bobby Eshleman [Tue, 28 Sep 2021 20:30:26 +0000 (13:30 -0700)]
x86/gdbsx: Rename debug.c to gdbsx.c

debug.c contains only dbg_rw_mem().  Rename it to gdbsx.c.

Move gdbsx_guest_mem_io(), and the prior setup of iop->remain, from domctl.c
to gdbsx.c, merging it with dbg_rw_mem().

Signed-off-by: Bobby Eshleman <bobby.eshleman@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/debugger: Remove debugger_trap_entry()
Bobby Eshleman [Tue, 28 Sep 2021 20:30:25 +0000 (13:30 -0700)]
x86/debugger: Remove debugger_trap_entry()

debugger_trap_entry() is unrelated to the other contents of debugger.h.  It is
a no-op for everything other than #DB/#BP, and for those it invokes guest
debugging (CONFIG_GDBSX) not host debugging (CONFIG_CRASH_DEBUG).

The reason it is a no-op for gdbstub is related to the fact that it's
description is inappropriate for any kind of useful debugging.  In normal
debugging, gdb only sees things which manifest as signals; it doesn't see
things which the kernel resolves itself (some #PF, #NM, etc).  Furthermore,
without a mechanism to invoke pv_inject_event(), the current infrastructure
will livelock on faults from guest context.

As such, there is no plausible future matching it's description.  Any work to
do something better than the current nothing will have to design something
more coherent.

Therefore, simplify everything by expanding debugger_trap_entry() into its two
non-empty locations, fixing bugs with their positioning (vs early exceptions
and curr not being safe to deference) and for #DB, deferring the pause until
the changes in %dr6 are saved to v->arch.dr6 so the debugger can actually see
which condition triggered.  This also removes some logically dead code from
do_trap(), where the compiler can't prove that #DB/#BP are handled by
different codepaths.

Signed-off-by: Bobby Eshleman <bobby.eshleman@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/build: Fix MAP rule when called in isolation
Andrew Cooper [Thu, 21 Apr 2022 14:23:37 +0000 (15:23 +0100)]
xen/build: Fix MAP rule when called in isolation

Now that `make MAP` might rebuild $(TARGET), it needs removing from
no-dot-config-targets.

Otherwise the build eventually fails with:

    CPP     arch/x86/asm-macros.i
  arch/x86/asm-macros.c:1:10: fatal error: asm/asm-defns.h: No such file or
  directory
      1 | #include <asm/asm-defns.h>
        |          ^~~~~~~~~~~~~~~~~

Fixes: e1e72198213b ("xen/build: Fix dependency for the MAP rule")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/build: make linking work again with ld incapable of generating build ID
Jan Beulich [Fri, 22 Apr 2022 12:56:23 +0000 (14:56 +0200)]
x86/build: make linking work again with ld incapable of generating build ID

The retaining of .note.* in a PT_NOTE segment requires a matching
program header to be present in the first place. Drop the respective
conditional and adjust mkelf32 to deal with (ignore) the potentially
present but empty extra segment (but have the new code be generic by
dropping any excess trailing entirely empty segments).

Fixes: dedb0aa42c6d ("x86/build: use --orphan-handling linker option if available")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoAMD/IOMMU: drop stray TLB flush
Jan Beulich [Fri, 22 Apr 2022 12:54:59 +0000 (14:54 +0200)]
AMD/IOMMU: drop stray TLB flush

I think this flush was overlooked when flushing was moved out of the
core (un)mapping functions. The flush the caller is required to invoke
anyway will satisfy the needs resulting from the splitting of a
superpage.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoIOMMU: have vendor code announce supported page sizes
Jan Beulich [Fri, 22 Apr 2022 12:54:16 +0000 (14:54 +0200)]
IOMMU: have vendor code announce supported page sizes

Generic code will use this information to determine what order values
can legitimately be passed to the ->{,un}map_page() hooks. For now all
ops structures simply get to announce 4k mappings (as base page size),
and there is (and always has been) an assumption that this matches the
CPU's MMU base page size (eventually we will want to permit IOMMUs with
a base page size smaller than the CPU MMU's).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
3 years agoVT-d: limit page table population in domain_pgd_maddr()
Jan Beulich [Fri, 22 Apr 2022 12:53:13 +0000 (14:53 +0200)]
VT-d: limit page table population in domain_pgd_maddr()

I have to admit that I never understood why domain_pgd_maddr() wants to
populate all page table levels for DFN 0. I can only assume that despite
the comment there what is needed is population just down to the smallest
possible nr_pt_levels that the loop later in the function may need to
run to. Hence what is needed is the minimum of all possible
iommu->nr_pt_levels, to then be passed into addr_to_dma_page_maddr()
instead of literal 1.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: have callers specify the target level for page table walks
Jan Beulich [Fri, 22 Apr 2022 12:52:40 +0000 (14:52 +0200)]
VT-d: have callers specify the target level for page table walks

In order to be able to insert/remove super-pages we need to allow
callers of the walking function to specify at which point to stop the
walk.

For intel_iommu_lookup_page() integrate the last level access into
the main walking function.

dma_pte_clear_one() gets only partly adjusted for now: Error handling
and order parameter get put in place, but the order parameter remains
ignored (just like intel_iommu_map_page()'s order part of the flags).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoAMD/IOMMU: have callers specify the target level for page table walks
Jan Beulich [Fri, 22 Apr 2022 12:51:37 +0000 (14:51 +0200)]
AMD/IOMMU: have callers specify the target level for page table walks

In order to be able to insert/remove super-pages we need to allow
callers of the walking function to specify at which point to stop the
walk. (For now at least gcc will instantiate just a variant of the
function with the parameter eliminated, so effectively no change to
generated code as far as the parameter addition goes.)

Instead of merely adjusting a BUG_ON() condition, convert it into an
error return - there's no reason to crash the entire host in that case.
Leave an assertion though for spotting issues early in debug builds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agogitlab-ci: add an ARM32 qemu-based smoke test
Stefano Stabellini [Thu, 21 Apr 2022 23:17:40 +0000 (16:17 -0700)]
gitlab-ci: add an ARM32 qemu-based smoke test

Add a minimal ARM32 smoke test based on qemu-system-arm, as provided by
the test-artifacts qemu container. The minimal test simply boots Xen
(built from previous build stages) and Dom0.

The test needs a working kernel and minimal initrd for dom0. Instead of
building our own kernel and initrd, which would mean maintaining one or
two more builting scripts under automation/, we borrow a kernel and
initrd from distros.

For the kernel we pick the Debian Bullseye kernel, which has everything
we need already built-in. However, we cannot use the Debian Bullseye
initrd because it is 22MB and the large size causes QEMU to core dump.

Instead, use the tiny busybox-based rootfs provided by Alpine Linux,
which is really minimal: just 2.5MB. Note that we cannot use the Alpine
Linux kernel because that doesn't boot on Xen.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
3 years agogitlab-ci: add qemu-system-arm to the existing tests-artifacts container
Stefano Stabellini [Sat, 16 Apr 2022 00:17:00 +0000 (17:17 -0700)]
gitlab-ci: add qemu-system-arm to the existing tests-artifacts container

Add qemu-system-arm to the existing test-artifacts qemu container (which
doesn't get build for every iteration but only updated once in a while.)

With qemu-system-arm available, we'll be able to run ARM32 tests.

This patch also bumps the QEMU version to v6.0.0 for both arm32 and
arm64 (the test-artifacts container is one, shared for both).

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/build: Rework binary conversion for boot/{cmdline,reloc}.c
Andrew Cooper [Thu, 14 Apr 2022 09:33:01 +0000 (10:33 +0100)]
x86/build: Rework binary conversion for boot/{cmdline,reloc}.c

There is no need to opencode .got.plt size check; it can be done with linker
asserts instead.  Extend the checking to all dynamic linkage sections, and
drop the $(OBJDUMP) pass.

Furthermore, instead of removing .got.plt specifically, take only .text when
converting to a flat binary.  This makes the process invariant of .text's
position relative to the start of the binary, which avoids needing to discard
all sections, and removes the need to work around sections that certain
linkers are unhappy discarding.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/build: Fix dependency for the MAP rule
Andrew Cooper [Thu, 14 Apr 2022 16:04:54 +0000 (17:04 +0100)]
xen/build: Fix dependency for the MAP rule

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/mm: avoid inadvertently degrading a TLB flush to local only
David Vrabel [Wed, 20 Apr 2022 08:55:01 +0000 (10:55 +0200)]
x86/mm: avoid inadvertently degrading a TLB flush to local only

If the direct map is incorrectly modified with interrupts disabled,
the required TLB flushes are degraded to flushing the local CPU only.

This could lead to very hard to diagnose problems as different CPUs will
end up with different views of memory. Although, no such issues have yet
been identified.

Change the check in the flush_area() macro to look at system_state
instead. This defers the switch from local to all later in the boot
(see xen/arch/x86/setup.c:__start_xen()). This is fine because
additional PCPUs are not brought up until after the system state is
SYS_STATE_smp_boot.

Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoVT-d: refuse to use IOMMU with reserved CAP.ND value
Jan Beulich [Wed, 20 Apr 2022 08:54:26 +0000 (10:54 +0200)]
VT-d: refuse to use IOMMU with reserved CAP.ND value

The field taking the value 7 (resulting in 18-bit DIDs when using the
calculation in cap_ndoms(), when the DID fields are only 16 bits wide)
is reserved. Instead of misbehaving in case we would encounter such an
IOMMU, refuse to use it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: plug memory leaks in iommu_alloc()
Jan Beulich [Wed, 20 Apr 2022 08:53:57 +0000 (10:53 +0200)]
VT-d: plug memory leaks in iommu_alloc()

While 97af062b89d5 ("IOMMU/x86: maintain a per-device pseudo domain ID")
took care of not making things worse, plugging pre-existing leaks wasn't
the purpose of that change; they're not security relevant after all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: drop ROOT_ENTRY_NR
Jan Beulich [Wed, 20 Apr 2022 08:53:19 +0000 (10:53 +0200)]
VT-d: drop ROOT_ENTRY_NR

It's not only misplaced, but entirely unused.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoIOMMU/x86: drop locking from quarantine_init() hooks
Jan Beulich [Wed, 20 Apr 2022 08:52:13 +0000 (10:52 +0200)]
IOMMU/x86: drop locking from quarantine_init() hooks

Prior extension of these functions to enable per-device quarantine page
tables already didn't add more locking there, but merely left in place
what had been there before. But really locking is unnecessary here:
We're running with pcidevs_lock held (i.e. multiple invocations of the
same function [or their teardown equivalents] are impossible, and hence
there are no "local" races), while all consuming of the data being
populated here can't race anyway due to happening sequentially
afterwards, and unlike ordinary domains' page tables quarantine ones
are never modified once fully constructed. See also the comment in
struct arch_pci_dev.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoinclude/public: add command result definitions to vscsiif.h
Juergen Gross [Wed, 20 Apr 2022 08:51:26 +0000 (10:51 +0200)]
include/public: add command result definitions to vscsiif.h

The result field of struct vscsiif_response is lacking a detailed
definition. Today the Linux kernel internal scsi definitions are being
used, which is not a sane interface for a PV device driver.

Add macros to change that by using today's values in the XEN namespace.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
3 years agoxen/arm: Add i.MX lpuart early printk support
Peng Fan [Tue, 19 Apr 2022 04:39:27 +0000 (12:39 +0800)]
xen/arm: Add i.MX lpuart early printk support

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: Add i.MX lpuart driver
Peng Fan [Tue, 19 Apr 2022 04:39:26 +0000 (12:39 +0800)]
xen/arm: Add i.MX lpuart driver

The i.MX LPUART Documentation:
https://www.nxp.com/webapp/Download?colCode=IMX8QMIEC
Chatper 13.6 Low Power Universal Asynchronous Receiver/
Transmitter (LPUART)

Tested-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: Make use of DT_MATCH_TIMER in make_timer_node
Michal Orzel [Thu, 14 Apr 2022 09:58:43 +0000 (11:58 +0200)]
xen/arm: Make use of DT_MATCH_TIMER in make_timer_node

DT_MATCH_TIMER stores the compatible timer ids and as such should be
used in all the places where we need to refer to them. make_timer_node
explicitly lists the same ids as the ones defined in DT_MATCH_TIMER so
make use of this macro instead.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen: cleanup gdbsx_guest_mem_io() call
Juergen Gross [Tue, 19 Apr 2022 13:52:53 +0000 (15:52 +0200)]
xen: cleanup gdbsx_guest_mem_io() call

Modify the gdbsx_guest_mem_io() interface to take the already known
domain pointer as parameter instead of the domid. This enables to
remove some more code further down the call tree.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen: fix XEN_DOMCTL_gdbsx_guestmemio crash
Juergen Gross [Tue, 19 Apr 2022 13:52:52 +0000 (15:52 +0200)]
xen: fix XEN_DOMCTL_gdbsx_guestmemio crash

A hypervisor built without CONFIG_GDBSX will crash in case the
XEN_DOMCTL_gdbsx_guestmemio domctl is being called, as the call will
end up in iommu_do_domctl() with d == NULL:

  (XEN) CPU:    6
  (XEN) RIP:    e008:[<ffff82d040269984>] iommu_do_domctl+0x4/0x30
  (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v0)
  (XEN) rax: 00000000000003e8   rbx: ffff830856277ef8   rcx: ffff830856277fff
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d040269984>] R iommu_do_domctl+0x4/0x30
  (XEN)    [<ffff82d04035cd5f>] S arch_do_domctl+0x7f/0x2330
  (XEN)    [<ffff82d040239e46>] S do_domctl+0xe56/0x1930
  (XEN)    [<ffff82d040238ff0>] S do_domctl+0/0x1930
  (XEN)    [<ffff82d0402f8c59>] S pv_hypercall+0x99/0x110
  (XEN)    [<ffff82d0402f5161>] S arch/x86/pv/domain.c#_toggle_guest_pt+0x11/0x90
  (XEN)    [<ffff82d040366288>] S lstar_enter+0x128/0x130
  (XEN)
  (XEN) Pagetable walk from 0000000000000144:
  (XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 6:
  (XEN) FATAL PAGE FAULT
  (XEN) [error_code=0000]
  (XEN) Faulting linear address: 0000000000000144
  (XEN) ****************************************

It used to be permitted to pass DOMID_IDLE to dbg_rw_mem(), which is why the
special case skipping the domid checks exists.  Now that it is only permitted
to pass proper domids, remove the special case, making 'd' always valid.

Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com>
Fixes: e726a82ca0dc ("xen: make gdbsx support configurable")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/debug: Drop unnecessary include of compile.h
Andrew Cooper [Thu, 14 Apr 2022 09:01:53 +0000 (10:01 +0100)]
x86/debug: Drop unnecessary include of compile.h

compile.h changes across incremental builds, but nothing in debug.c uses it.
This avoids debug.c getting rebuilt on every incremental build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoIOMMU: log appropriate SBDF
Jan Beulich [Wed, 13 Apr 2022 10:36:03 +0000 (12:36 +0200)]
IOMMU: log appropriate SBDF

To handle phantom devices, several functions are passed separate "devfn"
arguments besides a PCI device. In such cases we want to log the phantom
device's coordinates instead of the main one's. (Note that not all of
the instances being changed are fallout from the referenced commit.)

Fixes: 1ee1441835f4 ("print: introduce a format specifier for pci_sbdf_t")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoAMD/IOMMU: replace a few PCI_BDF2()
Jan Beulich [Wed, 13 Apr 2022 10:35:17 +0000 (12:35 +0200)]
AMD/IOMMU: replace a few PCI_BDF2()

struct pci_dev has the wanted value directly available; use it. Note
that this fixes a - imo benign - mistake in reassign_device(): The unity
map removal ought to be based on the passed in devfn (as is the case on
the establishing side). This is benign because the mappings would be
removed anyway a little later, when the "main" device gets processed.
While there also limit the scope of two variables in that function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agobuild: adding out-of-tree support to the xen build
Anthony PERARD [Wed, 13 Apr 2022 10:33:21 +0000 (12:33 +0200)]
build: adding out-of-tree support to the xen build

This implement out-of-tree support, there's two ways to create an
out-of-tree build tree (after that, `make` in that new directory
works):
    make O=build
    mkdir build; cd build; make -f ../Makefile
also works with an absolute path for both.

This implementation only works if the source tree is clean, as we use
VPATH.

This patch copies most new code with handling out-of-tree build from
Linux v5.12.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Tested-by: Julien Grall <jgrall@amazon.com>
Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com> # livepatch
3 years agoMAINTAINERS: add myself as Continuous Integration maintainer
Stefano Stabellini [Fri, 8 Apr 2022 00:00:47 +0000 (17:00 -0700)]
MAINTAINERS: add myself as Continuous Integration maintainer

I have contributed all the ARM tests to gitlab-ci. After checking with
Doug, I am happy to volunteer to co-maintain Continuous Integration.

Also take the opportunity to remove the stale travis-ci entries.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/xl: fix vif and vcpupin parse tests
Roger Pau Monné [Mon, 11 Apr 2022 10:33:02 +0000 (12:33 +0200)]
tools/xl: fix vif and vcpupin parse tests

Current vif and vcpupin parse tests are out of sync.  First of all, xl
returns 1 on failure, so replace the expected error code.

Secondly fix the expected output from some vif tests, as xl will no
longer print the unpopulated fields.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/boot: LEA -> MOV in video handling code
Jan Beulich [Mon, 11 Apr 2022 10:31:02 +0000 (12:31 +0200)]
x86/boot: LEA -> MOV in video handling code

Replace most LEA instances with (one byte shorter) MOV.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Mon, 11 Apr 2022 10:30:37 +0000 (12:30 +0200)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

3 years agox86/boot: obtain video info from boot loader
Jan Beulich [Mon, 11 Apr 2022 10:30:09 +0000 (12:30 +0200)]
x86/boot: obtain video info from boot loader

With MB2 the boot loader may provide this information, allowing us to
obtain it without needing to enter real mode (assuming we don't need to
set a new mode from "vga=", but can instead inherit the one the
bootloader may have established).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/boot: make "vga=current" work with graphics modes
Jan Beulich [Mon, 11 Apr 2022 10:29:14 +0000 (12:29 +0200)]
x86/boot: make "vga=current" work with graphics modes

GrUB2 can be told to leave the screen in the graphics mode it has been
using (or any other one), via "set gfxpayload=keep" (or suitable
variants thereof). In this case we can avoid doing another mode switch
ourselves. This in particular avoids possibly setting the screen to a
less desirable mode: On one of my test systems the set of modes
reported available by the VESA BIOS depends on whether the interposed
KVM switch has that machine set as the active one. If it's not active,
only modes up to 1024x768 get reported, while when active 1280x1024
modes are also included. For things to always work with an explicitly
specified mode (via the "vga=" option), that mode therefore needs be a
1024x768 one.

For some reason this only works for me with "multiboot2" (and
"module2"); "multiboot" (and "module") still forces the screen into text
mode, despite my reading of the sources suggesting otherwise.

For starters I'm limiting this to graphics modes; I do think this ought
to also work for text modes, but
- I can't tell whether GrUB2 can set any text mode other than 80x25
  (I've only found plain "text" to be valid as a "gfxpayload" setting),
- I'm uncertain whether supporting that is worth it, since I'm uncertain
  how many people would be running their systems/screens in text mode,
- I'd like to limit the amount of code added to the realmode trampoline.

For starters I'm also limiting mode information retrieval to raw BIOS
accesses. This will allow things to work (in principle) also with other
boot environments where a graphics mode can be left in place. The
downside is that this then still is dependent upon switching back to
real mode, so retrieving the needed information from multiboot info is
likely going to be desirable down the road.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen: Populate xen.lds.h and make use of its macros
Michal Orzel [Mon, 11 Apr 2022 07:03:00 +0000 (09:03 +0200)]
xen: Populate xen.lds.h and make use of its macros

Populate header file xen.lds.h with the first portion of macros storing
constructs common to x86 and arm linker scripts. Replace the original
constructs with these helpers.

No functional improvements to x86 linker script.

Making use of common macros improves arm linker script with:
- explicit list of debug sections that otherwise are seen as "orphans"
  by the linker. This will allow to fix issues after enabling linker
  option --orphan-handling one day,
- extended list of discarded section to include: .discard, destructors
  related sections, .fini_array which can reference .text.exit,
- sections not related to debugging that are placed by ld.lld. Even
  though we do not support linking with LLD on Arm, these sections do
  not cause problem to GNU ld,

As we are replacing hardcoded boundary specified as an argument to ALIGN
function with POINTER_ALIGN, this changes the alignment in HYPFS_PARAM
construct for arm32 from 8 to 4. It is fine as there are no 64bit values
used in struct param_hypfs.

Please note that this patch does not aim to perform the full sync up
between the linker scripts. It creates a base for further work.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoxen: Introduce a header to store common linker scripts content
Michal Orzel [Mon, 11 Apr 2022 07:02:59 +0000 (09:02 +0200)]
xen: Introduce a header to store common linker scripts content

Both x86 and arm linker scripts share quite a lot of common content.
It is difficult to keep syncing them up, thus introduce a new header
in include/xen called xen.lds.h to store the internals mutual to all
the linker scripts.

Include this header in linker scripts for x86 and arm.
This patch serves as an intermediate step before populating xen.lds.h
and making use of its content in the linker scripts later on.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoSUPPORT.md: add Dom0less as Supported
Stefano Stabellini [Fri, 8 Apr 2022 00:10:37 +0000 (17:10 -0700)]
SUPPORT.md: add Dom0less as Supported

Add Dom0less to SUPPORT.md to clarify its support status. The feature is
mature enough and small enough to make it security supported.

Clarify that dom0less DomUs memory is not scrubbed at boot when
bootscrub=on or bootscrub=off are passed as Xen command line parameters,
and no XSAs will be issued for that.

Also see XSA-372: 371347c5b64da and fd5dc41ceaed.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agox86/irq: skip unmap_domain_pirq XSM during destruction
Jason Andryuk [Fri, 8 Apr 2022 12:51:52 +0000 (14:51 +0200)]
x86/irq: skip unmap_domain_pirq XSM during destruction

xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from
complete_domain_destroy as an RCU callback.  The source context was an
unexpected, random domain.  Since this is a xen-internal operation,
going through the XSM hook is inapproriate.

Check d->is_dying and skip the XSM hook when set since this is a cleanup
operation for a domain being destroyed.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/P2M: the majority for struct p2m_domain's fields are HVM-only
Jan Beulich [Fri, 8 Apr 2022 12:51:06 +0000 (14:51 +0200)]
x86/P2M: the majority for struct p2m_domain's fields are HVM-only

..., as are the majority of the locks involved. Conditionalize things
accordingly.

Also adjust the ioreq field's indentation at this occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: p2m.c is HVM-only
Jan Beulich [Fri, 8 Apr 2022 12:50:29 +0000 (14:50 +0200)]
x86/P2M: p2m.c is HVM-only

This only requires moving p2m_percpu_rwlock elsewhere (ultimately I
think all P2M locking should go away as well when !HVM, but this looks
to require further code juggling). The two other unguarded functions are
already unneeded (by virtue of DCE) when !HVM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agopaged_pages field is MEM_PAGING-only
Jan Beulich [Fri, 8 Apr 2022 12:48:45 +0000 (14:48 +0200)]
paged_pages field is MEM_PAGING-only

Conditionalize it and its uses accordingly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agoshr_pages field is MEM_SHARING-only
Jan Beulich [Fri, 8 Apr 2022 12:47:56 +0000 (14:47 +0200)]
shr_pages field is MEM_SHARING-only

Conditionalize it and its uses accordingly. The main goal though is to
demonstrate that x86's p2m_teardown() is now empty when !HVM, which in
particular means the last remaining use of p2m_lock() in this cases goes
away.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/p2m: re-arrange {,__}put_gfn()
Jan Beulich [Fri, 8 Apr 2022 12:47:11 +0000 (14:47 +0200)]
x86/p2m: re-arrange {,__}put_gfn()

All explicit callers of __put_gfn() are in HVM-only code and hold a valid
P2M pointer in their hands. Move the paging_mode_translate() check out of
there into put_gfn(), renaming __put_gfn() and making its GFN parameter
type-safe.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>