Sergiy Kibrik [Wed, 29 May 2024 07:54:22 +0000 (09:54 +0200)]
x86/intel: move vmce_has_lmce() routine to header
Moving this function out of mce_intel.c will make it possible to disable
build of Intel MCE code later on, because the function gets called from
common x86 code.
Also replace boilerplate code that checks for MCG_LMCE_P flag with
vmce_has_lmce(), which might contribute to readability a bit.
Andrew Cooper [Tue, 28 May 2024 15:29:11 +0000 (16:29 +0100)]
x86/svm: Rework VMCB_ACCESSORS() to use a plain type name
This avoids having a function call in a typeof() expression.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Tue, 28 May 2024 06:52:27 +0000 (08:52 +0200)]
x86/traps: address violation of MISRA C Rule 8.4
Rule 8.4 states: "A compatible declaration shall be visible when
an object or function with external linkage is defined".
The function do_general_protection is either used is asm code
or only within this unit, so there is no risk of this getting
out of sync with its definition, but the function must remain
extern.
Therefore, this function is deviated using a comment-based deviation.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Tue, 28 May 2024 06:52:15 +0000 (08:52 +0200)]
CHANGELOG: Mention libxl blktap/tapback support
Add entry for backendtype=tap support in libxl. blktap needs some
changes to work with libxl, which haven't been merged. They are
available from this PR: https://github.com/xapi-project/blktap/pull/394
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Henry Wang [Thu, 23 May 2024 07:40:39 +0000 (15:40 +0800)]
tools: Introduce the "xl dt-overlay attach" command
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach (in the future also detach) devices from the provided DT overlay
to domains. Support this by introducing a new "xl dt-overlay" command
and related documentation, i.e. "xl dt-overlay attach. Slightly rework
the command option parsing logic.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Henry Wang [Thu, 23 May 2024 07:40:36 +0000 (15:40 +0800)]
xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains
In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. It is OK to change the sysctl behavior
as this feature is experimental so changing sysctl behavior and breaking
compatibility is OK.
Add the XEN_DOMCTL_dt_overlay with operations
XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.
The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources. All devices in the overlay are
assigned to a single domain.
Also take the opportunity to make one coding style fix in sysctl.h.
Introduce DT_OVERLAY_MAX_SIZE and use it to avoid repetitions of
KB(500).
xen,reg is to be used to handle non-1:1 mappings but it is currently
unsupported. For now return errors for not-1:1 mapped domains.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Vikram Garhwal <fnu.vikram@xilinx.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:35 +0000 (15:40 +0800)]
xen/arm/gic: Allow adding interrupt to running VMs
Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.
Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:34 +0000 (15:40 +0800)]
tools/arm: Introduce the "nr_spis" xl config entry
Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.
Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.
Update the doc and the golang bindings accordingly.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Henry Wang [Thu, 23 May 2024 07:40:33 +0000 (15:40 +0800)]
xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.
Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:32 +0000 (15:40 +0800)]
tools/xl: Correct the help information and exit code of the dt-overlay command
Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.
Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.
Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Suggested-by: Anthony PERARD <anthony@xenproject.org> Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
George Dunlap [Fri, 26 Apr 2024 13:17:33 +0000 (14:17 +0100)]
tools/xenalyze: Ignore HVM_EMUL events harder
To unify certain common sanity checks, checks are done very early in
processing based only on the top-level type.
Unfortunately, when TRC_HVM_EMUL was introduced, it broke some of the
assumptions about how the top-level types worked. Namely, traces of
this type will show up outside of HVM contexts: in idle domains and in
PV domains.
Make an explicit exception for TRC_HVM_EMUL types in a number of places:
- Pass the record info pointer to toplevel_assert_check, so that it
can exclude TRC_HVM_EMUL records from idle and vcpu data_mode
checks
- Don't attempt to set the vcpu data_type in hvm_process for
TRC_HVM_EMUL records.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Thu, 25 Apr 2024 12:03:58 +0000 (13:03 +0100)]
x86/hvm/trace: Use a different trace type for AMD processors
A long-standing usability sub-optimality with xenalyze is the
necessity to specify `--svm-mode` when analyzing AMD processors. This
fundamentally comes about because the same trace event ID is used for
both VMX and SVM, but the contents of the trace must be interpreted
differently.
Instead, allocate separate trace events for VMX and SVM vmexits in
Xen; this will allow all readers to properly interpret the meaning of
the vmexit reason.
In xenalyze, first remove the redundant call to init_hvm_data();
there's no way to get to hvm_vmexit_process() without it being already
initialized by the set_vcpu_type call in hvm_process().
Replace this with set_hvm_exit_reson_data(), and move setting of
hvm->exit_reason_* into that function.
Modify hvm_process and hvm_vmexit_process to handle all four potential
values appropriately.
If SVM entries are encountered, set opt.svm_mode so that other
SVM-specific functionality is triggered.
Remove the `--svm-mode` command-line option, since it's now redundant.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Henry Wang [Thu, 21 Mar 2024 03:57:06 +0000 (11:57 +0800)]
xen/arm: Set correct per-cpu cpu_core_mask
In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.
cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
George Dunlap [Fri, 26 Apr 2024 14:18:25 +0000 (15:18 +0100)]
tools/xentrace: Remove xentrace_format
xentrace_format was always of limited utility, since trace records
across pcpus were processed out of order; it was superseded by xenalyze
over a decade ago.
But for several releases, the `formats` file it has depended on for
proper operation has not even been included in `make install` (which
generally means it doesn't get picked up by distros either); yet
nobody has seemed to complain.
Simple remove xentrace_format, and point people to xenalyze instead.
NB that there is no man page for xenalyze, so the "see also" on the
xentrace man page is simply removed for now.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Olaf Hering <olaf@aepfle.de>
Andrew Cooper [Thu, 25 Apr 2024 09:46:40 +0000 (10:46 +0100)]
tools: Drop libsystemd as a dependency
There are no more users, and we want to disuade people from introducing new
users just for sd_notify() and friends. Drop the dependency.
We still want the overall --with{,out}-systemd to gate the generation of the
service/unit/mount/etc files.
Rerun autogen.sh, and mark the dependency as removed in the build containers.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 25 Apr 2024 09:26:58 +0000 (10:26 +0100)]
tools/{c,o}xenstored: Don't link against libsystemd
Use the local freestanding wrapper instead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 16 May 2024 17:59:00 +0000 (18:59 +0100)]
tools: Import stand-alone sd_notify() implementation from systemd
... in order to avoid linking against the whole of libsystemd.
Only minimal changes to the upstream copy, to function as a drop-in
replacement for sd_notify() and as a header-only library.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 16 May 2024 17:50:26 +0000 (18:50 +0100)]
LICENSES: Add MIT-0 (MIT No Attribution)
We are about to import code licensed under MIT-0. It's compatible for us to
use, so identify it as a permitted license.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Commit 634cfc8beb ("Make MEM_ACCESS configurable") intended to make
MEM_ACCESS configurable on Arm to reduce the code size when the user
doesn't need it.
However, this didn't cover the arch specific code. None of the code
in arm/mem_access.c is necessary when MEM_ACCESS=n, so it can be
compiled out. This will require to provide some stub for functions
called by the common code.
Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
vpci: add initial support for virtual PCI bus topology
Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.
Please note, that at the moment only function 0 of a multifunction
device can be passed through.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
vpci/header: emulate PCI_COMMAND register for guests
Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.
According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.
Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.
PCI_COMMAND_IO (bit 0)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if an I/O BAR is exposed to the guest.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
don't yet support I/O BARs for domUs.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_MEMORY (bit 1)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if a Memory BAR is exposed to the guest.
Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
regions.
Xen domU: For devices assigned to DomUs, memory decoding will be
disabled at the time of initialization.
PCI_COMMAND_MASTER (bit 2)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_SPECIAL (bit 3)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_INVALIDATE (bit 4)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_VGA_PALETTE (bit 5)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_PARITY (bit 6)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_WAIT (bit 7)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: hardwire to 0
QEMU: res_mask
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_SERR (bit 8)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_FAST_BACK (bit 9)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_INTX_DISABLE (bit 10)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU checks if INTx was mapped
for a device. If it is not, then guest can't control
PCI_COMMAND_INTX_DISABLE bit.
Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
Xen dom0: We allow dom0 to control this bit freely.
Bits 11-15
PCIe 6.1: RsvdP
PCI LB 3.0: Reserved
QEMU: res_mask
Xen domU: rsvdp_mask
Xen dom0: We allow dom0 to control these bits freely.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
arm/vpci: honor access size when returning an error
Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0xFFFFFFFFFFFFFFFF in the target
register.
Jan Beulich [Thu, 23 May 2024 08:16:52 +0000 (10:16 +0200)]
x86: detect PIT aliasing on ports other than 0x4[0-3]
... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to PIT. Unlike for CMOS/RTC, do detection
pretty early, to avoid disturbing normal operation later on (even if
typically we won't use much of the PIT).
Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects (beyond such that PIT
reads have anyway) in case it does not alias the PIT's.
As to the port 0x61 accesses: Unlike other accesses we do, this masks
off the top four bits (in addition to the bottom two ones), following
Intel chipset documentation saying that these (read-only) bits should
only be written with zero.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Oleksii Kurochko [Fri, 17 May 2024 13:54:55 +0000 (15:54 +0200)]
xen/riscv: introduce atomic.h
Initially the patch was introduced by Bobby, who takes the header from
Linux kernel.
The following changes were done on top of Bobby's changes:
- atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
to use__*xchg_generic()
- drop casts in write_atomic() as they are unnecessary
- drop introduction of WRITE_ONCE() and READ_ONCE().
Xen provides ACCESS_ONCE()
- remove zero-length array access in read_atomic()
- drop defines similar to pattern:
#define atomic_add_return_relaxed atomic_add_return_relaxed
- move not RISC-V specific functions to asm-generic/atomics-ops.h
- drop atomic##prefix##_{cmp}xchg_{release, aquire, release}() as they
are not used in Xen.
- update the defintion of atomic##prefix##_{cmp}xchg according to
{cmp}xchg() implementation in Xen.
- some ATOMIC_OP() macros were updated:
- drop size argument for ATOMIC_OP which defines atomic##prefix##_xchg()
and atomic##prefix##_cmpxchg().
- drop c_op argument for ATOMIC_OPS which defines ATOMIC_OPS(and, and),
ATOMIC_OPS( or, or), ATOMIC_OPS(xor, xor), ATOMIC_OPS(add, add, +),
ATOMIC_OPS(sub, add, -) as c_op is always "+" for them.
- drop "" from definition of __atomic_{acquire/release"}_fence.
The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:
- .aq: The LR/SC sequence can be given acquire semantics by
setting the aq bit on the LR instruction.
- .rl: The LR/SC sequence can be given release semantics by
setting the rl bit on the SC instruction.
- .aqrl: Setting the aq bit on the LR instruction, and setting
both the aq and the rl bit on the SC instruction makes
the LR/SC sequence sequentially consistent, meaning that
it cannot be reordered with earlier or later memory
operations from the same hart.
Software should not set the rl bit on an LR instruction unless
the aq bit is also set, nor should software set the aq bit on an
SC instruction unless the rl bit is also set. LR.rl and SC.aq
instructions are not guaranteed to provide any stronger ordering
than those with both bits clear, but may result in lower
performance.
Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]
Oleksii Kurochko [Fri, 17 May 2024 13:54:54 +0000 (15:54 +0200)]
xen/riscv: introduce cmpxchg.h
The header was taken from Linux kernl 6.4.0-rc1.
Addionally, were updated:
* add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
access.
* replace tabs with spaces
* replace __* variale with *__
* introduce generic version of xchg_* and cmpxchg_*.
* drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them
* drop barries and use instruction suffixices instead ( .aq, .rl, .aqrl )
Implementation of 4- and 8-byte cases were updated according to the spec:
```
....
Linux Construct RVWMO AMO Mapping
...
atomic <op> amo<op>.{w|d}.aqrl
Linux Construct RVWMO LR/SC Mapping
...
atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez loop
Table A.5: Mappings from Linux memory primitives to RISC-V primitives
```
The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:
- .aq: The LR/SC sequence can be given acquire semantics by
setting the aq bit on the LR instruction.
- .rl: The LR/SC sequence can be given release semantics by
setting the rl bit on the SC instruction.
- .aqrl: Setting the aq bit on the LR instruction, and setting
both the aq and the rl bit on the SC instruction makes
the LR/SC sequence sequentially consistent, meaning that
it cannot be reordered with earlier or later memory
operations from the same hart.
Software should not set the rl bit on an LR instruction unless
the aq bit is also set, nor should software set the aq bit on an
SC instruction unless the rl bit is also set. LR.rl and SC.aq
instructions are not guaranteed to provide any stronger ordering
than those with both bits clear, but may result in lower
performance.
Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]
Otherwise it's not possible to call functions described in hvm/vlapic.h from the
inline functions of hvm/hvm.h.
This is because a static inline in vlapic.h depends on hvm.h, and pulls it
transitively through vpt.h. The ultimate cause is having hvm.h included in any
of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
No functional change.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 23 May 2024 08:03:33 +0000 (10:03 +0200)]
iommu/x86: print RMRR/IVMD ranges using full addresses
It's easier to correlate with the physical memory map if the addresses are
fully printed, instead of using frame numbers.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 23 May 2024 08:03:14 +0000 (10:03 +0200)]
xen/livepatch: make .livepatch.funcs read-only for in-tree tests
This matches the flags of the .livepatch.funcs section when generated using
livepatch-build-tools, which only sets the SHT_ALLOC flag.
Also constify the definitions of the livepatch_func variables in the tests
themselves, in order to better match the resulting output. Note that just
making those variables constant is not enough to force the generated sections
to be read-only.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Nicola Vetrini [Tue, 21 May 2024 14:01:17 +0000 (16:01 +0200)]
x86_64/cpu_idle: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Tue, 21 May 2024 14:00:47 +0000 (16:00 +0200)]
x86_64/uaccess: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
xlat_malloc_init is touched for consistency, despite the construct
being already deviated.
Nicola Vetrini [Tue, 21 May 2024 14:00:20 +0000 (16:00 +0200)]
x86/hvm: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Tue, 21 May 2024 13:59:50 +0000 (15:59 +0200)]
x86/vpmu: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Henry Wang [Tue, 21 May 2024 13:59:14 +0000 (15:59 +0200)]
xen/common/dt-overlay: Fix lock issue when add/remove the device
If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(&dt_host_lock)' failed at drivers/passthrough/device_tree.c:146
(XEN) ----[ Xen-4.19-unstable arm64 debug=y Not tainted ]----
[...]
(XEN) Xen call trace:
(XEN) [<00000a0000257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN) [<00000a00002573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN) [<00000a000020797c>] dt-overlay.c#remove_node_resources+0x8c/0x90
(XEN) [<00000a0000207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN) [<00000a0000208460>] dt_overlay_sysctl+0x428/0xc68
(XEN) [<00000a00002707f8>] arch_do_sysctl+0x1c/0x2c
(XEN) [<00000a0000230b40>] do_sysctl+0x96c/0x9ec
(XEN) [<00000a0000271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN) [<00000a0000273490>] do_trap_guest_sync+0x448/0x63c
(XEN) [<00000a000025c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(&dt_host_lock)' failed at drivers/passthrough/device_tree.c:146
(XEN) ****************************************
This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. dt_host_lock is meant to ensure that the DT node will not
disappear behind back. So fix the issue by taking the lock as soon as
getting hold of overlay_node.
Similar issue will be observed in adding the dtbo:
(XEN) Assertion 'system_state < SYS_STATE_active || rw_is_locked(&dt_host_lock)'
failed at xen-source/xen/drivers/passthrough/device_tree.c:192
(XEN) ----[ Xen-4.19-unstable arm64 debug=y Not tainted ]----
[...]
(XEN) Xen call trace:
(XEN) [<00000a00002594f4>] iommu_add_dt_device+0x7c/0x17c (PC)
(XEN) [<00000a0000259494>] iommu_add_dt_device+0x1c/0x17c (LR)
(XEN) [<00000a0000267db4>] handle_device+0x68/0x1e8
(XEN) [<00000a0000208ba8>] dt_overlay_sysctl+0x9d4/0xb84
(XEN) [<00000a000027342c>] arch_do_sysctl+0x24/0x38
(XEN) [<00000a0000231ac8>] do_sysctl+0x9ac/0xa34
(XEN) [<00000a0000274b70>] traps.c#do_trap_hypercall+0x230/0x2dc
(XEN) [<00000a0000276330>] do_trap_guest_sync+0x478/0x688
(XEN) [<00000a000025e480>] entry.o#guest_sync_slowpath+0xa8/0xd8
This is because the lock is released too early. So fix the issue by
releasing the lock after handle_device().
Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Roger Pau Monné [Tue, 21 May 2024 07:15:03 +0000 (09:15 +0200)]
xen/x86: pretty print interrupt CPU affinity masks
Print the CPU affinity masks as numeric ranges instead of plain hexadecimal
bitfields.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Mon, 20 Sep 2021 12:40:21 +0000 (13:40 +0100)]
xen/trace: Drop old trace API
With all users updated to the new API, drop the old API. This includes all of
asm/hvm/trace.h, which allows us to drop some includes.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Tue, 21 Sep 2021 18:55:47 +0000 (19:55 +0100)]
xen/trace: Removal final {__,}trace_var() users in favour of the new API
The cycles parameter (which gets removed as a consequence) determines whether
trace() or trace_time() is used.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Fri, 17 Sep 2021 23:31:27 +0000 (00:31 +0100)]
xen: Switch to new TRACE() API
(Almost) no functional change.
* In irq_move_cleanup_interrupt(), use the 'me' local variable rather than
calling smp_processor_id() again. This manifests as a minor code
improvement.
* In vlapic_update_timer() and lapic_rearm(), introduce a new 'timer_period'
local variable to simplify the expressions used for both the trace and
create_periodic_time() calls.
All other differences in the compiled binary are to do with line numbers
changing.
Some conversion notes:
* HVMTRACE_LONG_[234]D() and TRACE_2_LONG_[234]D() were latently buggy. They
blindly discard extra parameters, but luckily no users are impacted. They
are also obfuscated wrappers, depending on exactly one or two parameters
being TRC_PAR_LONG() to compile successfully.
* HVMTRACE_LONG_1D() behaves unlike its named companions, and takes exactly
one 64bit parameter which it splits manually. It's one user,
vmx_cr_access()'s LMSW path, is gets adjusted.
* TRACE_?D() and TRACE_2_LONG_*() change to TRACE_TIME() as cycles is always
enabled.
* HVMTRACE_ND() is opencoded for VMENTRY/VMEXIT records to include cycles.
These are converted to TRACE_TIME(), with the old modifier parameter
expressed as an OR at the callsite. One callsite, svm_vmenter_helper() had
a nested tb_init_done check, which is dropped. (The optimiser also spotted
this, which is why it doesn't manifest as a binary difference.)
* All uses of *LONG() are either opencoded or swapped to using a struct, to
avoid MISRA issues.
* All HVMTRACE_?D() change to TRACE() as cycles is explicitly skipped.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Mon, 20 Sep 2021 13:07:43 +0000 (14:07 +0100)]
xen/sched: Clean up trace handling
There is no need for bitfields anywhere - use more sensible types. There is
also no need to cast 'd' to (unsigned char *) before passing it to a function
taking void *. Switch to new trace_time() API.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Fri, 17 Sep 2021 15:28:19 +0000 (16:28 +0100)]
xen/rt: Clean up trace handling
Most uses of bitfields and __packed are unnecessary. There is also no need to
cast 'd' to (unsigned char *) before passing it to a function taking void *.
Switch to new trace_time() API.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Wed, 15 Sep 2021 16:01:43 +0000 (17:01 +0100)]
xen/credit2: Clean up trace handling
There is no need for bitfields anywhere - use types with an explicit width
instead. There is also no need to cast 'd' to (unsigned char *) before
passing it to a function taking void *. Switch to new trace_time() API.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (13:49 +0200)]
tools/xen-cpuid: Drop old names
Not used any more. Split out of previous patch to aid legibility.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (12:49 +0100)]
tools/xen-cpuid: Use automatically generated feature names
Have gen-cpuid.py write out INIT_FEATURE_VAL_TO_NAME, derived from the same
data source as INIT_FEATURE_NAME_TO_VAL, although both aliases of common_1d
are needed.
In xen-cpuid.c, sanity check at build time that leaf_info[] and
feature_names[] are of sensible length.
As dump_leaf() rendered missing names as numbers, always dump leaves even if
we don't have the leaf name. This conversion was argumably missed in commit 59afdb8a81d6 ("tools/misc: Tweak reserved bit handling for xen-cpuid").
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (12:49 +0100)]
tools/xen-cpuid: Rename decodes[] to leaf_info[]
Split out of subsequent patch to aid legibility.
No functional change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Fri, 10 May 2024 19:04:51 +0000 (20:04 +0100)]
x86/gen-cpuid: Minor cleanup
Rename INIT_FEATURE_NAMES to INIT_FEATURE_NAME_TO_VAL as we're about to gain a
inverse mapping of the same thing.
Use dict.items() unconditionally. iteritems() is a marginal perf optimsiation
for Python2 only, and simply not worth the effort on a script this small.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Henry Wang [Mon, 20 May 2024 08:21:45 +0000 (16:21 +0800)]
tools/golang: Add missing golang bindings for vlan
It is noticed that commit: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
introduces a new "vlan" string field to libxl_device_nic. But the
golang bindings are missing. Add it in this patch.
Fixes: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic") Signed-off-by: Henry Wang <xin.wang2@amd.com> Acked-by: George Dunlap <george.dunlap@cloud.com>
Roger Pau Monné [Fri, 17 May 2024 13:56:05 +0000 (15:56 +0200)]
x86/msi: prevent watchdog triggering when dumping MSI state
Use the same check that's used in dump_irqs().
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
The value returned by __toupper is used in arithmetic operations causing
MISRA C 10.2 violations. Cast to plain char in the toupper macro. Also
do the same in tolower for consistency.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/arm: Add DT reserve map regions to bootinfo.reserved_mem
Currently the code is listing device tree reserve map regions
as reserved memory for Xen, but they are not added into
bootinfo.reserved_mem and they are fetched in multiple places
using the same code sequence, causing duplication. Fix this
by adding them to the bootinfo.reserved_mem at early stage.
Andrew Cooper [Thu, 16 May 2024 11:09:39 +0000 (12:09 +0100)]
x86/ucode: Further fixes to identify "ucode already up to date"
When the revision in hardware is newer than anything Xen has to hand,
'microcode_cache' isn't set up. Then, `xen-ucode` initiates the update
because it doesn't know whether the revisions across the system are symmetric
or not. This involves the patch getting all the way into the
apply_microcode() hooks before being found to be too old.
This is all a giant mess and needs an overhaul, but in the short term simply
adjust the apply_microcode() to return -EEXIST.
Also, unconditionally print the preexisting microcode revision on boot. It's
relevant information which is otherwise unavailable if Xen doesn't find new
microcode to use.
Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Sergiy Kibrik [Thu, 16 May 2024 11:36:22 +0000 (13:36 +0200)]
x86/p2m: move altp2m-related code to separate file
Move altp2m code from generic p2m.c file to altp2m.c, so it is kept separately
and can possibly be disabled in the build. We may want to disable it when
building for specific platform only, that doesn't support alternate p2m.
No functional change intended.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 16 May 2024 11:35:34 +0000 (13:35 +0200)]
x86/MCE: guard access to Intel/AMD-specific MCA MSRs
Add build-time checks for newly introduced INTEL/AMD config options when
calling vmce_{intel/amd}_{rdmsr/wrmsr}() routines.
This way a platform-specific code can be omitted in vmce code, if this
platform is disabled in config.
Oleksii Kurochko [Thu, 16 May 2024 08:08:37 +0000 (10:08 +0200)]
xen/bitops: put __ffs() into linux compatible header
The mentioned macros exist only because of Linux compatible purpose.
The patch defines __ffs() in terms of Xen bitops and it is safe
to define in this way ( as __ffs() - 1 ) as considering that __ffs()
was defined as __builtin_ctzl(x), which has undefined behavior when x=0,
so it is assumed that such cases are not encountered in the current code.
To not include <xen/linux-compat.h> to Xen library files __ffs() and __ffz()
were defined locally in find-next-bit.c.
Except __ffs() usage in find-next-bit.c only one usage of __ffs() leave
in smmu-v3.c. It seems that it __ffs can be changed to ffsl(x)-1 in
this file, but to keep smmu-v3.c looks close to linux it was deciced just
to define __ffs() in xen/linux-compat.h and include it in smmu-v3.c
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Rahul Singh <rahul.singh@arm.com>
Jan Beulich [Thu, 16 May 2024 08:03:16 +0000 (10:03 +0200)]
x86: detect PIC aliasing on ports other than 0x[2A][01]
... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to both PICs. Unlike for CMOS/RTC, do
detection very early, to avoid disturbing normal operation later on.
Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects in case it does not
alias the respective PIC's one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Jan Beulich [Thu, 16 May 2024 08:02:34 +0000 (10:02 +0200)]
x86: allow to suppress port-alias probing
By default there's already no use for this when we run in shim mode.
Plus there may also be a need to suppress the probing in case of issues
with it. Before introducing further port alias probing, introduce a
command line option allowing to bypass it, default it to on when in shim
mode, and gate RTC/CMOS port alias probing on it.
Requested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
automation/eclair_analysis: deviate macro count_args_ for MISRA Rule 20.7
The count_args_ macro violates Rule 20.7, but it can't be made
compliant with Rule 20.7 without breaking its functionality. Since
it's very unlikely for this macro to be misused, it is deviated.
Nicola Vetrini [Wed, 15 May 2024 07:51:59 +0000 (09:51 +0200)]
automation/eclair_analysis: fully deviate MISRA C Rules 21.9 and 21.10
These rules are concerned with the use of facilities provided by the
C Standard Library (qsort, bsearch for rule 21.9, and those provided
by <time.h> for rule 21.10).
Xen provides in its source code its own implementation of some of these
functions and macros, therefore a justification is provided for allowing
uses of these functions in the project.
The rules are also marked as clean as a consequence.
Roger Pau Monne [Mon, 13 May 2024 08:59:25 +0000 (10:59 +0200)]
x86/mtrr: avoid system wide rendezvous when setting AP MTRRs
There's no point in forcing a system wide update of the MTRRs on all processors
when there are no changes to be propagated. On AP startup it's only the AP
that needs to write the system wide MTRR values in order to match the rest of
the already online CPUs.
We have occasionally seen the watchdog trigger during `xen-hptool cpu-online`
in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs
on all the CPUs in the system.
While there adjust the comment to clarify why the system-wide resetting of the
MTRR registers is not needed for the purposes of mtrr_ap_init().
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Leigh Brown [Wed, 8 May 2024 21:38:21 +0000 (22:38 +0100)]
tools/xl: add vlan keyword to vif option
Update parse_nic_config() to support a new `vlan' keyword. This
keyword specifies the VLAN configuration to assign to the VIF when
attaching it to the bridge port, on operating systems that support
the capability (e.g. Linux). The vlan keyword will allow one or
more VLANs to be configured on the VIF when adding it to the bridge
port. This will be done by the vif-bridge script and functions.
Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Leigh Brown [Wed, 8 May 2024 21:38:20 +0000 (22:38 +0100)]
tools/libs/light: Add vlan field to libxl_device_nic
Add `vlan' string field to libxl_device_nic, to allow a VLAN
configuration to be specified for the VIF when adding it to the
bridge device.
Update libxl_nic.c to read and write the vlan field from the
xenstore.
This provides the capability for supported operating systems (e.g.
Linux) to perform VLAN filtering on bridge ports. The Xen
hotplug scripts need to be updated to read this information from
the xenstore and perform the required configuration.
Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Leigh Brown [Tue, 14 May 2024 08:13:44 +0000 (09:13 +0100)]
tools/xentop: Fix cpu% sort order
In compare_cpu_pct(), there is a double -> unsigned long long converion when
calling compare(). In C, this discards the fractional part, resulting in an
out-of order sorting such as:
Andrew Cooper [Thu, 9 May 2024 17:40:11 +0000 (18:40 +0100)]
tools/hvmloader: Further simplify SMP setup
Now that we're using hypercalls to start APs, we can replace the 'ap_cpuid'
global with a regular function parameter. This requires telling the compiler
that we'd like the parameter in a register rather than on the stack.
While adjusting, rename to cpu_setup(). It's always been used on the BSP,
making the name ap_start() specifically misleading.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Andrew Cooper [Sat, 11 May 2024 18:25:00 +0000 (19:25 +0100)]
x86/cpufreq: Rename cpuid variable/parameters to cpu
Various functions have a parameter or local variable called cpuid, but this
triggers a MISRA R5.3 violation because we also have a function called cpuid()
which wraps the real CPUID instruction.
In all these cases, it's a Xen cpu index, which is far more commonly named
just cpu in our code.
While adjusting these, fix a couple of other issues:
* cpufreq_cpu_init() is on the end of a hypercall (with in-memory parameters,
even), making EFAULT the wrong error to use. Use EOPNOTSUPP instead.
* check_est_cpu() is wrong to tie EIST to just Intel, and nowhere else using
EIST makes this restriction. Just check the feature itself, which is more
succinctly done after being folded into its single caller.
* In powernow_cpufreq_update(), replace an opencoded cpu_online().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 15 May 2024 13:35:15 +0000 (15:35 +0200)]
x86: respect mapcache_domain_init() failing
The function itself properly handles and hands onwards failure from
create_perdomain_mapping(). Therefore its caller should respect possible
failure, too.
Fixes: 4b28bf6ae90b ("x86: re-introduce map_domain_page() et al") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Juergen Gross [Wed, 15 May 2024 15:25:39 +0000 (17:25 +0200)]
xen/sched: set all sched_resource data inside locked region for new cpu
When adding a cpu to a scheduler, set all data items of struct
sched_resource inside the locked region, as otherwise a race might
happen (e.g. when trying to access the cpupool of the cpu):
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Fixes: a8c6c623192e ("sched: clarify use cases of schedule_cpu_switch()") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 2 Apr 2024 14:50:19 +0000 (15:50 +0100)]
Revert "evtchn: refuse EVTCHNOP_status for Xen-bound event channels"
The commit makes a claim without justification.
The claim is false; it broke lsevtchn in dom0, a debugging utility which
absolutely does care about all of the domain's event channels.
Whether to return information about a xen-owned evtchn is a matter of policy,
and it's not acceptable to subvert Xen's security subsystem on the decision.
Fixes: f60ab5337f96 ("evtchn: refuse EVTCHNOP_status for Xen-bound event channels") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Andrew Cooper [Fri, 10 May 2024 22:56:52 +0000 (23:56 +0100)]
xen: Use -Wuninitialized and -Winit-self
Assigning a variable to itself is an anti-pattern. It introduces definite UB
in an attempt to silence a warning about possible UB.
As it's definite undefined behaviour, it also mis-compiles in simple cases,
using whatever stale value happened to be in the allocated register.
Clang includes -Wuninitialized within -Wall, but GCC only includes it in
-Wextra, which is not used by Xen at this time.
Furthermore, the specific pattern of assigning a variable to itself in its
declaration is only diagnosed by GCC with -Winit-self. Clang does diagnose
simple forms of this pattern with a plain -Wuninitialized, but it fails to
diagnose the instances in Xen that GCC manages to find.
GCC, with -Wuninitialized and -Winit-self notices:
arch/x86/time.c: In function ‘read_pt_and_tsc’:
arch/x86/time.c:297:14: error: ‘best’ is used uninitialized in this function [-Werror=uninitialized]
297 | uint32_t best = best;
| ^~~~
arch/x86/time.c: In function ‘read_pt_and_tmcct’:
arch/x86/time.c:1022:14: error: ‘best’ is used uninitialized in this function [-Werror=uninitialized]
1022 | uint64_t best = best;
| ^~~~
Fix these up to start with a value of ~0, which is also more robust in the
case that something goes wrong.
Fixes: 23658e823238 ("x86/time: further improve TSC / CPU freq calibration accuracy") Fixes: 3f3906b462d5 ("x86/APIC: calibrate against platform timer when possible") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Nicola Vetrini [Fri, 10 May 2024 18:03:36 +0000 (20:03 +0200)]
automation/eclair_analysis: tag MISRA C Rule 1.1 as clean
Tag the rule as clean, as there are no more violations in the codebase since 93c27d54dd23 ("xen/arm: Fix MISRA regression on R1.1,
flexible array member not at the end").
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
libxl: Fix handling XenStore errors in device creation
If xenstored runs out of memory it is possible for it to fail operations
that should succeed. libxl wasn't robust against this, and could fail
to ensure that the TTY path of a non-initial console was created and
read-only for guests. This doesn't qualify for an XSA because guests
should not be able to run xenstored out of memory, but it still needs to
be fixed.
Add the missing error checks to ensure that all errors are properly
handled and that at no point can a guest make the TTY path of its
frontend directory writable.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com>
x86/hvm: Allow access to registers on the same page as MSI-X table
Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
on the same page as MSI-X table. Device model (especially one in
stubdomain) cannot really handle those, as direct writes to that page is
refused (page is on the mmio_ro_ranges list). Instead, extend
msixtbl_mmio_ops to handle such accesses too.
Doing this, requires correlating read/write location with guest
MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
for PV would need to be done separately.
This will be also used to read Pending Bit Array, if it lives on the same
page, making QEMU not needing /dev/mem access at all (especially helpful
with lockdown enabled in dom0). If PBA lives on another page, QEMU will
map it to the guest directly.
If PBA lives on the same page, discard writes and log a message.
Technically, writes outside of PBA could be allowed, but at this moment
the precise location of PBA isn't saved, and also no known device abuses
the spec in this way (at least yet).
To access those registers, msixtbl_mmio_ops need the relevant page
mapped. MSI handling already has infrastructure for that, using fixmap,
so try to map first/last page of the MSI-X table (if necessary) and save
their fixmap indexes. Note that msix_get_fixmap() does reference
counting and reuses existing mapping, so just call it directly, even if
the page was mapped before. Also, it uses a specific range of fixmap
indexes which doesn't include 0, so use 0 as default ("not mapped")
value - which simplifies code a bit.
Based on assumption that all MSI-X page accesses are handled by Xen, do
not forward adjacent accesses to other hypothetical ioreq servers, even
if the access wasn't handled for some reason (failure to map pages etc).
Relevant places log a message about that already.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
The arch_msix struct had a single "warned" field with a domid for which
warning was issued. Upcoming patch will need similar mechanism for few
more warnings, so change it to save a bit field of issued warnings.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Fri, 10 May 2024 12:49:13 +0000 (14:49 +0200)]
libxl: fix population of the online vCPU bitmap for PVH
libxl passes some information to libacpi to create the ACPI table for a PVH
guest, and among that information it's a bitmap of which vCPUs are online
which can be less than the maximum number of vCPUs assigned to the domain.
While the population of the bitmap is done correctly for HVM based on the
number of online vCPUs, for PVH the population of the bitmap is done based on
the number of maximum vCPUs allowed. This leads to all local APIC entries in
the MADT being set as enabled, which contradicts the data in xenstore if vCPUs
is different than maximum vCPUs.
Fix by copying the internal libxl bitmap that's populated based on the vCPUs
parameter.
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com> Link: https://gitlab.com/libvirt/libvirt/-/issues/399 Reported-by: Leigh Brown <leigh@solinno.co.uk> Fixes: 14c0d328da2b ('libxl/acpi: Build ACPI tables for HVMlite guests') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Leigh Brown <leigh@solinno.co.uk> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>