vpci/header: emulate PCI_COMMAND register for guests
Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.
According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.
Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.
PCI_COMMAND_IO (bit 0)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if an I/O BAR is exposed to the guest.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
don't yet support I/O BARs for domUs.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_MEMORY (bit 1)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if a Memory BAR is exposed to the guest.
Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
regions.
Xen domU: For devices assigned to DomUs, memory decoding will be
disabled at the time of initialization.
PCI_COMMAND_MASTER (bit 2)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_SPECIAL (bit 3)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_INVALIDATE (bit 4)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_VGA_PALETTE (bit 5)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_PARITY (bit 6)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_WAIT (bit 7)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: hardwire to 0
QEMU: res_mask
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_SERR (bit 8)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_FAST_BACK (bit 9)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_INTX_DISABLE (bit 10)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU checks if INTx was mapped
for a device. If it is not, then guest can't control
PCI_COMMAND_INTX_DISABLE bit.
Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
Xen dom0: We allow dom0 to control this bit freely.
Bits 11-15
PCIe 6.1: RsvdP
PCI LB 3.0: Reserved
QEMU: res_mask
Xen domU: rsvdp_mask
Xen dom0: We allow dom0 to control these bits freely.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
arm/vpci: honor access size when returning an error
Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0xFFFFFFFFFFFFFFFF in the target
register.
Jan Beulich [Thu, 23 May 2024 08:16:52 +0000 (10:16 +0200)]
x86: detect PIT aliasing on ports other than 0x4[0-3]
... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to PIT. Unlike for CMOS/RTC, do detection
pretty early, to avoid disturbing normal operation later on (even if
typically we won't use much of the PIT).
Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects (beyond such that PIT
reads have anyway) in case it does not alias the PIT's.
As to the port 0x61 accesses: Unlike other accesses we do, this masks
off the top four bits (in addition to the bottom two ones), following
Intel chipset documentation saying that these (read-only) bits should
only be written with zero.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Oleksii Kurochko [Fri, 17 May 2024 13:54:55 +0000 (15:54 +0200)]
xen/riscv: introduce atomic.h
Initially the patch was introduced by Bobby, who takes the header from
Linux kernel.
The following changes were done on top of Bobby's changes:
- atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
to use__*xchg_generic()
- drop casts in write_atomic() as they are unnecessary
- drop introduction of WRITE_ONCE() and READ_ONCE().
Xen provides ACCESS_ONCE()
- remove zero-length array access in read_atomic()
- drop defines similar to pattern:
#define atomic_add_return_relaxed atomic_add_return_relaxed
- move not RISC-V specific functions to asm-generic/atomics-ops.h
- drop atomic##prefix##_{cmp}xchg_{release, aquire, release}() as they
are not used in Xen.
- update the defintion of atomic##prefix##_{cmp}xchg according to
{cmp}xchg() implementation in Xen.
- some ATOMIC_OP() macros were updated:
- drop size argument for ATOMIC_OP which defines atomic##prefix##_xchg()
and atomic##prefix##_cmpxchg().
- drop c_op argument for ATOMIC_OPS which defines ATOMIC_OPS(and, and),
ATOMIC_OPS( or, or), ATOMIC_OPS(xor, xor), ATOMIC_OPS(add, add, +),
ATOMIC_OPS(sub, add, -) as c_op is always "+" for them.
- drop "" from definition of __atomic_{acquire/release"}_fence.
The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:
- .aq: The LR/SC sequence can be given acquire semantics by
setting the aq bit on the LR instruction.
- .rl: The LR/SC sequence can be given release semantics by
setting the rl bit on the SC instruction.
- .aqrl: Setting the aq bit on the LR instruction, and setting
both the aq and the rl bit on the SC instruction makes
the LR/SC sequence sequentially consistent, meaning that
it cannot be reordered with earlier or later memory
operations from the same hart.
Software should not set the rl bit on an LR instruction unless
the aq bit is also set, nor should software set the aq bit on an
SC instruction unless the rl bit is also set. LR.rl and SC.aq
instructions are not guaranteed to provide any stronger ordering
than those with both bits clear, but may result in lower
performance.
Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]
Oleksii Kurochko [Fri, 17 May 2024 13:54:54 +0000 (15:54 +0200)]
xen/riscv: introduce cmpxchg.h
The header was taken from Linux kernl 6.4.0-rc1.
Addionally, were updated:
* add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
access.
* replace tabs with spaces
* replace __* variale with *__
* introduce generic version of xchg_* and cmpxchg_*.
* drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them
* drop barries and use instruction suffixices instead ( .aq, .rl, .aqrl )
Implementation of 4- and 8-byte cases were updated according to the spec:
```
....
Linux Construct RVWMO AMO Mapping
...
atomic <op> amo<op>.{w|d}.aqrl
Linux Construct RVWMO LR/SC Mapping
...
atomic <op> loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez loop
Table A.5: Mappings from Linux memory primitives to RISC-V primitives
```
The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:
- .aq: The LR/SC sequence can be given acquire semantics by
setting the aq bit on the LR instruction.
- .rl: The LR/SC sequence can be given release semantics by
setting the rl bit on the SC instruction.
- .aqrl: Setting the aq bit on the LR instruction, and setting
both the aq and the rl bit on the SC instruction makes
the LR/SC sequence sequentially consistent, meaning that
it cannot be reordered with earlier or later memory
operations from the same hart.
Software should not set the rl bit on an LR instruction unless
the aq bit is also set, nor should software set the aq bit on an
SC instruction unless the rl bit is also set. LR.rl and SC.aq
instructions are not guaranteed to provide any stronger ordering
than those with both bits clear, but may result in lower
performance.
Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]
Otherwise it's not possible to call functions described in hvm/vlapic.h from the
inline functions of hvm/hvm.h.
This is because a static inline in vlapic.h depends on hvm.h, and pulls it
transitively through vpt.h. The ultimate cause is having hvm.h included in any
of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.
No functional change.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 23 May 2024 08:03:33 +0000 (10:03 +0200)]
iommu/x86: print RMRR/IVMD ranges using full addresses
It's easier to correlate with the physical memory map if the addresses are
fully printed, instead of using frame numbers.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 23 May 2024 08:03:14 +0000 (10:03 +0200)]
xen/livepatch: make .livepatch.funcs read-only for in-tree tests
This matches the flags of the .livepatch.funcs section when generated using
livepatch-build-tools, which only sets the SHT_ALLOC flag.
Also constify the definitions of the livepatch_func variables in the tests
themselves, in order to better match the resulting output. Note that just
making those variables constant is not enough to force the generated sections
to be read-only.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Nicola Vetrini [Tue, 21 May 2024 14:01:17 +0000 (16:01 +0200)]
x86_64/cpu_idle: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Tue, 21 May 2024 14:00:47 +0000 (16:00 +0200)]
x86_64/uaccess: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
xlat_malloc_init is touched for consistency, despite the construct
being already deviated.
Nicola Vetrini [Tue, 21 May 2024 14:00:20 +0000 (16:00 +0200)]
x86/hvm: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Tue, 21 May 2024 13:59:50 +0000 (15:59 +0200)]
x86/vpmu: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Henry Wang [Tue, 21 May 2024 13:59:14 +0000 (15:59 +0200)]
xen/common/dt-overlay: Fix lock issue when add/remove the device
If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(&dt_host_lock)' failed at drivers/passthrough/device_tree.c:146
(XEN) ----[ Xen-4.19-unstable arm64 debug=y Not tainted ]----
[...]
(XEN) Xen call trace:
(XEN) [<00000a0000257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN) [<00000a00002573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN) [<00000a000020797c>] dt-overlay.c#remove_node_resources+0x8c/0x90
(XEN) [<00000a0000207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN) [<00000a0000208460>] dt_overlay_sysctl+0x428/0xc68
(XEN) [<00000a00002707f8>] arch_do_sysctl+0x1c/0x2c
(XEN) [<00000a0000230b40>] do_sysctl+0x96c/0x9ec
(XEN) [<00000a0000271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN) [<00000a0000273490>] do_trap_guest_sync+0x448/0x63c
(XEN) [<00000a000025c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(&dt_host_lock)' failed at drivers/passthrough/device_tree.c:146
(XEN) ****************************************
This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. dt_host_lock is meant to ensure that the DT node will not
disappear behind back. So fix the issue by taking the lock as soon as
getting hold of overlay_node.
Similar issue will be observed in adding the dtbo:
(XEN) Assertion 'system_state < SYS_STATE_active || rw_is_locked(&dt_host_lock)'
failed at xen-source/xen/drivers/passthrough/device_tree.c:192
(XEN) ----[ Xen-4.19-unstable arm64 debug=y Not tainted ]----
[...]
(XEN) Xen call trace:
(XEN) [<00000a00002594f4>] iommu_add_dt_device+0x7c/0x17c (PC)
(XEN) [<00000a0000259494>] iommu_add_dt_device+0x1c/0x17c (LR)
(XEN) [<00000a0000267db4>] handle_device+0x68/0x1e8
(XEN) [<00000a0000208ba8>] dt_overlay_sysctl+0x9d4/0xb84
(XEN) [<00000a000027342c>] arch_do_sysctl+0x24/0x38
(XEN) [<00000a0000231ac8>] do_sysctl+0x9ac/0xa34
(XEN) [<00000a0000274b70>] traps.c#do_trap_hypercall+0x230/0x2dc
(XEN) [<00000a0000276330>] do_trap_guest_sync+0x478/0x688
(XEN) [<00000a000025e480>] entry.o#guest_sync_slowpath+0xa8/0xd8
This is because the lock is released too early. So fix the issue by
releasing the lock after handle_device().
Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities") Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Roger Pau Monné [Tue, 21 May 2024 07:15:03 +0000 (09:15 +0200)]
xen/x86: pretty print interrupt CPU affinity masks
Print the CPU affinity masks as numeric ranges instead of plain hexadecimal
bitfields.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Mon, 20 Sep 2021 12:40:21 +0000 (13:40 +0100)]
xen/trace: Drop old trace API
With all users updated to the new API, drop the old API. This includes all of
asm/hvm/trace.h, which allows us to drop some includes.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Tue, 21 Sep 2021 18:55:47 +0000 (19:55 +0100)]
xen/trace: Removal final {__,}trace_var() users in favour of the new API
The cycles parameter (which gets removed as a consequence) determines whether
trace() or trace_time() is used.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Fri, 17 Sep 2021 23:31:27 +0000 (00:31 +0100)]
xen: Switch to new TRACE() API
(Almost) no functional change.
* In irq_move_cleanup_interrupt(), use the 'me' local variable rather than
calling smp_processor_id() again. This manifests as a minor code
improvement.
* In vlapic_update_timer() and lapic_rearm(), introduce a new 'timer_period'
local variable to simplify the expressions used for both the trace and
create_periodic_time() calls.
All other differences in the compiled binary are to do with line numbers
changing.
Some conversion notes:
* HVMTRACE_LONG_[234]D() and TRACE_2_LONG_[234]D() were latently buggy. They
blindly discard extra parameters, but luckily no users are impacted. They
are also obfuscated wrappers, depending on exactly one or two parameters
being TRC_PAR_LONG() to compile successfully.
* HVMTRACE_LONG_1D() behaves unlike its named companions, and takes exactly
one 64bit parameter which it splits manually. It's one user,
vmx_cr_access()'s LMSW path, is gets adjusted.
* TRACE_?D() and TRACE_2_LONG_*() change to TRACE_TIME() as cycles is always
enabled.
* HVMTRACE_ND() is opencoded for VMENTRY/VMEXIT records to include cycles.
These are converted to TRACE_TIME(), with the old modifier parameter
expressed as an OR at the callsite. One callsite, svm_vmenter_helper() had
a nested tb_init_done check, which is dropped. (The optimiser also spotted
this, which is why it doesn't manifest as a binary difference.)
* All uses of *LONG() are either opencoded or swapped to using a struct, to
avoid MISRA issues.
* All HVMTRACE_?D() change to TRACE() as cycles is explicitly skipped.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Mon, 20 Sep 2021 13:07:43 +0000 (14:07 +0100)]
xen/sched: Clean up trace handling
There is no need for bitfields anywhere - use more sensible types. There is
also no need to cast 'd' to (unsigned char *) before passing it to a function
taking void *. Switch to new trace_time() API.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Fri, 17 Sep 2021 15:28:19 +0000 (16:28 +0100)]
xen/rt: Clean up trace handling
Most uses of bitfields and __packed are unnecessary. There is also no need to
cast 'd' to (unsigned char *) before passing it to a function taking void *.
Switch to new trace_time() API.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Wed, 15 Sep 2021 16:01:43 +0000 (17:01 +0100)]
xen/credit2: Clean up trace handling
There is no need for bitfields anywhere - use types with an explicit width
instead. There is also no need to cast 'd' to (unsigned char *) before
passing it to a function taking void *. Switch to new trace_time() API.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (13:49 +0200)]
tools/xen-cpuid: Drop old names
Not used any more. Split out of previous patch to aid legibility.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (12:49 +0100)]
tools/xen-cpuid: Use automatically generated feature names
Have gen-cpuid.py write out INIT_FEATURE_VAL_TO_NAME, derived from the same
data source as INIT_FEATURE_NAME_TO_VAL, although both aliases of common_1d
are needed.
In xen-cpuid.c, sanity check at build time that leaf_info[] and
feature_names[] are of sensible length.
As dump_leaf() rendered missing names as numbers, always dump leaves even if
we don't have the leaf name. This conversion was argumably missed in commit 59afdb8a81d6 ("tools/misc: Tweak reserved bit handling for xen-cpuid").
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (12:49 +0100)]
tools/xen-cpuid: Rename decodes[] to leaf_info[]
Split out of subsequent patch to aid legibility.
No functional change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Fri, 10 May 2024 19:04:51 +0000 (20:04 +0100)]
x86/gen-cpuid: Minor cleanup
Rename INIT_FEATURE_NAMES to INIT_FEATURE_NAME_TO_VAL as we're about to gain a
inverse mapping of the same thing.
Use dict.items() unconditionally. iteritems() is a marginal perf optimsiation
for Python2 only, and simply not worth the effort on a script this small.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Henry Wang [Mon, 20 May 2024 08:21:45 +0000 (16:21 +0800)]
tools/golang: Add missing golang bindings for vlan
It is noticed that commit: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
introduces a new "vlan" string field to libxl_device_nic. But the
golang bindings are missing. Add it in this patch.
Fixes: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic") Signed-off-by: Henry Wang <xin.wang2@amd.com> Acked-by: George Dunlap <george.dunlap@cloud.com>
Roger Pau Monné [Fri, 17 May 2024 13:56:05 +0000 (15:56 +0200)]
x86/msi: prevent watchdog triggering when dumping MSI state
Use the same check that's used in dump_irqs().
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
The value returned by __toupper is used in arithmetic operations causing
MISRA C 10.2 violations. Cast to plain char in the toupper macro. Also
do the same in tolower for consistency.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/arm: Add DT reserve map regions to bootinfo.reserved_mem
Currently the code is listing device tree reserve map regions
as reserved memory for Xen, but they are not added into
bootinfo.reserved_mem and they are fetched in multiple places
using the same code sequence, causing duplication. Fix this
by adding them to the bootinfo.reserved_mem at early stage.
Andrew Cooper [Thu, 16 May 2024 11:09:39 +0000 (12:09 +0100)]
x86/ucode: Further fixes to identify "ucode already up to date"
When the revision in hardware is newer than anything Xen has to hand,
'microcode_cache' isn't set up. Then, `xen-ucode` initiates the update
because it doesn't know whether the revisions across the system are symmetric
or not. This involves the patch getting all the way into the
apply_microcode() hooks before being found to be too old.
This is all a giant mess and needs an overhaul, but in the short term simply
adjust the apply_microcode() to return -EEXIST.
Also, unconditionally print the preexisting microcode revision on boot. It's
relevant information which is otherwise unavailable if Xen doesn't find new
microcode to use.
Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Sergiy Kibrik [Thu, 16 May 2024 11:36:22 +0000 (13:36 +0200)]
x86/p2m: move altp2m-related code to separate file
Move altp2m code from generic p2m.c file to altp2m.c, so it is kept separately
and can possibly be disabled in the build. We may want to disable it when
building for specific platform only, that doesn't support alternate p2m.
No functional change intended.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 16 May 2024 11:35:34 +0000 (13:35 +0200)]
x86/MCE: guard access to Intel/AMD-specific MCA MSRs
Add build-time checks for newly introduced INTEL/AMD config options when
calling vmce_{intel/amd}_{rdmsr/wrmsr}() routines.
This way a platform-specific code can be omitted in vmce code, if this
platform is disabled in config.
Oleksii Kurochko [Thu, 16 May 2024 08:08:37 +0000 (10:08 +0200)]
xen/bitops: put __ffs() into linux compatible header
The mentioned macros exist only because of Linux compatible purpose.
The patch defines __ffs() in terms of Xen bitops and it is safe
to define in this way ( as __ffs() - 1 ) as considering that __ffs()
was defined as __builtin_ctzl(x), which has undefined behavior when x=0,
so it is assumed that such cases are not encountered in the current code.
To not include <xen/linux-compat.h> to Xen library files __ffs() and __ffz()
were defined locally in find-next-bit.c.
Except __ffs() usage in find-next-bit.c only one usage of __ffs() leave
in smmu-v3.c. It seems that it __ffs can be changed to ffsl(x)-1 in
this file, but to keep smmu-v3.c looks close to linux it was deciced just
to define __ffs() in xen/linux-compat.h and include it in smmu-v3.c
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Rahul Singh <rahul.singh@arm.com>
Jan Beulich [Thu, 16 May 2024 08:03:16 +0000 (10:03 +0200)]
x86: detect PIC aliasing on ports other than 0x[2A][01]
... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to both PICs. Unlike for CMOS/RTC, do
detection very early, to avoid disturbing normal operation later on.
Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects in case it does not
alias the respective PIC's one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Jan Beulich [Thu, 16 May 2024 08:02:34 +0000 (10:02 +0200)]
x86: allow to suppress port-alias probing
By default there's already no use for this when we run in shim mode.
Plus there may also be a need to suppress the probing in case of issues
with it. Before introducing further port alias probing, introduce a
command line option allowing to bypass it, default it to on when in shim
mode, and gate RTC/CMOS port alias probing on it.
Requested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
automation/eclair_analysis: deviate macro count_args_ for MISRA Rule 20.7
The count_args_ macro violates Rule 20.7, but it can't be made
compliant with Rule 20.7 without breaking its functionality. Since
it's very unlikely for this macro to be misused, it is deviated.
Nicola Vetrini [Wed, 15 May 2024 07:51:59 +0000 (09:51 +0200)]
automation/eclair_analysis: fully deviate MISRA C Rules 21.9 and 21.10
These rules are concerned with the use of facilities provided by the
C Standard Library (qsort, bsearch for rule 21.9, and those provided
by <time.h> for rule 21.10).
Xen provides in its source code its own implementation of some of these
functions and macros, therefore a justification is provided for allowing
uses of these functions in the project.
The rules are also marked as clean as a consequence.
Roger Pau Monne [Mon, 13 May 2024 08:59:25 +0000 (10:59 +0200)]
x86/mtrr: avoid system wide rendezvous when setting AP MTRRs
There's no point in forcing a system wide update of the MTRRs on all processors
when there are no changes to be propagated. On AP startup it's only the AP
that needs to write the system wide MTRR values in order to match the rest of
the already online CPUs.
We have occasionally seen the watchdog trigger during `xen-hptool cpu-online`
in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs
on all the CPUs in the system.
While there adjust the comment to clarify why the system-wide resetting of the
MTRR registers is not needed for the purposes of mtrr_ap_init().
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Leigh Brown [Wed, 8 May 2024 21:38:21 +0000 (22:38 +0100)]
tools/xl: add vlan keyword to vif option
Update parse_nic_config() to support a new `vlan' keyword. This
keyword specifies the VLAN configuration to assign to the VIF when
attaching it to the bridge port, on operating systems that support
the capability (e.g. Linux). The vlan keyword will allow one or
more VLANs to be configured on the VIF when adding it to the bridge
port. This will be done by the vif-bridge script and functions.
Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Leigh Brown [Wed, 8 May 2024 21:38:20 +0000 (22:38 +0100)]
tools/libs/light: Add vlan field to libxl_device_nic
Add `vlan' string field to libxl_device_nic, to allow a VLAN
configuration to be specified for the VIF when adding it to the
bridge device.
Update libxl_nic.c to read and write the vlan field from the
xenstore.
This provides the capability for supported operating systems (e.g.
Linux) to perform VLAN filtering on bridge ports. The Xen
hotplug scripts need to be updated to read this information from
the xenstore and perform the required configuration.
Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Leigh Brown [Tue, 14 May 2024 08:13:44 +0000 (09:13 +0100)]
tools/xentop: Fix cpu% sort order
In compare_cpu_pct(), there is a double -> unsigned long long converion when
calling compare(). In C, this discards the fractional part, resulting in an
out-of order sorting such as:
Andrew Cooper [Thu, 9 May 2024 17:40:11 +0000 (18:40 +0100)]
tools/hvmloader: Further simplify SMP setup
Now that we're using hypercalls to start APs, we can replace the 'ap_cpuid'
global with a regular function parameter. This requires telling the compiler
that we'd like the parameter in a register rather than on the stack.
While adjusting, rename to cpu_setup(). It's always been used on the BSP,
making the name ap_start() specifically misleading.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Andrew Cooper [Sat, 11 May 2024 18:25:00 +0000 (19:25 +0100)]
x86/cpufreq: Rename cpuid variable/parameters to cpu
Various functions have a parameter or local variable called cpuid, but this
triggers a MISRA R5.3 violation because we also have a function called cpuid()
which wraps the real CPUID instruction.
In all these cases, it's a Xen cpu index, which is far more commonly named
just cpu in our code.
While adjusting these, fix a couple of other issues:
* cpufreq_cpu_init() is on the end of a hypercall (with in-memory parameters,
even), making EFAULT the wrong error to use. Use EOPNOTSUPP instead.
* check_est_cpu() is wrong to tie EIST to just Intel, and nowhere else using
EIST makes this restriction. Just check the feature itself, which is more
succinctly done after being folded into its single caller.
* In powernow_cpufreq_update(), replace an opencoded cpu_online().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 15 May 2024 13:35:15 +0000 (15:35 +0200)]
x86: respect mapcache_domain_init() failing
The function itself properly handles and hands onwards failure from
create_perdomain_mapping(). Therefore its caller should respect possible
failure, too.
Fixes: 4b28bf6ae90b ("x86: re-introduce map_domain_page() et al") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Juergen Gross [Wed, 15 May 2024 15:25:39 +0000 (17:25 +0200)]
xen/sched: set all sched_resource data inside locked region for new cpu
When adding a cpu to a scheduler, set all data items of struct
sched_resource inside the locked region, as otherwise a race might
happen (e.g. when trying to access the cpupool of the cpu):
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Fixes: a8c6c623192e ("sched: clarify use cases of schedule_cpu_switch()") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 2 Apr 2024 14:50:19 +0000 (15:50 +0100)]
Revert "evtchn: refuse EVTCHNOP_status for Xen-bound event channels"
The commit makes a claim without justification.
The claim is false; it broke lsevtchn in dom0, a debugging utility which
absolutely does care about all of the domain's event channels.
Whether to return information about a xen-owned evtchn is a matter of policy,
and it's not acceptable to subvert Xen's security subsystem on the decision.
Fixes: f60ab5337f96 ("evtchn: refuse EVTCHNOP_status for Xen-bound event channels") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Andrew Cooper [Fri, 10 May 2024 22:56:52 +0000 (23:56 +0100)]
xen: Use -Wuninitialized and -Winit-self
Assigning a variable to itself is an anti-pattern. It introduces definite UB
in an attempt to silence a warning about possible UB.
As it's definite undefined behaviour, it also mis-compiles in simple cases,
using whatever stale value happened to be in the allocated register.
Clang includes -Wuninitialized within -Wall, but GCC only includes it in
-Wextra, which is not used by Xen at this time.
Furthermore, the specific pattern of assigning a variable to itself in its
declaration is only diagnosed by GCC with -Winit-self. Clang does diagnose
simple forms of this pattern with a plain -Wuninitialized, but it fails to
diagnose the instances in Xen that GCC manages to find.
GCC, with -Wuninitialized and -Winit-self notices:
arch/x86/time.c: In function ‘read_pt_and_tsc’:
arch/x86/time.c:297:14: error: ‘best’ is used uninitialized in this function [-Werror=uninitialized]
297 | uint32_t best = best;
| ^~~~
arch/x86/time.c: In function ‘read_pt_and_tmcct’:
arch/x86/time.c:1022:14: error: ‘best’ is used uninitialized in this function [-Werror=uninitialized]
1022 | uint64_t best = best;
| ^~~~
Fix these up to start with a value of ~0, which is also more robust in the
case that something goes wrong.
Fixes: 23658e823238 ("x86/time: further improve TSC / CPU freq calibration accuracy") Fixes: 3f3906b462d5 ("x86/APIC: calibrate against platform timer when possible") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Nicola Vetrini [Fri, 10 May 2024 18:03:36 +0000 (20:03 +0200)]
automation/eclair_analysis: tag MISRA C Rule 1.1 as clean
Tag the rule as clean, as there are no more violations in the codebase since 93c27d54dd23 ("xen/arm: Fix MISRA regression on R1.1,
flexible array member not at the end").
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
libxl: Fix handling XenStore errors in device creation
If xenstored runs out of memory it is possible for it to fail operations
that should succeed. libxl wasn't robust against this, and could fail
to ensure that the TTY path of a non-initial console was created and
read-only for guests. This doesn't qualify for an XSA because guests
should not be able to run xenstored out of memory, but it still needs to
be fixed.
Add the missing error checks to ensure that all errors are properly
handled and that at no point can a guest make the TTY path of its
frontend directory writable.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com>
x86/hvm: Allow access to registers on the same page as MSI-X table
Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
on the same page as MSI-X table. Device model (especially one in
stubdomain) cannot really handle those, as direct writes to that page is
refused (page is on the mmio_ro_ranges list). Instead, extend
msixtbl_mmio_ops to handle such accesses too.
Doing this, requires correlating read/write location with guest
MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
for PV would need to be done separately.
This will be also used to read Pending Bit Array, if it lives on the same
page, making QEMU not needing /dev/mem access at all (especially helpful
with lockdown enabled in dom0). If PBA lives on another page, QEMU will
map it to the guest directly.
If PBA lives on the same page, discard writes and log a message.
Technically, writes outside of PBA could be allowed, but at this moment
the precise location of PBA isn't saved, and also no known device abuses
the spec in this way (at least yet).
To access those registers, msixtbl_mmio_ops need the relevant page
mapped. MSI handling already has infrastructure for that, using fixmap,
so try to map first/last page of the MSI-X table (if necessary) and save
their fixmap indexes. Note that msix_get_fixmap() does reference
counting and reuses existing mapping, so just call it directly, even if
the page was mapped before. Also, it uses a specific range of fixmap
indexes which doesn't include 0, so use 0 as default ("not mapped")
value - which simplifies code a bit.
Based on assumption that all MSI-X page accesses are handled by Xen, do
not forward adjacent accesses to other hypothetical ioreq servers, even
if the access wasn't handled for some reason (failure to map pages etc).
Relevant places log a message about that already.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
The arch_msix struct had a single "warned" field with a domid for which
warning was issued. Upcoming patch will need similar mechanism for few
more warnings, so change it to save a bit field of issued warnings.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Fri, 10 May 2024 12:49:13 +0000 (14:49 +0200)]
libxl: fix population of the online vCPU bitmap for PVH
libxl passes some information to libacpi to create the ACPI table for a PVH
guest, and among that information it's a bitmap of which vCPUs are online
which can be less than the maximum number of vCPUs assigned to the domain.
While the population of the bitmap is done correctly for HVM based on the
number of online vCPUs, for PVH the population of the bitmap is done based on
the number of maximum vCPUs allowed. This leads to all local APIC entries in
the MADT being set as enabled, which contradicts the data in xenstore if vCPUs
is different than maximum vCPUs.
Fix by copying the internal libxl bitmap that's populated based on the vCPUs
parameter.
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com> Link: https://gitlab.com/libvirt/libvirt/-/issues/399 Reported-by: Leigh Brown <leigh@solinno.co.uk> Fixes: 14c0d328da2b ('libxl/acpi: Build ACPI tables for HVMlite guests') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Leigh Brown <leigh@solinno.co.uk> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Fri, 10 May 2024 14:16:36 +0000 (16:16 +0200)]
xen: allow up to 16383 cpus
With lock handling now allowing up to 16384 cpus (spinlocks can handle
65535 cpus, rwlocks can handle 16384 cpus), raise the allowed limit for
the number of cpus to be configured to 16383.
The new limit is imposed by IOMMU_CMD_BUFFER_MAX_ENTRIES and
QINVAL_MAX_ENTRY_NR required to be larger than 2 * CONFIG_NR_CPUS.
Add a support limit of physical CPUs to SUPPORT.md (4096 on x86, 128
on ARM).
automation/eclair: hide reports coming from adopted code in scheduled analysis
To improve clarity and ease of navigation do not show reports related
to adopted code in the scheduled analysis.
Configuration options are commented out because they may be useful
in the future.
automation/eclair_analysis: amend configuration for some MISRA rules
Adjust ECLAIR configuration for rules: R21.14, R21.15, R21.16 by taking
into account mem* macros defined in the Xen sources as if they were
equivalent to the ones in Standard Library.
xen/arm: Fix MISRA regression on R1.1, flexible array member not at the end
Commit 2209c1e35b47 ("xen/arm: Introduce a generic way to access memory
bank structures") introduced a MISRA regression for Rule 1.1 because a
flexible array member is introduced in the middle of a struct, furthermore
this is using a GCC extension that is going to be deprecated in GCC 14 and
a warning to identify such cases will be present
(-Wflex-array-member-not-at-end) to identify such cases.
In order to fix this issue, use the macro __struct_group to create a
structure 'struct membanks_hdr' which will hold the common data among
structures using the 'struct membanks' interface.
Modify the 'struct shared_meminfo' and 'struct meminfo' to use this new
structure, effectively removing the flexible array member from the middle
of the structure and modify the code accessing the .common field to use
the macro container_of to maintain the functionality of the interface.
Given this change, container_of needs to be supplied with a type and so
the macro 'kernel_info_get_mem' inside arm/include/asm/kernel.h can't be
an option since it uses const and non-const types for struct membanks, so
introduce two static inline, one of which will keep the const qualifier.
Given the complexity of the interface, which carries a lot of benefit but
on the other hand could be prone to developer confusion if the access is
open-coded, introduce two static inline helper for the
'struct kernel_info' .shm_mem member and get rid the open-coding
shm_mem.common access.
Fixes: 2209c1e35b47 ("xen/arm: Introduce a generic way to access memory bank structures") Reported-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Import __struct_group from Linux, commit 50d7bd38c3aa
("stddef: Introduce struct_group() helper macro"), in order to
allow the access through the anonymous structure to the members
without having to write also the name, e.g:
struct foo {
int one;
struct {
int two;
int three, four;
} thing;
int five;
};
would become:
struct foo {
int one;
__struct_group(/* None */, thing, /* None */,
int two;
int three, four;
);
int five;
};
Allowing the users of this structure to access the .thing members by
using .two/.three/.four on the struct foo.
This construct will become useful in order to have some generalized
interfaces that shares some common members.
Andrew Cooper [Tue, 23 Apr 2024 15:45:36 +0000 (16:45 +0100)]
x86/boot: Explain how moving mod[0] works
modules_headroom is a misleading name as it applies strictly to mod[0] only,
and the movement loop is deeply unintuitive and completely undocumented.
Provide help to whomever needs to look at this code next.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Jason Andryuk <jason.andryuk@gmail.com>
x86/IOMMU: address violations of MISRA C:2012 Rule 14.4
The xen sources contain violations of MISRA C:2012 Rule 14.4 whose
headline states:
"The controlling expression of an if statement and the controlling
expression of an iteration-statement shall have essentially Boolean type".
Add comparisons to avoid using enum constants as controlling expressions
to comply with Rule 14.4.
Amend the comment in the enum definition to reflect the fact that
boolean uses of iommu_intremap are no longer allowed.
No functional change.
Signed-off-by: Maria Celeste Cesario <maria.celeste.cesario@bugseng.com> Signed-off-by: Simone Ballarin <simone.ballarin@bugseng.com> Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/unaligned: address violation of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 29 Apr 2024 16:31:03 +0000 (17:31 +0100)]
x86/hvm: Defer the size calculation in hvm_save_cpu_xsave_states()
HVM_CPU_XSAVE_SIZE() may rewrite %xcr0 twice. Defer the calculation until
after we've decided to write out an XSAVE record.
Note in hvm_load_cpu_xsave_states() that there were versions of Xen which
wrote out a useless XSAVE record. This sadly limits out ability to tidy up
the existing infrastructure. Also leave a note in xstate_ctxt_size() that 0
still needs tolerating for now.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
tools/hvmloader: Wake APs with hypercalls rather than INIT+SIPI+SIPI
... in order to change how LAPIC_ID handling works. Importantly, this allows
us to start APs by vCPU ID in order to query the LAPIC_ID, rather than needing
to know the APIC_ID in order to wake them.
Other improvements avoid:
* The 16bit entry stub
* A LMSW insn, which has no decode assist on AMD and needs emulating fully
* 13 vLAPIC emulations when 3 hypercalls can do
* 4 pages of stack when 1 in plenty
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 24 Aug 2022 10:08:28 +0000 (11:08 +0100)]
tools/hvmloader: Move various helpers to being static inlines
The IO port, MSR, IO-APIC and LAPIC accessors compile typically to single or
pairs of instructions, which is less overhead than even the stack manipulation
to call the helpers.
Move the implementations from util.c to being static inlines in util.h
In addition, turn ioapic_base_address into a constant as it is never modified
from 0xfec00000 (substantially shrinks the IO-APIC logic), and make use of the
"A" constraint for WRMSR/RDMSR like we already do for RDTSC.
Daniel P. Smith [Wed, 24 Apr 2024 16:34:22 +0000 (12:34 -0400)]
xen/gunzip: Move crc state into gunzip_state
Move the crc and its state into struct gunzip_state. In the process, expand
the only use of CRC_VALUE as it is hides what is being compared.
Furthermore, all variables here should be uint32_t rather than unsigned long,
which halves the storage space required. Filter the typechanges through the
logic.
Adjust the logic to hold crc in a positive form, and negate it for update in
flush_window(). This is the more normal way to write CRC algorithms, and
avoids weird-to-follow logic in gunzip().
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Daniel P. Smith [Wed, 24 Apr 2024 16:34:19 +0000 (12:34 -0400)]
xen/gunzip: Move input buffer handling into gunzip_state
Move the input buffer handling, buffer pointer(inbuf), size(insize), and
index(inptr), into gunzip_state. Adjust functions and macros that consumed the
input buffer to accept a struct gunzip_state reference.
Convert get_byte() into a real function and subsume fill_inbuf(). Fix the
failure path to work correctly when error() stops being a plain panic().
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 6 May 2024 08:08:40 +0000 (10:08 +0200)]
xen/gunzip: don't leak memory on error paths
While decompression errors are likely going to be fatal to Xen's boot
process anyway, the latest with the goal of doing multiple decompressor
runs it is likely better to avoid leaks even on error paths. All the
more when this way code size actually shrinks a tiny bit.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Nicola Vetrini [Mon, 6 May 2024 08:52:31 +0000 (10:52 +0200)]
automation/eclair_analysis: unblock pipelines from certain repositories
Repositories under people/* only execute the analyze step if manually
triggered, but in order to avoid blocking the rest of the pipeline
if such step is not run, allow it to fail.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
George Dunlap [Thu, 25 Apr 2024 08:49:42 +0000 (09:49 +0100)]
svm: Fix MISRA 8.2 violation
Misra 8.2 requires named parameters in prototypes. Use the name from
the implementaiton.
Fixes: 0d19d3aab0 ("svm/nestedsvm: Introduce nested capabilities bit") Reported-by: Andrew Cooper <andrew.cooper@cloud.com> Reported-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 7 May 2024 11:19:41 +0000 (12:19 +0100)]
x86/cpu-policy: Fix migration from Ice Lake to Cascade Lake
Ever since Xen 4.14, there has been a latent bug with migration.
While some toolstacks can level the features properly, they don't shink
feat.max_subleaf when all features have been dropped. This is because
we *still* have not completed the toolstack side work for full CPU Policy
objects.
As a consequence, even when properly feature levelled, VMs can't migrate
"backwards" across hardware which reduces feat.max_subleaf. One such example
is Ice Lake (max_subleaf=2 for INTEL_PSFD) to Cascade Lake (max_subleaf=0).
Extend the max policies feat.max_subleaf to the hightest number Xen knows
about, but leave the default policies matching the host. This will allow VMs
with a higher feat.max_subleaf than strictly necessary to migrate in.
Eventually we'll manage to teach the toolstack how to avoid creating such VMs
in the first place, but there's still more work to do there.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Sat, 4 May 2024 01:10:33 +0000 (02:10 +0100)]
tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC
The header description for xs_open() goes as far as to suggest that the fd is
O_CLOEXEC, but it isn't actually.
`xl devd` has been observed leaking /dev/xen/xenbus into children.
Link: https://github.com/QubesOS/qubes-issues/issues/8292 Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com>