Andrii Sultanov [Thu, 22 Aug 2024 09:06:02 +0000 (10:06 +0100)]
tools/ocaml: Remove '-cc $(CC)' from OCAMLOPTFLAGS
This option does not work as one might expect, and needs to be the full
compiler invocation including linking arguments to operate correctly.
See https://github.com/ocaml/ocaml/issues/12284 for more details.
Signed-off-by: Andrii Sultanov <andrii.sultanov@cloud.com> Acked-by: Christian Lindig <christian.lindig@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 23 Aug 2024 07:13:07 +0000 (09:13 +0200)]
x86emul: convert op_bytes/opc checks in SIMD emulation
Raising #UD for an internal shortcoming of the emulator isn't quite
right. Similarly BUG() is bigger a hammer than needed.
Switch to using EXPECT() instead. This way even for insns not covered by
the test harness fuzzers will have a chance of noticing issues, should
any still exist or new ones be introduced.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 23 Aug 2024 07:12:24 +0000 (09:12 +0200)]
x86emul: set (fake) operand size for AVX512CD broadcast insns
Back at the time I failed to pay attention to op_bytes still being zero
when reaching the respective case block: With the ext0f38_table[]
entries having simd_packed_int, the defaulting at the bottom of
x86emul_decode() won't set the field to non-zero for F3-prefixed insns.
Fixes: 37ccca740c26 ("x86emul: support AVX512CD insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 23 Aug 2024 07:11:15 +0000 (09:11 +0200)]
x86emul: always set operand size for AVX-VNNI-INT8 insns
Unlike for AVX-VNNI-INT16 I failed to notice that op_bytes may still be
zero when reaching the respective case block: With the ext0f38_table[]
entries having simd_packed_int, the defaulting at the bottom of
x86emul_decode() won't set the field to non-zero for F3- or F2-prefixed
insns.
Fixes: 842acaa743a5 ("x86emul: support AVX-VNNI-INT8") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
After commit c36efb7fcea6 ("automation: use expect to run QEMU") we lost
the \r filtering introduced by b576497e3b7d ("automation: remove CR
characters from serial output"). This patch reintroduced it.
Fixes: c36efb7fcea6 ("automation: use expect to run QEMU") Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Michal Orzel [Wed, 14 Aug 2024 09:43:03 +0000 (11:43 +0200)]
xen/arm: head: Do not pass physical offset to start_xen
Given no further use in C world of boot_phys_offset, drop it from the
argument list of start_xen() and do the necessary changes in the startup
code head.S (most notably modifying launch not to expect 2 arguments to
pass to C entry point).
Both on arm64 and arm32, phys offset (stored in x20 or r10 respectively)
is still needed, so that it can be used in e.g. create_table_entry,
therefore keep it on the list of common register usage.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Julien Grall <jgrall@amazon.com>
Michal Orzel [Wed, 14 Aug 2024 09:43:02 +0000 (11:43 +0200)]
xen/arm: Drop {boot_}phys_offset usage
boot_phys_offset stores the physical offset (PA-VA), is calculated in
the startup head.S code and passed to start_xen() as a first argument.
There is no point in using it given that we can ask MMU to translate
a VA for us using e.g. virt_to_{mfn,maddr}. Drop usage of these
variables from the C world.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Julien Grall <jgrall@amazon.com>
Julien Grall [Wed, 14 Aug 2024 21:00:54 +0000 (22:00 +0100)]
xen/arm64: Hide FEAT_SME
Newer hardware may support FEAT_SME. Xen doesn't have any knowledge but
it will still expose the feature to the VM. If the OS is trying to use
SME, then it will crash.
update_boot_mapping() invokes update_identity_mapping() for the MMU specific
code.
Later when the MPU code is added, update_boot_mapping() will invoke the
equivalent.
The common code now invokes update_boot_mapping() instead of
update_identity_mapping(). So, that there is clear abstraction between the
common and MMU/MPU specific logic.
This is in continuation to commit f661a20aa880: "Extract MMU-specific MM code".
update_identity_mapping() is now marked as static as it is called from
xen/arch/arm/arm64/mmu/mm.c only. Also, amend the prototype to
update_boot_mapping() which is now invoked from other files.
Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Amneesh Singh [Tue, 20 Aug 2024 08:22:03 +0000 (13:52 +0530)]
drivers: char: omap-uart: provide a default clock frequency
Quite a few TI K3 devices do not have clock-frequency specified in their
respective UART nodes. However hard-coding the frequency is not a
solution as the function clock input can differ between SoCs. So, use a
default frequency of 48MHz, which is the same as the linux default (see
8250_omap.c), if the device tree does not specify it.
Signed-off-by: Amneesh Singh <a-singh21@ti.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Andrew Cooper [Thu, 15 Aug 2024 12:18:08 +0000 (13:18 +0100)]
x86/pv: Address Coverity complaint in check_guest_io_breakpoint()
Commit 08aacc392d86 ("x86/emul: Fix misaligned IO breakpoint behaviour in PV
guests") caused a Coverity INTEGER_OVERFLOW complaint based on the reasoning
that width could be 0.
It can't, but digging into the code generation, GCC 8 and later (bisected on
godbolt) choose to emit a CSWITCH lookup table, and because the range (bottom
2 bits clear), it's a 16-entry lookup table.
So Coverity is understandable, given that GCC did emit a (dead) logic path
where width stayed 0.
Rewrite the logic. Introduce x86_bp_width() which compiles to a single basic
block, which replaces the switch() statement. Take the opportunity to also
make start and width be loop-scope variables.
No practical change, but it should compile better and placate Coverity.
Fixes: 08aacc392d86 ("x86/emul: Fix misaligned IO breakpoint behaviour in PV guests")
Coverity-ID: 1616152 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 24 May 2018 11:39:00 +0000 (12:39 +0100)]
x86/pv: Fix merging of new status bits into %dr6
All #DB exceptions result in an update of %dr6, but this isn't captured in
Xen's handling, and is buggy just about everywhere.
To begin resolving this issue, add a new pending_dbg field to x86_event
(unioned with cr2 to avoid taking any extra space, adjusting users to avoid
old-GCC bugs with anonymous unions), and introduce pv_inject_DB() to replace
the current callers using pv_inject_hw_exception().
Push the adjustment of v->arch.dr6 into pv_inject_event(), and use the new
x86_merge_dr6() rather than the current incorrect logic.
A key property is that pending_dbg is taken with positive polarity to deal
with RTM/BLD sensibly. Most callers pass in a constant, but callers passing
in a hardware %dr6 value need to XOR the value with X86_DR6_DEFAULT to flip to
positive polarity.
This fixes the behaviour of the breakpoint status bits; that any left pending
are generally discarded when a new #DB is raised. In principle it would fix
RTM/BLD too, except PV guests can't turn these capabilities on to start with.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 28 May 2018 14:20:18 +0000 (15:20 +0100)]
x86/pv: Introduce x86_merge_dr6() and fix do_debug()
Pretty much everywhere in Xen the logic to update %dr6 when injecting #DB is
buggy. Introduce a new x86_merge_dr6() helper, and start fixing the mess by
adjusting the dr6 merge in do_debug(). Also correct the comment.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 19 Aug 2024 13:32:31 +0000 (15:32 +0200)]
x86emul: correct #UD check for AVX512-FP16 complex multiplications
avx512_vlen_check()'s argument was inverted, while the surrounding
conditional wrongly forced the EVEX.L'L check for the scalar forms when
embedded rounding was in effect.
Fixes: d14c52cba0f5 ("x86emul: handle AVX512-FP16 complex multiplication insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 14 Aug 2024 13:40:30 +0000 (15:40 +0200)]
xvmalloc: please Misra C:2012 Rule 8.2
The cloning from xmalloc.h happened long before Misra work started in
earnest, leading to the missing parameter name having been overlooked
later on.
Fixes: 9102fcd9579f ("mm: introduce xvmalloc() et al and use for grant table allocations") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 14 Aug 2024 13:40:06 +0000 (15:40 +0200)]
x86emul: don't call ->read_segment() with x86_seg_none
LAR, LSL, VERR, and VERW emulation involve calling protmode_load_seg()
with x86_seg_none. The fuzzer's read_segment() hook function has an
assertion which triggers in this case. Calling the hook function,
however, makes little sense for those insns, as there's no data to
retrieve. Instead zero-filling the output structure is what properly
corresponds to those insns being invoked with a NUL selector.
While there also add a related comment at the VERR/VERW call site.
Fixes: 06a3b8cd7ad2 ("x86emul: support LAR/LSL/VERR/VERW") Link: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=70918 Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Wed, 14 Aug 2024 13:39:26 +0000 (15:39 +0200)]
x86/spec-ctrl: initialize per-domain XPTI in spec_ctrl_init_domain()
XPTI being a speculation mitigation feels better to be initialized in
spec_ctrl_init_domain().
No functional change intended, although the call to spec_ctrl_init_domain() in
arch_domain_create() needs to be moved ahead of pv_domain_initialise() for
d->->arch.pv.xpti to be correctly set.
Move it ahead of most of the initialization functions, since
spec_ctrl_init_domain() doesn't depend on any member in the struct domain being
set.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 14 Aug 2024 06:50:14 +0000 (08:50 +0200)]
libxl/disk: avoid aliasing of abs()
Tool chains with -Wshadow enabled by default won't like the function
parameter name "abs", for aliasing stdlib.h's abs(). Rename the
parameter to what other similar functions use.
Fixes: a18b50614d97 ("libxl: Enable stubdom cdrom changing") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Jan Beulich [Wed, 14 Aug 2024 06:47:58 +0000 (08:47 +0200)]
mm: introduce xvmalloc() et al and use for grant table allocations
All of the array allocations in grant_table_init() can exceed a page's
worth of memory, which xmalloc()-based interfaces aren't really suitable
for after boot. We also don't need any of these allocations to be
physically contiguous. Introduce interfaces dynamically switching
between xmalloc() et al and vmalloc() et al, based on requested size,
and use them instead.
All the wrappers in the new header are cloned mostly verbatim from
xmalloc.h, with the sole adjustment to switch unsigned long to size_t
for sizes and to unsigned int for alignments, and with the cloning of
x[mz]alloc_bytes() avoided. The exception is xvmemdup(), where the
const related comment on xmemdup() is actually addressed and hence
dropped.
While adjusting grant_table_destroy() also move ahead the clearing of
the struct domain field.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Use expect to invoke QEMU so that we can terminate the test as soon as
we get the right string in the output instead of waiting until the
final timeout.
For timeout, instead of an hardcoding the value, use a Gitlab CI
variable "QEMU_TIMEOUT" that can be changed depending on the latest
status of the Gitlab CI runners.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Julien Grall [Tue, 6 Aug 2024 12:48:15 +0000 (13:48 +0100)]
xen/arm64: entry: Actually skip do_trap_*() when an SError is triggered
For SErrors, we support two configurations:
* Every SErrors will result to a panic in Xen
* We will forward SErrors triggered by a VM back to itself
For the latter case, we want to skip the call to do_trap_*() because the PC
was already adjusted.
However, the alternative used to decide between the two configurations
is inverted. This would result to the VM corrupting itself if:
* x19 is non-zero in the panic case
* advance PC too much in the second case
Solve the issue by switch from alternative_if to alternative_if_not.
Fixes: a458d3bd0d25 ("xen/arm: entry: Ensure the guest state is synced when receiving a vSError") Signed-off-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Tue, 13 Aug 2024 11:49:39 +0000 (13:49 +0200)]
Arm: correct FIXADDR_TOP
While reviewing a RISC-V patch cloning the Arm code, I noticed an
off-by-1 here: FIX_PMAP_{BEGIN,END} being an inclusive range and
FIX_LAST being the same as FIX_PMAP_END, FIXADDR_TOP cannot derive from
FIX_LAST alone, or else the BUG_ON() in virt_to_fix() would trigger if
FIX_PMAP_END ended up being used.
While touching this area also add a check for fixmap and boot FDT area
to not only not overlap, but to have at least one (unmapped) page in
between.
Fixes: 4f17357b52f6 ("xen/arm: add Persistent Map (PMAP) infrastructure") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Jan Beulich [Tue, 13 Aug 2024 14:41:25 +0000 (16:41 +0200)]
x86emul: fix UB multiplications in S/G handling
The conversion of the shifts to multiplications by the commits tagged
below still wasn't quite right: The multiplications (of signed values)
can overflow, too. As of 298556c7b5f8 ("x86emul: correct 32-bit address
handling for AVX2 gathers") signed multiplication wasn't necessary
anymore, though: The necessary sign-extension (if any) will happen as
well when using intermediate variables of unsigned long types, and
excess address bits are chopped off by truncate_ea().
Fixes: b6a907f8c83d ("x86emul: replace UB shifts") Fixes: 21de9680eb59 ("x86emul: replace further UB shifts")
Oss-fuzz: 71138 Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Oleksii Kurochko [Tue, 13 Aug 2024 14:39:43 +0000 (16:39 +0200)]
xen/riscv: enable CONFIG_HAS_DEVICE_TREE
Enable build of generic functionality for working with device
tree for RISC-V.
Also, a collection of functions for parsing memory map and other
boot information from a device tree are available now.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
hvmloader's cpuid() implementation deviates from Xen's in that the value
passed on ecx is unspecified. This means that when used on leaves that
implement subleaves it's unspecified which one you get; though it's more
than likely an invalid one.
Import Xen's implementation so there are no surprises.
Fixes: 318ac791f9f9 ("Add utilities needed for SMBIOS generation to hvmloader") Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Tue, 13 Aug 2024 14:38:40 +0000 (16:38 +0200)]
x86/vmx: guard access to cpu_has_vmx_* in common code
There're several places in common code, outside of arch/x86/hvm/vmx,
where cpu_has_vmx_* get accessed without checking whether VMX supported first.
These macros rely on global variables defined in vmx code, so when VMX support
is disabled accesses to these variables turn into build failures.
To overcome these failures, build-time check is done before accessing global
variables, so that DCE would remove these variables.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Oleksii Kurochko [Tue, 13 Aug 2024 14:38:15 +0000 (16:38 +0200)]
xen/riscv: enable GENERIC_BUG_FRAME
Enable GENERIC_BUG_FRAME to support BUG(), WARN(), ASSERT,
and run_in_exception_handler().
"UNIMP" is used for BUG_INSTR, which, when macros from <xen/bug.h>
are used, triggers an exception with the ILLEGAL_INSTRUCTION cause.
This instruction is encoded as a 2-byte instruction when
CONFIG_RISCV_ISA_C is enabled: ffffffffc0046ba0: 0000 unimp
and is encoded as a 4-byte instruction when CONFIG_RISCV_ISA_C
ins't enabled: ffffffffc005a460: c0001073 unimp
Using 'ebreak' as BUG_INSTR does not guarantee proper handling of macros
from <xen/bug.h>. If a debugger inserts a breakpoint (using the 'ebreak'
instruction) at a location where Xen already uses 'ebreak', it
creates ambiguity. Xen cannot distinguish whether the 'ebreak'
instruction is inserted by the debugger or is part of Xen's own code.
Remove BUG_INSN_32 and BUG_INSN_16 macros as they encode the ebreak
instruction, which is no longer used for BUG_INSN.
Update the comment above the definition of INS_LENGTH_MASK as instead of
'ebreak' instruction 'unimp' instruction is used.
<xen/lib.h> is included for the reason that panic() and printk() are
used in common/bug.c and RISC-V fails if it is not included.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 13 Aug 2024 14:37:25 +0000 (16:37 +0200)]
x86/pass-through: documents as security-unsupported when sharing resources
When multiple devices share resources and one of them is to be passed
through to a guest, security of the entire system and of respective
guests individually cannot really be guaranteed without knowing
internals of any of the involved guests. Therefore such a configuration
cannot really be security-supported, yet making that explicit was so far
missing.
This is XSA-461 / CVE-2024-31146.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com>
Teddy Astie [Tue, 13 Aug 2024 14:36:40 +0000 (16:36 +0200)]
x86/IOMMU: move tracking in iommu_identity_mapping()
If for some reason xmalloc() fails after having mapped the reserved
regions, an error is reported, but the regions remain mapped in the P2M.
Similarly if an error occurs during set_identity_p2m_entry() (except on
the first call), the partial mappings of the region would be retained
without being tracked anywhere, and hence without there being a way to
remove them again from the domain's P2M.
Move the setting up of the list entry ahead of trying to map the region.
In cases other than the first mapping failing, keep record of the full
region, such that a subsequent unmapping request can be properly torn
down.
To compensate for the potentially excess unmapping requests, don't log a
warning from p2m_remove_identity_entry() when there really was nothing
mapped at a given GFN.
This is XSA-460 / CVE-2024-31145.
Fixes: 2201b67b9128 ("VT-d: improve RMRR region handling") Fixes: c0e19d7c6c42 ("IOMMU: generalize VT-d's tracking of mapped RMRR regions") Signed-off-by: Teddy Astie <teddy.astie@vates.tech> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
The Yocto jobs take a long time to run. We are changing Gitlab ARM64
runners and the new runners might not be able to finish the Yocto jobs
in a reasonable time.
For now, disable the Yocto jobs by turning them into "manual" trigger
(they need to be manually executed.)
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
docs: Introduce Fusa Requirement and define maintainers
The FUSA folder is expected to contain requirements and other documents
to enable safety certification of Xen hypervisor.
Added a intro to explain how the requirements are categorized, written
and their supported status.
Also, added index.rst for inclusion in build docs.
Jan Beulich [Thu, 8 Aug 2024 11:27:25 +0000 (13:27 +0200)]
x86emul: adjust 2nd param of idiv_dbl()
-LONG_MIN cannot be represented in a long and hence is UB, for being one
larger than LONG_MAX.
The caller passing an unsigned long and the 1st param also being (array
of) unsigned long, change the 2nd param accordingly while adding the
sole necessary cast. This was the original form of the function anyway.
Jan Beulich [Thu, 8 Aug 2024 11:26:38 +0000 (13:26 +0200)]
x86emul: avoid UB shift in AVX512 VPMOV* handling
For widening and narrowing moves, operand (vector) size is calculated
from a table. This calculation, for the AVX512 cases, lives ahead of
validation of EVEX.L'L (which cannot be 3 without raising #UD). Account
for the later checking by adjusting the constants in the expression such
that even EVEX.L'L == 3 will yield a non-UB shift (read: shift count
reliably >= 0).
Fixes: 3988beb08 ("x86emul: support AVX512{F,BW} zero- and sign-extending moves")
Oss-fuzz: 70914 Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
automation: fix eclair gitlab jobs for merge requests
The "eclair" script calls action_push.sh even for merge request, while
instead action_pull_request.sh should be called, resulting in a job
failure with this error:
Unexpected event pull_request
Fix the script to call action_pull_request.sh appropriately.
Non-PCI platform devices may use the ITS. Dom0 Linux drivers for such
devices are failing to register IRQs due to a missing #msi-cells
property. Add the missing #msi-cells property.
Shawn Anastasio [Tue, 6 Aug 2024 11:41:14 +0000 (13:41 +0200)]
xen/common: Move Arm's bootfdt.c to common
Move Arm's bootfdt.c to xen/common so that it can be used by other
device tree architectures like PPC and RISCV.
Remove stubs for process_shm_node() and early_print_info_shmem()
from $xen/arch/arm/include/asm/static-shmem.h.
These stubs are removed to avoid introducing them for architectures
that do not support CONFIG_STATIC_SHM.
The process_shm_node() stub is now implemented in common code to
maintain the current behavior of early_scan_code() on ARM.
The early_print_info_shmem() stub is only used in early_print_info(),
so it's now guarded with #ifdef CONFIG_STATIC_SHM ... #endif.
Shawn Anastasio [Tue, 6 Aug 2024 11:41:13 +0000 (13:41 +0200)]
xen/device-tree: Move Arm's setup.c bootinfo functions to common
Arm's setup.c contains a collection of functions for parsing memory map
and other boot information from a device tree. Since these routines are
generally useful on any architecture that supports device tree booting,
move them into xen/common/device-tree.
Also, common/device_tree.c has been moved to the device-tree folder with
the corresponding updates to common/Makefile and common/device-tree/Makefile.
Mentioning of arm32 is changed to CONFIG_SEPARATE_XENHEAP in comparison with
original ARM's code as now it is moved in common code.
Michal Orzel [Wed, 10 Jul 2024 11:22:04 +0000 (13:22 +0200)]
xen/arm: bootfdt: Fix device tree memory node probing
Memory node probing is done as part of early_scan_node() that is called
for each node with depth >= 1 (root node is at depth 0). According to
Devicetree Specification v0.4, chapter 3.4, /memory node can only exists
as a top level node. However, Xen incorrectly considers all the nodes with
unit node name "memory" as RAM. This buggy behavior can result in a
failure if there are other nodes in the device tree (at depth >= 2) with
"memory" as unit node name. An example can be a "memory@xxx" node under
/reserved-memory. Fix it by introducing device_tree_is_memory_node() to
perform all the required checks to assess if a node is a proper /memory
node.
Fixes: 3e99c95ba1c8 ("arm, device tree: parse the DTB for RAM location and size") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Tested-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Julien Grall <julien@xen.org>
Michal Orzel [Thu, 4 Jul 2024 07:54:19 +0000 (09:54 +0200)]
xen/arm: dom0less: Add #redistributor-regions property to GICv3 node
Dom0less domain using host memory layout may use more than one
re-distributor region (d->arch.vgic.nr_regions > 1). In that case Xen
will add them in a "reg" property of a GICv3 domU node. Guest needs to
know how many regions to search for, and therefore the GICv3 dt binding
[1] specifies that "#redistributor-regions" property is required if more
than one redistributor region is present. However, Xen does not add this
property which makes guest believe, there is just one such region. This
can lead to guest boot failure when doing GIC SMP initialization. Fix it
by adding this property, which matches what we do for hwdom.
Sergiy Kibrik [Tue, 6 Aug 2024 06:35:09 +0000 (08:35 +0200)]
x86/vpmu: guard calls to vmx/svm functions
If VMX/SVM disabled in the build, we may still want to have vPMU drivers for
PV guests. Yet in such case before using VMX/SVM features and functions we have
to explicitly check if they're available in the build. For this purpose
(and also not to complicate conditionals) two helpers introduced --
is_{vmx,svm}_vcpu(v) that check both HVM & VMX/SVM conditions at the same time,
and they replace is_hvm_vcpu(v) macro in Intel/AMD PMU drivers.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Mon, 5 Aug 2024 08:18:05 +0000 (10:18 +0200)]
x86/dom0: delay setting SMAP after dom0 build is done
Delay setting X86_CR4_SMAP on the BSP until the domain building is done, so
that there's no need to disable SMAP. Note however that SMAP is enabled for
the APs on bringup, as domain builder code strictly run on the BSP. Delaying
the setting for the APs would mean having to do a callfunc IPI later in order
to set it on all the APs.
The fixes tag is to account for the wrong usage of cpu_has_smap in
create_dom0(), it should instead have used
boot_cpu_has(X86_FEATURE_XEN_SMAP).
While there also make cr4_pv32_mask __ro_after_init.
Fixes: 493ab190e5b1 ('xen/sm{e, a}p: allow disabling sm{e, a}p for Xen itself') Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
The current logic to chose the preferred reboot method is based on the mode Xen
has been booted into, so if the box is booted from UEFI, the preferred reboot
method will be to use the ResetSystem() run time service call.
However, that method seems to be widely untested, and quite often leads to a
result similar to:
****************************************
Panic on CPU 0:
FATAL TRAP: vector = 6 (invalid opcode)
****************************************
Which in most cases does lead to a reboot, however that's unreliable.
Change the default reboot preference to prefer ACPI over UEFI if available and
not in reduced hardware mode.
This is in line to what Linux does, so it's unlikely to cause issues on current
and future hardware, since there's a much higher chance of vendors testing
hardware with Linux rather than Xen.
Add a special case for one Acer model that does require being rebooted using
ResetSystem(). See Linux commit 0082517fa4bce for rationale.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Add MISRA C rules 13.2 and 18.2 to rules.rst. Both rules have zero
violations reported by Eclair but they have some cautions. We accept
both rules and for now we'll enable scanning for them in Eclair but only
violations will cause the Gitlab CI job to fail (cautions will not.)
automation/eclair_analysis: add Rule 18.6 to the clean guidelines
MISRA C Rule 18.6 states: "The address of an object with automatic
storage shall not be copied to another object that persists after
the first object has ceased to exist."
The rule is set as monitored and tagged clean, in order to block
the CI on any violations that may arise, allowing the presence
of cautions (currently there are no violations).
Matthew Barnes [Fri, 2 Aug 2024 06:43:57 +0000 (08:43 +0200)]
tools/lsevtchn: Use errno macro to handle hypercall error cases
Currently, lsevtchn aborts its event channel enumeration when it hits
an event channel that is owned by Xen.
lsevtchn does not distinguish between different hypercall errors, which
results in lsevtchn missing potential relevant event channels with
higher port numbers.
Use the errno macro to distinguish between hypercall errors, and
continue event channel enumeration if the hypercall error is not
critical to enumeration.
Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
George Dunlap [Fri, 2 Aug 2024 06:42:09 +0000 (08:42 +0200)]
xen/hvm: Don't skip MSR_READ trace record
Commit 37f074a3383 ("x86/msr: introduce guest_rdmsr()") introduced a
function to combine the MSR_READ handling between PV and HVM.
Unfortunately, by returning directly, it skipped the trace generation,
leading to gaps in the trace record, as well as xenalyze errors like
this:
hvm_generic_postprocess: d2v0 Strange, exit 7c(VMEXIT_MSR) missing a handler
Replace the `return` with `goto out`.
Fixes: 37f074a3383 ("x86/msr: introduce guest_rdmsr()") Signed-off-by: George Dunlap <george.dunlap@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 11:57:52 +0000 (13:57 +0200)]
x86/vmx: replace CONFIG_HVM with CONFIG_INTEL_VMX in vmx.h
As now we got a separate config option for VMX which itself depends on
CONFIG_HVM, we need to use it to provide vmx_pi_hooks_{assign,deassign}
stubs for case when VMX is disabled while HVM is enabled.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
x86/PV: guard svm specific functions with using_svm() check
Replace cpu_has_svm check with using_svm(), so that not only SVM support in CPU
is being checked at runtime, but also at build time we ensure the availability
of functions svm_load_segs() and svm_load_segs_prefetch().
x86/traps: guard vmx specific functions with usinc_vmx() check
Replace cpu_has_vmx check with using_vmx(), so that not only VMX support in CPU
is being checked at runtime, but also at build time we ensure the availability
of functions vmx_vmcs_enter() & vmx_vmcs_exit().
Also since CONFIG_VMX is checked in using_vmx and it depends on CONFIG_HVM,
we can drop #ifdef CONFIG_HVM lines around using_vmx.
x86/p2m: guard EPT functions with using_vmx() check
Replace cpu_has_vmx check with using_vmx(), so that DCE would remove calls
to functions ept_p2m_init() and ept_p2m_uninit() on non-VMX build.
Since currently Intel EPT implementation depends on CONFIG_INTEL_VMX config
option, when VMX is off these functions are unavailable.
Sergiy Kibrik [Thu, 1 Aug 2024 11:55:39 +0000 (13:55 +0200)]
x86: introduce using_{svm,vmx}() helpers
As we now have AMD_SVM/INTEL_VMX config options for enabling/disabling these
features completely in the build, we need some build-time checks to ensure that
vmx/svm code can be used and things compile. Macros cpu_has_{svm,vmx} used to be
doing such checks at runtime, however they do not check if SVM/VMX support is
enabled in the build.
Also cpu_has_{svm,vmx} can potentially be called from non-{VMX,SVM} build
yet running on {VMX,SVM}-enabled CPU, so would correctly indicate that VMX/SVM
is indeed supported by CPU, but code to drive it can't be used.
New routines using_{vmx,svm}() indicate that both CPU _and_ build provide
corresponding technology support, while cpu_has_{vmx,svm} still remains for
informational runtime purpose, just as their naming suggests.
These new helpers are used right away in several sites, namely guard calls to
start_nested_{svm,vmx} and start_{svm,vmx} to fix a build when INTEL_VMX=n or
AMD_SVM=n.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 11:55:08 +0000 (13:55 +0200)]
x86: introduce CONFIG_ALTP2M Kconfig option
Add new option to make altp2m code inclusion optional.
Currently altp2m implemented for Intel EPT only, so option is dependant on VMX.
Also the prompt itself depends on EXPERT=y, so that option is available
for fine-tuning, if one want to play around with it.
Use this option instead of more generic CONFIG_HVM option.
That implies the possibility to build hvm code without altp2m support,
hence we need to declare altp2m routines for hvm code to compile successfully
(altp2m_vcpu_initialise(), altp2m_vcpu_destroy(), altp2m_vcpu_enable_ve())
Also guard altp2m routines, so that they can be disabled completely in the
build -- when target platform does not actually support altp2m
(AMD-V & ARM as of now).
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 11:54:23 +0000 (13:54 +0200)]
x86/monitor: guard altp2m usage
Explicitly check whether altp2m is on for domain when getting altp2m index.
If explicit call to altp2m_active() always returns false, DCE will remove
call to altp2m_vcpu_idx().
p2m_get_mem_access() expects 0 as altp2m_idx parameter when altp2m not active
or not supported, so 0 is a fallback value then.
The purpose of that is later to be able to disable altp2m support and
exclude its code from the build completely, when not supported by target
platform (as of now it's supported for VT-x only).
Also all other calls to altp2m_vcpu_idx() are guarded by altp2m_active(), so
this change puts usage of this routine in line with the rest of code.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
x86: introduce AMD-V and Intel VT-x Kconfig options
Introduce two new Kconfig options, AMD_SVM and INTEL_VMX, to allow code
specific to each virtualization technology to be separated and, when not
required, stripped.
CONFIG_AMD_SVM will be used to enable virtual machine extensions on platforms
that implement the AMD Virtualization Technology (AMD-V).
CONFIG_INTEL_VMX will be used to enable virtual machine extensions on platforms
that implement the Intel Virtualization Technology (Intel VT-x).
Both features depend on HVM support.
Since, at this point, disabling any of them would cause Xen to not compile,
the options are enabled by default if HVM and are not selectable by the user.
Sergiy Kibrik [Thu, 1 Aug 2024 07:41:03 +0000 (09:41 +0200)]
x86/cpufreq: separate powernow/hwp/acpi cpufreq code
Build AMD Architectural P-state driver when CONFIG_AMD is on, and
Intel Hardware P-States driver together with ACPI Processor P-States driver
when CONFIG_INTEL is on respectively, allowing for a platform-specific build.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 07:40:12 +0000 (09:40 +0200)]
x86/cpufreq: move ACPI cpufreq driver into separate file
Separate ACPI driver from generic initialization cpufreq code.
This way acpi-cpufreq can become optional in the future and be disabled
from non-Intel builds.
no changes to code were introduced, except:
acpi_cpufreq_register() helper added
clean up a list of included headers
license transformed into an SPDX line
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/fpu: Create a typedef for the x87/SSE area inside "struct xsave_struct"
Making the union non-anonymous would cause a lot of headaches, because a lot of
code relies on it being so, but it's possible to make a typedef of the anonymous
union so all callsites currently relying on typeof() can stop doing so directly.
This commit creates a `fpusse_t` typedef to the anonymous union at the head of
the XSAVE area and uses it instead of typeof().
No functional change.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 07:38:00 +0000 (09:38 +0200)]
x86/intel: optional build of TSX support
Transactional Synchronization Extensions are supported on certain Intel's
CPUs only, hence can be put under CONFIG_INTEL build option.
The whole TSX support, even if supported by CPU, may need to be disabled via
options, by microcode or through spec-ctrl, depending on a set of specific
conditions. To make sure nothing gets accidentally runtime-broken all
modifications of global TSX configuration variables is secured by #ifdef's,
while variables themselves redefined to 0, so that ones can't mistakenly be
written to.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 31 Jul 2024 19:05:21 +0000 (20:05 +0100)]
x86/domain: Fix domlist_insert() updating the domain hash
A last minute review request was to dedup the expression calculating the
domain hash bucket.
While the code reads correctly, it is buggy because rcu_assign_pointer() is a
deeply misleading API assigning by name not value, and - contrary to it's name
- does not hide an indirection.
Therefore, rcu_assign_pointer(bucket, d); updates the local bucket variable on
the stack, not domain_hash[], causing all subsequent domid lookups to fail.
Rework the logic to use pd in the same way that domlist_remove() does.
Fixes: 19995bc70cc6 ("xen/domain: Factor domlist_{insert,remove}() out of domain_{create,destroy}()") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
xen/riscv: fix build issue for bullseye-riscv64 container
Address compilation error on bullseye-riscv64 container:
undefined reference to `guest_physmap_remove_page`
Since there is no current implementation of `guest_physmap_remove_page()`,
a stub function has been added.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
x86/e820 address violations of MISRA C:2012 Rule 5.3
This addresses violations of MISRA C:2012 Rule 5.3 which states as
following: An identifier declared in an inner scope shall not hide an
identifier declared in an outer scope. Right here the conflict is with
the global named "e820".
No functional change.
Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/sched: fix error handling in cpu_schedule_up()
In case cpu_schedule_up() is failing, it needs to undo all externally
visible changes it has done before.
Reason is that cpu_schedule_callback() won't be called with the
CPU_UP_CANCELED notifier in case cpu_schedule_up() did fail.
Fixes: 207589dbacd4 ("xen/sched: move per cpu scheduler private data into struct sched_resource") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Xen's bitops.h consists of several Linux's headers:
* linux/arch/include/asm/bitops.h:
* The following function were removed as they aren't used in Xen:
* test_and_set_bit_lock
* clear_bit_unlock
* __clear_bit_unlock
* The following functions were renamed in the way how they are
used by common code:
* __test_and_set_bit
* __test_and_clear_bit
* The declaration and implementation of the following functios
were updated to make Xen build happy:
* clear_bit
* set_bit
* __test_and_clear_bit
* __test_and_set_bit
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
The following generic functions were introduced:
* test_bit
* generic__test_and_set_bit
* generic__test_and_clear_bit
* generic__test_and_change_bit
These functions and macros can be useful for architectures
that don't have corresponding arch-specific instructions.
Also, the patch introduces the following generics which are
used by the functions mentioned above:
* BITOP_BITS_PER_WORD
* BITOP_MASK
* BITOP_WORD
* BITOP_TYPE
The following approach was chosen for generic*() and arch*() bit
operation functions:
If the bit operation function that is going to be generic starts
with the prefix "__", then the corresponding generic/arch function
will also contain the "__" prefix. For example:
* test_bit() will be defined using arch_test_bit() and
generic_test_bit().
* __test_and_set_bit() will be defined using
arch__test_and_set_bit() and generic__test_and_set_bit().
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
The current code in ALT_CALL_ARG() won't successfully workaround the clang
code-generation issue if the arg parameter has a size that's not a power of 2.
While there are no such sized parameters at the moment, improve the workaround
to also be effective when such sizes are used.
Instead of using a union with a long use an unsigned long that's first
initialized to 0 and afterwards set to the argument value.
Reported-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Suggested-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/dom0: fix restoring %cr3 and the mapcache override on PV build error
One of the error paths in the PV dom0 builder section that runs on the guest
page-tables wasn't restoring the Xen value of %cr3, neither removing the
mapcache override.
Fixes: 079ff2d32c3d ('libelf-loader: introduce elf_load_image') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 31 Jul 2024 10:39:35 +0000 (12:39 +0200)]
public/x86: don't include common xen.h from arch-specific one
No other arch-*.h does so, and arch-x86/xen.h really just takes the role
of arch-x86_32.h and arch-x86_64.h (by those two forwarding there). With
xen.h itself including the per-arch headers, doing so is also kind of
backwards anyway, and just calling for problems. There's exactly one
place where arch-x86/xen.h is included when really xen.h is meant (for
wanting XEN_GUEST_HANDLE_64() to be made available, the default
definition of which lives in the common xen.h).
This then addresses a violation of Misra C:2012 Directive 4.10
("Precautions shall be taken in order to prevent the contents of a
header file being included more than once").