Jan Beulich [Thu, 8 Aug 2024 11:27:25 +0000 (13:27 +0200)]
x86emul: adjust 2nd param of idiv_dbl()
-LONG_MIN cannot be represented in a long and hence is UB, for being one
larger than LONG_MAX.
The caller passing an unsigned long and the 1st param also being (array
of) unsigned long, change the 2nd param accordingly while adding the
sole necessary cast. This was the original form of the function anyway.
Jan Beulich [Thu, 8 Aug 2024 11:26:38 +0000 (13:26 +0200)]
x86emul: avoid UB shift in AVX512 VPMOV* handling
For widening and narrowing moves, operand (vector) size is calculated
from a table. This calculation, for the AVX512 cases, lives ahead of
validation of EVEX.L'L (which cannot be 3 without raising #UD). Account
for the later checking by adjusting the constants in the expression such
that even EVEX.L'L == 3 will yield a non-UB shift (read: shift count
reliably >= 0).
Fixes: 3988beb08 ("x86emul: support AVX512{F,BW} zero- and sign-extending moves")
Oss-fuzz: 70914 Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
automation: fix eclair gitlab jobs for merge requests
The "eclair" script calls action_push.sh even for merge request, while
instead action_pull_request.sh should be called, resulting in a job
failure with this error:
Unexpected event pull_request
Fix the script to call action_pull_request.sh appropriately.
Non-PCI platform devices may use the ITS. Dom0 Linux drivers for such
devices are failing to register IRQs due to a missing #msi-cells
property. Add the missing #msi-cells property.
Shawn Anastasio [Tue, 6 Aug 2024 11:41:14 +0000 (13:41 +0200)]
xen/common: Move Arm's bootfdt.c to common
Move Arm's bootfdt.c to xen/common so that it can be used by other
device tree architectures like PPC and RISCV.
Remove stubs for process_shm_node() and early_print_info_shmem()
from $xen/arch/arm/include/asm/static-shmem.h.
These stubs are removed to avoid introducing them for architectures
that do not support CONFIG_STATIC_SHM.
The process_shm_node() stub is now implemented in common code to
maintain the current behavior of early_scan_code() on ARM.
The early_print_info_shmem() stub is only used in early_print_info(),
so it's now guarded with #ifdef CONFIG_STATIC_SHM ... #endif.
Shawn Anastasio [Tue, 6 Aug 2024 11:41:13 +0000 (13:41 +0200)]
xen/device-tree: Move Arm's setup.c bootinfo functions to common
Arm's setup.c contains a collection of functions for parsing memory map
and other boot information from a device tree. Since these routines are
generally useful on any architecture that supports device tree booting,
move them into xen/common/device-tree.
Also, common/device_tree.c has been moved to the device-tree folder with
the corresponding updates to common/Makefile and common/device-tree/Makefile.
Mentioning of arm32 is changed to CONFIG_SEPARATE_XENHEAP in comparison with
original ARM's code as now it is moved in common code.
Michal Orzel [Wed, 10 Jul 2024 11:22:04 +0000 (13:22 +0200)]
xen/arm: bootfdt: Fix device tree memory node probing
Memory node probing is done as part of early_scan_node() that is called
for each node with depth >= 1 (root node is at depth 0). According to
Devicetree Specification v0.4, chapter 3.4, /memory node can only exists
as a top level node. However, Xen incorrectly considers all the nodes with
unit node name "memory" as RAM. This buggy behavior can result in a
failure if there are other nodes in the device tree (at depth >= 2) with
"memory" as unit node name. An example can be a "memory@xxx" node under
/reserved-memory. Fix it by introducing device_tree_is_memory_node() to
perform all the required checks to assess if a node is a proper /memory
node.
Fixes: 3e99c95ba1c8 ("arm, device tree: parse the DTB for RAM location and size") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Tested-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Julien Grall <julien@xen.org>
Michal Orzel [Thu, 4 Jul 2024 07:54:19 +0000 (09:54 +0200)]
xen/arm: dom0less: Add #redistributor-regions property to GICv3 node
Dom0less domain using host memory layout may use more than one
re-distributor region (d->arch.vgic.nr_regions > 1). In that case Xen
will add them in a "reg" property of a GICv3 domU node. Guest needs to
know how many regions to search for, and therefore the GICv3 dt binding
[1] specifies that "#redistributor-regions" property is required if more
than one redistributor region is present. However, Xen does not add this
property which makes guest believe, there is just one such region. This
can lead to guest boot failure when doing GIC SMP initialization. Fix it
by adding this property, which matches what we do for hwdom.
Sergiy Kibrik [Tue, 6 Aug 2024 06:35:09 +0000 (08:35 +0200)]
x86/vpmu: guard calls to vmx/svm functions
If VMX/SVM disabled in the build, we may still want to have vPMU drivers for
PV guests. Yet in such case before using VMX/SVM features and functions we have
to explicitly check if they're available in the build. For this purpose
(and also not to complicate conditionals) two helpers introduced --
is_{vmx,svm}_vcpu(v) that check both HVM & VMX/SVM conditions at the same time,
and they replace is_hvm_vcpu(v) macro in Intel/AMD PMU drivers.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Mon, 5 Aug 2024 08:18:05 +0000 (10:18 +0200)]
x86/dom0: delay setting SMAP after dom0 build is done
Delay setting X86_CR4_SMAP on the BSP until the domain building is done, so
that there's no need to disable SMAP. Note however that SMAP is enabled for
the APs on bringup, as domain builder code strictly run on the BSP. Delaying
the setting for the APs would mean having to do a callfunc IPI later in order
to set it on all the APs.
The fixes tag is to account for the wrong usage of cpu_has_smap in
create_dom0(), it should instead have used
boot_cpu_has(X86_FEATURE_XEN_SMAP).
While there also make cr4_pv32_mask __ro_after_init.
Fixes: 493ab190e5b1 ('xen/sm{e, a}p: allow disabling sm{e, a}p for Xen itself') Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
The current logic to chose the preferred reboot method is based on the mode Xen
has been booted into, so if the box is booted from UEFI, the preferred reboot
method will be to use the ResetSystem() run time service call.
However, that method seems to be widely untested, and quite often leads to a
result similar to:
****************************************
Panic on CPU 0:
FATAL TRAP: vector = 6 (invalid opcode)
****************************************
Which in most cases does lead to a reboot, however that's unreliable.
Change the default reboot preference to prefer ACPI over UEFI if available and
not in reduced hardware mode.
This is in line to what Linux does, so it's unlikely to cause issues on current
and future hardware, since there's a much higher chance of vendors testing
hardware with Linux rather than Xen.
Add a special case for one Acer model that does require being rebooted using
ResetSystem(). See Linux commit 0082517fa4bce for rationale.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Add MISRA C rules 13.2 and 18.2 to rules.rst. Both rules have zero
violations reported by Eclair but they have some cautions. We accept
both rules and for now we'll enable scanning for them in Eclair but only
violations will cause the Gitlab CI job to fail (cautions will not.)
automation/eclair_analysis: add Rule 18.6 to the clean guidelines
MISRA C Rule 18.6 states: "The address of an object with automatic
storage shall not be copied to another object that persists after
the first object has ceased to exist."
The rule is set as monitored and tagged clean, in order to block
the CI on any violations that may arise, allowing the presence
of cautions (currently there are no violations).
Matthew Barnes [Fri, 2 Aug 2024 06:43:57 +0000 (08:43 +0200)]
tools/lsevtchn: Use errno macro to handle hypercall error cases
Currently, lsevtchn aborts its event channel enumeration when it hits
an event channel that is owned by Xen.
lsevtchn does not distinguish between different hypercall errors, which
results in lsevtchn missing potential relevant event channels with
higher port numbers.
Use the errno macro to distinguish between hypercall errors, and
continue event channel enumeration if the hypercall error is not
critical to enumeration.
Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
George Dunlap [Fri, 2 Aug 2024 06:42:09 +0000 (08:42 +0200)]
xen/hvm: Don't skip MSR_READ trace record
Commit 37f074a3383 ("x86/msr: introduce guest_rdmsr()") introduced a
function to combine the MSR_READ handling between PV and HVM.
Unfortunately, by returning directly, it skipped the trace generation,
leading to gaps in the trace record, as well as xenalyze errors like
this:
hvm_generic_postprocess: d2v0 Strange, exit 7c(VMEXIT_MSR) missing a handler
Replace the `return` with `goto out`.
Fixes: 37f074a3383 ("x86/msr: introduce guest_rdmsr()") Signed-off-by: George Dunlap <george.dunlap@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 11:57:52 +0000 (13:57 +0200)]
x86/vmx: replace CONFIG_HVM with CONFIG_INTEL_VMX in vmx.h
As now we got a separate config option for VMX which itself depends on
CONFIG_HVM, we need to use it to provide vmx_pi_hooks_{assign,deassign}
stubs for case when VMX is disabled while HVM is enabled.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
x86/PV: guard svm specific functions with using_svm() check
Replace cpu_has_svm check with using_svm(), so that not only SVM support in CPU
is being checked at runtime, but also at build time we ensure the availability
of functions svm_load_segs() and svm_load_segs_prefetch().
x86/traps: guard vmx specific functions with usinc_vmx() check
Replace cpu_has_vmx check with using_vmx(), so that not only VMX support in CPU
is being checked at runtime, but also at build time we ensure the availability
of functions vmx_vmcs_enter() & vmx_vmcs_exit().
Also since CONFIG_VMX is checked in using_vmx and it depends on CONFIG_HVM,
we can drop #ifdef CONFIG_HVM lines around using_vmx.
x86/p2m: guard EPT functions with using_vmx() check
Replace cpu_has_vmx check with using_vmx(), so that DCE would remove calls
to functions ept_p2m_init() and ept_p2m_uninit() on non-VMX build.
Since currently Intel EPT implementation depends on CONFIG_INTEL_VMX config
option, when VMX is off these functions are unavailable.
Sergiy Kibrik [Thu, 1 Aug 2024 11:55:39 +0000 (13:55 +0200)]
x86: introduce using_{svm,vmx}() helpers
As we now have AMD_SVM/INTEL_VMX config options for enabling/disabling these
features completely in the build, we need some build-time checks to ensure that
vmx/svm code can be used and things compile. Macros cpu_has_{svm,vmx} used to be
doing such checks at runtime, however they do not check if SVM/VMX support is
enabled in the build.
Also cpu_has_{svm,vmx} can potentially be called from non-{VMX,SVM} build
yet running on {VMX,SVM}-enabled CPU, so would correctly indicate that VMX/SVM
is indeed supported by CPU, but code to drive it can't be used.
New routines using_{vmx,svm}() indicate that both CPU _and_ build provide
corresponding technology support, while cpu_has_{vmx,svm} still remains for
informational runtime purpose, just as their naming suggests.
These new helpers are used right away in several sites, namely guard calls to
start_nested_{svm,vmx} and start_{svm,vmx} to fix a build when INTEL_VMX=n or
AMD_SVM=n.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 11:55:08 +0000 (13:55 +0200)]
x86: introduce CONFIG_ALTP2M Kconfig option
Add new option to make altp2m code inclusion optional.
Currently altp2m implemented for Intel EPT only, so option is dependant on VMX.
Also the prompt itself depends on EXPERT=y, so that option is available
for fine-tuning, if one want to play around with it.
Use this option instead of more generic CONFIG_HVM option.
That implies the possibility to build hvm code without altp2m support,
hence we need to declare altp2m routines for hvm code to compile successfully
(altp2m_vcpu_initialise(), altp2m_vcpu_destroy(), altp2m_vcpu_enable_ve())
Also guard altp2m routines, so that they can be disabled completely in the
build -- when target platform does not actually support altp2m
(AMD-V & ARM as of now).
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 11:54:23 +0000 (13:54 +0200)]
x86/monitor: guard altp2m usage
Explicitly check whether altp2m is on for domain when getting altp2m index.
If explicit call to altp2m_active() always returns false, DCE will remove
call to altp2m_vcpu_idx().
p2m_get_mem_access() expects 0 as altp2m_idx parameter when altp2m not active
or not supported, so 0 is a fallback value then.
The purpose of that is later to be able to disable altp2m support and
exclude its code from the build completely, when not supported by target
platform (as of now it's supported for VT-x only).
Also all other calls to altp2m_vcpu_idx() are guarded by altp2m_active(), so
this change puts usage of this routine in line with the rest of code.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
x86: introduce AMD-V and Intel VT-x Kconfig options
Introduce two new Kconfig options, AMD_SVM and INTEL_VMX, to allow code
specific to each virtualization technology to be separated and, when not
required, stripped.
CONFIG_AMD_SVM will be used to enable virtual machine extensions on platforms
that implement the AMD Virtualization Technology (AMD-V).
CONFIG_INTEL_VMX will be used to enable virtual machine extensions on platforms
that implement the Intel Virtualization Technology (Intel VT-x).
Both features depend on HVM support.
Since, at this point, disabling any of them would cause Xen to not compile,
the options are enabled by default if HVM and are not selectable by the user.
Sergiy Kibrik [Thu, 1 Aug 2024 07:41:03 +0000 (09:41 +0200)]
x86/cpufreq: separate powernow/hwp/acpi cpufreq code
Build AMD Architectural P-state driver when CONFIG_AMD is on, and
Intel Hardware P-States driver together with ACPI Processor P-States driver
when CONFIG_INTEL is on respectively, allowing for a platform-specific build.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 07:40:12 +0000 (09:40 +0200)]
x86/cpufreq: move ACPI cpufreq driver into separate file
Separate ACPI driver from generic initialization cpufreq code.
This way acpi-cpufreq can become optional in the future and be disabled
from non-Intel builds.
no changes to code were introduced, except:
acpi_cpufreq_register() helper added
clean up a list of included headers
license transformed into an SPDX line
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/fpu: Create a typedef for the x87/SSE area inside "struct xsave_struct"
Making the union non-anonymous would cause a lot of headaches, because a lot of
code relies on it being so, but it's possible to make a typedef of the anonymous
union so all callsites currently relying on typeof() can stop doing so directly.
This commit creates a `fpusse_t` typedef to the anonymous union at the head of
the XSAVE area and uses it instead of typeof().
No functional change.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Thu, 1 Aug 2024 07:38:00 +0000 (09:38 +0200)]
x86/intel: optional build of TSX support
Transactional Synchronization Extensions are supported on certain Intel's
CPUs only, hence can be put under CONFIG_INTEL build option.
The whole TSX support, even if supported by CPU, may need to be disabled via
options, by microcode or through spec-ctrl, depending on a set of specific
conditions. To make sure nothing gets accidentally runtime-broken all
modifications of global TSX configuration variables is secured by #ifdef's,
while variables themselves redefined to 0, so that ones can't mistakenly be
written to.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 31 Jul 2024 19:05:21 +0000 (20:05 +0100)]
x86/domain: Fix domlist_insert() updating the domain hash
A last minute review request was to dedup the expression calculating the
domain hash bucket.
While the code reads correctly, it is buggy because rcu_assign_pointer() is a
deeply misleading API assigning by name not value, and - contrary to it's name
- does not hide an indirection.
Therefore, rcu_assign_pointer(bucket, d); updates the local bucket variable on
the stack, not domain_hash[], causing all subsequent domid lookups to fail.
Rework the logic to use pd in the same way that domlist_remove() does.
Fixes: 19995bc70cc6 ("xen/domain: Factor domlist_{insert,remove}() out of domain_{create,destroy}()") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
xen/riscv: fix build issue for bullseye-riscv64 container
Address compilation error on bullseye-riscv64 container:
undefined reference to `guest_physmap_remove_page`
Since there is no current implementation of `guest_physmap_remove_page()`,
a stub function has been added.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
x86/e820 address violations of MISRA C:2012 Rule 5.3
This addresses violations of MISRA C:2012 Rule 5.3 which states as
following: An identifier declared in an inner scope shall not hide an
identifier declared in an outer scope. Right here the conflict is with
the global named "e820".
No functional change.
Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/sched: fix error handling in cpu_schedule_up()
In case cpu_schedule_up() is failing, it needs to undo all externally
visible changes it has done before.
Reason is that cpu_schedule_callback() won't be called with the
CPU_UP_CANCELED notifier in case cpu_schedule_up() did fail.
Fixes: 207589dbacd4 ("xen/sched: move per cpu scheduler private data into struct sched_resource") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Xen's bitops.h consists of several Linux's headers:
* linux/arch/include/asm/bitops.h:
* The following function were removed as they aren't used in Xen:
* test_and_set_bit_lock
* clear_bit_unlock
* __clear_bit_unlock
* The following functions were renamed in the way how they are
used by common code:
* __test_and_set_bit
* __test_and_clear_bit
* The declaration and implementation of the following functios
were updated to make Xen build happy:
* clear_bit
* set_bit
* __test_and_clear_bit
* __test_and_set_bit
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
The following generic functions were introduced:
* test_bit
* generic__test_and_set_bit
* generic__test_and_clear_bit
* generic__test_and_change_bit
These functions and macros can be useful for architectures
that don't have corresponding arch-specific instructions.
Also, the patch introduces the following generics which are
used by the functions mentioned above:
* BITOP_BITS_PER_WORD
* BITOP_MASK
* BITOP_WORD
* BITOP_TYPE
The following approach was chosen for generic*() and arch*() bit
operation functions:
If the bit operation function that is going to be generic starts
with the prefix "__", then the corresponding generic/arch function
will also contain the "__" prefix. For example:
* test_bit() will be defined using arch_test_bit() and
generic_test_bit().
* __test_and_set_bit() will be defined using
arch__test_and_set_bit() and generic__test_and_set_bit().
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
The current code in ALT_CALL_ARG() won't successfully workaround the clang
code-generation issue if the arg parameter has a size that's not a power of 2.
While there are no such sized parameters at the moment, improve the workaround
to also be effective when such sizes are used.
Instead of using a union with a long use an unsigned long that's first
initialized to 0 and afterwards set to the argument value.
Reported-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Suggested-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/dom0: fix restoring %cr3 and the mapcache override on PV build error
One of the error paths in the PV dom0 builder section that runs on the guest
page-tables wasn't restoring the Xen value of %cr3, neither removing the
mapcache override.
Fixes: 079ff2d32c3d ('libelf-loader: introduce elf_load_image') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 31 Jul 2024 10:39:35 +0000 (12:39 +0200)]
public/x86: don't include common xen.h from arch-specific one
No other arch-*.h does so, and arch-x86/xen.h really just takes the role
of arch-x86_32.h and arch-x86_64.h (by those two forwarding there). With
xen.h itself including the per-arch headers, doing so is also kind of
backwards anyway, and just calling for problems. There's exactly one
place where arch-x86/xen.h is included when really xen.h is meant (for
wanting XEN_GUEST_HANDLE_64() to be made available, the default
definition of which lives in the common xen.h).
This then addresses a violation of Misra C:2012 Directive 4.10
("Precautions shall be taken in order to prevent the contents of a
header file being included more than once").
Jan Beulich [Wed, 31 Jul 2024 10:36:14 +0000 (12:36 +0200)]
x86+Arm: drop (rename) __virt_to_maddr() / __maddr_to_virt()
There's no use of them anymore except in the definitions of the non-
underscore-prefixed aliases.
On Arm convert the (renamed) inline function to a macro.
On x86 rename the inline functions, adjust the virt_to_maddr() #define,
and purge the maddr_to_virt() one, thus eliminating a bogus cast which
would have allowed the passing of a pointer type variable into
maddr_to_virt() to go silently.
Andrew Cooper [Thu, 18 Jul 2024 20:22:41 +0000 (21:22 +0100)]
arch/domain: Clean up the idle domain remnants in arch_domain_create()
With arch_domain_create() no longer being called with the idle domain, drop
the last remaining logic.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Thu, 18 Jul 2024 20:20:52 +0000 (21:20 +0100)]
xen/domain: Simpliy domain_create() now the idle domain is complete earlier
With x86 implementing arch_init_idle_domain(), there is no longer any need to
call arch_domain_create() with the idle domain.
Have the idle domain exit early with all other system domains. Move the
static-analysis ASSERT() earlier. Then, remove the !is_idle_domain()
protections around the majority of domain_create() and remove one level of
indentation.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Thu, 18 Jul 2024 20:12:31 +0000 (21:12 +0100)]
x86/domain: Implement arch_init_idle_domain()
The idle domain needs d->arch.ctxt_switch initialised on x86. Implement the
new arch_init_idle_domain() in order to do this.
Intentionally remove cpu_policy's initialisation to ZERO_BLOCK_PTR. It has
never tripped since it's introduction, and is weird to have in isolation
without a similar approach on other pointers.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 18 Jul 2024 19:54:05 +0000 (20:54 +0100)]
xen/domain: Introduce arch_init_idle_domain()
The idle domain causes a large amount of complexity in domain_create() because
of x86's need to initialise d->arch.ctxt_switch in arch_domain_create().
In order to address this, introduce an optional hook to perform extra
initialisation of the idle domain.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
As discussed during the last MISRA C meeting, add Rule 12.2 to the list
of MISRA C rules we accept, together with an explanation that we use gcc
-fsanitize=undefined to check for violations.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
In the file include/xen/event.h macro set_bit is called with argument
current->pause_flags.
Once expanded this set_bit's argument is used in sizeof operations
and thus 'current', being a macro that expands to a function
call with potential side effects, generates a violation.
To address this violation the value of current is therefore stored in a
variable called 'v' before passing it to macro set_bit.
Jason Andryuk [Mon, 29 Jul 2024 15:04:12 +0000 (11:04 -0400)]
libxl: Enable stubdom cdrom changing
To change the cd-rom medium, libxl will:
- QMP eject the medium from QEMU
- block-detach the old PV disk
- block-attach the new PV disk
- QMP change the medium to the new PV disk by fdset-id
The QMP code is reused, and remove and attach are implemented here.
The stubdom must internally handle adding /dev/xvdc to the appropriate
fdset. libxl in dom0 doesn't see the result of adding to the fdset as
that is internal to the stubdom, but the fdset's opaque fields will be
set to stub-devid:$devid, so libxl can identify it. $devid is common
between the stubdom and libxl, so it can be identified on both side.
The stubdom will name the device xvdY regardless of the guest name hdY,
sdY, or xvdY, but the stubdom will be assigned the same devid
facilitating lookup. Because the stubdom add-fd call is asynchronous,
libxl needs to poll query-fdsets to identify when add-fd has completed.
For cd-eject, we still need to attach the empty vbd. This is necessary
since xenstore is used to determine that hdc exists. Otherwise after
eject, hdc would be gone and the cd-insert would fail to find the drive
to insert new media.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
Upgrade Yocto to a newer version. Use ext4 as image format for testing
with QEMU on ARM and ARM64 as the default is WIC and it is not available
for our xen-image-minimal target.
Also update the tar.bz2 filename for the rootfs.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Fri, 5 Jul 2024 11:52:05 +0000 (12:52 +0100)]
XSM/domctl: Fix permission checks on XEN_DOMCTL_createdomain
The XSM checks for XEN_DOMCTL_createdomain are problematic. There's a split
between xsm_domctl() called early, and flask_domain_create() called quite late
during domain construction.
All XSM implementations except Flask have a simple IS_PRIV check in
xsm_domctl(), and operate as expected when an unprivileged domain tries to
make a hypercall.
Flask however foregoes any action in xsm_domctl() and defers everything,
including the simple "is the caller permitted to create a domain" check, to
flask_domain_create().
As a consequence, when XSM Flask is active, and irrespective of the policy
loaded, all domains irrespective of privilege can:
* Mutate the global 'rover' variable, used to track the next free domid.
Therefore, all domains can cause a domid wraparound, and combined with a
voluntary reboot, choose their own domid.
* Cause a reasonable amount of a domain to be constructed before ultimately
failing for permission reasons, including the use of settings outside of
supported limits.
In order to remediate this, pass the ssidref into xsm_domctl() and at least
check that the calling domain privileged enough to create domains.
Take the opportunity to also fix the sign of the cmd parameter to be unsigned.
This issue has not been assigned an XSA, because Flask is experimental and not
security supported.
Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Andrew Cooper [Mon, 15 Jul 2024 13:17:43 +0000 (14:17 +0100)]
tools/examples: Remove more obsolete content
xeninfo.pl was introduced in commit 1b0a8bb57e3e ("Added xeninfo.pl, a script
for collecting statistics from Xen hosts using the Xen-API") and has been
touched exactly twice since to remove hardcoded IP addresses and paths.
The configuration files in vnc/* date from when we had a vendered version of
Qemu living in the tree.
These have never (AFAICT) been wired into the `make install` rule.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
There's no l{1,2,3,4}e_read() implementation, so drop the _atomic suffix from
the read helpers. This allows unifying the naming with the write helpers,
which are also atomic but don't have the suffix already: l{1,2,3,4}e_write().
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The l{1,2,3,4}e_write_atomic() and non _atomic suffixed helpers share the same
implementation, so it seems pointless and possibly confusing to have both.
x86 32bit mode used to have a non-atomic PTE write that would split the write
in two halves, but with Xen only supporting x86 64bit that's no longer
present.
Remove the l{1,2,3,4}e_write_atomic() helpers and switch it's user to
l{1,2,3,4}e_write(), as that's also atomic. While there also remove
pte_write{,_atomic}() and just use write_atomic() in the wrappers.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ross Lagerwall [Tue, 30 Jul 2024 09:55:56 +0000 (11:55 +0200)]
bunzip2: fix rare decompression failure
The decompression code parses a huffman tree and counts the number of
symbols for a given bit length. In rare cases, there may be >= 256
symbols with a given bit length, causing the unsigned char to overflow.
This causes a decompression failure later when the code tries and fails to
find the bit length for a given symbol.
Since the maximum number of symbols is 258, use unsigned short instead.
Fixes: ab77e81f6521 ("x86/dom0: support bzip2 and lzma compressed bzImage payloads") Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
If run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
a passthrough device by using gsi, see qemu code
xen_pt_realize->xc_physdev_map_pirq and libxl code
pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq
will call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq
is not allowed because currd is PVH dom0 and PVH has no
X86_EMU_USE_PIRQ flag, it will fail at has_pirq check.
So, allow PHYSDEVOP_map_pirq when dom0 is PVH and also allow
PHYSDEVOP_unmap_pirq for the removal device path to unmap pirq.
And add a new check to prevent (un)map when the subject domain
doesn't have a notion of PIRQ.
So that the interrupt of a passthrough device can be
successfully mapped to pirq for domU with a notion of PIRQ
when dom0 is PVH
Add deviation comments to address violations of
MISRA C:2012 Directive 4.10 ("Precautions shall be taken in order
to prevent the contents of a header file being included more than
once").
Inclusion guards must appear at the beginning of the headers
(comments are permitted anywhere).
This patch adds deviation comments using the format specified
in docs/misra/safe.json for headers with just the direct
inclusion guard before the inclusion guard since they are
safe and not supposed to comply with the directive.
Note that with SAF-10-safe in place, failures to have proper guards later
in the header files will not be reported
misra: modify deviations for empty and generated headers
This patch modifies deviations for Directive 4.10:
"Precautions shall be taken in order to prevent the contents of
a header file being included more than once"
This patch avoids the file-based deviation for empty headers, and
replaces it with a comment-based one using the format specified in
docs/misra/safe.json.
Generated headers are not generally safe against multi-inclusions,
whether a header is safe depends on the nature of the generated code
in the header. For that reason, this patch drops the deviation for
generated headers.
misra: add deviation for headers that explicitly avoid guards
Some headers, under specific circumstances (documented in a comment at
the beginning of the file), explicitly do not have strict inclusion
guards: the caller is responsible for including them correctly.
These files are not supposed to comply with Directive 4.10:
"Precautions shall be taken in order to prevent the contents of a header
file being included more than once"
This patch adds deviation cooments for headers that avoid guards.
x86/traps: address violations of MISRA C Rule 16.3
Add break or pseudo keyword fallthrough to address violations of
MISRA C Rule 16.3: "An unconditional `break' statement shall terminate
every switch-clause".
automation/eclair: fix deviation of MISRA C Rule 16.3
Add missing escape for the final dot of the fallthrough comment,
extend the search of a fallthrough comment up to 2 lines after the last
statement and improve the text of the justification.
When building with gcc with -finstrument-functions, optimization level
-O1, CONFIG_HYPFS=y and # CONFIG_HAS_SCHED_GRANULARITY is not set, the
the following build warning (error) is encountered:
common/sched/cpupool.c: In function ‘cpupool_gran_write’:
common/sched/cpupool.c:1220:26: error: ‘gran’ may be used uninitialized [-Werror=maybe-uninitialized]
1220 | 0 : cpupool_check_granularity(gran);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
common/sched/cpupool.c:1207:21: note: ‘gran’ declared here
1207 | enum sched_gran gran;
| ^~~~
This is a false positive. Silence the warning (error) by initializing
the variable.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Juergen Gross <jgross@suse.com>
x86/viridian: Clarify some viridian logging strings
It's sadically misleading to show an error without letters and expect
the dmesg reader to understand it's in hex. The patch adds a 0x prefix
to all hex numbers that don't already have it.
On the one instance in which a boolean is printed as an integer, print
it as a decimal integer instead so it's 0/1 in the common case and not
misleading if it's ever not just that due to a bug.
While at it, rename VIRIDIAN CRASH to VIRIDIAN GUEST_CRASH. Every member
of a support team that looks at the message systematically believes
"viridian" crashed, which is absolutely not what goes on. It's the guest
asking the hypervisor for a sudden shutdown because it crashed, and
stating why.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Thu, 9 May 2024 17:52:59 +0000 (18:52 +0100)]
hvmloader: Use fastcall everywhere
HVMLoader is a single freestanding 32bit program with no external
dependencies. Use the fastcall calling convetion (up to 3 parameters in
registers) globally, which is more efficient than passing all parameters on
the stack.
Some bloat-o-meter highlights are:
add/remove: 0/0 grow/shrink: 3/118 up/down: 8/-3004 (-2996)
Function old new delta
...
hvmloader_acpi_build_tables 1125 961 -164
acpi_build_tables 1277 1081 -196
pci_setup 4756 4516 -240
construct_secondary_tables 1689 1447 -242
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 23 Jul 2024 16:32:26 +0000 (17:32 +0100)]
x86/IO-APIC: Improve APIC_TMR accesses
XenServer's instance of Coverity complains of OVERFLOW_BEFORE_WIDEN in
mask_and_ack_level_ioapic_irq(), which is ultimately because of v being
unsigned long, and (1U << ...) being 32 bits.
The reasoning isn't correct. (1U << (x & 0x1f)) can't overflow, but the
complaint is really about having to expand the RHS. While this can be fixed
by changing v to be unsigned int, take the opportunity to do better still.
Introduce a apic_tmr_read() helper like we already have for ISR and IRR, and
use it to remove the opencoded logic. Introduce an is_level boolean to
improve the legibility of the surrounding logic.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Michal Orzel [Wed, 24 Jul 2024 09:38:13 +0000 (11:38 +0200)]
MAINTAINERS: Add me and Bertrand as device tree maintainers
With Arm port being the major recipient of dt related patches and the
future need of incorporating dt support into other ports, we'd like to
keep an eye on these changes.