Jan Beulich [Tue, 2 Jul 2024 10:00:27 +0000 (12:00 +0200)]
cmdline: document and enforce "extra_guest_irqs" upper bounds
PHYSDEVOP_pirq_eoi_gmfn_v<N> accepting just a single GFN implies that no
more than 32k pIRQ-s can be used by a domain on x86. Document this upper
bound.
To also enforce the limit, (ab)use both arch_hwdom_irqs() (changing its
parameter type) and setup_system_domains(). This is primarily to avoid
exposing the two static variables or introducing yet further arch hooks.
While touching arch_hwdom_irqs() also mark it hwdom-init.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 28 Jun 2024 13:04:30 +0000 (14:04 +0100)]
tools/libxs: Fix CLOEXEC handling in xs_fileno()
xs_fileno() opens a pipe on first use to communicate between the watch thread
and the main thread. Nothing ever sets CLOEXEC on the file descriptors.
Check for the availability of the pipe2() function with configure. Despite
starting life as Linux-only, FreeBSD and NetBSD have gained it.
When pipe2() isn't available, try our best with pipe() and set_cloexec().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Andrew Cooper [Fri, 28 Jun 2024 13:10:12 +0000 (14:10 +0100)]
tools/libxs: Fix CLOEXEC handling in get_dev()
Move the O_CLOEXEC compatibility outside of an #ifdef USE_PTHREAD block.
Introduce set_cloexec() to wrap fcntl() setting FD_CLOEXEC. It will be reused
for other CLOEXEC fixes too.
Use set_cloexec() when O_CLOEXEC isn't available as a best-effort fallback.
Fixes: f4f2f3402b2f ("tools/libxs: Open /dev/xen/xenbus fds as O_CLOEXEC") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Andrew Cooper [Thu, 27 Jun 2024 12:22:14 +0000 (13:22 +0100)]
tools/dombuilder: Correct the length calculation in xc_dom_alloc_segment()
xc_dom_alloc_segment() is passed a size in bytes, calculates a size in pages
from it, then fills in the new segment information with a bytes value
re-calculated from the number of pages.
This causes the module information given to the guest (MB, or PVH) to have
incorrect sizes; specifically, sizes rounded up to the next page.
This in turn is problematic for Xen. When Xen finds a gzipped module, it
peeks at the end metadata to judge the decompressed size, which is a -4
backreference from the reported end of the module.
Fill in seg->vend using the correct number of bytes.
Fixes: ea7c8a3d0e82 ("libxc: reorganize domain builder guest memory allocator") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
During Gitlab CI randconfig job for RISC-V failed witn an error:
common/trace.c:57:22: error: expected '=', ',', ';', 'asm' or
'__attribute__' before '__read_mostly'
57 | static u32 data_size __read_mostly;
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@cloud.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Tue, 2 Jul 2024 06:35:56 +0000 (08:35 +0200)]
pirq_cleanup_check() leaks
Its original introduction had two issues: For one the "common" part of
the checks (carried out in the macro) was inverted. And then after
removal from the radix tree the structure wasn't scheduled for freeing.
(All structures still left in the radix tree would be freed upon domain
destruction, though.)
For the freeing to be safe even if it didn't use RCU (i.e. to avoid use-
after-free), re-arrange checks/operations in evtchn_close(), such that
the pointer wouldn't be used anymore after calling pirq_cleanup_check()
(noting that unmap_domain_pirq_emuirq() itself calls the function in the
success case).
Fixes: c24536b636f2 ("replace d->nr_pirqs sized arrays with radix tree") Fixes: 79858fee307c ("xen: fix hvm_domain_use_pirq's behavior") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
George Dunlap [Wed, 26 Jun 2024 15:07:30 +0000 (16:07 +0100)]
MAINTAINERS: Step down as maintainer and committer
Remain a Reviewer on the golang bindings and scheduler for now (using
a xenproject.org alias), since there may be architectural decisions I
can shed light on.
Remove the XENTRACE section entirely, as there's no obvious candidate
to take it over; having the respective parts fall back to the tools
and The Rest seems the most reasonable option.
Nicola Vetrini [Thu, 27 Jun 2024 11:48:08 +0000 (13:48 +0200)]
x86/traps: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Remove from the ECLAIR integration scripts an unused option, which
was already ignored, and make the help texts consistent
with the rest of the scripts.
Nicola Vetrini [Thu, 27 Jun 2024 11:47:16 +0000 (13:47 +0200)]
x86/irq: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Thu, 27 Jun 2024 11:46:57 +0000 (13:46 +0200)]
automation/eclair_analysis: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses".
The local helpers GRP2 and XADD in the x86 emulator use their first
argument as the constant expression for a case label. This pattern
is deviated project-wide, because it is very unlikely to induce
developer confusion and result in the wrong control flow being
carried out.
Nicola Vetrini [Thu, 27 Jun 2024 11:46:27 +0000 (13:46 +0200)]
xen/guest_access: address violations of MISRA rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Thu, 27 Jun 2024 11:46:02 +0000 (13:46 +0200)]
xen/self-tests: address violations of MISRA rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.
Nicola Vetrini [Thu, 27 Jun 2024 11:45:18 +0000 (13:45 +0200)]
automation/eclair: address violations of MISRA C Rule 20.7
MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses".
The helper macro bitmap_switch has parameters that cannot be parenthesized
in order to comply with the rule, as that would break its functionality.
Moreover, the risk of misuse due developer confusion is deemed not
substantial enough to warrant a more involved refactor, thus the macro
is deviated for this rule.
George Dunlap [Mon, 24 Jun 2024 08:31:52 +0000 (09:31 +0100)]
CHANGELOG: Add entries related to tracing
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
George Dunlap [Mon, 24 Jun 2024 10:23:18 +0000 (11:23 +0100)]
tools/xenalyze: Remove argp_program_bug_address
xenalyze sets argp_program_bug_address to my old Citrix address. This
was done before xenalyze was in the xen.git tree; and it's the only
program in the tree which does so.
Now that xenalyze is part of the normal Xen distribution, it should be
obvious where to report bugs.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
George Dunlap [Mon, 24 Jun 2024 08:43:04 +0000 (09:43 +0100)]
CHANGELOG.md: Fix indentation of "Removed" section
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
automation/eclair_analysis: deviate and|or|xor|not for MISRA C Rule 21.2
Rule 21.2 reports identifiers reserved for the C and POSIX standard
libraries: or, and, not and xor are reserved identifiers because they
constitute alternate spellings for the corresponding operators (they are
defined as macros by iso646.h); however Xen doesn't use standard library
headers, so there is no risk of overlap.
This addresses violations arising from x86_emulate/x86_emulate.c, where
label statements named as or, and and xor appear.
Jan Beulich [Tue, 25 Jun 2024 09:37:44 +0000 (11:37 +0200)]
gnttab: fix compat query-size handling
The odd DEFINE_XEN_GUEST_HANDLE(), inconsistent with all other similar
constructs, should have caught my attention. Turns out it was needed for
the build to succeed merely because the corresponding #ifndef had a
typo. That typo in turn broke compat mode guests, by having query-size
requests of theirs wire into the domain_crash() at the bottom of the
switch().
Fixes: 8c3bb4d8ce3f ("xen/gnttab: Perform compat/native gnttab_query_size check") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <Oleksii.kurochko@gmail.com>
Jan Beulich [Tue, 25 Jun 2024 09:36:59 +0000 (11:36 +0200)]
xen: re-add type checking to {,__}copy_from_guest_offset()
When re-working them to avoid UB on guest address calculations, I failed
to add explicit type checks in exchange for the implicit ones that until
then had happened in assignments that were there anyway.
Fixes: 43d5c5d5f70b ("xen: avoid UB in guest handle arithmetic") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 21 Jun 2024 20:57:59 +0000 (21:57 +0100)]
x86/pagewalk: Address MISRA R8.3 violation in guest_walk_tables()
Commit 4c5d78a10dc8 ("x86/pagewalk: Re-implement the pagetable walker")
intentionally renamed guest_walk_tables()'s 'pfec' parameter to 'walk' because
it's not a PageFault Error Code, despite the name of some of the constants
passed in. Sadly the constants-cleanup I've been meaning to do since then
still hasn't come to pass.
Update the declaration to match, to placate MISRA.
Fixes: 4c5d78a10dc8 ("x86/pagewalk: Re-implement the pagetable walker") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
automation/eclair: add deviations of MISRA C Rule 5.5
MISRA C Rule 5.5 states that "Identifiers shall be distinct from macro
names".
Update ECLAIR configuration to deviate:
- macros expanding to their own name;
- clashes between macros and non-callable entities;
- clashes related to the selection of specific implementations of string
handling functions.
Michal Orzel [Fri, 21 Jun 2024 09:22:05 +0000 (11:22 +0200)]
xen/arm: static-shmem: request host address to be specified for 1:1 domains
As a follow up to commit cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem
should be direct-mapped for direct-mapped domains") add a check to
request that both host and guest physical address must be supplied for
direct mapped domains. Otherwise return an error to prevent unwanted
behavior.
Jan Beulich [Wed, 22 May 2024 10:17:30 +0000 (12:17 +0200)]
x86/shadow: Don't leave trace record field uninitialized
The emulation_count field is set only conditionally right now. Convert
all field setting to an initializer, thus guaranteeing that field to be
set to 0 (default initialized) when GUEST_PAGING_LEVELS != 3.
Rework trace_shadow_emulate() to be consistent with the other trace helpers.
Coverity-ID: 1598430 Fixes: 9a86ac1aa3d2 ("xentrace 5/7: Additional tracing for the shadow code") Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 22 May 2024 13:05:13 +0000 (14:05 +0100)]
x86/shadow: Rework trace_shadow_emulate_other() as sh_trace_gfn_va()
sh_trace_gfn_va() is very similar to sh_trace_gl1e_va(), and a rather shorter
name than trace_shadow_emulate_other().
It's only referenced in CONFIG_HVM=y builds, so give it a __maybe_unused to
placate randconfig builds.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 22 May 2024 12:58:22 +0000 (13:58 +0100)]
x86/shadow: Introduce sh_trace_gl1e_va()
trace_shadow_fixup() and trace_not_shadow_fault() both write out identical
trace records. Reimplement them in terms of a common sh_trace_gl1e_va().
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 22 May 2024 12:51:43 +0000 (13:51 +0100)]
x86/shadow: Rework trace_shadow_gen() into sh_trace_va()
The ((GUEST_PAGING_LEVELS - 2) << 8) expression in the event field is common
to all shadow trace events, so introduce sh_trace() as a very thin wrapper
around trace().
Then, rename trace_shadow_gen() to sh_trace_va() to better describe what it is
doing, and to be more consistent with later cleanup.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Tue, 7 May 2024 11:05:58 +0000 (12:05 +0100)]
tools/xl: Open xldevd.log with O_CLOEXEC
`xl devd` has been observed leaking /var/log/xldevd.log into children.
Note this is specifically safe; dup2() leaves O_CLOEXEC disabled on newfd, so
after setting up stdout/stderr, it's only the logfile fd which will close on
exec().
Link: https://github.com/QubesOS/qubes-issues/issues/8292 Reported-by: Demi Marie Obenour <demi@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Demi Marie Obenour <demi@invisiblethingslab.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Thu, 20 Jun 2024 15:34:56 +0000 (17:34 +0200)]
libelf: avoid UB in elf_xen_feature_{get,set}()
When the left shift amount is up to 31, the shifted quantity wants to be
of unsigned int (or wider) type.
While there also adjust types: get doesn't alter the array and returns a
boolean, while both don't really accept negative "nr". Drop a stray
blank each as well.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Matthew Barnes [Thu, 20 Jun 2024 15:36:46 +0000 (16:36 +0100)]
x86/ioapic: Fix signed shifts in io_apic.c
There exists bitshifts in the IOAPIC code where signed integers are
shifted to the left by up to 31 bits, which is undefined behaviour.
This patch fixes this by changing the integers from signed to unsigned.
Signed-off-by: Matthew Barnes <matthew.barnes@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Thu, 20 Jun 2024 10:10:27 +0000 (12:10 +0200)]
livepatch: use appropriate type for buffer offset variables
As was made noticeable by the last of the commits referenced below,
using a fixed-size type for such purposes is not only against
./CODING_STYLE, but can lead to actual issues. Switch to using size_t
instead, thus also allowing calculations to be lighter-weight in 32-bit
builds.
No functional change for 64-bit builds.
Link: https://gitlab.com/xen-project/xen/-/jobs/7136417308 Fixes: b145b4a39c13 ("livepatch: Handle arbitrary size names with the list operation") Fixes: 5083e0ff939d ("livepatch: Add metadata runtime retrieval mechanism") Fixes: 43d5c5d5f70b ("xen: avoid UB in guest handle arithmetic") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Thu, 20 Jun 2024 10:09:32 +0000 (12:09 +0200)]
x86/irq: forward pending interrupts to new destination in fixup_irqs()
fixup_irqs() is used to evacuate interrupts from to be offlined CPUs. Given
the CPU is to become offline, the normal migration logic used by Xen where the
vector in the previous target(s) is left configured until the interrupt is
received on the new destination is not suitable.
Instead attempt to do as much as possible in order to prevent loosing
interrupts. If fixup_irqs() is called from the CPU to be offlined (as is
currently the case for CPU hot unplug) attempt to forward pending vectors when
interrupts that target the current CPU are migrated to a different destination.
Additionally, for interrupts that have already been moved from the current CPU
prior to the call to fixup_irqs() but that haven't been delivered to the new
destination (iow: interrupts with move_in_progress set and the current CPU set
in ->arch.old_cpu_mask) also check whether the previous vector is pending and
forward it to the new destination.
This allows us to remove the window with interrupts enabled at the bottom of
fixup_irqs(). Such window wasn't safe anyway: references to the CPU to become
offline are removed from interrupts masks, but the per-CPU vector_irq[] array
is not updated to reflect those changes (as the CPU is going offline anyway).
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Leigh Brown [Thu, 20 Jun 2024 10:09:02 +0000 (12:09 +0200)]
tools/libs/light: Fix nic->vlan memory allocation
After the following commit: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
xl list -l aborts with a double free error if a domain has at least
one vif defined:
$ sudo xl list -l
free(): double free detected in tcache 2
Aborted
Orginally, the vlan field was called vid and was defined as an integer.
It was appropriate to call libxl__xs_read_checked() with gc passed as
the string data was copied to a different variable. However, the final
version uses a string data type and the call should have been changed
to use NOGC instead of gc to allow that data to live past the gc
controlled lifetime, in line with the other string fields.
This patch makes the change to pass NOGC instead of gc and moves the
new code to be next to the other string fields (fixing a couple of
errant tabs along the way), as recommended by Jason.
Fixes: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic") Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jason Andryuk [Thu, 20 Jun 2024 10:08:42 +0000 (12:08 +0200)]
hotplug: Restore block-tap phy compatibility
backendtype=phy using the blktap kernel module needs to use write_dev,
but tapback can't support that. tapback should perform better, but make
the script compatible with the old kernel module again.
Fixes: 76a484193d ("hotplug: Update block-tap") Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Wed, 19 Jun 2024 12:11:07 +0000 (14:11 +0200)]
xen: avoid UB in guest handle arithmetic
At least XENMEM_memory_exchange can have huge values passed in the
nr_extents and nr_exchanged fields. Adding such values to pointers can
overflow, resulting in UB. Cast respective pointers to "unsigned long"
while at the same time making the necessary multiplication explicit.
Remaining arithmetic is, despite there possibly being mathematical
overflow, okay as per the C99 spec: "A computation involving unsigned
operands can never overflow, because a result that cannot be represented
by the resulting unsigned integer type is reduced modulo the number that
is one greater than the largest value that can be represented by the
resulting type." The overflow that we need to guard against is checked
for in array_access_ok().
Note that in / down from array_access_ok() the address value is only
ever cast to "unsigned long" anyway, which is why in the invocation from
guest_handle_subrange_okay() the value doesn't need casting back to
pointer type.
In compat grant table code change two guest_handle_add_offset() to avoid
passing in negative offsets.
Since {,__}clear_guest_offset() need touching anyway, also deal with
another (latent) issue there: They were losing the handle type, i.e. the
size of the individual objects accessed. Luckily the few users we
presently have all pass char or uint8 handles.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 30 Apr 2021 15:14:36 +0000 (16:14 +0100)]
x86/defns: Clean up X86_{XCR0,XSS}_* constants
With the exception of one case in read_bndcfgu() which can use ilog2(),
the *_POS defines are unused. Drop them.
X86_XCR0_X87 is the name used by both the SDM and APM, rather than
X86_XCR0_FP.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 30 Apr 2021 19:17:55 +0000 (20:17 +0100)]
x86/cpuid: Fix handling of XSAVE dynamic leaves
First, if XSAVE is available in hardware but not visible to the guest, the
dynamic leaves shouldn't be filled in.
Second, the comment concerning XSS state is wrong. VT-x doesn't manage
host/guest state automatically, but there is provision for "host only" bits to
be set, so the implications are still accurate.
Introduce xstate_compressed_size() to mirror the uncompressed one. Cross
check it at boot.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 30 Apr 2021 19:17:55 +0000 (20:17 +0100)]
x86/cpu-policy: Simplify recalculate_xstate()
Make use of xstate_uncompressed_size() helper rather than maintaining the
running calculation while accumulating feature components.
The rest of the CPUID data can come direct from the raw cpu policy. All
per-component data form an ABI through the behaviour of the X{SAVE,RSTOR}*
instructions.
Use for_each_set_bit() rather than opencoding a slightly awkward version of
it. Mask the attributes in ecx down based on the visible features. This
isn't actually necessary for any components or attributes defined at the time
of writing (up to AMX), but is added out of an abundance of caution.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 30 Apr 2021 19:17:55 +0000 (20:17 +0100)]
x86/xstate: Rework xstate_ctxt_size() as xstate_uncompressed_size()
We're soon going to need a compressed helper of the same form.
The size of the uncompressed image depends on the single element with the
largest offset + size. Sadly this isn't always the element with the largest
index.
Name the per-xstate-component cpu_policy struture, for legibility of the logic
in xstate_uncompressed_size(). Cross-check with hardware during boot, and
remove hw_uncompressed_size().
This means that the migration paths don't need to mess with XCR0 just to
sanity check the buffer size. It also means we can drop the "fastpath" check
against xfeature_mask (there to skip some XCR0 writes); this path is going to
be dead logic the moment Xen starts using supervisor states itself.
The users of hw_uncompressed_size() in xstate_init() can (and indeed need) to
be replaced with CPUID instructions. They run with feature_mask in XCR0, and
prior to setup_xstate_features() on the BSP.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 22 May 2024 23:55:34 +0000 (00:55 +0100)]
x86/boot: Collect the Raw CPU Policy earlier on boot
This is a tangle, but it's a small step in the right direction.
In the following change, xstate_init() is going to start using the Raw policy.
calculate_raw_cpu_policy() is sufficiently separate from the other policies to
safely move like this.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 21 Feb 2020 17:56:57 +0000 (17:56 +0000)]
x86/xstate: Cross-check dynamic XSTATE sizes at boot
Right now, xstate_ctxt_size() performs a cross-check of size with CPUID in for
every call. This is expensive, being used for domain create/migrate, as well
as to service certain guest CPUID instructions.
Instead, arrange to check the sizes once at boot. See the code comments for
details. Right now, it just checks hardware against the algorithm
expectations. Later patches will cross-check Xen's XSTATE calculations too.
Introduce more X86_XCR0_* and X86_XSS_* constants CPUID bits. This is to
maximise coverage in the sanity check, even if we don't expect to
use/virtualise some of these features any time soon. Leave HDC and HWP alone
for now; we don't have CPUID bits from them stored nicely.
Only perform the cross-checks when SELF_TESTS are active. It's only
developers or new hardware liable to trip these checks, and Xen at least
tracks "maximum value ever seen in xcr0" for the lifetime of the VM, which we
don't want to be tickling in the general case.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 22 May 2024 16:23:54 +0000 (17:23 +0100)]
x86/xstate: Fix initialisation of XSS cache
The clobbering of this_cpu(xcr0) and this_cpu(xss) to architecturally invalid
values is to force the subsequent set_xcr0() and set_msr_xss() to reload the
hardware register.
While XCR0 is reloaded in xstate_init(), MSR_XSS isn't. This causes
get_msr_xss() to return the invalid value, and logic of the form:
old = get_msr_xss();
set_msr_xss(new);
...
set_msr_xss(old);
to try and restore said invalid value.
The architecturally invalid value must be purged from the cache, meaning the
hardware register must be written at least once. This in turn highlights that
the invalid value must only be used in the case that the hardware register is
available.
Fixes: f7f4a523927f ("x86/xstate: reset cached register values on resume") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 14 Jun 2024 12:05:40 +0000 (13:05 +0100)]
xen/arch: Centralise __read_mostly and __ro_after_init
These living in cache.h is inherited from Linux, but cache.h is not a terribly
appropriately location for them to live.
__read_mostly is an optimisation related to data placement in order to avoid
having shared data in cachelines that are likely to be written to, but it
really is just a section of the linked image separating data by usage
patterns; it has nothing to do with cache sizes or flushing logic.
Worse, __ro_after_init was only in xen/cache.h because __read_mostly was in
arch/cache.h, and has literally nothing whatsoever to do with caches.
Move the definitions into xen/sections.h, which in particular means that
RISC-V doesn't need to repeat the problematic pattern. Take the opportunity
to provide a short descriptions of what these are used for.
For now, leave TODO comments next to the other identical definitions. It
turns out that unpicking cache.h is more complicated than it appears because a
number of files use it for transitive dependencies.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Tue, 18 Jun 2024 12:48:35 +0000 (13:48 +0100)]
xen/irq: Address MISRA Rule 8.3 violation
When centralising irq_ack_none(), different architectures had different names
for the parameter. As it's type is struct irq_desc *, it should be named
desc. Make this consistent.
No functional change.
Fixes: 8aeda4a241ab ("arch/irq: Make irq_ack_none() mandatory") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Michal Orzel [Wed, 19 Jun 2024 06:46:52 +0000 (08:46 +0200)]
xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure
Building Xen with CONFIG_STATIC_SHM=y results in a build failure:
arch/arm/static-shmem.c: In function 'process_shm':
arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized [-Werror=maybe-uninitialized]
327 | if ( is_domain_direct_mapped(d) && (pbase != gbase) )
arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
305 | paddr_t gbase, pbase, psize;
This is because the commit cb1ddafdc573 adds a check referencing
gbase/pbase variables which were not yet assigned a value. Fix it.
Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Henry Wang [Fri, 24 May 2024 22:55:20 +0000 (15:55 -0700)]
xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor
There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.
An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```
The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.
Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.
Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.
Since the guest magic region allocation from init-dom0less is for
XenStore, and the XenStore page is now allocated from the hypervisor,
instead of hardcoding the guest magic pages region, use
xc_hvm_param_get() to get the XenStore page PFN. Rename alloc_xs_page()
to get_xs_page() to reflect the changes.
With this change, some existing code is not needed anymore, including:
(1) The definition of the XenStore page offset.
(2) Call to xc_domain_setmaxmem() and xc_clear_domain_page() as we
don't need to set the max mem and clear the page anymore.
(3) Foreign mapping of the XenStore page, setting of XenStore interface
status and HVM_PARAM_STORE_PFN from init-dom0less, as they are set
by the hypervisor.
Take the opportunity to do some coding style improvements when possible.
Reported-by: Alec Kwapis <alec.kwapis@medtronic.com> Suggested-by: Daniel P. Smith <dpsmith@apertussolutions.com> Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Henry Wang [Wed, 19 Jun 2024 00:27:51 +0000 (17:27 -0700)]
xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains
Currently, users are allowed to map static shared memory in a
non-direct-mapped way for direct-mapped domains. This can lead to
clashing of guest memory spaces. Also, the current extended region
finding logic only removes the host physical addresses of the
static shared memory areas for direct-mapped domains, which may be
inconsistent with the guest memory map if users map the static
shared memory in a non-direct-mapped way. This will lead to incorrect
extended region calculation results.
To make things easier, add restriction that static shared memory
should also be direct-mapped for direct-mapped domains. Check the
host physical address to be matched with guest physical address when
parsing the device tree. Document this restriction in the doc.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Michal Orzel <michal.orzel@amd.com>
Andrew Cooper [Mon, 17 Jun 2024 17:40:32 +0000 (18:40 +0100)]
xen/ubsan: Fix UB in type_descriptor declaration
struct type_descriptor is arranged with a NUL terminated string following the
kind/info fields.
The only reason this doesn't trip UBSAN detection itself (on more modern
compilers at least) is because struct type_descriptor is only referenced in
suppressed regions.
Switch the declaration to be a real flexible member. No functional change.
Fixes: 00fcf4dd8eb4 ("xen/ubsan: Import ubsan implementation from Linux 4.13") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Tue, 18 Jun 2024 13:15:10 +0000 (15:15 +0200)]
x86/irq: handle moving interrupts in _assign_irq_vector()
Currently there's logic in fixup_irqs() that attempts to prevent
_assign_irq_vector() from failing, as fixup_irqs() is required to evacuate all
interrupts from the CPUs not present in the input mask. The current logic in
fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.
Instead of attempting to fixup the interrupt descriptor in fixup_irqs() so that
_assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector()
to deal with interrupts that have either move_{in_progress,cleanup_count} set
and no remaining online CPUs in ->arch.cpu_mask.
If _assign_irq_vector() is requested to move an interrupt in the state
described above, first attempt to see if ->arch.old_cpu_mask contains any valid
CPUs that could be used as fallback, and if that's the case do move the
interrupt back to the previous destination. Note this is easier because the
vector hasn't been released yet, so there's no need to allocate and setup a new
vector on the destination.
Due to the logic in fixup_irqs() that clears offline CPUs from
->arch.old_cpu_mask (and releases the old vector if the mask becomes empty) it
shouldn't be possible to get into _assign_irq_vector() with
->arch.move_{in_progress,cleanup_count} set but no online CPUs in
->arch.old_cpu_mask.
However if ->arch.move_{in_progress,cleanup_count} is set and the interrupt has
also changed affinity, it's possible the members of ->arch.old_cpu_mask are no
longer part of the affinity set, move the interrupt to a different CPU part of
the provided mask and keep the current ->arch.old_{cpu_mask,vector} for the
pending interrupt movement to be completed.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 18 Jun 2024 13:14:49 +0000 (15:14 +0200)]
x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs()
Given the current logic it's possible for ->arch.old_cpu_mask to get out of
sync: if a CPU set in old_cpu_mask is offlined and then onlined
again without old_cpu_mask having been updated the data in the mask will no
longer be accurate, as when brought back online the CPU will no longer have
old_vector configured to handle the old interrupt source.
If there's an interrupt movement in progress, and the to be offlined CPU (which
is the call context) is in the old_cpu_mask, clear it and update the mask, so
it doesn't contain stale data.
Note that when the system is going down fixup_irqs() will be called by
smp_send_stop() from CPU 0 with a mask with only CPU 0 on it, effectively
asking to move all interrupts to the current caller (CPU 0) which is the only
CPU to remain online. In that case we don't care to migrate interrupts that
are in the process of being moved, as it's likely we won't be able to move all
interrupts to CPU 0 due to vector shortage anyway.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 18 Jun 2024 13:12:44 +0000 (15:12 +0200)]
x86/Intel: unlock CPUID earlier for the BSP
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
this bit is set by the BIOS then CPUID evaluation does not work when
data from any leaf greater than two is needed; early_cpu_init() in
particular wants to collect leaf 7 data.
Cure this by unlocking CPUID right before evaluating anything which
depends on the maximum CPUID leaf being greater than two.
Inspired by (and description cloned from) Linux commit 0c2f6d04619e
("x86/topology/intel: Unlock CPUID before evaluating anything").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Fri, 7 Jun 2024 20:13:17 +0000 (22:13 +0200)]
automation/eclair_analysis: address remaining violations of MISRA C Rule 20.12
The DEFINE macro in asm-offsets.c (for all architectures) still generates
violations despite the file(s) being excluded from compliance, due to the
fact that in its expansion it sometimes refers entities in non-excluded files.
These corner cases are deviated by the configuration.
Penny Zheng [Fri, 24 May 2024 12:40:55 +0000 (13:40 +0100)]
xen/docs: Describe static shared memory when host address is not provided
This commit describe the new scenario where host address is not provided
in "xen,shared-mem" property and a new example is added to the page to
explain in details.
Luca Fancellu [Fri, 24 May 2024 12:40:54 +0000 (13:40 +0100)]
xen/arm: Implement the logic for static shared memory from Xen heap
This commit implements the logic to have the static shared memory banks
from the Xen heap instead of having the host physical address passed from
the user.
When the host physical address is not supplied, the physical memory is
taken from the Xen heap using allocate_domheap_memory, the allocation
needs to occur at the first handled DT node and the allocated banks
need to be saved somewhere.
Introduce the 'shm_heap_banks' for that reason, a struct that will hold
the banks allocated from the heap, its field bank[].shmem_extra will be
used to point to the bootinfo shared memory banks .shmem_extra space, so
that there is not further allocation of memory and every bank in
shm_heap_banks can be safely identified by the shm_id to reconstruct its
traceability and if it was allocated or not.
A search into 'shm_heap_banks' will reveal if the banks were allocated
or not, in case the host address is not passed, and the callback given
to allocate_domheap_memory will store the banks in the structure and
map them to the current domain, to do that, some changes to
acquire_shared_memory_bank are made to let it differentiate if the bank
is from the heap and if it is, then assign_pages is called for every
bank.
When the bank is already allocated, for every bank allocated with the
corresponding shm_id, handle_shared_mem_bank is called and the mapping
are done.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
The function allocate_bank_memory allocates pages from the heap and
maps them to the guest using guest_physmap_add_page.
As a preparation work to support static shared memory bank when the
host physical address is not provided, Xen needs to allocate memory
from the heap, so rework allocate_bank_memory moving out the page
allocation in a new function called allocate_domheap_memory.
The function allocate_domheap_memory takes a callback function and
a pointer to some extra information passed to the callback and this
function will be called for every region, until a defined size is
reached.
In order to keep allocate_bank_memory functionality, the callback
passed to allocate_domheap_memory is a wrapper for
guest_physmap_add_page.
Let allocate_domheap_memory be externally visible, in order to use
it in the future from the static shared memory module.
Take the opportunity to change the signature of allocate_bank_memory
and remove the 'struct domain' parameter, which can be retrieved from
'struct kernel_info'.
No functional changes is intended.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Luca Fancellu [Fri, 24 May 2024 12:40:52 +0000 (13:40 +0100)]
xen/arm: Parse xen,shared-mem when host phys address is not provided
Handle the parsing of the 'xen,shared-mem' property when the host physical
address is not provided, this commit is introducing the logic to parse it,
but the functionality is still not implemented and will be part of future
commits.
Rework the logic inside process_shm_node to check the shm_id before doing
the other checks, because it ease the logic itself, add more comment on
the logic.
Now when the host physical address is not provided, the value
INVALID_PADDR is chosen to signal this condition and it is stored as
start of the bank, due to that change also early_print_info_shmem and
init_sharedmem_pages are changed, to not handle banks with start equal
to INVALID_PADDR.
Another change is done inside meminfo_overlap_check, to skip banks that
are starting with the start address INVALID_PADDR, that function is used
to check banks from reserved memory, shared memory and ACPI and since
the comment above the function states that wrapping around is not handled,
it's unlikely for these bank to have the start address as INVALID_PADDR.
Same change is done inside consider_modules, find_unallocated_memory and
dt_unreserved_regions functions, in order to skip banks that starts with
INVALID_PADDR from any computation.
The changes above holds because of this consideration.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Penny Zheng [Tue, 28 May 2024 12:56:03 +0000 (13:56 +0100)]
xen/p2m: put reference for level 2 superpage
We are doing foreign memory mapping for static shared memory, and
there is a great possibility that it could be super mapped.
But today, p2m_put_l3_page could not handle superpages.
This commits implements a new function p2m_put_l2_superpage to handle
level 2 superpages, specifically for helping put extra references for
foreign superpages.
Modify relinquish_p2m_mapping as well to take into account preemption
when we have a level-2 foreign mapping.
Currently level 1 superpages are not handled because Xen is not
preemptible and therefore some work is needed to handle such superpages,
for which at some point Xen might end up freeing memory and therefore
for such a big mapping it could end up in a very long operation.
Luca Fancellu [Fri, 24 May 2024 12:40:50 +0000 (13:40 +0100)]
xen/arm: Wrap shared memory mapping code in one function
Wrap the code and logic that is calling assign_shared_memory
and map_regions_p2mt into a new function 'handle_shared_mem_bank',
it will become useful later when the code will allow the user to
don't pass the host physical address.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Luca Fancellu [Fri, 24 May 2024 12:40:49 +0000 (13:40 +0100)]
xen/arm: Lookup bootinfo shm bank during the mapping
The current static shared memory code is using bootinfo banks when it
needs to find the number of borrowers, so every time assign_shared_memory
is called, the bank is searched in the bootinfo.shmem structure.
There is nothing wrong with it, however the bank can be used also to
retrieve the start address and size and also to pass less argument to
assign_shared_memory. When retrieving the information from the bootinfo
bank, it's also possible to move the checks on alignment to
process_shm_node in the early stages.
So create a new function find_shm_bank_by_id() which takes a
'struct shared_meminfo' structure and the shared memory ID, to look for a
bank with a matching ID, take the physical host address and size from the
bank, pass the bank to assign_shared_memory() removing the now unnecessary
arguments and finally remove the acquire_nr_borrower_domain() function
since now the information can be extracted from the passed bank.
Move the "xen,shm-id" parsing early in process_shm to bail out quickly in
case of errors (unlikely), as said above, move the checks on alignment
to process_shm_node.
Drawback of this change is that now the bootinfo are used also when the
bank doesn't need to be allocated, however it will be convenient later
to use it as an argument for assign_shared_memory when dealing with
the use case where the Host physical address is not supplied by the user.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Jan Beulich [Thu, 13 Jun 2024 14:55:22 +0000 (16:55 +0200)]
x86/EPT: drop questionable mfn_valid() from epte_get_entry_emt()
mfn_valid() is RAM-focused; it will often return false for MMIO. Yet
access to actual MMIO space should not generally be restricted to UC
only; especially video frame buffer accesses are unduly affected by such
a restriction.
Since, as of 777c71d31325 ("x86/EPT: avoid marking non-present entries
for re-configuring"), the function won't be called with INVALID_MFN or,
worse, truncated forms thereof anymore, we call fully drop that check.
Fixes: 81fd0d3ca4b2 ("x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Thu, 13 Jun 2024 14:54:17 +0000 (16:54 +0200)]
x86/EPT: avoid marking non-present entries for re-configuring
For non-present entries EMT, like most other fields, is meaningless to
hardware. Make the logic in ept_set_entry() setting the field (and iPAT)
conditional upon dealing with a present entry, leaving the value at 0
otherwise. This has two effects for epte_get_entry_emt() which we'll
want to leverage subsequently:
1) The call moved here now won't be issued with INVALID_MFN anymore (a
respective BUG_ON() is being added).
2) Neither of the other two calls could now be issued with a truncated
form of INVALID_MFN anymore (as long as there's no bug anywhere
marking an entry present when that was populated using INVALID_MFN).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Thu, 13 Jun 2024 14:53:34 +0000 (16:53 +0200)]
x86/EPT: correct special page checking in epte_get_entry_emt()
mfn_valid() granularity is (currently) 256Mb. Therefore the start of a
1Gb page passing the test doesn't necessarily mean all parts of such a
range would also pass. Yet using the result of mfn_to_page() on an MFN
which doesn't pass mfn_valid() checking is liable to result in a crash
(the invocation of mfn_to_page() alone is presumably "just" UB in such a
case).
Fixes: ca24b2ffdbd9 ("x86/hvm: set 'ipat' in EPT for special pages") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jens Wiklander [Mon, 10 Jun 2024 06:53:43 +0000 (08:53 +0200)]
xen/arm: ffa: support notification
Add support for FF-A notifications, currently limited to an SP (Secure
Partition) sending an asynchronous notification to a guest.
Guests and Xen itself are made aware of pending notifications with an
interrupt. The interrupt handler triggers a tasklet to retrieve the
notifications using the FF-A ABI and deliver them to their destinations.
Update ffa_partinfo_domain_init() to return error code like
ffa_notif_domain_init().
Jens Wiklander [Mon, 10 Jun 2024 06:53:42 +0000 (08:53 +0200)]
xen/arm: add and call tee_free_domain_ctx()
Add tee_free_domain_ctx() to the TEE mediator framework.
tee_free_domain_ctx() is called from arch_domain_destroy() to allow late
freeing of the d->arch.tee context. This will simplify access to
d->arch.tee for domains retrieved with rcu_lock_domain_by_id().
Jens Wiklander [Mon, 10 Jun 2024 06:53:41 +0000 (08:53 +0200)]
xen/arm: add and call init_tee_secondary()
Add init_tee_secondary() to the TEE mediator framework and call it from
start_secondary() late enough that per-cpu interrupts can be configured
on CPUs as they are initialized. This is needed in later patches.
Jens Wiklander [Mon, 10 Jun 2024 06:53:40 +0000 (08:53 +0200)]
xen/arm: allow dynamically assigned SGI handlers
Updates so request_irq() can be used with a dynamically assigned SGI irq
as input. This prepares for a later patch where an FF-A schedule
receiver interrupt handler is installed for an SGI generated by the
secure world.
>From the Arm Base System Architecture v1.0C [1]:
"The system shall implement at least eight Non-secure SGIs, assigned to
interrupt IDs 0-7."
gic_route_irq_to_xen() don't gic_set_irq_type() for SGIs since they are
always edge triggered.
gic_interrupt() is updated to route the dynamically assigned SGIs to
do_IRQ() instead of do_sgi(). The latter still handles the statically
assigned SGI handlers like for instance GIC_SGI_CALL_FUNCTION.
Jens Wiklander [Mon, 10 Jun 2024 06:53:39 +0000 (08:53 +0200)]
xen/arm: ffa: simplify ffa_handle_mem_share()
Simplify ffa_handle_mem_share() by removing the start_page_idx and
last_page_idx parameters from get_shm_pages() and check that the number
of pages matches expectations at the end of get_shm_pages().
Jens Wiklander [Mon, 10 Jun 2024 06:53:38 +0000 (08:53 +0200)]
xen/arm: ffa: use ACCESS_ONCE()
Replace read_atomic() with ACCESS_ONCE() to match the intended use, that
is, to prevent the compiler from (via optimization) reading shared
memory more than once.
Jens Wiklander [Mon, 10 Jun 2024 06:53:37 +0000 (08:53 +0200)]
xen/arm: ffa: refactor ffa_handle_call()
Refactors the large switch block in ffa_handle_call() to use common code
for the simple case where it's either an error code or success with no
further parameters.
Jan Beulich [Wed, 12 Jun 2024 12:31:21 +0000 (14:31 +0200)]
x86/physdev: replace physdev_{,un}map_pirq() checking against DOMID_SELF
It's hardly ever correct to check for just DOMID_SELF, as guests have
ways to figure out their domain IDs and hence could instead use those as
inputs to respective hypercalls. Note, however, that for ordinary DomU-s
the adjustment is relaxing things rather than tightening them, since
- as a result of XSA-237 - the respective XSM checks would have rejected
self (un)mapping attempts for other than the control domain.
Since in physdev_map_pirq() handling overall is a little easier this
way, move obtaining of the domain pointer into the caller. Doing the
same for physdev_unmap_pirq() is just to keep both consistent in this
regard.
Fixes: 0b469cd68708 ("Interrupt remapping to PIRQs in HVM guests") Fixes: 9e1a3415b773 ("x86: fixes after emuirq changes") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:30:40 +0000 (14:30 +0200)]
x86/irq: limit interrupt movement done by fixup_irqs()
The current check used in fixup_irqs() to decide whether to move around
interrupts is based on the affinity mask, but such mask can have all bits set,
and hence is unlikely to be a subset of the input mask. For example if an
interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
an all set CPU mask would cause that interrupt to be shuffled around
unconditionally.
What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
input CPU mask, and for that purpose it should check whether the interrupt is
assigned to a CPU not present in the input mask. Assume that ->arch.cpu_mask
is a subset of the ->affinity mask, and keep the current logic that resets the
->affinity mask if the interrupt has to be shuffled around.
Doing the affinity movement based on ->arch.cpu_mask requires removing the
special handling to ->arch.cpu_mask done for high priority vectors, otherwise
the adjustment done to cpu_mask makes them always skip the CPU interrupt
movement.
While there also adjust the comment as to the purpose of fixup_irqs().
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:30:06 +0000 (14:30 +0200)]
x86/irq: describe how the interrupt CPU movement works
The logic to move interrupts across CPUs is complex, attempt to provide a
comment that describes the expected behavior so users of the interrupt system
have more context about the usage of the arch_irq_desc structure fields.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:29:31 +0000 (14:29 +0200)]
x86/smp: do not use shorthand IPI destinations in CPU hot{,un}plug contexts
Due to the current rwlock logic, if the CPU calling get_cpu_maps() does
so from a cpu_hotplug_{begin,done}() region the function will still
return success, because a CPU taking the rwlock in read mode after
having taken it in write mode is allowed. Such corner case makes using
get_cpu_maps() alone not enough to prevent using the shorthand in CPU
hotplug regions.
Introduce a new helper to detect whether the current caller is between a
cpu_hotplug_{begin,done}() region and use it in send_IPI_mask() to restrict
shorthand usage.
Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Wed, 12 Jun 2024 08:52:56 +0000 (10:52 +0200)]
MAINTAINERS: alter EFI section
To get past the recurring friction on the approach to take wrt
workarounds needed for various firmware flaws, I'm stepping down as the
maintainer of our code interfacing with EFI firmware. Two new
maintainers are being introduced in my place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Marek Marczykowski <marmarek@invisiblethingslab.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
This tests if QEMU works in PVH dom0. QEMU in dom0 requires enabling TUN
in the kernel, so do that too.
Add it to both x86 runners, similar to the PVH domU test.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Mon, 10 Jun 2024 11:29:25 +0000 (13:29 +0200)]
x86/pvh: declare PVH dom0 supported with caveats
PVH dom0 is functionally very similar to PVH domU except for the domain
builder and the added set of hypercalls available to it.
The main concern with declaring it "Supported" is the lack of some features
when compared to classic PV dom0, hence switch it's status to supported with
caveats. List the known missing features, there might be more features missing
or not working as expected apart from the ones listed.
Note there's some (limited) PVH dom0 testing on both osstest and gitlab.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Mon, 10 Jun 2024 08:34:05 +0000 (10:34 +0200)]
x86/domain: deviate violation of MISRA C Rule 20.12
MISRA C Rule 20.12 states: "A macro parameter used as an operand to
the # or ## operators, which is itself subject to further macro replacement,
shall only be used as an operand to these operators".
In this case, builds where CONFIG_COMPAT=y the fpu_ctxt
macro is used both as a regular macro argument and as an operand for
stringification in the expansion of CHECK_FIELD_.
This is deviated using a SAF-x-safe comment.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Mon, 10 Jun 2024 08:33:22 +0000 (10:33 +0200)]
x86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count
When adjusting move_cleanup_count to account for CPUs that are offline also
adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
those again and create an imbalance in move_cleanup_count.
Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Nicola Vetrini [Sat, 1 Jun 2024 10:16:56 +0000 (12:16 +0200)]
xen: fix MISRA regressions on rule 20.9 and 20.12
Commit ea59e7d780d9 ("xen/bitops: Cleanup and new infrastructure ahead of
rearrangements") introduced new violations on previously clean rules 20.9 and
20.12 (clean on ARM only, right now).
The first is introduced because CONFIG_CC_IS_CLANG in xen/self-tests.h is not
defined in the configuration under analysis. Using "defined()" instead avoids
relying on the preprocessor's behaviour upon encountering an undedfined identifier
and addresses the violation.
The violation of Rule 20.12 is due to "val" being used both as an ordinary argument
in macro RUNTIME_CHECK, and as a stringification operator.
No functional change.
Fixes: ea59e7d780d9 ("xen/bitops: Cleanup and new infrastructure ahead of rearrangements") Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 24 May 2024 19:37:50 +0000 (20:37 +0100)]
xen/bitops: Rearrange the top of xen/bitops.h
The #include <asm/bitops.h> can move to the top of the file now now that
generic_ffs()/generic_fls() have been untangled.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>