Michal Orzel [Wed, 19 Jun 2024 06:46:52 +0000 (08:46 +0200)]
xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure
Building Xen with CONFIG_STATIC_SHM=y results in a build failure:
arch/arm/static-shmem.c: In function 'process_shm':
arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized [-Werror=maybe-uninitialized]
327 | if ( is_domain_direct_mapped(d) && (pbase != gbase) )
arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
305 | paddr_t gbase, pbase, psize;
This is because the commit cb1ddafdc573 adds a check referencing
gbase/pbase variables which were not yet assigned a value. Fix it.
Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Henry Wang [Fri, 24 May 2024 22:55:20 +0000 (15:55 -0700)]
xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor
There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.
An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```
The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.
Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.
Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.
Since the guest magic region allocation from init-dom0less is for
XenStore, and the XenStore page is now allocated from the hypervisor,
instead of hardcoding the guest magic pages region, use
xc_hvm_param_get() to get the XenStore page PFN. Rename alloc_xs_page()
to get_xs_page() to reflect the changes.
With this change, some existing code is not needed anymore, including:
(1) The definition of the XenStore page offset.
(2) Call to xc_domain_setmaxmem() and xc_clear_domain_page() as we
don't need to set the max mem and clear the page anymore.
(3) Foreign mapping of the XenStore page, setting of XenStore interface
status and HVM_PARAM_STORE_PFN from init-dom0less, as they are set
by the hypervisor.
Take the opportunity to do some coding style improvements when possible.
Reported-by: Alec Kwapis <alec.kwapis@medtronic.com> Suggested-by: Daniel P. Smith <dpsmith@apertussolutions.com> Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Henry Wang [Wed, 19 Jun 2024 00:27:51 +0000 (17:27 -0700)]
xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains
Currently, users are allowed to map static shared memory in a
non-direct-mapped way for direct-mapped domains. This can lead to
clashing of guest memory spaces. Also, the current extended region
finding logic only removes the host physical addresses of the
static shared memory areas for direct-mapped domains, which may be
inconsistent with the guest memory map if users map the static
shared memory in a non-direct-mapped way. This will lead to incorrect
extended region calculation results.
To make things easier, add restriction that static shared memory
should also be direct-mapped for direct-mapped domains. Check the
host physical address to be matched with guest physical address when
parsing the device tree. Document this restriction in the doc.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Michal Orzel <michal.orzel@amd.com>
Andrew Cooper [Mon, 17 Jun 2024 17:40:32 +0000 (18:40 +0100)]
xen/ubsan: Fix UB in type_descriptor declaration
struct type_descriptor is arranged with a NUL terminated string following the
kind/info fields.
The only reason this doesn't trip UBSAN detection itself (on more modern
compilers at least) is because struct type_descriptor is only referenced in
suppressed regions.
Switch the declaration to be a real flexible member. No functional change.
Fixes: 00fcf4dd8eb4 ("xen/ubsan: Import ubsan implementation from Linux 4.13") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Tue, 18 Jun 2024 13:15:10 +0000 (15:15 +0200)]
x86/irq: handle moving interrupts in _assign_irq_vector()
Currently there's logic in fixup_irqs() that attempts to prevent
_assign_irq_vector() from failing, as fixup_irqs() is required to evacuate all
interrupts from the CPUs not present in the input mask. The current logic in
fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.
Instead of attempting to fixup the interrupt descriptor in fixup_irqs() so that
_assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector()
to deal with interrupts that have either move_{in_progress,cleanup_count} set
and no remaining online CPUs in ->arch.cpu_mask.
If _assign_irq_vector() is requested to move an interrupt in the state
described above, first attempt to see if ->arch.old_cpu_mask contains any valid
CPUs that could be used as fallback, and if that's the case do move the
interrupt back to the previous destination. Note this is easier because the
vector hasn't been released yet, so there's no need to allocate and setup a new
vector on the destination.
Due to the logic in fixup_irqs() that clears offline CPUs from
->arch.old_cpu_mask (and releases the old vector if the mask becomes empty) it
shouldn't be possible to get into _assign_irq_vector() with
->arch.move_{in_progress,cleanup_count} set but no online CPUs in
->arch.old_cpu_mask.
However if ->arch.move_{in_progress,cleanup_count} is set and the interrupt has
also changed affinity, it's possible the members of ->arch.old_cpu_mask are no
longer part of the affinity set, move the interrupt to a different CPU part of
the provided mask and keep the current ->arch.old_{cpu_mask,vector} for the
pending interrupt movement to be completed.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 18 Jun 2024 13:14:49 +0000 (15:14 +0200)]
x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs()
Given the current logic it's possible for ->arch.old_cpu_mask to get out of
sync: if a CPU set in old_cpu_mask is offlined and then onlined
again without old_cpu_mask having been updated the data in the mask will no
longer be accurate, as when brought back online the CPU will no longer have
old_vector configured to handle the old interrupt source.
If there's an interrupt movement in progress, and the to be offlined CPU (which
is the call context) is in the old_cpu_mask, clear it and update the mask, so
it doesn't contain stale data.
Note that when the system is going down fixup_irqs() will be called by
smp_send_stop() from CPU 0 with a mask with only CPU 0 on it, effectively
asking to move all interrupts to the current caller (CPU 0) which is the only
CPU to remain online. In that case we don't care to migrate interrupts that
are in the process of being moved, as it's likely we won't be able to move all
interrupts to CPU 0 due to vector shortage anyway.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 18 Jun 2024 13:12:44 +0000 (15:12 +0200)]
x86/Intel: unlock CPUID earlier for the BSP
Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
this bit is set by the BIOS then CPUID evaluation does not work when
data from any leaf greater than two is needed; early_cpu_init() in
particular wants to collect leaf 7 data.
Cure this by unlocking CPUID right before evaluating anything which
depends on the maximum CPUID leaf being greater than two.
Inspired by (and description cloned from) Linux commit 0c2f6d04619e
("x86/topology/intel: Unlock CPUID before evaluating anything").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Fri, 7 Jun 2024 20:13:17 +0000 (22:13 +0200)]
automation/eclair_analysis: address remaining violations of MISRA C Rule 20.12
The DEFINE macro in asm-offsets.c (for all architectures) still generates
violations despite the file(s) being excluded from compliance, due to the
fact that in its expansion it sometimes refers entities in non-excluded files.
These corner cases are deviated by the configuration.
Penny Zheng [Fri, 24 May 2024 12:40:55 +0000 (13:40 +0100)]
xen/docs: Describe static shared memory when host address is not provided
This commit describe the new scenario where host address is not provided
in "xen,shared-mem" property and a new example is added to the page to
explain in details.
Luca Fancellu [Fri, 24 May 2024 12:40:54 +0000 (13:40 +0100)]
xen/arm: Implement the logic for static shared memory from Xen heap
This commit implements the logic to have the static shared memory banks
from the Xen heap instead of having the host physical address passed from
the user.
When the host physical address is not supplied, the physical memory is
taken from the Xen heap using allocate_domheap_memory, the allocation
needs to occur at the first handled DT node and the allocated banks
need to be saved somewhere.
Introduce the 'shm_heap_banks' for that reason, a struct that will hold
the banks allocated from the heap, its field bank[].shmem_extra will be
used to point to the bootinfo shared memory banks .shmem_extra space, so
that there is not further allocation of memory and every bank in
shm_heap_banks can be safely identified by the shm_id to reconstruct its
traceability and if it was allocated or not.
A search into 'shm_heap_banks' will reveal if the banks were allocated
or not, in case the host address is not passed, and the callback given
to allocate_domheap_memory will store the banks in the structure and
map them to the current domain, to do that, some changes to
acquire_shared_memory_bank are made to let it differentiate if the bank
is from the heap and if it is, then assign_pages is called for every
bank.
When the bank is already allocated, for every bank allocated with the
corresponding shm_id, handle_shared_mem_bank is called and the mapping
are done.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
The function allocate_bank_memory allocates pages from the heap and
maps them to the guest using guest_physmap_add_page.
As a preparation work to support static shared memory bank when the
host physical address is not provided, Xen needs to allocate memory
from the heap, so rework allocate_bank_memory moving out the page
allocation in a new function called allocate_domheap_memory.
The function allocate_domheap_memory takes a callback function and
a pointer to some extra information passed to the callback and this
function will be called for every region, until a defined size is
reached.
In order to keep allocate_bank_memory functionality, the callback
passed to allocate_domheap_memory is a wrapper for
guest_physmap_add_page.
Let allocate_domheap_memory be externally visible, in order to use
it in the future from the static shared memory module.
Take the opportunity to change the signature of allocate_bank_memory
and remove the 'struct domain' parameter, which can be retrieved from
'struct kernel_info'.
No functional changes is intended.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Luca Fancellu [Fri, 24 May 2024 12:40:52 +0000 (13:40 +0100)]
xen/arm: Parse xen,shared-mem when host phys address is not provided
Handle the parsing of the 'xen,shared-mem' property when the host physical
address is not provided, this commit is introducing the logic to parse it,
but the functionality is still not implemented and will be part of future
commits.
Rework the logic inside process_shm_node to check the shm_id before doing
the other checks, because it ease the logic itself, add more comment on
the logic.
Now when the host physical address is not provided, the value
INVALID_PADDR is chosen to signal this condition and it is stored as
start of the bank, due to that change also early_print_info_shmem and
init_sharedmem_pages are changed, to not handle banks with start equal
to INVALID_PADDR.
Another change is done inside meminfo_overlap_check, to skip banks that
are starting with the start address INVALID_PADDR, that function is used
to check banks from reserved memory, shared memory and ACPI and since
the comment above the function states that wrapping around is not handled,
it's unlikely for these bank to have the start address as INVALID_PADDR.
Same change is done inside consider_modules, find_unallocated_memory and
dt_unreserved_regions functions, in order to skip banks that starts with
INVALID_PADDR from any computation.
The changes above holds because of this consideration.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Penny Zheng [Tue, 28 May 2024 12:56:03 +0000 (13:56 +0100)]
xen/p2m: put reference for level 2 superpage
We are doing foreign memory mapping for static shared memory, and
there is a great possibility that it could be super mapped.
But today, p2m_put_l3_page could not handle superpages.
This commits implements a new function p2m_put_l2_superpage to handle
level 2 superpages, specifically for helping put extra references for
foreign superpages.
Modify relinquish_p2m_mapping as well to take into account preemption
when we have a level-2 foreign mapping.
Currently level 1 superpages are not handled because Xen is not
preemptible and therefore some work is needed to handle such superpages,
for which at some point Xen might end up freeing memory and therefore
for such a big mapping it could end up in a very long operation.
Luca Fancellu [Fri, 24 May 2024 12:40:50 +0000 (13:40 +0100)]
xen/arm: Wrap shared memory mapping code in one function
Wrap the code and logic that is calling assign_shared_memory
and map_regions_p2mt into a new function 'handle_shared_mem_bank',
it will become useful later when the code will allow the user to
don't pass the host physical address.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Luca Fancellu [Fri, 24 May 2024 12:40:49 +0000 (13:40 +0100)]
xen/arm: Lookup bootinfo shm bank during the mapping
The current static shared memory code is using bootinfo banks when it
needs to find the number of borrowers, so every time assign_shared_memory
is called, the bank is searched in the bootinfo.shmem structure.
There is nothing wrong with it, however the bank can be used also to
retrieve the start address and size and also to pass less argument to
assign_shared_memory. When retrieving the information from the bootinfo
bank, it's also possible to move the checks on alignment to
process_shm_node in the early stages.
So create a new function find_shm_bank_by_id() which takes a
'struct shared_meminfo' structure and the shared memory ID, to look for a
bank with a matching ID, take the physical host address and size from the
bank, pass the bank to assign_shared_memory() removing the now unnecessary
arguments and finally remove the acquire_nr_borrower_domain() function
since now the information can be extracted from the passed bank.
Move the "xen,shm-id" parsing early in process_shm to bail out quickly in
case of errors (unlikely), as said above, move the checks on alignment
to process_shm_node.
Drawback of this change is that now the bootinfo are used also when the
bank doesn't need to be allocated, however it will be convenient later
to use it as an argument for assign_shared_memory when dealing with
the use case where the Host physical address is not supplied by the user.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Jan Beulich [Thu, 13 Jun 2024 14:55:22 +0000 (16:55 +0200)]
x86/EPT: drop questionable mfn_valid() from epte_get_entry_emt()
mfn_valid() is RAM-focused; it will often return false for MMIO. Yet
access to actual MMIO space should not generally be restricted to UC
only; especially video frame buffer accesses are unduly affected by such
a restriction.
Since, as of 777c71d31325 ("x86/EPT: avoid marking non-present entries
for re-configuring"), the function won't be called with INVALID_MFN or,
worse, truncated forms thereof anymore, we call fully drop that check.
Fixes: 81fd0d3ca4b2 ("x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Thu, 13 Jun 2024 14:54:17 +0000 (16:54 +0200)]
x86/EPT: avoid marking non-present entries for re-configuring
For non-present entries EMT, like most other fields, is meaningless to
hardware. Make the logic in ept_set_entry() setting the field (and iPAT)
conditional upon dealing with a present entry, leaving the value at 0
otherwise. This has two effects for epte_get_entry_emt() which we'll
want to leverage subsequently:
1) The call moved here now won't be issued with INVALID_MFN anymore (a
respective BUG_ON() is being added).
2) Neither of the other two calls could now be issued with a truncated
form of INVALID_MFN anymore (as long as there's no bug anywhere
marking an entry present when that was populated using INVALID_MFN).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Thu, 13 Jun 2024 14:53:34 +0000 (16:53 +0200)]
x86/EPT: correct special page checking in epte_get_entry_emt()
mfn_valid() granularity is (currently) 256Mb. Therefore the start of a
1Gb page passing the test doesn't necessarily mean all parts of such a
range would also pass. Yet using the result of mfn_to_page() on an MFN
which doesn't pass mfn_valid() checking is liable to result in a crash
(the invocation of mfn_to_page() alone is presumably "just" UB in such a
case).
Fixes: ca24b2ffdbd9 ("x86/hvm: set 'ipat' in EPT for special pages") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jens Wiklander [Mon, 10 Jun 2024 06:53:43 +0000 (08:53 +0200)]
xen/arm: ffa: support notification
Add support for FF-A notifications, currently limited to an SP (Secure
Partition) sending an asynchronous notification to a guest.
Guests and Xen itself are made aware of pending notifications with an
interrupt. The interrupt handler triggers a tasklet to retrieve the
notifications using the FF-A ABI and deliver them to their destinations.
Update ffa_partinfo_domain_init() to return error code like
ffa_notif_domain_init().
Jens Wiklander [Mon, 10 Jun 2024 06:53:42 +0000 (08:53 +0200)]
xen/arm: add and call tee_free_domain_ctx()
Add tee_free_domain_ctx() to the TEE mediator framework.
tee_free_domain_ctx() is called from arch_domain_destroy() to allow late
freeing of the d->arch.tee context. This will simplify access to
d->arch.tee for domains retrieved with rcu_lock_domain_by_id().
Jens Wiklander [Mon, 10 Jun 2024 06:53:41 +0000 (08:53 +0200)]
xen/arm: add and call init_tee_secondary()
Add init_tee_secondary() to the TEE mediator framework and call it from
start_secondary() late enough that per-cpu interrupts can be configured
on CPUs as they are initialized. This is needed in later patches.
Jens Wiklander [Mon, 10 Jun 2024 06:53:40 +0000 (08:53 +0200)]
xen/arm: allow dynamically assigned SGI handlers
Updates so request_irq() can be used with a dynamically assigned SGI irq
as input. This prepares for a later patch where an FF-A schedule
receiver interrupt handler is installed for an SGI generated by the
secure world.
>From the Arm Base System Architecture v1.0C [1]:
"The system shall implement at least eight Non-secure SGIs, assigned to
interrupt IDs 0-7."
gic_route_irq_to_xen() don't gic_set_irq_type() for SGIs since they are
always edge triggered.
gic_interrupt() is updated to route the dynamically assigned SGIs to
do_IRQ() instead of do_sgi(). The latter still handles the statically
assigned SGI handlers like for instance GIC_SGI_CALL_FUNCTION.
Jens Wiklander [Mon, 10 Jun 2024 06:53:39 +0000 (08:53 +0200)]
xen/arm: ffa: simplify ffa_handle_mem_share()
Simplify ffa_handle_mem_share() by removing the start_page_idx and
last_page_idx parameters from get_shm_pages() and check that the number
of pages matches expectations at the end of get_shm_pages().
Jens Wiklander [Mon, 10 Jun 2024 06:53:38 +0000 (08:53 +0200)]
xen/arm: ffa: use ACCESS_ONCE()
Replace read_atomic() with ACCESS_ONCE() to match the intended use, that
is, to prevent the compiler from (via optimization) reading shared
memory more than once.
Jens Wiklander [Mon, 10 Jun 2024 06:53:37 +0000 (08:53 +0200)]
xen/arm: ffa: refactor ffa_handle_call()
Refactors the large switch block in ffa_handle_call() to use common code
for the simple case where it's either an error code or success with no
further parameters.
Jan Beulich [Wed, 12 Jun 2024 12:31:21 +0000 (14:31 +0200)]
x86/physdev: replace physdev_{,un}map_pirq() checking against DOMID_SELF
It's hardly ever correct to check for just DOMID_SELF, as guests have
ways to figure out their domain IDs and hence could instead use those as
inputs to respective hypercalls. Note, however, that for ordinary DomU-s
the adjustment is relaxing things rather than tightening them, since
- as a result of XSA-237 - the respective XSM checks would have rejected
self (un)mapping attempts for other than the control domain.
Since in physdev_map_pirq() handling overall is a little easier this
way, move obtaining of the domain pointer into the caller. Doing the
same for physdev_unmap_pirq() is just to keep both consistent in this
regard.
Fixes: 0b469cd68708 ("Interrupt remapping to PIRQs in HVM guests") Fixes: 9e1a3415b773 ("x86: fixes after emuirq changes") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:30:40 +0000 (14:30 +0200)]
x86/irq: limit interrupt movement done by fixup_irqs()
The current check used in fixup_irqs() to decide whether to move around
interrupts is based on the affinity mask, but such mask can have all bits set,
and hence is unlikely to be a subset of the input mask. For example if an
interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
an all set CPU mask would cause that interrupt to be shuffled around
unconditionally.
What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
input CPU mask, and for that purpose it should check whether the interrupt is
assigned to a CPU not present in the input mask. Assume that ->arch.cpu_mask
is a subset of the ->affinity mask, and keep the current logic that resets the
->affinity mask if the interrupt has to be shuffled around.
Doing the affinity movement based on ->arch.cpu_mask requires removing the
special handling to ->arch.cpu_mask done for high priority vectors, otherwise
the adjustment done to cpu_mask makes them always skip the CPU interrupt
movement.
While there also adjust the comment as to the purpose of fixup_irqs().
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:30:06 +0000 (14:30 +0200)]
x86/irq: describe how the interrupt CPU movement works
The logic to move interrupts across CPUs is complex, attempt to provide a
comment that describes the expected behavior so users of the interrupt system
have more context about the usage of the arch_irq_desc structure fields.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 12 Jun 2024 12:29:31 +0000 (14:29 +0200)]
x86/smp: do not use shorthand IPI destinations in CPU hot{,un}plug contexts
Due to the current rwlock logic, if the CPU calling get_cpu_maps() does
so from a cpu_hotplug_{begin,done}() region the function will still
return success, because a CPU taking the rwlock in read mode after
having taken it in write mode is allowed. Such corner case makes using
get_cpu_maps() alone not enough to prevent using the shorthand in CPU
hotplug regions.
Introduce a new helper to detect whether the current caller is between a
cpu_hotplug_{begin,done}() region and use it in send_IPI_mask() to restrict
shorthand usage.
Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Jan Beulich [Wed, 12 Jun 2024 08:52:56 +0000 (10:52 +0200)]
MAINTAINERS: alter EFI section
To get past the recurring friction on the approach to take wrt
workarounds needed for various firmware flaws, I'm stepping down as the
maintainer of our code interfacing with EFI firmware. Two new
maintainers are being introduced in my place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Marek Marczykowski <marmarek@invisiblethingslab.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
This tests if QEMU works in PVH dom0. QEMU in dom0 requires enabling TUN
in the kernel, so do that too.
Add it to both x86 runners, similar to the PVH domU test.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Mon, 10 Jun 2024 11:29:25 +0000 (13:29 +0200)]
x86/pvh: declare PVH dom0 supported with caveats
PVH dom0 is functionally very similar to PVH domU except for the domain
builder and the added set of hypercalls available to it.
The main concern with declaring it "Supported" is the lack of some features
when compared to classic PV dom0, hence switch it's status to supported with
caveats. List the known missing features, there might be more features missing
or not working as expected apart from the ones listed.
Note there's some (limited) PVH dom0 testing on both osstest and gitlab.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Mon, 10 Jun 2024 08:34:05 +0000 (10:34 +0200)]
x86/domain: deviate violation of MISRA C Rule 20.12
MISRA C Rule 20.12 states: "A macro parameter used as an operand to
the # or ## operators, which is itself subject to further macro replacement,
shall only be used as an operand to these operators".
In this case, builds where CONFIG_COMPAT=y the fpu_ctxt
macro is used both as a regular macro argument and as an operand for
stringification in the expansion of CHECK_FIELD_.
This is deviated using a SAF-x-safe comment.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Mon, 10 Jun 2024 08:33:22 +0000 (10:33 +0200)]
x86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count
When adjusting move_cleanup_count to account for CPUs that are offline also
adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
those again and create an imbalance in move_cleanup_count.
Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Nicola Vetrini [Sat, 1 Jun 2024 10:16:56 +0000 (12:16 +0200)]
xen: fix MISRA regressions on rule 20.9 and 20.12
Commit ea59e7d780d9 ("xen/bitops: Cleanup and new infrastructure ahead of
rearrangements") introduced new violations on previously clean rules 20.9 and
20.12 (clean on ARM only, right now).
The first is introduced because CONFIG_CC_IS_CLANG in xen/self-tests.h is not
defined in the configuration under analysis. Using "defined()" instead avoids
relying on the preprocessor's behaviour upon encountering an undedfined identifier
and addresses the violation.
The violation of Rule 20.12 is due to "val" being used both as an ordinary argument
in macro RUNTIME_CHECK, and as a stringification operator.
No functional change.
Fixes: ea59e7d780d9 ("xen/bitops: Cleanup and new infrastructure ahead of rearrangements") Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 24 May 2024 19:37:50 +0000 (20:37 +0100)]
xen/bitops: Rearrange the top of xen/bitops.h
The #include <asm/bitops.h> can move to the top of the file now now that
generic_ffs()/generic_fls() have been untangled.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Sat, 9 Mar 2024 02:44:56 +0000 (02:44 +0000)]
xen/bitops: Clean up ffs64()/fls64() definitions
Implement ffs64() and fls64() as plain static inlines, dropping the ifdefary
and intermediate generic_f?s64() forms.
Add tests for all interesting bit positions at 32bit boundaries.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 14 Mar 2024 23:31:11 +0000 (23:31 +0000)]
x86/bitops: Improve arch_ffs() in the general case
The asm in arch_ffs() is safe but inefficient.
CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a zero
input. It is safe to pre-load the destination register with -1 and drop the
conditional logic.
However, it is common to find ffs() in a context where the optimiser knows
that x is non-zero even if it the value isn't known precisely. In this case,
it's safe to drop the preload of -1 too.
There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-92 (-92)
Function old new delta
mask_write 121 113 -8
xmem_pool_alloc 1076 1056 -20
test_bitops 390 358 -32
pt_update_contig_markers 1236 1204 -32
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Wed, 31 Jan 2024 18:31:16 +0000 (18:31 +0000)]
xen/bitops: Implement ffs() in common logic
Perform constant-folding unconditionally, rather than having it implemented
inconsistency between architectures.
Confirm the expected behaviour with compile time and boot time tests.
For non-constant inputs, use arch_ffs() if provided but fall back to
generic_ffsl() if not. In particular, RISC-V doesn't have a builtin that
works in all configurations.
For x86, rename ffs() to arch_ffs() and adjust the prototype.
For PPC, __builtin_ctz() is 1/3 of the size of size of the transform to
generic_fls(). Drop the definition entirely. ARM too benefits in the general
case by using __builtin_ctz(), but less dramatically because it using
optimised asm().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 24 May 2024 12:36:25 +0000 (13:36 +0100)]
xen/bitops: Implement generic_ffsl()/generic_flsl() in lib/
generic_ffs()/generic_fls*( being static inline is the cause of lots of the
complexity between the common and arch-specific bitops.h
They appear to be static inline for constant-folding reasons (ARM), but there
are better ways to achieve the same effect.
It is presumptuous that an unrolled binary search is the right algorithm to
use on all microarchitectures. Indeed, it's not for the eventual users, but
that can be addressed at a later point.
It is also nonsense to implement the int form as the base primitive and
construct the long form from 2x int in 64-bit builds, when it's just one extra
step to operate at the native register width.
Therefore, implement generic_ffsl()/generic_flsl() in lib/. They're not
actually needed in x86/ARM/PPC by the end of the cleanup (i.e. the functions
will be dropped by the linker), and they're only expected be needed by RISC-V
on hardware which lacks the Zbb extension.
Implement generic_fls() in terms of generic_flsl() for now, but this will be
cleaned up in due course.
Provide basic runtime testing using __constructor inside the lib/ file. This
is important, as it means testing runs if and only if generic_f?sl() are used
elsewhere in Xen.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 8 Mar 2024 23:45:08 +0000 (23:45 +0000)]
xen/bitops: Cleanup and new infrastructure ahead of rearrangements
* Rename __attribute_pure__ to just __pure before it gains users.
* Introduce __constructor which is going to be used in lib/, and is
unconditionally cf_check.
* Identify the areas of xen/bitops.h which are a mess.
* Introduce xen/self-tests.h as helpers for compile and boot time testing.
This provides a statement of the ABI, and a confirmation that arch-specific
implementations behave as expected.
* Introduce HIDE() in macros.h. While it's only used in self-tests.h for
now, we're going to consolidate similar constructs in due course.
Sadly Clang 7 and older isn't happy with the compile time checks. Skip them,
and just rely on the runtime checks.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 14 Mar 2024 20:38:44 +0000 (20:38 +0000)]
xen/bitops: Delete find_first_set_bit()
No more users.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Thu, 30 May 2024 17:58:18 +0000 (18:58 +0100)]
arch/irq: Centralise no_irq_type
Having no_irq_type defined per arch, but using common callbacks is a mess, and
is particualrly hard to bootstrap a new architecture with.
Now that the ack()/end() hooks have been exported suitably, move the
definition of no_irq_type into common/irq.c, and make it const too for good
measure.
No functional change, but a whole lot less tangled.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Oleksii Kurochko [Wed, 29 May 2024 19:55:02 +0000 (21:55 +0200)]
xen/riscv: Update Kconfig in preparation for a full Xen build
Disable unnecessary configs for two cases:
1. By utilizing EXTRA_FIXED_RANDCONFIG for randconfig builds (GitLab CI jobs).
2. By using tiny64_defconfig for non-randconfig builds.
Only configs which lead to compilation issues were disabled.
Remove lines related to disablement of configs which aren't affected
compilation:
-# CONFIG_SCHED_CREDIT is not set
-# CONFIG_SCHED_RTDS is not set
-# CONFIG_SCHED_NULL is not set
-# CONFIG_SCHED_ARINC653 is not set
-# CONFIG_TRACEBUFFER is not set
-# CONFIG_HYPFS is not set
-# CONFIG_SPECULATIVE_HARDEN_ARRAY is not set
Update argo.c to include asm/p2m.h directly, rather than on a transitive
dependency through asm/domain.h Update asm/p2m.h to include xen/errno.h,
rather than rely on it having included already.
CONFIG_XSM=n as it requires an introduction of:
* boot_module_find_by_kind()
* BOOTMOD_XSM
* struct bootmodule
* copy_from_paddr()
The mentioned things aren't introduced now.
CONFIG_BOOT_TIME_CPUPOOLS requires an introduction of cpu_physical_id() and
acpi_disabled, so it is disabled for now.
PERF_COUNTERS requires asm/perf.h and asm/perfc-defn.h, so it is
also disabled for now, as RISC-V hasn't introduced this headers yet.
LIVEPATCH isn't ready for RISC-V too and it can be overriden by randconfig,
so to avoid compilation errors for randconfig it is disabled for now.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Fix up common/argo.c rather than inserting a transitive dependency] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Thu, 30 May 2024 07:53:18 +0000 (09:53 +0200)]
x86/hvm: allow XENMEM_machine_memory_map
For HVM based control domains XENMEM_machine_memory_map must be available so
that the `e820_host` xl.cfg option can be used.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Sat, 9 Mar 2024 02:22:53 +0000 (02:22 +0000)]
xen/bitops: Replace find_first_set_bit() with ffs()/ffsl() - 1
find_first_set_bit() is a Xen-ism which has undefined behaviour with a 0
input. The latter is well defined with an input of 0, and is a found outside
of Xen too.
timer_sanitize_int_route(), pt_update_contig_markers() and
set_iommu_ptes_present() are all already operating on unsigned int data, so
switch straight to ffs().
The ffsl() in pvh_populate_memory_range() needs coercion to unsigned to keep
the typecheck in min() happy in the short term.
_init_heap_pages() is comparing the LSB of two different addresses, so the -1
cancels off both sides of the expression.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Fri, 24 May 2024 12:36:15 +0000 (13:36 +0100)]
xen/page_alloc: Coerce min(flsl(), foo) expressions to being unsigned
This is in order to maintain bisectability through the subsequent changes,
where flsl() changes sign-ness non-atomically by architecture.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Thu, 30 May 2024 10:02:16 +0000 (11:02 +0100)]
tools: (Actually) drop libsystemd as a dependency
When reinstating some of systemd.m4 between v1 and v2, I reintroduced a little
too much. While {c,o}xenstored are indeed no longer linked against
libsystemd, ./configure still looks for it.
Drop this too.
Fixes: ae26101f6bfc ("tools: Drop libsystemd as a dependency") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monné [Wed, 29 May 2024 14:11:19 +0000 (16:11 +0200)]
xen/x86: remove foreign mappings from the p2m on teardown
Iterate over the p2m up to the maximum recorded gfn and remove any foreign
mappings, in order to drop the underlying page references and thus don't keep
extra page references if a domain is destroyed while still having foreign
mappings on it's p2m.
The logic is similar to the one used on Arm.
Note that foreign mappings cannot be created by guests that have altp2m or
nested HVM enabled, as p2ms different than the host one are not currently
scrubbed when destroyed in order to drop references to any foreign maps.
It's unclear whether the right solution is to take an extra reference when
foreign maps are added to p2ms different than the host one, or just rely on the
host p2m already having a reference. The mapping being removed from the host
p2m should cause it to be dropped on all domain p2ms.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Roger Pau Monné [Wed, 29 May 2024 14:10:04 +0000 (16:10 +0200)]
xen: enable altp2m at create domain domctl
Enabling it using an HVM param is fragile, and complicates the logic when
deciding whether options that interact with altp2m can also be enabled.
Leave the HVM param value for consumption by the guest, but prevent it from
being set. Enabling is now done using and additional altp2m specific field in
xen_domctl_createdomain.
Note that albeit only currently implemented in x86, altp2m could be implemented
in other architectures, hence why the field is added to xen_domctl_createdomain
instead of xen_arch_domainconfig.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Christian Lindig <christian.lindig@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> # hypervisor Acked-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> # tools/libs/ Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Andrew Cooper [Tue, 28 May 2024 14:11:54 +0000 (15:11 +0100)]
xen: Introduce CONFIG_SELF_TESTS
... and move x86's stub_selftest() under this new option.
There is value in having these tests included in release builds too.
It will shortly be used to gate the bitops unit tests on all architectures.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Wed, 29 May 2024 07:57:28 +0000 (09:57 +0200)]
x86: address violations of MISRA C Rule 8.4
Rule 8.4 states: "A compatible declaration shall be visible when an
object or function with external linkage is defined."
These variables are only referenced from assembly code, so they need to
be extern and there is negligible risk of them being used improperly
without noticing.
As a result, they can be exempted using a comment-based deviation.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Wed, 29 May 2024 07:56:57 +0000 (09:56 +0200)]
x86/MCE: optional build of AMD/Intel MCE code
Separate Intel/AMD-specific MCE code using CONFIG_{INTEL,AMD} config options.
Now we can avoid build of mcheck code if support for specific platform is
intentionally disabled by configuration.
Also global variables lmce_support & cmci_support from Intel-specific
mce_intel.c have to moved to common mce.c, as they get checked in common code.
Sergiy Kibrik [Wed, 29 May 2024 07:56:15 +0000 (09:56 +0200)]
x86/MCE: add default switch case in init_nonfatal_mce_checker()
The default switch case block is wanted here, to handle situation
e.g. of unexpected c->x86_vendor value -- then no mcheck init is done, but
misleading message still gets logged anyway.
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Sergiy Kibrik [Wed, 29 May 2024 07:54:22 +0000 (09:54 +0200)]
x86/intel: move vmce_has_lmce() routine to header
Moving this function out of mce_intel.c will make it possible to disable
build of Intel MCE code later on, because the function gets called from
common x86 code.
Also replace boilerplate code that checks for MCG_LMCE_P flag with
vmce_has_lmce(), which might contribute to readability a bit.
Andrew Cooper [Tue, 28 May 2024 15:29:11 +0000 (16:29 +0100)]
x86/svm: Rework VMCB_ACCESSORS() to use a plain type name
This avoids having a function call in a typeof() expression.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Nicola Vetrini [Tue, 28 May 2024 06:52:27 +0000 (08:52 +0200)]
x86/traps: address violation of MISRA C Rule 8.4
Rule 8.4 states: "A compatible declaration shall be visible when
an object or function with external linkage is defined".
The function do_general_protection is either used is asm code
or only within this unit, so there is no risk of this getting
out of sync with its definition, but the function must remain
extern.
Therefore, this function is deviated using a comment-based deviation.
No functional change.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Tue, 28 May 2024 06:52:15 +0000 (08:52 +0200)]
CHANGELOG: Mention libxl blktap/tapback support
Add entry for backendtype=tap support in libxl. blktap needs some
changes to work with libxl, which haven't been merged. They are
available from this PR: https://github.com/xapi-project/blktap/pull/394
Signed-off-by: Jason Andryuk <jason.andryuk@amd.com> Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Henry Wang [Thu, 23 May 2024 07:40:39 +0000 (15:40 +0800)]
tools: Introduce the "xl dt-overlay attach" command
With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach (in the future also detach) devices from the provided DT overlay
to domains. Support this by introducing a new "xl dt-overlay" command
and related documentation, i.e. "xl dt-overlay attach. Slightly rework
the command option parsing logic.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Henry Wang [Thu, 23 May 2024 07:40:36 +0000 (15:40 +0800)]
xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains
In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. It is OK to change the sysctl behavior
as this feature is experimental so changing sysctl behavior and breaking
compatibility is OK.
Add the XEN_DOMCTL_dt_overlay with operations
XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.
The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources. All devices in the overlay are
assigned to a single domain.
Also take the opportunity to make one coding style fix in sysctl.h.
Introduce DT_OVERLAY_MAX_SIZE and use it to avoid repetitions of
KB(500).
xen,reg is to be used to handle non-1:1 mappings but it is currently
unsupported. For now return errors for not-1:1 mapped domains.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Vikram Garhwal <fnu.vikram@xilinx.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:35 +0000 (15:40 +0800)]
xen/arm/gic: Allow adding interrupt to running VMs
Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.
Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:34 +0000 (15:40 +0800)]
tools/arm: Introduce the "nr_spis" xl config entry
Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.
Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.
Update the doc and the golang bindings accordingly.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Henry Wang [Thu, 23 May 2024 07:40:33 +0000 (15:40 +0800)]
xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.
Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.
Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Thu, 23 May 2024 07:40:32 +0000 (15:40 +0800)]
tools/xl: Correct the help information and exit code of the dt-overlay command
Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.
Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.
Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Suggested-by: Anthony PERARD <anthony@xenproject.org> Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Jason Andryuk <jason.andryuk@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
George Dunlap [Fri, 26 Apr 2024 13:17:33 +0000 (14:17 +0100)]
tools/xenalyze: Ignore HVM_EMUL events harder
To unify certain common sanity checks, checks are done very early in
processing based only on the top-level type.
Unfortunately, when TRC_HVM_EMUL was introduced, it broke some of the
assumptions about how the top-level types worked. Namely, traces of
this type will show up outside of HVM contexts: in idle domains and in
PV domains.
Make an explicit exception for TRC_HVM_EMUL types in a number of places:
- Pass the record info pointer to toplevel_assert_check, so that it
can exclude TRC_HVM_EMUL records from idle and vcpu data_mode
checks
- Don't attempt to set the vcpu data_type in hvm_process for
TRC_HVM_EMUL records.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Thu, 25 Apr 2024 12:03:58 +0000 (13:03 +0100)]
x86/hvm/trace: Use a different trace type for AMD processors
A long-standing usability sub-optimality with xenalyze is the
necessity to specify `--svm-mode` when analyzing AMD processors. This
fundamentally comes about because the same trace event ID is used for
both VMX and SVM, but the contents of the trace must be interpreted
differently.
Instead, allocate separate trace events for VMX and SVM vmexits in
Xen; this will allow all readers to properly interpret the meaning of
the vmexit reason.
In xenalyze, first remove the redundant call to init_hvm_data();
there's no way to get to hvm_vmexit_process() without it being already
initialized by the set_vcpu_type call in hvm_process().
Replace this with set_hvm_exit_reson_data(), and move setting of
hvm->exit_reason_* into that function.
Modify hvm_process and hvm_vmexit_process to handle all four potential
values appropriately.
If SVM entries are encountered, set opt.svm_mode so that other
SVM-specific functionality is triggered.
Remove the `--svm-mode` command-line option, since it's now redundant.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Henry Wang [Thu, 21 Mar 2024 03:57:06 +0000 (11:57 +0800)]
xen/arm: Set correct per-cpu cpu_core_mask
In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.
cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Signed-off-by: Henry Wang <xin.wang2@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
George Dunlap [Fri, 26 Apr 2024 14:18:25 +0000 (15:18 +0100)]
tools/xentrace: Remove xentrace_format
xentrace_format was always of limited utility, since trace records
across pcpus were processed out of order; it was superseded by xenalyze
over a decade ago.
But for several releases, the `formats` file it has depended on for
proper operation has not even been included in `make install` (which
generally means it doesn't get picked up by distros either); yet
nobody has seemed to complain.
Simple remove xentrace_format, and point people to xenalyze instead.
NB that there is no man page for xenalyze, so the "see also" on the
xentrace man page is simply removed for now.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Olaf Hering <olaf@aepfle.de>
Andrew Cooper [Thu, 25 Apr 2024 09:46:40 +0000 (10:46 +0100)]
tools: Drop libsystemd as a dependency
There are no more users, and we want to disuade people from introducing new
users just for sd_notify() and friends. Drop the dependency.
We still want the overall --with{,out}-systemd to gate the generation of the
service/unit/mount/etc files.
Rerun autogen.sh, and mark the dependency as removed in the build containers.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 25 Apr 2024 09:26:58 +0000 (10:26 +0100)]
tools/{c,o}xenstored: Don't link against libsystemd
Use the local freestanding wrapper instead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 16 May 2024 17:59:00 +0000 (18:59 +0100)]
tools: Import stand-alone sd_notify() implementation from systemd
... in order to avoid linking against the whole of libsystemd.
Only minimal changes to the upstream copy, to function as a drop-in
replacement for sd_notify() and as a header-only library.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Andrew Cooper [Thu, 16 May 2024 17:50:26 +0000 (18:50 +0100)]
LICENSES: Add MIT-0 (MIT No Attribution)
We are about to import code licensed under MIT-0. It's compatible for us to
use, so identify it as a permitted license.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Commit 634cfc8beb ("Make MEM_ACCESS configurable") intended to make
MEM_ACCESS configurable on Arm to reduce the code size when the user
doesn't need it.
However, this didn't cover the arch specific code. None of the code
in arm/mem_access.c is necessary when MEM_ACCESS=n, so it can be
compiled out. This will require to provide some stub for functions
called by the common code.
Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
vpci: add initial support for virtual PCI bus topology
Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.
Please note, that at the moment only function 0 of a multifunction
device can be passed through.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
vpci/header: emulate PCI_COMMAND register for guests
Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.
According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.
Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.
PCI_COMMAND_IO (bit 0)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if an I/O BAR is exposed to the guest.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
don't yet support I/O BARs for domUs.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_MEMORY (bit 1)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU sets this bit to 1 in
hardware if a Memory BAR is exposed to the guest.
Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
regions.
Xen domU: For devices assigned to DomUs, memory decoding will be
disabled at the time of initialization.
PCI_COMMAND_MASTER (bit 2)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_SPECIAL (bit 3)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_INVALIDATE (bit 4)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_VGA_PALETTE (bit 5)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: Pass through writes to hardware.
Xen domU/dom0: Pass through writes to hardware.
PCI_COMMAND_PARITY (bit 6)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_WAIT (bit 7)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: hardwire to 0
QEMU: res_mask
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_SERR (bit 8)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_FAST_BACK (bit 9)
PCIe 6.1: RO, hardwire to 0
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware.
Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
Xen dom0: We allow dom0 to control this bit freely.
PCI_COMMAND_INTX_DISABLE (bit 10)
PCIe 6.1: RW
PCI LB 3.0: RW
QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
writes do not propagate to hardware. QEMU checks if INTx was mapped
for a device. If it is not, then guest can't control
PCI_COMMAND_INTX_DISABLE bit.
Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
Xen dom0: We allow dom0 to control this bit freely.
Bits 11-15
PCIe 6.1: RsvdP
PCI LB 3.0: Reserved
QEMU: res_mask
Xen domU: rsvdp_mask
Xen dom0: We allow dom0 to control these bits freely.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>