]> xenbits.xensource.com Git - people/tklengyel/xen.git/log
people/tklengyel/xen.git
10 months agoAdd scripts/oss-fuzz/build.sh x86_instr_fuzz
Tamas K Lengyel [Thu, 20 Jun 2024 20:14:22 +0000 (20:14 +0000)]
Add scripts/oss-fuzz/build.sh

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
10 months agoAdd libfuzzer target to fuzz/x86_instruction_emulator
Tamas K Lengyel [Thu, 20 Jun 2024 20:09:57 +0000 (20:09 +0000)]
Add libfuzzer target to fuzz/x86_instruction_emulator

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
10 months agoRevert "xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor"
Julien Grall [Wed, 19 Jun 2024 11:48:09 +0000 (12:48 +0100)]
Revert "xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor"

Michal reported that the gitlab CI is failing because of this series
[1].

This reverts commit 6f9d90ea943b5e0c5d11a71090c49bbfd79e97ea.

Signed-off-by: Julien Grall <jgrall@amazon.com>
10 months agoRevert "docs/features/dom0less: Update the late XenStore init protocol"
Julien Grall [Wed, 19 Jun 2024 11:47:28 +0000 (12:47 +0100)]
Revert "docs/features/dom0less: Update the late XenStore init protocol"

Michal reported that the gitlab CI is failing because of this series
[1].

This reverts commit 53c5c99e8744495395c1274595d6ca55947d1d6a.

[1] https://gitlab.com/xen-project/xen/-/pipelines/1338067978

Signed-off-by: Julien Grall <jgrall@amazon.com>
10 months agoxen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure
Michal Orzel [Wed, 19 Jun 2024 06:46:52 +0000 (08:46 +0200)]
xen/arm: static-shmem: fix "gbase/pbase used uninitialized" build failure

Building Xen with CONFIG_STATIC_SHM=y results in a build failure:

arch/arm/static-shmem.c: In function 'process_shm':
arch/arm/static-shmem.c:327:41: error: 'gbase' may be used uninitialized [-Werror=maybe-uninitialized]
  327 |         if ( is_domain_direct_mapped(d) && (pbase != gbase) )
arch/arm/static-shmem.c:305:17: note: 'gbase' was declared here
  305 |         paddr_t gbase, pbase, psize;

This is because the commit cb1ddafdc573 adds a check referencing
gbase/pbase variables which were not yet assigned a value. Fix it.

Fixes: cb1ddafdc573 ("xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
10 months agodocs/features/dom0less: Update the late XenStore init protocol
Henry Wang [Fri, 24 May 2024 22:55:22 +0000 (15:55 -0700)]
docs/features/dom0less: Update the late XenStore init protocol

With the new allocation strategy of Dom0less DomUs XenStore page,
update the doc of the late XenStore init protocol accordingly.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
10 months agoxen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor
Henry Wang [Fri, 24 May 2024 22:55:20 +0000 (15:55 -0700)]
xen/arm: Alloc XenStore page for Dom0less DomUs from hypervisor

There are use cases (for example using the PV driver) in Dom0less
setup that require Dom0less DomUs start immediately with Dom0, but
initialize XenStore later after Dom0's successful boot and call to
the init-dom0less application.

An error message can seen from the init-dom0less application on
1:1 direct-mapped domains:
```
Allocating magic pages
memory.c:238:d0v0 mfn 0x39000 doesn't belong to d1
Error on alloc magic pages
```

The "magic page" is a terminology used in the toolstack as reserved
pages for the VM to have access to virtual platform capabilities.
Currently the magic pages for Dom0less DomUs are populated by the
init-dom0less app through populate_physmap(), and populate_physmap()
automatically assumes gfn == mfn for 1:1 direct mapped domains. This
cannot be true for the magic pages that are allocated later from the
init-dom0less application executed in Dom0. For domain using statically
allocated memory but not 1:1 direct-mapped, similar error "failed to
retrieve a reserved page" can be seen as the reserved memory list is
empty at that time.

Since for init-dom0less, the magic page region is only for XenStore.
To solve above issue, this commit allocates the XenStore page for
Dom0less DomUs at the domain construction time. The PFN will be
noted and communicated to the init-dom0less application executed
from Dom0. To keep the XenStore late init protocol, set the connection
status to XENSTORE_RECONNECT.

Currently the GUEST_MAGIC_BASE in the init-dom0less application is
hardcoded, which will lead to failures for 1:1 direct-mapped Dom0less
DomUs.

Since the guest magic region allocation from init-dom0less is for
XenStore, and the XenStore page is now allocated from the hypervisor,
instead of hardcoding the guest magic pages region, use
xc_hvm_param_get() to get the XenStore page PFN. Rename alloc_xs_page()
to get_xs_page() to reflect the changes.

With this change, some existing code is not needed anymore, including:
(1) The definition of the XenStore page offset.
(2) Call to xc_domain_setmaxmem() and xc_clear_domain_page() as we
    don't need to set the max mem and clear the page anymore.
(3) Foreign mapping of the XenStore page, setting of XenStore interface
    status and HVM_PARAM_STORE_PFN from init-dom0less, as they are set
    by the hypervisor.

Take the opportunity to do some coding style improvements when possible.

Reported-by: Alec Kwapis <alec.kwapis@medtronic.com>
Suggested-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
10 months agoxen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains
Henry Wang [Wed, 19 Jun 2024 00:27:51 +0000 (17:27 -0700)]
xen/arm/static-shmem: Static-shmem should be direct-mapped for direct-mapped domains

Currently, users are allowed to map static shared memory in a
non-direct-mapped way for direct-mapped domains. This can lead to
clashing of guest memory spaces. Also, the current extended region
finding logic only removes the host physical addresses of the
static shared memory areas for direct-mapped domains, which may be
inconsistent with the guest memory map if users map the static
shared memory in a non-direct-mapped way. This will lead to incorrect
extended region calculation results.

To make things easier, add restriction that static shared memory
should also be direct-mapped for direct-mapped domains. Check the
host physical address to be matched with guest physical address when
parsing the device tree. Document this restriction in the doc.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>
10 months agoxen/ubsan: Fix UB in type_descriptor declaration
Andrew Cooper [Mon, 17 Jun 2024 17:40:32 +0000 (18:40 +0100)]
xen/ubsan: Fix UB in type_descriptor declaration

struct type_descriptor is arranged with a NUL terminated string following the
kind/info fields.

The only reason this doesn't trip UBSAN detection itself (on more modern
compilers at least) is because struct type_descriptor is only referenced in
suppressed regions.

Switch the declaration to be a real flexible member.  No functional change.

Fixes: 00fcf4dd8eb4 ("xen/ubsan: Import ubsan implementation from Linux 4.13")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agox86/irq: handle moving interrupts in _assign_irq_vector()
Roger Pau Monné [Tue, 18 Jun 2024 13:15:10 +0000 (15:15 +0200)]
x86/irq: handle moving interrupts in _assign_irq_vector()

Currently there's logic in fixup_irqs() that attempts to prevent
_assign_irq_vector() from failing, as fixup_irqs() is required to evacuate all
interrupts from the CPUs not present in the input mask.  The current logic in
fixup_irqs() is incomplete, as it doesn't deal with interrupts that have
move_cleanup_count > 0 and a non-empty ->arch.old_cpu_mask field.

Instead of attempting to fixup the interrupt descriptor in fixup_irqs() so that
_assign_irq_vector() cannot fail, introduce logic in _assign_irq_vector()
to deal with interrupts that have either move_{in_progress,cleanup_count} set
and no remaining online CPUs in ->arch.cpu_mask.

If _assign_irq_vector() is requested to move an interrupt in the state
described above, first attempt to see if ->arch.old_cpu_mask contains any valid
CPUs that could be used as fallback, and if that's the case do move the
interrupt back to the previous destination.  Note this is easier because the
vector hasn't been released yet, so there's no need to allocate and setup a new
vector on the destination.

Due to the logic in fixup_irqs() that clears offline CPUs from
->arch.old_cpu_mask (and releases the old vector if the mask becomes empty) it
shouldn't be possible to get into _assign_irq_vector() with
->arch.move_{in_progress,cleanup_count} set but no online CPUs in
->arch.old_cpu_mask.

However if ->arch.move_{in_progress,cleanup_count} is set and the interrupt has
also changed affinity, it's possible the members of ->arch.old_cpu_mask are no
longer part of the affinity set, move the interrupt to a different CPU part of
the provided mask and keep the current ->arch.old_{cpu_mask,vector} for the
pending interrupt movement to be completed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 months agox86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs()
Roger Pau Monné [Tue, 18 Jun 2024 13:14:49 +0000 (15:14 +0200)]
x86/irq: deal with old_cpu_mask for interrupts in movement in fixup_irqs()

Given the current logic it's possible for ->arch.old_cpu_mask to get out of
sync: if a CPU set in old_cpu_mask is offlined and then onlined
again without old_cpu_mask having been updated the data in the mask will no
longer be accurate, as when brought back online the CPU will no longer have
old_vector configured to handle the old interrupt source.

If there's an interrupt movement in progress, and the to be offlined CPU (which
is the call context) is in the old_cpu_mask, clear it and update the mask, so
it doesn't contain stale data.

Note that when the system is going down fixup_irqs() will be called by
smp_send_stop() from CPU 0 with a mask with only CPU 0 on it, effectively
asking to move all interrupts to the current caller (CPU 0) which is the only
CPU to remain online.  In that case we don't care to migrate interrupts that
are in the process of being moved, as it's likely we won't be able to move all
interrupts to CPU 0 due to vector shortage anyway.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
10 months agox86/Intel: unlock CPUID earlier for the BSP
Jan Beulich [Tue, 18 Jun 2024 13:12:44 +0000 (15:12 +0200)]
x86/Intel: unlock CPUID earlier for the BSP

Intel CPUs have a MSR bit to limit CPUID enumeration to leaf two. If
this bit is set by the BIOS then CPUID evaluation does not work when
data from any leaf greater than two is needed; early_cpu_init() in
particular wants to collect leaf 7 data.

Cure this by unlocking CPUID right before evaluating anything which
depends on the maximum CPUID leaf being greater than two.

Inspired by (and description cloned from) Linux commit 0c2f6d04619e
("x86/topology/intel: Unlock CPUID before evaluating anything").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoautomation/eclair_analysis: add more clean MISRA guidelines
Nicola Vetrini [Fri, 7 Jun 2024 20:13:18 +0000 (22:13 +0200)]
automation/eclair_analysis: add more clean MISRA guidelines

Rules 20.12 and 14.4 are now clean on ARM and x86, so they are added
to the list of clean guidelines.

Some guidelines listed in the additional clean section for ARM are also
clean on x86, so they can be removed from there.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
[stefano: remove 20.9 from commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
10 months agoautomation/eclair_analysis: address remaining violations of MISRA C Rule 20.12
Nicola Vetrini [Fri, 7 Jun 2024 20:13:17 +0000 (22:13 +0200)]
automation/eclair_analysis: address remaining violations of MISRA C Rule 20.12

The DEFINE macro in asm-offsets.c (for all architectures) still generates
violations despite the file(s) being excluded from compliance, due to the
fact that in its expansion it sometimes refers entities in non-excluded files.
These corner cases are deviated by the configuration.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
10 months agoxen/docs: Describe static shared memory when host address is not provided
Penny Zheng [Fri, 24 May 2024 12:40:55 +0000 (13:40 +0100)]
xen/docs: Describe static shared memory when host address is not provided

This commit describe the new scenario where host address is not provided
in "xen,shared-mem" property and a new example is added to the page to
explain in details.

Take the occasion to fix some typos in the page.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
10 months agoxen/arm: Implement the logic for static shared memory from Xen heap
Luca Fancellu [Fri, 24 May 2024 12:40:54 +0000 (13:40 +0100)]
xen/arm: Implement the logic for static shared memory from Xen heap

This commit implements the logic to have the static shared memory banks
from the Xen heap instead of having the host physical address passed from
the user.

When the host physical address is not supplied, the physical memory is
taken from the Xen heap using allocate_domheap_memory, the allocation
needs to occur at the first handled DT node and the allocated banks
need to be saved somewhere.

Introduce the 'shm_heap_banks' for that reason, a struct that will hold
the banks allocated from the heap, its field bank[].shmem_extra will be
used to point to the bootinfo shared memory banks .shmem_extra space, so
that there is not further allocation of memory and every bank in
shm_heap_banks can be safely identified by the shm_id to reconstruct its
traceability and if it was allocated or not.

A search into 'shm_heap_banks' will reveal if the banks were allocated
or not, in case the host address is not passed, and the callback given
to allocate_domheap_memory will store the banks in the structure and
map them to the current domain, to do that, some changes to
acquire_shared_memory_bank are made to let it differentiate if the bank
is from the heap and if it is, then assign_pages is called for every
bank.

When the bank is already allocated, for every bank allocated with the
corresponding shm_id, handle_shared_mem_bank is called and the mapping
are done.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
10 months agoxen/arm: Rework heap page allocation outside allocate_bank_memory
Luca Fancellu [Fri, 24 May 2024 12:40:53 +0000 (13:40 +0100)]
xen/arm: Rework heap page allocation outside allocate_bank_memory

The function allocate_bank_memory allocates pages from the heap and
maps them to the guest using guest_physmap_add_page.

As a preparation work to support static shared memory bank when the
host physical address is not provided, Xen needs to allocate memory
from the heap, so rework allocate_bank_memory moving out the page
allocation in a new function called allocate_domheap_memory.

The function allocate_domheap_memory takes a callback function and
a pointer to some extra information passed to the callback and this
function will be called for every region, until a defined size is
reached.

In order to keep allocate_bank_memory functionality, the callback
passed to allocate_domheap_memory is a wrapper for
guest_physmap_add_page.

Let allocate_domheap_memory be externally visible, in order to use
it in the future from the static shared memory module.

Take the opportunity to change the signature of allocate_bank_memory
and remove the 'struct domain' parameter, which can be retrieved from
'struct kernel_info'.

No functional changes is intended.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
10 months agoxen/arm: Parse xen,shared-mem when host phys address is not provided
Luca Fancellu [Fri, 24 May 2024 12:40:52 +0000 (13:40 +0100)]
xen/arm: Parse xen,shared-mem when host phys address is not provided

Handle the parsing of the 'xen,shared-mem' property when the host physical
address is not provided, this commit is introducing the logic to parse it,
but the functionality is still not implemented and will be part of future
commits.

Rework the logic inside process_shm_node to check the shm_id before doing
the other checks, because it ease the logic itself, add more comment on
the logic.
Now when the host physical address is not provided, the value
INVALID_PADDR is chosen to signal this condition and it is stored as
start of the bank, due to that change also early_print_info_shmem and
init_sharedmem_pages are changed, to not handle banks with start equal
to INVALID_PADDR.

Another change is done inside meminfo_overlap_check, to skip banks that
are starting with the start address INVALID_PADDR, that function is used
to check banks from reserved memory, shared memory and ACPI and since
the comment above the function states that wrapping around is not handled,
it's unlikely for these bank to have the start address as INVALID_PADDR.
Same change is done inside consider_modules, find_unallocated_memory and
dt_unreserved_regions functions, in order to skip banks that starts with
INVALID_PADDR from any computation.
The changes above holds because of this consideration.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
10 months agoxen/p2m: put reference for level 2 superpage
Penny Zheng [Tue, 28 May 2024 12:56:03 +0000 (13:56 +0100)]
xen/p2m: put reference for level 2 superpage

We are doing foreign memory mapping for static shared memory, and
there is a great possibility that it could be super mapped.
But today, p2m_put_l3_page could not handle superpages.

This commits implements a new function p2m_put_l2_superpage to handle
level 2 superpages, specifically for helping put extra references for
foreign superpages.

Modify relinquish_p2m_mapping as well to take into account preemption
when we have a level-2 foreign mapping.

Currently level 1 superpages are not handled because Xen is not
preemptible and therefore some work is needed to handle such superpages,
for which at some point Xen might end up freeing memory and therefore
for such a big mapping it could end up in a very long operation.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
10 months agoxen/arm: Wrap shared memory mapping code in one function
Luca Fancellu [Fri, 24 May 2024 12:40:50 +0000 (13:40 +0100)]
xen/arm: Wrap shared memory mapping code in one function

Wrap the code and logic that is calling assign_shared_memory
and map_regions_p2mt into a new function 'handle_shared_mem_bank',
it will become useful later when the code will allow the user to
don't pass the host physical address.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
10 months agoxen/arm: Lookup bootinfo shm bank during the mapping
Luca Fancellu [Fri, 24 May 2024 12:40:49 +0000 (13:40 +0100)]
xen/arm: Lookup bootinfo shm bank during the mapping

The current static shared memory code is using bootinfo banks when it
needs to find the number of borrowers, so every time assign_shared_memory
is called, the bank is searched in the bootinfo.shmem structure.

There is nothing wrong with it, however the bank can be used also to
retrieve the start address and size and also to pass less argument to
assign_shared_memory. When retrieving the information from the bootinfo
bank, it's also possible to move the checks on alignment to
process_shm_node in the early stages.

So create a new function find_shm_bank_by_id() which takes a
'struct shared_meminfo' structure and the shared memory ID, to look for a
bank with a matching ID, take the physical host address and size from the
bank, pass the bank to assign_shared_memory() removing the now unnecessary
arguments and finally remove the acquire_nr_borrower_domain() function
since now the information can be extracted from the passed bank.
Move the "xen,shm-id" parsing early in process_shm to bail out quickly in
case of errors (unlikely), as said above, move the checks on alignment
to process_shm_node.

Drawback of this change is that now the bootinfo are used also when the
bank doesn't need to be allocated, however it will be convenient later
to use it as an argument for assign_shared_memory when dealing with
the use case where the Host physical address is not supplied by the user.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
10 months agox86/EPT: drop questionable mfn_valid() from epte_get_entry_emt()
Jan Beulich [Thu, 13 Jun 2024 14:55:22 +0000 (16:55 +0200)]
x86/EPT: drop questionable mfn_valid() from epte_get_entry_emt()

mfn_valid() is RAM-focused; it will often return false for MMIO. Yet
access to actual MMIO space should not generally be restricted to UC
only; especially video frame buffer accesses are unduly affected by such
a restriction.

Since, as of 777c71d31325 ("x86/EPT: avoid marking non-present entries
for re-configuring"), the function won't be called with INVALID_MFN or,
worse, truncated forms thereof anymore, we call fully drop that check.

Fixes: 81fd0d3ca4b2 ("x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agox86/EPT: avoid marking non-present entries for re-configuring
Jan Beulich [Thu, 13 Jun 2024 14:54:17 +0000 (16:54 +0200)]
x86/EPT: avoid marking non-present entries for re-configuring

For non-present entries EMT, like most other fields, is meaningless to
hardware. Make the logic in ept_set_entry() setting the field (and iPAT)
conditional upon dealing with a present entry, leaving the value at 0
otherwise. This has two effects for epte_get_entry_emt() which we'll
want to leverage subsequently:
1) The call moved here now won't be issued with INVALID_MFN anymore (a
   respective BUG_ON() is being added).
2) Neither of the other two calls could now be issued with a truncated
   form of INVALID_MFN anymore (as long as there's no bug anywhere
   marking an entry present when that was populated using INVALID_MFN).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agox86/EPT: correct special page checking in epte_get_entry_emt()
Jan Beulich [Thu, 13 Jun 2024 14:53:34 +0000 (16:53 +0200)]
x86/EPT: correct special page checking in epte_get_entry_emt()

mfn_valid() granularity is (currently) 256Mb. Therefore the start of a
1Gb page passing the test doesn't necessarily mean all parts of such a
range would also pass. Yet using the result of mfn_to_page() on an MFN
which doesn't pass mfn_valid() checking is liable to result in a crash
(the invocation of mfn_to_page() alone is presumably "just" UB in such a
case).

Fixes: ca24b2ffdbd9 ("x86/hvm: set 'ipat' in EPT for special pages")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoxen/arm: ffa: support notification
Jens Wiklander [Mon, 10 Jun 2024 06:53:43 +0000 (08:53 +0200)]
xen/arm: ffa: support notification

Add support for FF-A notifications, currently limited to an SP (Secure
Partition) sending an asynchronous notification to a guest.

Guests and Xen itself are made aware of pending notifications with an
interrupt. The interrupt handler triggers a tasklet to retrieve the
notifications using the FF-A ABI and deliver them to their destinations.

Update ffa_partinfo_domain_init() to return error code like
ffa_notif_domain_init().

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoxen/arm: add and call tee_free_domain_ctx()
Jens Wiklander [Mon, 10 Jun 2024 06:53:42 +0000 (08:53 +0200)]
xen/arm: add and call tee_free_domain_ctx()

Add tee_free_domain_ctx() to the TEE mediator framework.
tee_free_domain_ctx() is called from arch_domain_destroy() to allow late
freeing of the d->arch.tee context. This will simplify access to
d->arch.tee for domains retrieved with rcu_lock_domain_by_id().

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoxen/arm: add and call init_tee_secondary()
Jens Wiklander [Mon, 10 Jun 2024 06:53:41 +0000 (08:53 +0200)]
xen/arm: add and call init_tee_secondary()

Add init_tee_secondary() to the TEE mediator framework and call it from
start_secondary() late enough that per-cpu interrupts can be configured
on CPUs as they are initialized. This is needed in later patches.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoxen/arm: allow dynamically assigned SGI handlers
Jens Wiklander [Mon, 10 Jun 2024 06:53:40 +0000 (08:53 +0200)]
xen/arm: allow dynamically assigned SGI handlers

Updates so request_irq() can be used with a dynamically assigned SGI irq
as input. This prepares for a later patch where an FF-A schedule
receiver interrupt handler is installed for an SGI generated by the
secure world.

>From the Arm Base System Architecture v1.0C [1]:
"The system shall implement at least eight Non-secure SGIs, assigned to
interrupt IDs 0-7."

gic_route_irq_to_xen() don't gic_set_irq_type() for SGIs since they are
always edge triggered.

gic_interrupt() is updated to route the dynamically assigned SGIs to
do_IRQ() instead of do_sgi(). The latter still handles the statically
assigned SGI handlers like for instance GIC_SGI_CALL_FUNCTION.

[1] https://developer.arm.com/documentation/den0094/

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoxen/arm: ffa: simplify ffa_handle_mem_share()
Jens Wiklander [Mon, 10 Jun 2024 06:53:39 +0000 (08:53 +0200)]
xen/arm: ffa: simplify ffa_handle_mem_share()

Simplify ffa_handle_mem_share() by removing the start_page_idx and
last_page_idx parameters from get_shm_pages() and check that the number
of pages matches expectations at the end of get_shm_pages().

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoxen/arm: ffa: use ACCESS_ONCE()
Jens Wiklander [Mon, 10 Jun 2024 06:53:38 +0000 (08:53 +0200)]
xen/arm: ffa: use ACCESS_ONCE()

Replace read_atomic() with ACCESS_ONCE() to match the intended use, that
is, to prevent the compiler from (via optimization) reading shared
memory more than once.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoxen/arm: ffa: refactor ffa_handle_call()
Jens Wiklander [Mon, 10 Jun 2024 06:53:37 +0000 (08:53 +0200)]
xen/arm: ffa: refactor ffa_handle_call()

Refactors the large switch block in ffa_handle_call() to use common code
for the simple case where it's either an error code or success with no
further parameters.

Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agox86/physdev: replace physdev_{,un}map_pirq() checking against DOMID_SELF
Jan Beulich [Wed, 12 Jun 2024 12:31:21 +0000 (14:31 +0200)]
x86/physdev: replace physdev_{,un}map_pirq() checking against DOMID_SELF

It's hardly ever correct to check for just DOMID_SELF, as guests have
ways to figure out their domain IDs and hence could instead use those as
inputs to respective hypercalls. Note, however, that for ordinary DomU-s
the adjustment is relaxing things rather than tightening them, since
- as a result of XSA-237 - the respective XSM checks would have rejected
self (un)mapping attempts for other than the control domain.

Since in physdev_map_pirq() handling overall is a little easier this
way, move obtaining of the domain pointer into the caller. Doing the
same for physdev_unmap_pirq() is just to keep both consistent in this
regard.

Fixes: 0b469cd68708 ("Interrupt remapping to PIRQs in HVM guests")
Fixes: 9e1a3415b773 ("x86: fixes after emuirq changes")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agox86/irq: limit interrupt movement done by fixup_irqs()
Roger Pau Monné [Wed, 12 Jun 2024 12:30:40 +0000 (14:30 +0200)]
x86/irq: limit interrupt movement done by fixup_irqs()

The current check used in fixup_irqs() to decide whether to move around
interrupts is based on the affinity mask, but such mask can have all bits set,
and hence is unlikely to be a subset of the input mask.  For example if an
interrupt has an affinity mask of all 1s, any input to fixup_irqs() that's not
an all set CPU mask would cause that interrupt to be shuffled around
unconditionally.

What fixup_irqs() care about is evacuating interrupts from CPUs not set on the
input CPU mask, and for that purpose it should check whether the interrupt is
assigned to a CPU not present in the input mask.  Assume that ->arch.cpu_mask
is a subset of the ->affinity mask, and keep the current logic that resets the
->affinity mask if the interrupt has to be shuffled around.

Doing the affinity movement based on ->arch.cpu_mask requires removing the
special handling to ->arch.cpu_mask done for high priority vectors, otherwise
the adjustment done to cpu_mask makes them always skip the CPU interrupt
movement.

While there also adjust the comment as to the purpose of fixup_irqs().

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agox86/irq: describe how the interrupt CPU movement works
Roger Pau Monné [Wed, 12 Jun 2024 12:30:06 +0000 (14:30 +0200)]
x86/irq: describe how the interrupt CPU movement works

The logic to move interrupts across CPUs is complex, attempt to provide a
comment that describes the expected behavior so users of the interrupt system
have more context about the usage of the arch_irq_desc structure fields.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agox86/smp: do not use shorthand IPI destinations in CPU hot{,un}plug contexts
Roger Pau Monné [Wed, 12 Jun 2024 12:29:31 +0000 (14:29 +0200)]
x86/smp: do not use shorthand IPI destinations in CPU hot{,un}plug contexts

Due to the current rwlock logic, if the CPU calling get_cpu_maps() does
so from a cpu_hotplug_{begin,done}() region the function will still
return success, because a CPU taking the rwlock in read mode after
having taken it in write mode is allowed.  Such corner case makes using
get_cpu_maps() alone not enough to prevent using the shorthand in CPU
hotplug regions.

Introduce a new helper to detect whether the current caller is between a
cpu_hotplug_{begin,done}() region and use it in send_IPI_mask() to restrict
shorthand usage.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoMAINTAINERS: alter EFI section
Jan Beulich [Wed, 12 Jun 2024 08:52:56 +0000 (10:52 +0200)]
MAINTAINERS: alter EFI section

To get past the recurring friction on the approach to take wrt
workarounds needed for various firmware flaws, I'm stepping down as the
maintainer of our code interfacing with EFI firmware. Two new
maintainers are being introduced in my place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Marek Marczykowski <marmarek@invisiblethingslab.com>
Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoMAINTAINERS: add me as scheduler maintainer
Juergen Gross [Wed, 12 Jun 2024 08:52:22 +0000 (10:52 +0200)]
MAINTAINERS: add me as scheduler maintainer

I've been active in the scheduling code since many years now. Add
me as a maintainer.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
10 months agoCI: Update FreeBSD to 13.3
Andrew Cooper [Tue, 11 Jun 2024 11:59:12 +0000 (12:59 +0100)]
CI: Update FreeBSD to 13.3

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agoautomation: add a test for HVM domU on PVH dom0
Marek Marczykowski-Górecki [Mon, 10 Jun 2024 13:32:09 +0000 (15:32 +0200)]
automation: add a test for HVM domU on PVH dom0

This tests if QEMU works in PVH dom0. QEMU in dom0 requires enabling TUN
in the kernel, so do that too.

Add it to both x86 runners, similar to the PVH domU test.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-Acked-By: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
10 months agox86/pvh: declare PVH dom0 supported with caveats
Roger Pau Monné [Mon, 10 Jun 2024 11:29:25 +0000 (13:29 +0200)]
x86/pvh: declare PVH dom0 supported with caveats

PVH dom0 is functionally very similar to PVH domU except for the domain
builder and the added set of hypercalls available to it.

The main concern with declaring it "Supported" is the lack of some features
when compared to classic PV dom0, hence switch it's status to supported with
caveats.  List the known missing features, there might be more features missing
or not working as expected apart from the ones listed.

Note there's some (limited) PVH dom0 testing on both osstest and gitlab.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
10 months agox86/domain: deviate violation of MISRA C Rule 20.12
Nicola Vetrini [Mon, 10 Jun 2024 08:34:05 +0000 (10:34 +0200)]
x86/domain: deviate violation of MISRA C Rule 20.12

MISRA C Rule 20.12 states: "A macro parameter used as an operand to
the # or ## operators, which is itself subject to further macro replacement,
shall only be used as an operand to these operators".

In this case, builds where CONFIG_COMPAT=y the fpu_ctxt
macro is used both as a regular macro argument and as an operand for
stringification in the expansion of CHECK_FIELD_.
This is deviated using a SAF-x-safe comment.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
10 months agox86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count
Roger Pau Monné [Mon, 10 Jun 2024 08:33:22 +0000 (10:33 +0200)]
x86/irq: remove offline CPUs from old CPU mask when adjusting move_cleanup_count

When adjusting move_cleanup_count to account for CPUs that are offline also
adjust old_cpu_mask, otherwise further calls to fixup_irqs() could subtract
those again and create an imbalance in move_cleanup_count.

Fixes: 472e0b74c5c4 ('x86/IRQ: deal with move cleanup count state in fixup_irqs()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 months agoxen: fix MISRA regressions on rule 20.9 and 20.12
Nicola Vetrini [Sat, 1 Jun 2024 10:16:56 +0000 (12:16 +0200)]
xen: fix MISRA regressions on rule 20.9 and 20.12

Commit ea59e7d780d9 ("xen/bitops: Cleanup and new infrastructure ahead of
rearrangements") introduced new violations on previously clean rules 20.9 and
20.12 (clean on ARM only, right now).

The first is introduced because CONFIG_CC_IS_CLANG in xen/self-tests.h is not
defined in the configuration under analysis. Using "defined()" instead avoids
relying on the preprocessor's behaviour upon encountering an undedfined identifier
and addresses the violation.

The violation of Rule 20.12 is due to "val" being used both as an ordinary argument
in macro RUNTIME_CHECK, and as a stringification operator.

No functional change.

Fixes: ea59e7d780d9 ("xen/bitops: Cleanup and new infrastructure ahead of rearrangements")
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agoxen/bitops: Rearrange the top of xen/bitops.h
Andrew Cooper [Fri, 24 May 2024 19:37:50 +0000 (20:37 +0100)]
xen/bitops: Rearrange the top of xen/bitops.h

The #include <asm/bitops.h> can move to the top of the file now now that
generic_ffs()/generic_fls() have been untangled.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/bitops: Clean up ffs64()/fls64() definitions
Andrew Cooper [Sat, 9 Mar 2024 02:44:56 +0000 (02:44 +0000)]
xen/bitops: Clean up ffs64()/fls64() definitions

Implement ffs64() and fls64() as plain static inlines, dropping the ifdefary
and intermediate generic_f?s64() forms.

Add tests for all interesting bit positions at 32bit boundaries.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/bitops: Implement fls()/flsl() in common logic
Oleksii Kurochko [Fri, 24 May 2024 15:49:53 +0000 (16:49 +0100)]
xen/bitops: Implement fls()/flsl() in common logic

This is most easily done together because of how arm32 is currently
structured, but it does just mirror the existing ffs()/ffsl() work.

Introduce compile and boot time testing.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/bitops: Implement ffsl() in common logic
Andrew Cooper [Wed, 31 Jan 2024 18:31:16 +0000 (18:31 +0000)]
xen/bitops: Implement ffsl() in common logic

... just as with ffs() previously.  Express the upper bound of the testing in
terms of BITS_PER_LONG as it varies between architectures.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agox86/bitops: Improve arch_ffs() in the general case
Andrew Cooper [Thu, 14 Mar 2024 23:31:11 +0000 (23:31 +0000)]
x86/bitops: Improve arch_ffs() in the general case

The asm in arch_ffs() is safe but inefficient.

CMOV would be an improvement over a conditional branch, but for 64bit CPUs
both Intel and AMD have provided enough details about the behaviour for a zero
input.  It is safe to pre-load the destination register with -1 and drop the
conditional logic.

However, it is common to find ffs() in a context where the optimiser knows
that x is non-zero even if it the value isn't known precisely.  In this case,
it's safe to drop the preload of -1 too.

There are only a handful of uses of ffs() in the x86 build, and all of them
improve as a result of this:

  add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-92 (-92)
  Function                                     old     new   delta
  mask_write                                   121     113      -8
  xmem_pool_alloc                             1076    1056     -20
  test_bitops                                  390     358     -32
  pt_update_contig_markers                    1236    1204     -32

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/bitops: Implement ffs() in common logic
Andrew Cooper [Wed, 31 Jan 2024 18:31:16 +0000 (18:31 +0000)]
xen/bitops: Implement ffs() in common logic

Perform constant-folding unconditionally, rather than having it implemented
inconsistency between architectures.

Confirm the expected behaviour with compile time and boot time tests.

For non-constant inputs, use arch_ffs() if provided but fall back to
generic_ffsl() if not.  In particular, RISC-V doesn't have a builtin that
works in all configurations.

For x86, rename ffs() to arch_ffs() and adjust the prototype.

For PPC, __builtin_ctz() is 1/3 of the size of size of the transform to
generic_fls().  Drop the definition entirely.  ARM too benefits in the general
case by using __builtin_ctz(), but less dramatically because it using
optimised asm().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/bitops: Implement generic_ffsl()/generic_flsl() in lib/
Andrew Cooper [Fri, 24 May 2024 12:36:25 +0000 (13:36 +0100)]
xen/bitops: Implement generic_ffsl()/generic_flsl() in lib/

generic_ffs()/generic_fls*( being static inline is the cause of lots of the
complexity between the common and arch-specific bitops.h

They appear to be static inline for constant-folding reasons (ARM), but there
are better ways to achieve the same effect.

It is presumptuous that an unrolled binary search is the right algorithm to
use on all microarchitectures.  Indeed, it's not for the eventual users, but
that can be addressed at a later point.

It is also nonsense to implement the int form as the base primitive and
construct the long form from 2x int in 64-bit builds, when it's just one extra
step to operate at the native register width.

Therefore, implement generic_ffsl()/generic_flsl() in lib/.  They're not
actually needed in x86/ARM/PPC by the end of the cleanup (i.e. the functions
will be dropped by the linker), and they're only expected be needed by RISC-V
on hardware which lacks the Zbb extension.

Implement generic_fls() in terms of generic_flsl() for now, but this will be
cleaned up in due course.

Provide basic runtime testing using __constructor inside the lib/ file.  This
is important, as it means testing runs if and only if generic_f?sl() are used
elsewhere in Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/bitops: Cleanup and new infrastructure ahead of rearrangements
Andrew Cooper [Fri, 8 Mar 2024 23:45:08 +0000 (23:45 +0000)]
xen/bitops: Cleanup and new infrastructure ahead of rearrangements

 * Rename __attribute_pure__ to just __pure before it gains users.
 * Introduce __constructor which is going to be used in lib/, and is
   unconditionally cf_check.
 * Identify the areas of xen/bitops.h which are a mess.
 * Introduce xen/self-tests.h as helpers for compile and boot time testing.
   This provides a statement of the ABI, and a confirmation that arch-specific
   implementations behave as expected.
 * Introduce HIDE() in macros.h.  While it's only used in self-tests.h for
   now, we're going to consolidate similar constructs in due course.

Sadly Clang 7 and older isn't happy with the compile time checks.  Skip them,
and just rely on the runtime checks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/bitops: Delete find_first_set_bit()
Andrew Cooper [Thu, 14 Mar 2024 20:38:44 +0000 (20:38 +0000)]
xen/bitops: Delete find_first_set_bit()

No more users.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoarch/irq: Centralise no_irq_type
Andrew Cooper [Thu, 30 May 2024 17:58:18 +0000 (18:58 +0100)]
arch/irq: Centralise no_irq_type

Having no_irq_type defined per arch, but using common callbacks is a mess, and
is particualrly hard to bootstrap a new architecture with.

Now that the ack()/end() hooks have been exported suitably, move the
definition of no_irq_type into common/irq.c, and make it const too for good
measure.

No functional change, but a whole lot less tangled.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoarch/irq: Make irq_ack_none() mandatory
Andrew Cooper [Thu, 30 May 2024 18:00:45 +0000 (19:00 +0100)]
arch/irq: Make irq_ack_none() mandatory

Any non-stub implementation is going to have to do something here.

The related hook irq_end_none() is more complicated and has arch-specific
interactions with irq_ack_none(), so make it optional.

For PPC, introduce a stub irq_ack_none().

For ARM and x86, export the existing {ack,end}_none() helpers, gaining an irq_
prefix for consisntency with everything else in no_irq_type.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoCI: Improve serial handling in qemu-smoke-ppc64le.sh
Andrew Cooper [Wed, 29 May 2024 13:21:12 +0000 (14:21 +0100)]
CI: Improve serial handling in qemu-smoke-ppc64le.sh

Have PPC put serial to stdout like all other tests, so it shows up in the main
job log.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoCI: Use a debug build of Xen for the Xilinx HW tests
Andrew Cooper [Wed, 29 May 2024 13:20:39 +0000 (14:20 +0100)]
CI: Use a debug build of Xen for the Xilinx HW tests

... like the other hardware tests.  This gets more value out of the testing.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/riscv: Update Kconfig in preparation for a full Xen build
Oleksii Kurochko [Wed, 29 May 2024 19:55:02 +0000 (21:55 +0200)]
xen/riscv: Update Kconfig in preparation for a full Xen build

Disable unnecessary configs for two cases:
1. By utilizing EXTRA_FIXED_RANDCONFIG for randconfig builds (GitLab CI jobs).
2. By using tiny64_defconfig for non-randconfig builds.

Only configs which lead to compilation issues were disabled.

Remove lines related to disablement of configs which aren't affected
compilation:
 -# CONFIG_SCHED_CREDIT is not set
 -# CONFIG_SCHED_RTDS is not set
 -# CONFIG_SCHED_NULL is not set
 -# CONFIG_SCHED_ARINC653 is not set
 -# CONFIG_TRACEBUFFER is not set
 -# CONFIG_HYPFS is not set
 -# CONFIG_SPECULATIVE_HARDEN_ARRAY is not set

Update argo.c to include asm/p2m.h directly, rather than on a transitive
dependency through asm/domain.h Update asm/p2m.h to include xen/errno.h,
rather than rely on it having included already.

CONFIG_XSM=n as it requires an introduction of:
* boot_module_find_by_kind()
* BOOTMOD_XSM
* struct bootmodule
* copy_from_paddr()
The mentioned things aren't introduced now.

CONFIG_BOOT_TIME_CPUPOOLS requires an introduction of cpu_physical_id() and
acpi_disabled, so it is disabled for now.

PERF_COUNTERS requires asm/perf.h and asm/perfc-defn.h, so it is
also disabled for now, as RISC-V hasn't introduced this headers yet.

LIVEPATCH isn't ready for RISC-V too and it can be overriden by randconfig,
so to avoid compilation errors for randconfig it is disabled for now.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Fix up common/argo.c rather than inserting a transitive dependency]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agox86/hvm: allow XENMEM_machine_memory_map
Roger Pau Monne [Thu, 30 May 2024 07:53:18 +0000 (09:53 +0200)]
x86/hvm: allow XENMEM_machine_memory_map

For HVM based control domains XENMEM_machine_memory_map must be available so
that the `e820_host` xl.cfg option can be used.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agoxen/bitops: Replace find_first_set_bit() with ffs()/ffsl() - 1
Andrew Cooper [Sat, 9 Mar 2024 02:22:53 +0000 (02:22 +0000)]
xen/bitops: Replace find_first_set_bit() with ffs()/ffsl() - 1

find_first_set_bit() is a Xen-ism which has undefined behaviour with a 0
input.  The latter is well defined with an input of 0, and is a found outside
of Xen too.

timer_sanitize_int_route(), pt_update_contig_markers() and
set_iommu_ptes_present() are all already operating on unsigned int data, so
switch straight to ffs().

The ffsl() in pvh_populate_memory_range() needs coercion to unsigned to keep
the typecheck in min() happy in the short term.

_init_heap_pages() is comparing the LSB of two different addresses, so the -1
cancels off both sides of the expression.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/page_alloc: Coerce min(flsl(), foo) expressions to being unsigned
Andrew Cooper [Fri, 24 May 2024 12:36:15 +0000 (13:36 +0100)]
xen/page_alloc: Coerce min(flsl(), foo) expressions to being unsigned

This is in order to maintain bisectability through the subsequent changes,
where flsl() changes sign-ness non-atomically by architecture.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoppc/boot: Run constructors on boot
Andrew Cooper [Fri, 24 May 2024 10:38:37 +0000 (11:38 +0100)]
ppc/boot: Run constructors on boot

PPC collects constructors, but doesn't run them yet.  Do so.

They'll shortly be used to confirm correct behaviour of the bitops primitives.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agotools: (Actually) drop libsystemd as a dependency
Andrew Cooper [Thu, 30 May 2024 10:02:16 +0000 (11:02 +0100)]
tools: (Actually) drop libsystemd as a dependency

When reinstating some of systemd.m4 between v1 and v2, I reintroduced a little
too much.  While {c,o}xenstored are indeed no longer linked against
libsystemd, ./configure still looks for it.

Drop this too.

Fixes: ae26101f6bfc ("tools: Drop libsystemd as a dependency")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agoPartial revert of "x86/MCE: optional build of AMD/Intel MCE code"
Andrew Cooper [Wed, 29 May 2024 14:11:45 +0000 (16:11 +0200)]
Partial revert of "x86/MCE: optional build of AMD/Intel MCE code"

{cmci,lmce}_support are written during S3 resume, so cannot live in
__ro_after_init.  Move them back to being __read_mostly, as they were
originally.

Link: https://gitlab.com/xen-project/xen/-/jobs/6966698361
Fixes: 19b6e9f9149f ("x86/MCE: optional build of AMD/Intel MCE code")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/x86: remove foreign mappings from the p2m on teardown
Roger Pau Monné [Wed, 29 May 2024 14:11:19 +0000 (16:11 +0200)]
xen/x86: remove foreign mappings from the p2m on teardown

Iterate over the p2m up to the maximum recorded gfn and remove any foreign
mappings, in order to drop the underlying page references and thus don't keep
extra page references if a domain is destroyed while still having foreign
mappings on it's p2m.

The logic is similar to the one used on Arm.

Note that foreign mappings cannot be created by guests that have altp2m or
nested HVM enabled, as p2ms different than the host one are not currently
scrubbed when destroyed in order to drop references to any foreign maps.

It's unclear whether the right solution is to take an extra reference when
foreign maps are added to p2ms different than the host one, or just rely on the
host p2m already having a reference.  The mapping being removed from the host
p2m should cause it to be dropped on all domain p2ms.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen: enable altp2m at create domain domctl
Roger Pau Monné [Wed, 29 May 2024 14:10:04 +0000 (16:10 +0200)]
xen: enable altp2m at create domain domctl

Enabling it using an HVM param is fragile, and complicates the logic when
deciding whether options that interact with altp2m can also be enabled.

Leave the HVM param value for consumption by the guest, but prevent it from
being set.  Enabling is now done using and additional altp2m specific field in
xen_domctl_createdomain.

Note that albeit only currently implemented in x86, altp2m could be implemented
in other architectures, hence why the field is added to xen_domctl_createdomain
instead of xen_arch_domainconfig.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com> # hypervisor
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com> # tools/libs/
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/x86: account number of foreign mappings in the p2m
Roger Pau Monné [Wed, 29 May 2024 14:07:55 +0000 (16:07 +0200)]
xen/x86: account number of foreign mappings in the p2m

Such information will be needed in order to remove foreign mappings during
teardown for HVM guests.

Right now the introduced counter is not consumed.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen: Introduce CONFIG_SELF_TESTS
Andrew Cooper [Tue, 28 May 2024 14:11:54 +0000 (15:11 +0100)]
xen: Introduce CONFIG_SELF_TESTS

... and move x86's stub_selftest() under this new option.

There is value in having these tests included in release builds too.

It will shortly be used to gate the bitops unit tests on all architectures.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agox86: address violations of MISRA C Rule 8.4
Nicola Vetrini [Wed, 29 May 2024 07:57:28 +0000 (09:57 +0200)]
x86: address violations of MISRA C Rule 8.4

Rule 8.4 states: "A compatible declaration shall be visible when an
object or function with external linkage is defined."

These variables are only referenced from assembly code, so they need to
be extern and there is negligible risk of them being used improperly
without noticing.

As a result, they can be exempted using a comment-based deviation.
No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/MCE: optional build of AMD/Intel MCE code
Sergiy Kibrik [Wed, 29 May 2024 07:56:57 +0000 (09:56 +0200)]
x86/MCE: optional build of AMD/Intel MCE code

Separate Intel/AMD-specific MCE code using CONFIG_{INTEL,AMD} config options.
Now we can avoid build of mcheck code if support for specific platform is
intentionally disabled by configuration.

Also global variables lmce_support & cmci_support from Intel-specific
mce_intel.c have to moved to common mce.c, as they get checked in common code.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/MCE: add default switch case in init_nonfatal_mce_checker()
Sergiy Kibrik [Wed, 29 May 2024 07:56:15 +0000 (09:56 +0200)]
x86/MCE: add default switch case in init_nonfatal_mce_checker()

The default switch case block is wanted here, to handle situation
e.g. of unexpected c->x86_vendor value -- then no mcheck init is done, but
misleading message still gets logged anyway.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/intel: move vmce_has_lmce() routine to header
Sergiy Kibrik [Wed, 29 May 2024 07:54:22 +0000 (09:54 +0200)]
x86/intel: move vmce_has_lmce() routine to header

Moving this function out of mce_intel.c will make it possible to disable
build of Intel MCE code later on, because the function gets called from
common x86 code.

Also replace boilerplate code that checks for MCG_LMCE_P flag with
vmce_has_lmce(), which might contribute to readability a bit.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/svm: Rework VMCB_ACCESSORS() to use a plain type name
Andrew Cooper [Tue, 28 May 2024 15:29:11 +0000 (16:29 +0100)]
x86/svm: Rework VMCB_ACCESSORS() to use a plain type name

This avoids having a function call in a typeof() expression.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/x86: Address two misc MISRA 17.7 violations
Andrew Cooper [Tue, 21 May 2024 15:22:08 +0000 (16:22 +0100)]
xen/x86: Address two misc MISRA 17.7 violations

Neither text_poke() nor watchdog_setup() have their return value consulted.
Switch them to being void.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/x86: Drop useless non-Kconfig CONFIG_* variables
Andrew Cooper [Tue, 21 May 2024 17:07:09 +0000 (18:07 +0100)]
xen/x86: Drop useless non-Kconfig CONFIG_* variables

These are all either completely unused, or do nothing useful.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/lzo: Implement COPY{4,8} using memcpy()
Andrew Cooper [Tue, 21 May 2024 16:08:32 +0000 (17:08 +0100)]
xen/lzo: Implement COPY{4,8} using memcpy()

This is simpler and easier for both humans and compilers to read.

It also addresses 6 instances of MISRA R5.3 violation (shadowing of the ptr_
local variable inside both {put,get}_unaligned()).

No change, not even in the compiled binary.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agox86/traps: address violation of MISRA C Rule 8.4
Nicola Vetrini [Tue, 28 May 2024 06:52:27 +0000 (08:52 +0200)]
x86/traps: address violation of MISRA C Rule 8.4

Rule 8.4 states: "A compatible declaration shall be visible when
an object or function with external linkage is defined".

The function do_general_protection is either used is asm code
or only within this unit, so there is no risk of this getting
out of sync with its definition, but the function must remain
extern.

Therefore, this function is deviated using a comment-based deviation.
No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoCHANGELOG: Mention libxl blktap/tapback support
Jason Andryuk [Tue, 28 May 2024 06:52:15 +0000 (08:52 +0200)]
CHANGELOG: Mention libxl blktap/tapback support

Add entry for backendtype=tap support in libxl.  blktap needs some
changes to work with libxl, which haven't been merged.  They are
available from this PR: https://github.com/xapi-project/blktap/pull/394

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoautomation/eclair_analysis: avoid an ECLAIR warning about escaping
Nicola Vetrini [Mon, 27 May 2024 14:53:17 +0000 (16:53 +0200)]
automation/eclair_analysis: avoid an ECLAIR warning about escaping

The parentheses in this regular expression should be doubly
escaped because they undergo expansion twice.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
[stefano: fix commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agodocs/misra: exclude gdbsx from MISRA compliance
Nicola Vetrini [Mon, 27 May 2024 14:53:16 +0000 (16:53 +0200)]
docs/misra: exclude gdbsx from MISRA compliance

These files are used when debugging Xen, and are not meant to comply
with MISRA rules at the moment.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoautomation/eclair_analysis: add already clean rules to the analysis
Nicola Vetrini [Tue, 21 May 2024 19:34:21 +0000 (21:34 +0200)]
automation/eclair_analysis: add already clean rules to the analysis

Some MISRA C rules already have no violations in Xen, so they can be
set as clean.

Reorder the rules in tagging.ecl according to version ordering
(i.e. sort -V) and split the configuration on multiple lines for
readability.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoautomation/eclair_analysis: set MISRA C Rule 10.2 as clean
Nicola Vetrini [Fri, 17 May 2024 10:27:10 +0000 (12:27 +0200)]
automation/eclair_analysis: set MISRA C Rule 10.2 as clean

This rule has no more violations in the codebase, so it can be
set as clean.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agodocs: Add device tree overlay documentation
Vikram Garhwal [Thu, 23 May 2024 07:40:40 +0000 (15:40 +0800)]
docs: Add device tree overlay documentation

Signed-off-by: Vikram Garhwal <fnu.vikram@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
11 months agotools: Introduce the "xl dt-overlay attach" command
Henry Wang [Thu, 23 May 2024 07:40:39 +0000 (15:40 +0800)]
tools: Introduce the "xl dt-overlay attach" command

With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach (in the future also detach) devices from the provided DT overlay
to domains. Support this by introducing a new "xl dt-overlay" command
and related documentation, i.e. "xl dt-overlay attach. Slightly rework
the command option parsing logic.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains
Henry Wang [Thu, 23 May 2024 07:40:36 +0000 (15:40 +0800)]
xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains

In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. It is OK to change the sysctl behavior
as this feature is experimental so changing sysctl behavior and breaking
compatibility is OK.

Add the XEN_DOMCTL_dt_overlay with operations
XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources. All devices in the overlay are
assigned to a single domain.

Also take the opportunity to make one coding style fix in sysctl.h.
Introduce DT_OVERLAY_MAX_SIZE and use it to avoid repetitions of
KB(500).

xen,reg is to be used to handle non-1:1 mappings but it is currently
unsupported. For now return errors for not-1:1 mapped domains.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Vikram Garhwal <fnu.vikram@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
11 months agoxen/arm/gic: Allow adding interrupt to running VMs
Henry Wang [Thu, 23 May 2024 07:40:35 +0000 (15:40 +0800)]
xen/arm/gic: Allow adding interrupt to running VMs

Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.

Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
11 months agotools/arm: Introduce the "nr_spis" xl config entry
Henry Wang [Thu, 23 May 2024 07:40:34 +0000 (15:40 +0800)]
tools/arm: Introduce the "nr_spis" xl config entry

Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.

Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.

Update the doc and the golang bindings accordingly.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
11 months agoxen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
Henry Wang [Thu, 23 May 2024 07:40:33 +0000 (15:40 +0800)]
xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.

Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
11 months agotools/xl: Correct the help information and exit code of the dt-overlay command
Henry Wang [Thu, 23 May 2024 07:40:32 +0000 (15:40 +0800)]
tools/xl: Correct the help information and exit code of the dt-overlay command

Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.

Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.

Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support")
Suggested-by: Anthony PERARD <anthony@xenproject.org>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agotools/xenalyze: Ignore HVM_EMUL events harder
George Dunlap [Fri, 26 Apr 2024 13:17:33 +0000 (14:17 +0100)]
tools/xenalyze: Ignore HVM_EMUL events harder

To unify certain common sanity checks, checks are done very early in
processing based only on the top-level type.

Unfortunately, when TRC_HVM_EMUL was introduced, it broke some of the
assumptions about how the top-level types worked.  Namely, traces of
this type will show up outside of HVM contexts: in idle domains and in
PV domains.

Make an explicit exception for TRC_HVM_EMUL types in a number of places:

 - Pass the record info pointer to toplevel_assert_check, so that it
   can exclude TRC_HVM_EMUL records from idle and vcpu data_mode
   checks

 - Don't attempt to set the vcpu data_type in hvm_process for
   TRC_HVM_EMUL records.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agox86/hvm/trace: Use a different trace type for AMD processors
George Dunlap [Thu, 25 Apr 2024 12:03:58 +0000 (13:03 +0100)]
x86/hvm/trace: Use a different trace type for AMD processors

A long-standing usability sub-optimality with xenalyze is the
necessity to specify `--svm-mode` when analyzing AMD processors.  This
fundamentally comes about because the same trace event ID is used for
both VMX and SVM, but the contents of the trace must be interpreted
differently.

Instead, allocate separate trace events for VMX and SVM vmexits in
Xen; this will allow all readers to properly interpret the meaning of
the vmexit reason.

In xenalyze, first remove the redundant call to init_hvm_data();
there's no way to get to hvm_vmexit_process() without it being already
initialized by the set_vcpu_type call in hvm_process().

Replace this with set_hvm_exit_reson_data(), and move setting of
hvm->exit_reason_* into that function.

Modify hvm_process and hvm_vmexit_process to handle all four potential
values appropriately.

If SVM entries are encountered, set opt.svm_mode so that other
SVM-specific functionality is triggered.

Remove the `--svm-mode` command-line option, since it's now redundant.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agoxen/arm: Set correct per-cpu cpu_core_mask
Henry Wang [Thu, 21 Mar 2024 03:57:06 +0000 (11:57 +0800)]
xen/arm: Set correct per-cpu cpu_core_mask

In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.

cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
11 months agotools/xentrace: Remove xentrace_format
George Dunlap [Fri, 26 Apr 2024 14:18:25 +0000 (15:18 +0100)]
tools/xentrace: Remove xentrace_format

xentrace_format was always of limited utility, since trace records
across pcpus were processed out of order; it was superseded by xenalyze
over a decade ago.

But for several releases, the `formats` file it has depended on for
proper operation has not even been included in `make install` (which
generally means it doesn't get picked up by distros either); yet
nobody has seemed to complain.

Simple remove xentrace_format, and point people to xenalyze instead.

NB that there is no man page for xenalyze, so the "see also" on the
xentrace man page is simply removed for now.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
11 months agotools: Drop libsystemd as a dependency
Andrew Cooper [Thu, 25 Apr 2024 09:46:40 +0000 (10:46 +0100)]
tools: Drop libsystemd as a dependency

There are no more users, and we want to disuade people from introducing new
users just for sd_notify() and friends.  Drop the dependency.

We still want the overall --with{,out}-systemd to gate the generation of the
service/unit/mount/etc files.

Rerun autogen.sh, and mark the dependency as removed in the build containers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
11 months agotools/{c,o}xenstored: Don't link against libsystemd
Andrew Cooper [Thu, 25 Apr 2024 09:26:58 +0000 (10:26 +0100)]
tools/{c,o}xenstored: Don't link against libsystemd

Use the local freestanding wrapper instead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
11 months agotools: Import stand-alone sd_notify() implementation from systemd
Andrew Cooper [Thu, 16 May 2024 17:59:00 +0000 (18:59 +0100)]
tools: Import stand-alone sd_notify() implementation from systemd

... in order to avoid linking against the whole of libsystemd.

Only minimal changes to the upstream copy, to function as a drop-in
replacement for sd_notify() and as a header-only library.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
11 months agoLICENSES: Add MIT-0 (MIT No Attribution)
Andrew Cooper [Thu, 16 May 2024 17:50:26 +0000 (18:50 +0100)]
LICENSES: Add MIT-0 (MIT No Attribution)

We are about to import code licensed under MIT-0.  It's compatible for us to
use, so identify it as a permitted license.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
11 months agoxen/arm: mem_access: Conditionally compile mem_access.c
Alessandro Zucchelli [Fri, 10 May 2024 12:32:11 +0000 (14:32 +0200)]
xen/arm: mem_access: Conditionally compile mem_access.c

Commit 634cfc8beb ("Make MEM_ACCESS configurable") intended to make
MEM_ACCESS configurable on Arm to reduce the code size when the user
doesn't need it.

However, this didn't cover the arch specific code. None of the code
in arm/mem_access.c is necessary when MEM_ACCESS=n, so it can be
compiled out. This will require to provide some stub for functions
called by the common code.

Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agovpci: add initial support for virtual PCI bus topology
Oleksandr Andrushchenko [Thu, 23 May 2024 08:18:47 +0000 (10:18 +0200)]
vpci: add initial support for virtual PCI bus topology

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agovpci/header: emulate PCI_COMMAND register for guests
Oleksandr Andrushchenko [Thu, 23 May 2024 08:18:04 +0000 (10:18 +0200)]
vpci/header: emulate PCI_COMMAND register for guests

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.

Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.

PCI_COMMAND_IO (bit 0)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if an I/O BAR is exposed to the guest.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
    don't yet support I/O BARs for domUs.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_MEMORY (bit 1)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if a Memory BAR is exposed to the guest.
  Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
    regions.
  Xen domU: For devices assigned to DomUs, memory decoding will be
    disabled at the time of initialization.

PCI_COMMAND_MASTER (bit 2)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_SPECIAL (bit 3)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_INVALIDATE (bit 4)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_VGA_PALETTE (bit 5)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_PARITY (bit 6)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_WAIT (bit 7)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: hardwire to 0
  QEMU: res_mask
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_SERR (bit 8)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_FAST_BACK (bit 9)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_INTX_DISABLE (bit 10)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU checks if INTx was mapped
    for a device. If it is not, then guest can't control
    PCI_COMMAND_INTX_DISABLE bit.
  Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
  Xen dom0: We allow dom0 to control this bit freely.

Bits 11-15
  PCIe 6.1: RsvdP
  PCI LB 3.0: Reserved
  QEMU: res_mask
  Xen domU: rsvdp_mask
  Xen dom0: We allow dom0 to control these bits freely.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agoarm/vpci: honor access size when returning an error
Volodymyr Babchuk [Thu, 23 May 2024 08:17:30 +0000 (10:17 +0200)]
arm/vpci: honor access size when returning an error

Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0xFFFFFFFFFFFFFFFF in the target
register.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>