Henry Wang [Mon, 28 Aug 2023 01:32:13 +0000 (09:32 +0800)]
xen/arm: Introduce CONFIG_MMU Kconfig option
There are two types of memory system architectures available for
Arm-based systems, namely the Virtual Memory System Architecture (VMSA)
and the Protected Memory System Architecture (PMSA). According to
ARM DDI 0487G.a, A VMSA provides a Memory Management Unit (MMU) that
controls address translation, access permissions, and memory attribute
determination and checking, for memory accesses made by the PE. And
refer to ARM DDI 0600A.c, the PMSA supports a unified memory protection
scheme where an Memory Protection Unit (MPU) manages instruction and
data access. Currently, Xen only supports VMSA.
Introduce a Kconfig option CONFIG_MMU, which is currently default
set to y and unselectable because currently only VMSA is supported.
CONFIG_MMU will be used in follow-up patches.
Suggested-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Henry Wang <Henry.Wang@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
At the moment, on MMU system, enable_mmu() will return to an
address in the 1:1 mapping, then each path is responsible to
switch to virtual runtime mapping. Then remove_identity_mapping()
is called on the boot CPU to remove all 1:1 mapping.
Since remove_identity_mapping() is not necessary on Non-MMU system,
and we also avoid creating empty function for Non-MMU system, trying
to keep only one codeflow in arm64/head.S, we move path switch and
remove_identity_mapping() in enable_mmu() on MMU system.
As the remove_identity_mapping should only be called for the boot
CPU only, so we introduce enable_boot_cpu_mm() for boot CPU and
enable_secondary_cpu_mm() for secondary CPUs in this patch.
xen/arm: ioreq: add header for 'handle_ioserv' and 'try_fwd_ioserv'
The functions referenced by this patch should have had a compatible
declaration visible prior to their definition. This is achieved by
including the arch-specific header in 'xen/arch/arm/ioreq.c'
Fixes: cb9953d2f2bc ("arm/ioreq: Introduce arch specific bits for IOREQ/DM features") Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Since QEMU's PowerNV support has matured to the point where it is
now suitable for development, drop support for booting on the
paravirtualized pseries machine type and its associated interfaces.
Support for booting on pseries was broken by 74b725a64d80 ('xen/ppc:
Implement initial Radix MMU support'), and since there is little
practical value in continuing to support pseries as a target, just drop
support for it entirely.
automation: Switch ppc64le tests to PowerNV machine type
Run ppc64le tests with the PowerNV machine type (bare metal) instead of
the paravirtualized pseries machine. This requires a more modern version
of QEMU than is present in debian bullseye's repository, so update the
dockerfile to build QEMU from source.
Support for booting on pseries was broken by 74b725a64d80 ('xen/ppc:
Implement initial Radix MMU support') which resulted in CI failures. In
preparation for removing pseries support entirely, switch the CI
infrastructure to the PowerNV machine type.
xen: apply deviation for Rule 8.4 (asm-only definitions)
As stated in 'docs/misra/rules.rst' the functions that are used only by
asm modules do not need to conform to MISRA C:2012 Rule 8.4.
The deviations are carried out with a SAF comment.
Jan Beulich [Thu, 7 Sep 2023 07:22:40 +0000 (09:22 +0200)]
Arm: constrain {,u}int64_aligned_t in public header
For using a GNU extension, it may not be exposed in general, just like
is done on x86 (except that here we need to also work around not all of
the tool stack actually defining __XEN_TOOLS__). External consumers (not
using gcc or a compatible compiler) need to make this type available up
front (just like we expect {,u}int<N>_t to be supplied) - unlike on x86
the type is actually needed outside of tools-only interfaces, because
guest handle definitions use it.
While there also add underscores around "aligned".
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Michal Orzel [Wed, 6 Sep 2023 10:30:14 +0000 (12:30 +0200)]
xen/arm: Fix printk specifiers and arguments in iomem_remove_cb()
When building Xen for arm32 with CONFIG_DTB_OVERLAY, the following
error is printed:
common/dt-overlay.c: In function ‘iomem_remove_cb’:
././include/xen/config.h:55:24: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=]
Function parameters s and e (denoting MMIO region) are of type unsigned
long and indicate frame numbers and not addresses. This also means that
the arguments passed to printk() are incorrect (using PAGE_ALIGN() or
PAGE_MASK ANDed with a frame number results in unwanted output). Fix it.
Take the opportunity to switch to %pd specifier to print domain id in
a consolidated way.
FFA_RXTX_MAP is currently limited to mapping only one 4k page for each
RX and TX buffer. If a guest tries to map more than one page, an error
is returned. Until this patch, we have been using FFA_RET_NOT_SUPPORTED.
However, that error code is reserved in the FF-A specification to report
that the function is not implemented. Of all the other defined error
codes, the least bad is FFA_RET_INVALID_PARAMETERS, so use that instead.
Michal Orzel [Wed, 6 Sep 2023 12:56:09 +0000 (14:56 +0200)]
tools/xl: Guard main_dt_overlay() with LIBXL_HAVE_DT_OVERLAY
main_dt_overlay() makes a call to libxl_dt_overlay() which is for now
only compiled for Arm. This causes the build failure as reported by
gitlab CI and OSSTEST. Fix it by guarding the function, prototype and
entry in cmd_table[] using LIBXL_HAVE_DT_OVERLAY. This has an advantage
over regular Arm guard so that the code will not need to be modified again
if other architecture gain support for this feature.
Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Michal Orzel [Wed, 6 Sep 2023 12:54:48 +0000 (14:54 +0200)]
xen: Change parameter of generic_{fls,ffs}() to unsigned int
When running with SMMUv3 and UBSAN enabled on arm64, there are a lot of
warnings printed related to shifting into sign bit in generic_fls()
as it takes parameter of type int.
Example:
(XEN) UBSAN: Undefined behaviour in ./include/xen/bitops.h:69:11
(XEN) left shift of 134217728 by 4 places cannot be represented in type 'int'
It does not make a lot of sense to ask for the last set bit of a negative
value. We don't have a direct user of this helper and all the wrappers
pass value of type unsigned {int,long}.
Linux did the same as part of commit: 3fc2579e6f16 ("fls: change parameter to unsigned int")
To keep consistency between the helpers, take the opportunity to:
- replace __inline__ with inline,
- modify generic_ffs() to take parameter of type unsigned int as well
(currently no user and the only wrapper generic_ffsl() passes unsigned
long).
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com>
xen/arm: Implement device tree node addition functionalities
Update sysctl XEN_SYSCTL_dt_overlay to enable support for dtbo nodes addition
using device tree overlay.
xl dt-overlay add file.dtbo:
Each time overlay nodes are added using .dtbo, a new fdt(memcpy of
device_tree_flattened) is created and updated with overlay nodes. This
updated fdt is further unflattened to a dt_host_new. Next, it checks if any
of the overlay nodes already exists in the dt_host. If overlay nodes doesn't
exist then find the overlay nodes in dt_host_new, find the overlay node's
parent in dt_host and add the nodes as child under their parent in the
dt_host. The node is attached as the last node under target parent.
Finally, add IRQs, add device to IOMMUs, set permissions and map MMIO for the
overlay node.
When a node is added using overlay, a new entry is allocated in the
overlay_track to keep the track of memory allocation due to addition of overlay
node. This is helpful for freeing the memory allocated when a device tree node
is removed.
The main purpose of this to address first part of dynamic programming i.e.
making xen aware of new device tree node which means updating the dt_host with
overlay node information. Here we are adding/removing node from dt_host, and
checking/setting IOMMU and IRQ permission but never mapping them to any domain.
Right now, mapping/Un-mapping will happen only when a new domU is
created/destroyed using "xl create".
xen/arm: Implement device tree node removal functionalities
Introduce sysctl XEN_SYSCTL_dt_overlay to remove device-tree nodes added using
device tree overlay.
xl dt-overlay remove file.dtbo:
Removes all the nodes in a given dtbo.
First, removes IRQ permissions and MMIO accesses. Next, it finds the nodes
in dt_host and delete the device node entries from dt_host.
The nodes get removed only if it is not used by any of dom0 or domio.
Also, added overlay_track struct to keep the track of added node through device
tree overlay. overlay_track has dt_host_new which is unflattened form of updated
fdt and name of overlay nodes. When a node is removed, we also free the memory
used by overlay_track for the particular overlay node.
Nested overlay removal is supported in sequential manner only i.e. if
overlay_child nests under overlay_parent, it is assumed that user first removes
overlay_child and then removes overlay_parent.
Also, this is an experimental feature so it is expected from user to make sure
correct device tree overlays are used when adding nodes and making sure devices
are not being used by other domain before removing them from Xen tree.
Partially added/removed i.e. failures while removing the overlay may cause other
failures and might need a system reboot.
arm/asm/setup.h: Update struct map_range_data to add rangeset.
Add rangesets for IRQs and IOMEMs. This was done to accommodate dynamic overlay
node addition/removal operations. With overlay operations, new IRQs and IOMEMs
are added in dt_host and routed. While removing overlay nodes, nodes are removed
from dt_host and their IRQs and IOMEMs routing is also removed. Storing IRQs and
IOMEMs in the rangeset will avoid re-parsing the device tree nodes to get the
IOMEM and IRQ ranges for overlay remove ops.
Dynamic overlay node add/remove will be introduced in follow-up patches.
Dynamic programming ops will modify the dt_host and there might be other
functions which are browsing the dt_host at the same time. To avoid the race
conditions, adding rwlock for browsing the dt_host during runtime. dt_host
writer will be added in the follow-up patch for device tree overlay
functionalities.
Reason behind adding rwlock instead of spinlock:
For now, dynamic programming is the sole modifier of dt_host in Xen during
run time. All other access functions like iommu_release_dt_device() are
just reading the dt_host during run-time. So, there is a need to protect
others from browsing the dt_host while dynamic programming is modifying
it. rwlock is better suitable for this task as spinlock won't be able to
differentiate between read and write access.
asm/smp.h: Fix circular dependency for device_tree.h and rwlock.h
Dynamic programming ops will modify the dt_host and there might be other
function which are browsing the dt_host at the same time. To avoid the race
conditions, we will need to add a rwlock to protect access to the dt_host.
However, adding rwlock in device_tree.h causes following circular dependency:
device_tree.h->rwlock.h->smp.h->asm/smp.h->device_tree.h
To fix this, removed the "#include <xen/device_tree.h> and forward declared
"struct dt_device_node".
Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Julien Grall <jgrall@amazon.com>
xen/smmu: Add remove_device callback for smmu_iommu ops
Add remove_device callback for removing the device entry from smmu-master using
following steps:
1. Find if SMMU master exists for the device node.
2. Check if device is currently in use.
3. Remove the SMMU master.
xen/iommu: protect iommu_add_dt_device() with dtdevs_lock
Protect iommu_add_dt_device() with dtdevs_lock to prevent concurrent access
to add/remove/assign/deassign.
With addition of dynamic programming feature(follow-up patches in this series),
this function can be concurrently accessed by dynamic node add/remove using
device tree overlays.
xen/iommu: Move spin_lock from iommu_dt_device_is_assigned to caller
Rename iommu_dt_device_is_assigned() to iommu_dt_device_is_assigned_locked().
Moving spin_lock to caller was done to prevent the concurrent access to
iommu_dt_device_is_assigned while doing add/remove/assign/deassign. Follow-up
patches in this series introduces node add/remove feature.
xen/device-tree: Add dt_find_node_by_path_from() to find nodes in device tree
Add dt_find_node_by_path_from() to find a matching node with path for a
dt_device_node.
Reason behind this function:
Each time overlay nodes are added using .dtbo, a new fdt (memcpy of
device_tree_flattened) is created and updated with overlay nodes. This
updated fdt is further unflattened to a dt_host_new. Next, we need to find
the overlay nodes in dt_host_new, find the overlay node's parent in dt_host
and add the nodes as child under their parent in the dt_host. Thus we need
this function to search for node in different unflattened device trees.
Rename overlay_get_target() to fdt_overlay_target_offset() and remove static
function type.
This is done to get the target path for the overlay nodes which is very useful
in many cases. For example, Xen hypervisor needs it when applying overlays
because Xen needs to do further processing of the overlay nodes, e.g. mapping of
resources(IRQs and IOMMUs) to other VMs, creation of SMMU pagetables, etc.
Following changes are done to __unflatten_device_tree():
1. __unflatten_device_tree() is renamed to unflatten_device_tree().
2. Remove __init and static function type.
The changes are done to make this function useable for dynamic node programming
where new device tree overlay nodes are added to fdt and further unflattend to
update xen device tree during runtime.
Remove __init from following function to access during runtime:
1. map_irq_to_domain()
2. handle_device_interrupts()
3. map_range_to_domain()
4. unflatten_dt_node()
5. handle_device()
6. map_device_children()
7. map_dt_irq_to_domain()
Move map_irq_to_domain() prototype from domain_build.h to setup.h.
Above changes will create an error on build as non-init function are still
in domain_build.c file. So, to avoid build fails, following changes are done:
1. Move map_irq_to_domain(), handle_device_interrupts(), map_range_to_domain(),
handle_device(), map_device_children() and map_dt_irq_to_domain()
to device.c. After removing __init type, these functions are not specific
to domain building, so moving them out of domain_build.c to device.c.
2. Remove static type from handle_device_interrupts().
Also, renamed handle_device_interrupts() to map_device_irqs_to_domain().
Overall, these changes are done to support the dynamic programming of a nodes
where an overlay node will be added to fdt and unflattened node will be added to
dt_host. Furthermore, IRQ and mmio mapping will be done for the added node.
This will be useful in dynamic node programming when new dt nodes are unflattend
during runtime. Invalid device tree node related errors should be propagated
back to the caller.
common/device_tree: handle memory allocation failure in __unflatten_device_tree()
Change __unflatten_device_tree() return type to integer so it can propagate
memory allocation failure. Add panic() in dt_unflatten_host_device_tree() for
memory allocation failure during boot.
Fixes: fb97eb614acf ("xen/arm: Create a hierarchical device tree") Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Julien Grall <jgrall@amazon.com>
xen/arm: page: Handle cache flush of an element at the top of the address space
The region that needs to be cleaned/invalidated may be at the top
of the address space. This means that 'end' (i.e. 'p + size') will
be 0 and therefore nothing will be cleaned/invalidated as the check
in the loop will always be false.
On Arm64, we only support we only support up to 48-bit Virtual
address space. So this is not a concern there. However, for 32-bit,
the mapcache is using the last 2GB of the address space. Therefore
we may not clean/invalidate properly some pages. This could lead
to memory corruption or data leakage (the scrubbed value may
still sit in the cache when the guest could read directly the memory
and therefore read the old content).
Rework invalidate_dcache_va_range(), clean_dcache_va_range(),
clean_and_invalidate_dcache_va_range() to handle a cache flush
with an element at the top of the address space.
Jan Beulich [Fri, 1 Sep 2023 07:17:41 +0000 (09:17 +0200)]
MAINTAINERS: consolidate vm-event/monitor entry
If the F: description is to be trusted, the two xen/arch/x86/hvm/
lines were fully redundant with the earlier wildcard ones. Arch header
files, otoh, were no longer covered by anything as of the move from
include/asm-*/ to arch/*/include/asm/. Further also generalize (by
folding) the x86- and Arm-specific mem_access.c entries.
Finally, again assuming the F: description can be trusted, there's no
point listing arch/, common/, and include/ entries separately. Fold
them all.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Nicola Vetrini [Fri, 25 Aug 2023 14:00:23 +0000 (16:00 +0200)]
arm64/vfp: address MISRA C:2012 Dir 4.3
Directive 4.3 prescribes the following:
"Assembly language shall be encapsulated and isolated",
on the grounds of improved readability and ease of maintenance.
A static inline function is the chosen encapsulation mechanism.
Simone Ballarin [Mon, 28 Aug 2023 13:20:08 +0000 (15:20 +0200)]
xen/sched: address violations of MISRA C:2012 Directive 4.10
Add inclusion guards to address violations of
MISRA C:2012 Directive 4.10 ("Precautions shall be taken in order
to prevent the contents of a header file being included more than
once").
These were left over after a previous pci_sbdf_t conversion.
Fixes: 0c38c61aad21 ("pci: switch pci_conf_write32 to use pci_sbdf_t") Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Wed, 30 Aug 2023 08:03:53 +0000 (10:03 +0200)]
x86/irq: fix reporting of spurious i8259 interrupts
The return value of bogus_8259A_irq() is wrong: the function will
return `true` when the IRQ is real and `false` when it's a spurious
IRQ. This causes the "No irq handler for vector ..." message in
do_IRQ() to be printed for spurious i8259 interrupts which is not
intended (and not helpful).
Fix by inverting the return value of bogus_8259A_irq().
Fixes: 132906348a14 ('x86/i8259: Handle bogus spurious interrupts more quietly') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ross Lagerwall [Thu, 24 Aug 2023 09:02:58 +0000 (11:02 +0200)]
xen/console: Set the default log level to INFO for release builds
Not displaying INFO messages by default on release builds is not
helpful, as messages of that level are relevant for hypervisor
operation. For example messages related to livepatches applied and
reverted are of INFO level.
Custom builds that require less verbose output can adjust it using the
command line, but attempt to provide all relevant information by
default on release builds.
Adjust the loglevel of printks that don't have an associated level to
INFO instead of WARNING, since INFO will now be printed by default on
all builds.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Simon Gaiser [Mon, 7 Aug 2023 09:38:25 +0000 (11:38 +0200)]
x86/ACPI: Ignore entries with invalid APIC IDs when parsing MADT
It seems some firmwares put dummy entries in the ACPI MADT table for non
existing processors. On my NUC11TNHi5 those have the invalid APIC ID
0xff. Linux already has code to handle those cases both in
acpi_parse_lapic [1] as well as in acpi_parse_x2apic [2]. So add the
same check to Xen.
xen/vpci: address violations of MISRA C:2012 Rule 7.2
The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".
Add the 'U' suffix to integers literals with unsigned type and also to other
literals used in the same contexts or near violations, when their positive
nature is immediately clear. The latter changes are done for the sake of
uniformity.
Nicola Vetrini [Mon, 28 Aug 2023 13:27:16 +0000 (15:27 +0200)]
xen/pci: drop remaining uses of bool_t
The remaining occurrences of the type bool_t in the header file can be
removed. This also resolves violations of MISRA C:2012 Rule 8.3
introduced by 870d5cd9a91f ("xen/IOMMU: Switch bool_t to bool").
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Shawn Anastasio [Mon, 28 Aug 2023 13:26:41 +0000 (15:26 +0200)]
common: Add missing #includes treewide
A few files treewide depend on defininitions in headers that they
don't include. This works when arch headers end up including the
required headers by chance, but broke on ppc64 with only minimal/stub
arch headers.
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/vpci: address violations of MISRA C:2012 Rule 7.3
The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".
Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.
Anthony PERARD [Wed, 23 Aug 2023 15:23:34 +0000 (16:23 +0100)]
CI: Always move the bisect build log back
On failure of "build"-each-commit script, the next command that move
the log back into the build directory isn't executed. Fix that by
using "after_script" which is always executed even if the main
"script" fails. (We would still miss the log when the jobs times out.)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Thu, 24 Aug 2023 12:39:39 +0000 (13:39 +0100)]
tools/oxenstored: Additional debugging commands
These were added to aid security development, and are useful generally for
debugging.
Signed-off-by: Edwin Török <edwin.torok@cloud.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Julien Grall [Thu, 24 Aug 2023 10:42:26 +0000 (11:42 +0100)]
tools/libs: light: Remove the variable 'domainid' do_pci_remove()
The function do_pci_remove() has two local variables 'domid' and
'domainid' containing the same value.
Looking at the history, until 2cf3b50dcd8b ("libxl_pci: Use
libxl__ao_device with pci_remove") the two variables may have
different value when using a stubdomain.
As this is not the case now, remove 'domainid'. This will reduce
the confusion between the two variables.
Note that there are other places in libxl_pci.c which are using
the two confusing names within the same function. They are left
unchanged for now.
No functional changes intented.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
During the discussions that led to the acceptance of the Rules, we
decided on a few exceptions that were not properly recorded in
rules.rst. Other times, the exceptions were decided later when it came
to enabling a rule in ECLAIR.
Either way, update rules.rst with appropriate notes.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/mem_access: address violations of MISRA C:2012 Rule 7.3
The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".
Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.
The changes in this patch are mechanical.
Signed-off-by: Gianluca Luparini <gianluca.luparini@bugseng.com> Signed-off-by: Simone Ballarin <simone.ballarin@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Juergen Gross [Tue, 22 Aug 2023 07:45:02 +0000 (09:45 +0200)]
tools/xenstore: move xenstored sources into dedicated directory
In tools/xenstore there are living xenstored and xenstore clients.
They are no longer sharing anything apart from the "xenstore" in their
names.
Move the xenstored sources into a new directory tools/xenstored while
dropping the "xenstored_" prefix from their names. This will make it
clearer that xenstore clients and xenstored are independent from each
other.
In order to avoid two very similar named directories below tools,
rename tools/xenstore to tools/xs-clients.
Drop the make targets [un]install-clients as those are not used in
the Xen tree.
Nicola Vetrini [Thu, 17 Aug 2023 12:39:27 +0000 (14:39 +0200)]
vpci/msix: make 'get_slot' static
The function can become static since it's used only within this file.
This also resolves a violation of MISRA C:2012 Rule 8.4 due to the absence
of a declaration before the function definition.
Fixes: b177892d2d0e ("vpci/msix: handle accesses adjacent to the MSI-X table") Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Juergen Gross [Wed, 23 Aug 2023 08:32:19 +0000 (10:32 +0200)]
stubdom: remove openssl related clean actions
When introducing polarssl into stubdom building the clean targets of
stubdom/Makefile gained actions for removing openssl directories and
files additional to polarssl ones.
As those openssl files are never downloaded or created during build,
the related actions can be dropped.
Fixes: bdd516dc6b2f ("vtpm/vtpmmgr and required libs to stubdom/Makefile") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Andrew Cooper [Fri, 9 Oct 2020 14:25:34 +0000 (15:25 +0100)]
x86/vmx: Revert "x86/VMX: sanitize rIP before re-entering guest"
At the time of XSA-170, the x86 instruction emulator was genuinely broken. It
would load arbitrary values into %rip and putting a check here probably was
the best stopgap security fix. It should have been reverted following c/s 81d3a0b26c1 "x86emul: limit-check branch targets" which corrected the emulator
behaviour.
However, everyone involved in XSA-170, myself included, failed to read the SDM
correctly. On the subject of %rip consistency checks, the SDM stated:
If the processor supports N < 64 linear-address bits, bits 63:N must be
identical
A non-canonical %rip (and SSP more recently) is an explicitly legal state in
x86, and the VMEntry consistency checks are intentionally off-by-one from a
regular canonical check.
The consequence of this bug is that Xen will currently take a legal x86 state
which would successfully VMEnter, and corrupt it into having non-architectural
behaviour.
Furthermore, in the time this bugfix has been pending in public, I
successfully persuaded Intel to clarify the SDM, adding the following
clarification:
The guest RIP value is not required to be canonical; the value of bit N-1
may differ from that of bit N.
Fixes: ffbbfda377 ("x86/VMX: sanitize rIP before re-entering guest") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
The full structures cannot match in layout, as soon as a 32-bit tool
stack build comes into play. But it also doesn't need to; the part of
the layouts that needs to match is merely the union that we memcpy()
from the sysctl structure to the xc one. Keep (in adjusted form) only
the relevant ones.
Since the whole block needs touching anyway, move it closer to the
respective memcpy() and use a wrapper macro to limit verbosity. Also
tidy the full-size-check there.
Fixes: 2381dfab083f ("xen/sysctl: Nest cpufreq scaling options") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:18 +0000 (14:51 -0400)]
xenpm: Add set-cpufreq-cppc subcommand
set-cpufreq-cppc allows setting the Hardware P-State (HWP) parameters.
It can be run on all or just a single cpu. There are presets of
balance, powersave & performance. Those can be further tweaked by
param:val arguments as explained in the usage description.
Parameter names are just checked to the first 3 characters to shorten
typing.
Some options are hardware dependent, and ranges can be found in
get-cpufreq-para.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:16 +0000 (14:51 -0400)]
xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op
Add SET_CPUFREQ_HWP xen_sysctl_pm_op to set HWP parameters. The sysctl
supports setting multiple values simultaneously as indicated by the
set_params bits. This allows atomically applying new HWP configuration
via a single wrmsr.
XEN_SYSCTL_HWP_SET_PRESET_BALANCE/PERFORMANCE/POWERSAVE provide three
common presets. Setting them depends on hardware limits which the
hypervisor is already caching. So using them allows skipping a
hypercall to query the limits (lowest/highest) to then set those same
values. The code is organized to allow a preset to be refined with
additional parameters if desired.
"most_efficient" and "guaranteed" could be additional presets in the
future, but the are not added now. Those levels can change at runtime,
but we don't have code in place to monitor and update for those events.
Since activity window may not be supported by all hardware, omit writing
it when not supported, and return that fact to userspace by updating
set_params.
CPPC parameter checking disallows setting reserved bytes and ensure
values are only non-zero when the corresponding set_params bit is set.
There is no range checking (0-255 is allowed) since hardware is
documented to clip internally.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:13 +0000 (14:51 -0400)]
cpufreq: Export HWP parameters to userspace as CPPC
Extend xen_get_cpufreq_para to return hwp parameters. HWP is an
implementation of ACPI CPPC (Collaborative Processor Performance
Control). Use the CPPC name since that might be useful in the future
for AMD P-state.
We need the features bitmask to indicate fields supported by the actual
hardware - this only applies to activity window for the time being.
The HWP most_efficient is mapped to CPPC lowest_nonlinear, and guaranteed is
mapped to nominal. CPPC has a guaranteed that is optional while nominal
is required. ACPI spec says "If this register is not implemented, OSPM
assumes guaranteed performance is always equal to nominal performance."
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:12 +0000 (14:51 -0400)]
xenpm: Change get-cpufreq-para output for hwp
When using HWP, some of the returned data is not applicable. In that
case, we should just omit it to avoid confusing the user. So switch to
printing the base and max frequencies since those are relevant to HWP.
Similarly, stop printing the CPU frequencies since those do not apply.
The scaling fields are also no longer printed.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:11 +0000 (14:51 -0400)]
xen/x86: Tweak PDC bits when using HWP
Qubes testing of HWP support had a report of a laptop, Thinkpad X1
Carbon Gen 4 with a Skylake processor, locking up during boot when HWP
is enabled. A user found a kernel bug that seems to be the same issue:
https://bugzilla.kernel.org/show_bug.cgi?id=110941.
That bug was fixed by Linux commit a21211672c9a ("ACPI / processor:
Request native thermal interrupt handling via _OSC"). The tl;dr is SMM
crashes when it receives thermal interrupts, so Linux calls the ACPI
_OSC method to take over interrupt handling.
The Linux fix looks at the CPU features to decide whether or not to call
_OSC with bit 12 set to take over native interrupt handling. Xen needs
some way to communicate HWP to Dom0 for making an equivalent call.
Xen exposes modified PDC bits via the platform_op set_pminfo hypercall.
Expand that to set bit 12 when HWP is present and in use.
Any generated interrupt would be handled by Xen's thermal drive, which
clears the status.
Bit 12 isn't named in the linux header and is open coded in Linux's
usage. Name it ACPI_PDC_CPPC_NATIVE_INTR.
This will need a corresponding linux patch to pick up and apply the PDC
bits.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:10 +0000 (14:51 -0400)]
cpufreq: Add Hardware P-State (HWP) driver
From the Intel SDM: "Hardware-Controlled Performance States (HWP), which
autonomously selects performance states while utilizing OS supplied
performance guidance hints."
Enable HWP to run in autonomous mode by poking the correct MSRs. HWP is
disabled by default, and cpufreq=hwp enables it.
cpufreq= parsing is expanded to allow cpufreq=hwp;xen. This allows
trying HWP and falling back to xen if not available. Only hwp and xen
are supported for this fallback feature. hdc is a sub-option under hwp
(i.e. cpufreq=hwp,hdc=0) as is verbose.
There is no interface to configure - xen_sysctl_pm_op/xenpm will
be extended to configure in subsequent patches. It will run with the
default values, which should be the default 0x80 (out of 0x0-0xff)
energy/performance preference.
Unscientific powertop measurement of an mostly idle, customized OpenXT
install:
A 10th gen 6-core laptop showed battery discharge drop from ~9.x to
~7.x watts.
A 8th gen 4-core laptop dropped from ~10 to ~9
Power usage depends on many factors, especially display brightness, but
this does show a power saving in balanced mode when CPU utilization is
low.
HWP isn't compatible with an external governor - it doesn't take
explicit frequency requests. Therefore a minimal internal governor,
hwp, is also added as a placeholder.
While adding to the xen-command-line.pandoc entry, un-nest verbose from
minfreq. They are independent.
With cpufreq=hwp,verbose, HWP prints processor capabilities that are not
used by the code, like HW_FEEDBACK. This is done because otherwise
there isn't a convenient way to query the information.
Xen doesn't use the HWP interrupt, so it is disabled like in the Linux
pstate driver.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:09 +0000 (14:51 -0400)]
pmstat&xenpm: Re-arrage for cpufreq union
Rearrange code now that xen_sysctl_pm_op's get_para fields has the
nested union and struct. In particular, the scaling governor
information like scaling_available_governors is inside the union, so it
is not always available. Move those fields (op->u.get_para.u.s.u.*)
together as well as the common fields (ones outside the union like
op->u.get_para.turbo_enabled).
With that, gov_num may be 0, so bounce buffer handling needs
to be modified.
scaling_governor and other fields inside op->u.get_para.u.s.u.* won't be
used for hwp, so this will simplify the change when hwp support is
introduced and re-indents these lines all together.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:08 +0000 (14:51 -0400)]
xen/sysctl: Nest cpufreq scaling options
Add a union and struct so that most of the scaling variables of struct
xen_get_cpufreq_para are within in a binary-compatible layout. This
allows cppc_para to live in the larger union and use uint32_ts - struct
xen_cppc_para will be 10 uint32_t's.
The new scaling struct is 3 * uint32_t + 16 bytes CPUFREQ_NAME_LEN + 4 *
uint32_t for xen_ondemand = 11 uint32_t. That means the old size is
retained, int32_t turbo_enabled doesn't move and it's binary compatible.
The out-of-context memcpy() in xc_get_cpufreq_para() now handles the
copying of the fields removed there.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:06 +0000 (14:51 -0400)]
cpufreq: Add perf_freq to cpuinfo
acpi-cpufreq scales the aperf/mperf measurements by max_freq, but HWP
needs to scale by base frequency. Settings max_freq to base_freq
"works" but the code is not obvious, and returning values to userspace
is tricky. Add an additonal perf_freq member which is used for scaling
aperf/mperf measurements.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:05 +0000 (14:51 -0400)]
cpufreq: Allow restricting to internal governors only
For hwp, the standard governors are not usable, and only the internal
one is applicable. Add the cpufreq_governor_internal boolean to
indicate when an internal governor, like hwp, will be used. This is set
during presmp_initcall, and governor registration can be skipped when
called during initcall.
This way unusable governors are not registered, and only compatible
governors are advertised to userspace.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/hypercalls: address violations of MISRA C:2012 Rule 8.3
Make function declarations and definitions consistent to address
violations of MISRA C:2012 Rule 8.3 ("All declarations of an object or
function shall use the same names and type qualifiers").
No functional change.
Signed-off-by: Federico Serafini <federico.serafini@bugseng.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Shawn Anastasio [Wed, 23 Aug 2023 07:28:02 +0000 (09:28 +0200)]
xen/ppc: Relocate kernel to physical address 0 on boot
Introduce a small assembly loop in `start` to copy the kernel to
physical address 0 before continuing. This ensures that the physical
address lines up with XEN_VIRT_START (0xc000000000000000) and allows us
to identity map the kernel when the MMU is set up in the next patch.
We are also able to start execution at XEN_VIRT_START after the copy
since hardware will ignore the top 4 address bits when operating in Real
Mode (MMU off).
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Shawn Anastasio [Wed, 23 Aug 2023 07:27:29 +0000 (09:27 +0200)]
xen/ppc: Bump minimum target ISA to 3.0 (POWER9)
In preparation for implementing ISA3+ Radix MMU support, drop ISA 2.07B
from the supported ISA list to avoid having a non-working
configuration in tree. It can be re-added at a later point when Hash
MMU support is added.
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 23 Aug 2023 07:26:36 +0000 (09:26 +0200)]
x86/AMD: extend Zenbleed check to models "good" ucode isn't known for
Reportedly the AMD Custom APU 0405 found on SteamDeck, models 0x90 and
0x91, (quoting the respective Linux commit) is similarly affected. Put
another instance of our Zen1 vs Zen2 distinction checks in
amd_check_zenbleed(), forcing use of the chickenbit irrespective of
ucode version (building upon real hardware never surfacing a version of
0xffffffff).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 23 Aug 2023 07:25:52 +0000 (09:25 +0200)]
build: make cc-option properly deal with unrecognized sub-options
In options like -march=, it may be only the sub-option which is
unrecognized by the compiler. In such an event the error message often
splits option and argument, typically saying something like "bad value
'<argument>' for '<option>'. Instead of extend the grep invocation, stop
parsing compiler output altogether. Instead substitute -Wno-* options by
their -W* counterparts for probing (obviously assuming that such a
counterpart always exists).
Suggested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Nicola Vetrini [Tue, 22 Aug 2023 06:53:24 +0000 (08:53 +0200)]
vm_event: rework inclusions to use arch-indipendent header
The arch-specific header <asm/vm_event.h> should be included by the
common header <xen/vm_event.h>, so that the latter can be included
in the source files.
This also resolves violations of MISRA C:2012 Rule 8.4 that were
caused by declarations for
'vm_event_{fill_regs,set_registers,monitor_next_interrupt}'
in <asm/vm_event.h> not being visible when
defining functions in 'xen/arch/x86/vm_event.c'
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Tue, 22 Aug 2023 06:52:49 +0000 (08:52 +0200)]
mem-sharing: move (x86) / drop (Arm) arch_dump_shared_mem_info()
When !MEM_SHARING no useful output is produced. Move the function into
mm/mem_sharing.c while conditionalizing the call to it, thus allowing to
drop it altogether from Arm (and eliminating the need to introduce stubs
on PPC and RISC-V).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> #arm Acked-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Simon Gaiser [Tue, 22 Aug 2023 06:51:38 +0000 (08:51 +0200)]
x86/hpet: Disable legacy replacement mode after IRQ test
As far as I understand the HPET legacy mode is not required after the
timer IRQ test. For previous discussion see [1] and [2]. Keeping it
enabled prevents reaching deeper C-states on some systems and thereby
also S0ix residency. So disable it after the timer IRQ test worked. Note
that this code path is only reached when opt_hpet_legacy_replacement < 0,
so explicit user choice is still honored.
Wei Chen [Mon, 14 Aug 2023 04:25:26 +0000 (12:25 +0800)]
xen/arm64: prepare for moving MMU related code from head.S
We want to reuse head.S for MPU systems, but there are some
code are implemented for MMU systems only. We will move such
code to another MMU specific file. But before that we will
do some indentations fix in this patch to make them be easier
for reviewing:
1. Fix the indentations and incorrect style of code comments.
2. Fix the indentations for .text.header section.
3. Rename puts() to asm_puts() for global export
Julien Grall [Mon, 21 Aug 2023 17:02:05 +0000 (18:02 +0100)]
xen/public: arch-arm: All PSR_* defines should be unsigned
The defines PSR_* are field in registers and always unsigned. So
add 'U' to clarify.
This should help with MISRA Rule 7.2.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Tested-by: Henry Wang <Henry.Wang@arm.com>
Julien Grall [Mon, 21 Aug 2023 17:01:09 +0000 (18:01 +0100)]
xen/arm: vgic: Use 'unsigned int' rather than 'int' whenever it is possible
Switch to unsigned int for the return/parameters of the following
functions:
* REG_RANK_NR(): 'b' (number of bits) and the return is always positive.
'n' doesn't need to be size specific.
* vgic_rank_offset(): 'b' (number of bits), 'n' (register index),
's' (size of the access) are always positive.
* vgic_{enable, disable}_irqs(): 'n' (rank index) is always positive
* vgic_get_virq_type(): 'n' (rank index) and 'index' (register
index) are always positive.
* vgic_get_rank(): 'rank' is an index and therefore always positive.
Take the opportunity to propogate the unsignedness to the local
variable used for the arguments.
This will remove some of the warning reported by GCC 12.2.1 when
passing the flags -Wsign-conversion/-Wconversion.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Tested-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>