Simone Ballarin [Mon, 28 Aug 2023 13:20:08 +0000 (15:20 +0200)]
xen/sched: address violations of MISRA C:2012 Directive 4.10
Add inclusion guards to address violations of
MISRA C:2012 Directive 4.10 ("Precautions shall be taken in order
to prevent the contents of a header file being included more than
once").
These were left over after a previous pci_sbdf_t conversion.
Fixes: 0c38c61aad21 ("pci: switch pci_conf_write32 to use pci_sbdf_t") Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Wed, 30 Aug 2023 08:03:53 +0000 (10:03 +0200)]
x86/irq: fix reporting of spurious i8259 interrupts
The return value of bogus_8259A_irq() is wrong: the function will
return `true` when the IRQ is real and `false` when it's a spurious
IRQ. This causes the "No irq handler for vector ..." message in
do_IRQ() to be printed for spurious i8259 interrupts which is not
intended (and not helpful).
Fix by inverting the return value of bogus_8259A_irq().
Fixes: 132906348a14 ('x86/i8259: Handle bogus spurious interrupts more quietly') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ross Lagerwall [Thu, 24 Aug 2023 09:02:58 +0000 (11:02 +0200)]
xen/console: Set the default log level to INFO for release builds
Not displaying INFO messages by default on release builds is not
helpful, as messages of that level are relevant for hypervisor
operation. For example messages related to livepatches applied and
reverted are of INFO level.
Custom builds that require less verbose output can adjust it using the
command line, but attempt to provide all relevant information by
default on release builds.
Adjust the loglevel of printks that don't have an associated level to
INFO instead of WARNING, since INFO will now be printed by default on
all builds.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Simon Gaiser [Mon, 7 Aug 2023 09:38:25 +0000 (11:38 +0200)]
x86/ACPI: Ignore entries with invalid APIC IDs when parsing MADT
It seems some firmwares put dummy entries in the ACPI MADT table for non
existing processors. On my NUC11TNHi5 those have the invalid APIC ID
0xff. Linux already has code to handle those cases both in
acpi_parse_lapic [1] as well as in acpi_parse_x2apic [2]. So add the
same check to Xen.
xen/vpci: address violations of MISRA C:2012 Rule 7.2
The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".
Add the 'U' suffix to integers literals with unsigned type and also to other
literals used in the same contexts or near violations, when their positive
nature is immediately clear. The latter changes are done for the sake of
uniformity.
Nicola Vetrini [Mon, 28 Aug 2023 13:27:16 +0000 (15:27 +0200)]
xen/pci: drop remaining uses of bool_t
The remaining occurrences of the type bool_t in the header file can be
removed. This also resolves violations of MISRA C:2012 Rule 8.3
introduced by 870d5cd9a91f ("xen/IOMMU: Switch bool_t to bool").
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Shawn Anastasio [Mon, 28 Aug 2023 13:26:41 +0000 (15:26 +0200)]
common: Add missing #includes treewide
A few files treewide depend on defininitions in headers that they
don't include. This works when arch headers end up including the
required headers by chance, but broke on ppc64 with only minimal/stub
arch headers.
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/vpci: address violations of MISRA C:2012 Rule 7.3
The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".
Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.
Anthony PERARD [Wed, 23 Aug 2023 15:23:34 +0000 (16:23 +0100)]
CI: Always move the bisect build log back
On failure of "build"-each-commit script, the next command that move
the log back into the build directory isn't executed. Fix that by
using "after_script" which is always executed even if the main
"script" fails. (We would still miss the log when the jobs times out.)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Thu, 24 Aug 2023 12:39:39 +0000 (13:39 +0100)]
tools/oxenstored: Additional debugging commands
These were added to aid security development, and are useful generally for
debugging.
Signed-off-by: Edwin Török <edwin.torok@cloud.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Julien Grall [Thu, 24 Aug 2023 10:42:26 +0000 (11:42 +0100)]
tools/libs: light: Remove the variable 'domainid' do_pci_remove()
The function do_pci_remove() has two local variables 'domid' and
'domainid' containing the same value.
Looking at the history, until 2cf3b50dcd8b ("libxl_pci: Use
libxl__ao_device with pci_remove") the two variables may have
different value when using a stubdomain.
As this is not the case now, remove 'domainid'. This will reduce
the confusion between the two variables.
Note that there are other places in libxl_pci.c which are using
the two confusing names within the same function. They are left
unchanged for now.
No functional changes intented.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
During the discussions that led to the acceptance of the Rules, we
decided on a few exceptions that were not properly recorded in
rules.rst. Other times, the exceptions were decided later when it came
to enabling a rule in ECLAIR.
Either way, update rules.rst with appropriate notes.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/mem_access: address violations of MISRA C:2012 Rule 7.3
The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".
Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.
The changes in this patch are mechanical.
Signed-off-by: Gianluca Luparini <gianluca.luparini@bugseng.com> Signed-off-by: Simone Ballarin <simone.ballarin@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Juergen Gross [Tue, 22 Aug 2023 07:45:02 +0000 (09:45 +0200)]
tools/xenstore: move xenstored sources into dedicated directory
In tools/xenstore there are living xenstored and xenstore clients.
They are no longer sharing anything apart from the "xenstore" in their
names.
Move the xenstored sources into a new directory tools/xenstored while
dropping the "xenstored_" prefix from their names. This will make it
clearer that xenstore clients and xenstored are independent from each
other.
In order to avoid two very similar named directories below tools,
rename tools/xenstore to tools/xs-clients.
Drop the make targets [un]install-clients as those are not used in
the Xen tree.
Nicola Vetrini [Thu, 17 Aug 2023 12:39:27 +0000 (14:39 +0200)]
vpci/msix: make 'get_slot' static
The function can become static since it's used only within this file.
This also resolves a violation of MISRA C:2012 Rule 8.4 due to the absence
of a declaration before the function definition.
Fixes: b177892d2d0e ("vpci/msix: handle accesses adjacent to the MSI-X table") Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Juergen Gross [Wed, 23 Aug 2023 08:32:19 +0000 (10:32 +0200)]
stubdom: remove openssl related clean actions
When introducing polarssl into stubdom building the clean targets of
stubdom/Makefile gained actions for removing openssl directories and
files additional to polarssl ones.
As those openssl files are never downloaded or created during build,
the related actions can be dropped.
Fixes: bdd516dc6b2f ("vtpm/vtpmmgr and required libs to stubdom/Makefile") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Andrew Cooper [Fri, 9 Oct 2020 14:25:34 +0000 (15:25 +0100)]
x86/vmx: Revert "x86/VMX: sanitize rIP before re-entering guest"
At the time of XSA-170, the x86 instruction emulator was genuinely broken. It
would load arbitrary values into %rip and putting a check here probably was
the best stopgap security fix. It should have been reverted following c/s 81d3a0b26c1 "x86emul: limit-check branch targets" which corrected the emulator
behaviour.
However, everyone involved in XSA-170, myself included, failed to read the SDM
correctly. On the subject of %rip consistency checks, the SDM stated:
If the processor supports N < 64 linear-address bits, bits 63:N must be
identical
A non-canonical %rip (and SSP more recently) is an explicitly legal state in
x86, and the VMEntry consistency checks are intentionally off-by-one from a
regular canonical check.
The consequence of this bug is that Xen will currently take a legal x86 state
which would successfully VMEnter, and corrupt it into having non-architectural
behaviour.
Furthermore, in the time this bugfix has been pending in public, I
successfully persuaded Intel to clarify the SDM, adding the following
clarification:
The guest RIP value is not required to be canonical; the value of bit N-1
may differ from that of bit N.
Fixes: ffbbfda377 ("x86/VMX: sanitize rIP before re-entering guest") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
The full structures cannot match in layout, as soon as a 32-bit tool
stack build comes into play. But it also doesn't need to; the part of
the layouts that needs to match is merely the union that we memcpy()
from the sysctl structure to the xc one. Keep (in adjusted form) only
the relevant ones.
Since the whole block needs touching anyway, move it closer to the
respective memcpy() and use a wrapper macro to limit verbosity. Also
tidy the full-size-check there.
Fixes: 2381dfab083f ("xen/sysctl: Nest cpufreq scaling options") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:18 +0000 (14:51 -0400)]
xenpm: Add set-cpufreq-cppc subcommand
set-cpufreq-cppc allows setting the Hardware P-State (HWP) parameters.
It can be run on all or just a single cpu. There are presets of
balance, powersave & performance. Those can be further tweaked by
param:val arguments as explained in the usage description.
Parameter names are just checked to the first 3 characters to shorten
typing.
Some options are hardware dependent, and ranges can be found in
get-cpufreq-para.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:16 +0000 (14:51 -0400)]
xen: Add SET_CPUFREQ_HWP xen_sysctl_pm_op
Add SET_CPUFREQ_HWP xen_sysctl_pm_op to set HWP parameters. The sysctl
supports setting multiple values simultaneously as indicated by the
set_params bits. This allows atomically applying new HWP configuration
via a single wrmsr.
XEN_SYSCTL_HWP_SET_PRESET_BALANCE/PERFORMANCE/POWERSAVE provide three
common presets. Setting them depends on hardware limits which the
hypervisor is already caching. So using them allows skipping a
hypercall to query the limits (lowest/highest) to then set those same
values. The code is organized to allow a preset to be refined with
additional parameters if desired.
"most_efficient" and "guaranteed" could be additional presets in the
future, but the are not added now. Those levels can change at runtime,
but we don't have code in place to monitor and update for those events.
Since activity window may not be supported by all hardware, omit writing
it when not supported, and return that fact to userspace by updating
set_params.
CPPC parameter checking disallows setting reserved bytes and ensure
values are only non-zero when the corresponding set_params bit is set.
There is no range checking (0-255 is allowed) since hardware is
documented to clip internally.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:13 +0000 (14:51 -0400)]
cpufreq: Export HWP parameters to userspace as CPPC
Extend xen_get_cpufreq_para to return hwp parameters. HWP is an
implementation of ACPI CPPC (Collaborative Processor Performance
Control). Use the CPPC name since that might be useful in the future
for AMD P-state.
We need the features bitmask to indicate fields supported by the actual
hardware - this only applies to activity window for the time being.
The HWP most_efficient is mapped to CPPC lowest_nonlinear, and guaranteed is
mapped to nominal. CPPC has a guaranteed that is optional while nominal
is required. ACPI spec says "If this register is not implemented, OSPM
assumes guaranteed performance is always equal to nominal performance."
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:12 +0000 (14:51 -0400)]
xenpm: Change get-cpufreq-para output for hwp
When using HWP, some of the returned data is not applicable. In that
case, we should just omit it to avoid confusing the user. So switch to
printing the base and max frequencies since those are relevant to HWP.
Similarly, stop printing the CPU frequencies since those do not apply.
The scaling fields are also no longer printed.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:11 +0000 (14:51 -0400)]
xen/x86: Tweak PDC bits when using HWP
Qubes testing of HWP support had a report of a laptop, Thinkpad X1
Carbon Gen 4 with a Skylake processor, locking up during boot when HWP
is enabled. A user found a kernel bug that seems to be the same issue:
https://bugzilla.kernel.org/show_bug.cgi?id=110941.
That bug was fixed by Linux commit a21211672c9a ("ACPI / processor:
Request native thermal interrupt handling via _OSC"). The tl;dr is SMM
crashes when it receives thermal interrupts, so Linux calls the ACPI
_OSC method to take over interrupt handling.
The Linux fix looks at the CPU features to decide whether or not to call
_OSC with bit 12 set to take over native interrupt handling. Xen needs
some way to communicate HWP to Dom0 for making an equivalent call.
Xen exposes modified PDC bits via the platform_op set_pminfo hypercall.
Expand that to set bit 12 when HWP is present and in use.
Any generated interrupt would be handled by Xen's thermal drive, which
clears the status.
Bit 12 isn't named in the linux header and is open coded in Linux's
usage. Name it ACPI_PDC_CPPC_NATIVE_INTR.
This will need a corresponding linux patch to pick up and apply the PDC
bits.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:10 +0000 (14:51 -0400)]
cpufreq: Add Hardware P-State (HWP) driver
From the Intel SDM: "Hardware-Controlled Performance States (HWP), which
autonomously selects performance states while utilizing OS supplied
performance guidance hints."
Enable HWP to run in autonomous mode by poking the correct MSRs. HWP is
disabled by default, and cpufreq=hwp enables it.
cpufreq= parsing is expanded to allow cpufreq=hwp;xen. This allows
trying HWP and falling back to xen if not available. Only hwp and xen
are supported for this fallback feature. hdc is a sub-option under hwp
(i.e. cpufreq=hwp,hdc=0) as is verbose.
There is no interface to configure - xen_sysctl_pm_op/xenpm will
be extended to configure in subsequent patches. It will run with the
default values, which should be the default 0x80 (out of 0x0-0xff)
energy/performance preference.
Unscientific powertop measurement of an mostly idle, customized OpenXT
install:
A 10th gen 6-core laptop showed battery discharge drop from ~9.x to
~7.x watts.
A 8th gen 4-core laptop dropped from ~10 to ~9
Power usage depends on many factors, especially display brightness, but
this does show a power saving in balanced mode when CPU utilization is
low.
HWP isn't compatible with an external governor - it doesn't take
explicit frequency requests. Therefore a minimal internal governor,
hwp, is also added as a placeholder.
While adding to the xen-command-line.pandoc entry, un-nest verbose from
minfreq. They are independent.
With cpufreq=hwp,verbose, HWP prints processor capabilities that are not
used by the code, like HW_FEEDBACK. This is done because otherwise
there isn't a convenient way to query the information.
Xen doesn't use the HWP interrupt, so it is disabled like in the Linux
pstate driver.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:09 +0000 (14:51 -0400)]
pmstat&xenpm: Re-arrage for cpufreq union
Rearrange code now that xen_sysctl_pm_op's get_para fields has the
nested union and struct. In particular, the scaling governor
information like scaling_available_governors is inside the union, so it
is not always available. Move those fields (op->u.get_para.u.s.u.*)
together as well as the common fields (ones outside the union like
op->u.get_para.turbo_enabled).
With that, gov_num may be 0, so bounce buffer handling needs
to be modified.
scaling_governor and other fields inside op->u.get_para.u.s.u.* won't be
used for hwp, so this will simplify the change when hwp support is
introduced and re-indents these lines all together.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:08 +0000 (14:51 -0400)]
xen/sysctl: Nest cpufreq scaling options
Add a union and struct so that most of the scaling variables of struct
xen_get_cpufreq_para are within in a binary-compatible layout. This
allows cppc_para to live in the larger union and use uint32_ts - struct
xen_cppc_para will be 10 uint32_t's.
The new scaling struct is 3 * uint32_t + 16 bytes CPUFREQ_NAME_LEN + 4 *
uint32_t for xen_ondemand = 11 uint32_t. That means the old size is
retained, int32_t turbo_enabled doesn't move and it's binary compatible.
The out-of-context memcpy() in xc_get_cpufreq_para() now handles the
copying of the fields removed there.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:06 +0000 (14:51 -0400)]
cpufreq: Add perf_freq to cpuinfo
acpi-cpufreq scales the aperf/mperf measurements by max_freq, but HWP
needs to scale by base frequency. Settings max_freq to base_freq
"works" but the code is not obvious, and returning values to userspace
is tricky. Add an additonal perf_freq member which is used for scaling
aperf/mperf measurements.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Mon, 7 Aug 2023 18:51:05 +0000 (14:51 -0400)]
cpufreq: Allow restricting to internal governors only
For hwp, the standard governors are not usable, and only the internal
one is applicable. Add the cpufreq_governor_internal boolean to
indicate when an internal governor, like hwp, will be used. This is set
during presmp_initcall, and governor registration can be skipped when
called during initcall.
This way unusable governors are not registered, and only compatible
governors are advertised to userspace.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/hypercalls: address violations of MISRA C:2012 Rule 8.3
Make function declarations and definitions consistent to address
violations of MISRA C:2012 Rule 8.3 ("All declarations of an object or
function shall use the same names and type qualifiers").
No functional change.
Signed-off-by: Federico Serafini <federico.serafini@bugseng.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Shawn Anastasio [Wed, 23 Aug 2023 07:28:02 +0000 (09:28 +0200)]
xen/ppc: Relocate kernel to physical address 0 on boot
Introduce a small assembly loop in `start` to copy the kernel to
physical address 0 before continuing. This ensures that the physical
address lines up with XEN_VIRT_START (0xc000000000000000) and allows us
to identity map the kernel when the MMU is set up in the next patch.
We are also able to start execution at XEN_VIRT_START after the copy
since hardware will ignore the top 4 address bits when operating in Real
Mode (MMU off).
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Shawn Anastasio [Wed, 23 Aug 2023 07:27:29 +0000 (09:27 +0200)]
xen/ppc: Bump minimum target ISA to 3.0 (POWER9)
In preparation for implementing ISA3+ Radix MMU support, drop ISA 2.07B
from the supported ISA list to avoid having a non-working
configuration in tree. It can be re-added at a later point when Hash
MMU support is added.
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 23 Aug 2023 07:26:36 +0000 (09:26 +0200)]
x86/AMD: extend Zenbleed check to models "good" ucode isn't known for
Reportedly the AMD Custom APU 0405 found on SteamDeck, models 0x90 and
0x91, (quoting the respective Linux commit) is similarly affected. Put
another instance of our Zen1 vs Zen2 distinction checks in
amd_check_zenbleed(), forcing use of the chickenbit irrespective of
ucode version (building upon real hardware never surfacing a version of
0xffffffff).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 23 Aug 2023 07:25:52 +0000 (09:25 +0200)]
build: make cc-option properly deal with unrecognized sub-options
In options like -march=, it may be only the sub-option which is
unrecognized by the compiler. In such an event the error message often
splits option and argument, typically saying something like "bad value
'<argument>' for '<option>'. Instead of extend the grep invocation, stop
parsing compiler output altogether. Instead substitute -Wno-* options by
their -W* counterparts for probing (obviously assuming that such a
counterpart always exists).
Suggested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Nicola Vetrini [Tue, 22 Aug 2023 06:53:24 +0000 (08:53 +0200)]
vm_event: rework inclusions to use arch-indipendent header
The arch-specific header <asm/vm_event.h> should be included by the
common header <xen/vm_event.h>, so that the latter can be included
in the source files.
This also resolves violations of MISRA C:2012 Rule 8.4 that were
caused by declarations for
'vm_event_{fill_regs,set_registers,monitor_next_interrupt}'
in <asm/vm_event.h> not being visible when
defining functions in 'xen/arch/x86/vm_event.c'
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Tue, 22 Aug 2023 06:52:49 +0000 (08:52 +0200)]
mem-sharing: move (x86) / drop (Arm) arch_dump_shared_mem_info()
When !MEM_SHARING no useful output is produced. Move the function into
mm/mem_sharing.c while conditionalizing the call to it, thus allowing to
drop it altogether from Arm (and eliminating the need to introduce stubs
on PPC and RISC-V).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> #arm Acked-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Simon Gaiser [Tue, 22 Aug 2023 06:51:38 +0000 (08:51 +0200)]
x86/hpet: Disable legacy replacement mode after IRQ test
As far as I understand the HPET legacy mode is not required after the
timer IRQ test. For previous discussion see [1] and [2]. Keeping it
enabled prevents reaching deeper C-states on some systems and thereby
also S0ix residency. So disable it after the timer IRQ test worked. Note
that this code path is only reached when opt_hpet_legacy_replacement < 0,
so explicit user choice is still honored.
Wei Chen [Mon, 14 Aug 2023 04:25:26 +0000 (12:25 +0800)]
xen/arm64: prepare for moving MMU related code from head.S
We want to reuse head.S for MPU systems, but there are some
code are implemented for MMU systems only. We will move such
code to another MMU specific file. But before that we will
do some indentations fix in this patch to make them be easier
for reviewing:
1. Fix the indentations and incorrect style of code comments.
2. Fix the indentations for .text.header section.
3. Rename puts() to asm_puts() for global export
Julien Grall [Mon, 21 Aug 2023 17:02:05 +0000 (18:02 +0100)]
xen/public: arch-arm: All PSR_* defines should be unsigned
The defines PSR_* are field in registers and always unsigned. So
add 'U' to clarify.
This should help with MISRA Rule 7.2.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Tested-by: Henry Wang <Henry.Wang@arm.com>
Julien Grall [Mon, 21 Aug 2023 17:01:09 +0000 (18:01 +0100)]
xen/arm: vgic: Use 'unsigned int' rather than 'int' whenever it is possible
Switch to unsigned int for the return/parameters of the following
functions:
* REG_RANK_NR(): 'b' (number of bits) and the return is always positive.
'n' doesn't need to be size specific.
* vgic_rank_offset(): 'b' (number of bits), 'n' (register index),
's' (size of the access) are always positive.
* vgic_{enable, disable}_irqs(): 'n' (rank index) is always positive
* vgic_get_virq_type(): 'n' (rank index) and 'index' (register
index) are always positive.
* vgic_get_rank(): 'rank' is an index and therefore always positive.
Take the opportunity to propogate the unsignedness to the local
variable used for the arguments.
This will remove some of the warning reported by GCC 12.2.1 when
passing the flags -Wsign-conversion/-Wconversion.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Tested-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Julien Grall [Mon, 21 Aug 2023 16:57:07 +0000 (17:57 +0100)]
xen/arm: vmmio: The number of entries cannot be negative
The number of virtual MMIO regions cannot be negative. So switch
the field 'num_entries' and 'max_num_entries' to 'unsigned int'.
The new type is then propagated to the caller and the vGIC
code.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Tested-by: Henry Wang <Henry.Wang@arm.com>
Mark more files as "adopted" and configure Rule 8.3 in order to:
- exclude violations involving the type ret_t;
- exclude violations involving both an internal and an external file,
thus avoiding touching adopted code.
Andrew Cooper [Fri, 18 Aug 2023 09:47:46 +0000 (10:47 +0100)]
rombios: Avoid using K&R function syntax
Clang-15 complains:
tcgbios.c:598:25: error: a function declaration without a prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
void tcpa_calling_int19h()
^
void
C2x formally removes K&R syntax. The declarations for these functions in
32bitprotos.h are already ANSI compatible. Update the definitions to match.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 17 Aug 2023 20:32:53 +0000 (21:32 +0100)]
rombios: Work around GCC issue 99578
GCC 12 objects to pointers derived from a constant:
util.c: In function 'find_rsdp':
util.c:429:16: error: array subscript 0 is outside array bounds of 'uint16_t[0]' {aka 'short unsigned int[]'} [-Werror=array-bounds]
429 | ebda_seg = *(uint16_t *)ADDR_FROM_SEG_OFF(0x40, 0xe);
cc1: all warnings being treated as errors
This is a GCC bug, but work around it rather than turning array-bounds
checking off generally.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 18 Aug 2023 13:04:28 +0000 (15:04 +0200)]
x86emul: rework wrapping of libc functions in test and fuzzing harnesses
Our present approach is working fully behind the compiler's back. This
was found to not work with LTO. Employ ld's --wrap= option instead. Note
that while this makes the build work at least with new enough gcc (it
doesn't with gcc7, for example, due to tool chain side issues afaict),
according to my testing things still won't work when building the
fuzzing harness with afl-cc: While with the gcc7 tool chain I see afl-as
getting invoked, this does not happen with gcc13. Yet without using that
assembler wrapper the resulting binary will look uninstrumented to
afl-fuzz.
While checking the resulting binaries I noticed that we've gained uses
of snprintf() and strstr(), which only just so happen to not cause any
problems. Add a wrappers for them as well.
Since we don't have any actual uses of v{,sn}printf(), no definitions of
their wrappers appear (just yet). But I think we want
__wrap_{,sn}printf() to properly use __real_v{,sn}printf() right away,
which means we need delarations of the latter.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Mon, 14 Aug 2023 07:47:05 +0000 (09:47 +0200)]
tools/xenstore: introduce get_node_const()
Add a variant of get_node() returning a const struct node pointer.
Note that all callers of this new variant don't supply a pointer where
to store the canonical node name, while all callers needing a non-const
node do supply this pointer. This results in an asymmetric
simplification of the two variants.
Juergen Gross [Mon, 14 Aug 2023 07:47:03 +0000 (09:47 +0200)]
tools/xenstore: merge is_valid_nodename() into canonicalize()
Today is_valid_nodename() is always called directly after calling
canonicalize(), with the exception of do_unwatch(), where the call
is missing (which is not correct, but results just in a wrong error
reason being returned).
Juergen Gross [Mon, 14 Aug 2023 07:47:02 +0000 (09:47 +0200)]
tools/xenstore: merge get_spec_node() into get_node_canonicalized()
Add a "allow_special" parameter to get_node_canonicalized() allowing
to merge get_spec_node() into get_node_canonicalized().
Add the same parameter to is_valid_nodename(), as this will simplify
check_watch_path().
This is done in preparation to introducing a get_node() variant
returning a pointer to const struct node.
Note that this will change how special node names are going to be
validated, as now the normal restrictions for node names will be
applied:
- they can't end with "/"
- they can't contain "//"
- they can't contain characters other than the ones allowed for normal
nodes
- the length of the node name is restricted by the max path length
quota
For defined special node names this isn't any real restriction, though.
Juergen Gross [Mon, 14 Aug 2023 07:47:00 +0000 (09:47 +0200)]
tools/xenstore: alloc new memory in domain_adjust_node_perms()
In order to avoid modifying the node data in the data base in case a
domain is gone, let domain_adjust_node_perms() allocate new memory for
the permissions in case they need to be modified. As this should
happen only in very rare cases, it is fine to do this even when having
copied the node data already.
Juergen Gross [Mon, 14 Aug 2023 07:46:58 +0000 (09:46 +0200)]
tools/xenstore: don't use struct node_perms in struct node
Open code struct node_perms in struct node in order to prepare using
struct node_hdr in struct node.
Add two helpers to transfer permissions between struct node and struct
node_perms and a helper to directly get connection base permissions
from a node.
Let perms_to_strings() take a struct node as parameter and rename it
to node_perms_to_strings().
Juergen Gross [Mon, 14 Aug 2023 07:46:57 +0000 (09:46 +0200)]
tools/xenstore: rework struct xs_tdb_record_hdr
Struct xs_tdb_record_hdr is used for nodes stored in the data base.
When working on a node, struct node is being used, which is including
the same information as struct xs_tdb_record_hdr, but in a different
format. Rework struct xs_tdb_record_hdr in order to prepare including
it in struct node.
Do the following modifications:
- move its definition to xenstored_core.h, as the reason to put it into
utils.h are no longer existing
- rename it to struct node_hdr, as the "tdb" in its name has only
historical reasons
- replace the empty permission array at the end with a comment about
the layout of data in the data base (concatenation of header,
permissions, node contents, and children list)
- use narrower types for num_perms and datalen, as those are naturally
limited to XENSTORE_PAYLOAD_MAX (childlen is different here, as it is
in theory basically unlimited)
Juergen Gross [Mon, 14 Aug 2023 07:46:56 +0000 (09:46 +0200)]
tools/xenstore: move copying of node data out of db_fetch()
Today the node data is copied in db_fetch() on each data base read in
order to avoid accidental data base modifications when working on a
node.
read_node() is the only caller of db_fetch() which isn't freeing the
returned data area immediately after using it. The other callers don't
modify the returned data, so they don't need the data to be copied.
Move copying of the data into read_node(), resulting in a speedup of
the other callers due to no memory allocation and no copying being
needed anymore.
This allows to let db_fetch() return a pointer to const data.
As db_fetch() can't return any error other than ENOENT now, error
handling for the callers can be simplified.
Juergen Gross [Mon, 14 Aug 2023 07:46:52 +0000 (09:46 +0200)]
tools/xenstore: drop use of tdb
Today all Xenstore nodes are stored in a TDB data base. This data base
has several disadvantages:
- It is using a fixed sized hash table, resulting in high memory
overhead for small installations with only very few VMs, and a rather
large performance hit for systems with lots of VMs due to many
collisions.
The hash table size today is 7919 entries. This means that e.g. in
case of a simple desktop use case with 2 or 3 VMs probably far less
than 10% of the entries will be used (assuming roughly 100 nodes per
VM). OTOH a setup on a large server with 500 VMs would result in
heavy conflicts in the hash list with 5-10 nodes per hash table entry.
- TDB is using a single large memory area for storing the nodes. It
only ever increases this area and will never shrink it afterwards.
This will result in more memory usage than necessary after a peak of
Xenstore usage.
- Xenstore is only single-threaded, while TDB is designed to be fit
for multi-threaded use cases, resulting in much higher code
complexity than needed.
- Special use cases of Xenstore are not possible to implement with TDB
in an effective way, while an implementation of a data base tailored
for Xenstore could simplify some handling (e.g. transactions) a lot.
So drop using TDB and store the nodes directly in memory making them
easily accessible. Use a hash-based lookup mechanism for fast lookup
of nodes by their full path.
For now only replace TDB keeping the current access functions.
The single lock in struct ffa_ctx is complemented with rx_lock and tx_lock.
The old lock is used for small critical sections, like increasing
shm_count or adding another shm to shm_list.
rx_lock and tx_lock are only acquired using spin_trylock() which for
well-behaving guests should always succeed. Guests using the RX and TX
buffers are expected to serialize accesses before doing the FF-A
request.
xen/arm: ffa: add support to reclaim shared memory
Adds support to reclaim memory previously shared with FFA_MEM_SHARE.
A memory region that doesn't need to be shared any longer can be
reclaimed with FFA_MEM_RECLAIM once the SP doesn't use it any longer.
This is checked by the SPMC and not in control of the mediator.
Adds a check that the SP supports the needed FF-A feature
FFA_MEM_RECLAIM.
Adds support for a guest to share memory with an SP using FFA_MEM_SHARE.
Only memory regions small enough to be shared with a single call to
FFA_MEM_SHARE are supported.
With this commit we have a FF-A version 1.1 [1] mediator able to
communicate with a Secure Partition in secure world using shared memory.
The secure world must use FF-A version 1.1, but the guest is free to use
version 1.0 or version 1.1.
Adds a check that the SP supports the needed FF-A features
FFA_MEM_SHARE_64 or FFA_MEM_SHARE_32.
According to DEN0077A version 1.1 REL0
- Section 10.9.2 Memory region handle, page 167
- Table 10.18 at page 175
- Table 10.15 at page 168
- Section 10.11.4 Flags usage, page 184-187
add defines needed for sharing using the function FFA_MEM_SHARE and
friends.
Also add limits for how many shared buffers that a guest can have at
once and how large a shared buffer can be at most.
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
xen/arm: move regpair_to_uint64() and uint64_to_regpair() to regs.h
Moves the two helper functions regpair_to_uint64() and
uint64_to_regpair() from xen/arch/arm/tee/optee.c to the common arm
specific regs.h. This enables reuse of these functions in the FF-A
mediator in a subsequent patch.
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com>
xen/arm: ffa: support guest FFA_PARTITION_INFO_GET
Adds support in the mediator to handle FFA_PARTITION_INFO_GET requests
from a guest. The requests are forwarded to the SPMC and the response is
translated according to the FF-A version in use by the guest.
Using FFA_PARTITION_INFO_GET changes the owner of the RX buffer to the
caller (the guest in this case), so once it is done with the buffer it
must be released using FFA_RX_RELEASE before another call can be made.
Adds support in the mediator to map and unmap the RX and TX buffers
provided by the guest using the two FF-A functions FFA_RXTX_MAP and
FFA_RXTX_UNMAP.
These buffer are later used to transmit data that cannot be passed in
registers only.
xen/arm: ffa: send guest events to Secure Partitions
The FF-A specification defines framework messages sent as direct
requests when certain events occurs. For instance when a VM (guest) is
created or destroyed. Only SPs which have subscribed to these events
will receive them. An SP can subscribe to these messages in its
partition properties.
Adds a check that the SP supports the needed FF-A features
FFA_PARTITION_INFO_GET and FFA_RX_RELEASE.
The partition properties of each SP is retrieved with
FFA_PARTITION_INFO_GET which returns the information in our RX buffer.
Using FFA_PARTITION_INFO_GET changes the owner of the RX buffer to the
caller (us), so once we're done with the buffer it must be released
using FFA_RX_RELEASE before another call can be made.
Adds support for sending a FF-A direct request. Checks that the SP also
supports handling a 32-bit direct request. 64-bit direct requests are
not used by the mediator itself so there is not need to check for that.
Signed-off-by: Jens Wiklander <jens.wiklander@linaro.org> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Thu, 17 Aug 2023 14:25:51 +0000 (16:25 +0200)]
IOMMU/x86: fix build with old gcc after IO-APIC RTE changes
Old gcc (up to at least 4.3.4) won't cope with initializers involving
unnamed struct/union fields.
Fixes: 3e033172b025 ("x86/iommu: pass full IO-APIC RTE for remapping table update") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jason Andryuk [Thu, 17 Aug 2023 14:24:49 +0000 (16:24 +0200)]
tboot: Disable CET at shutdown
tboot_shutdown() calls into tboot to perform the actual system shutdown.
tboot isn't built with endbr annotations, and Xen has CET-IBT enabled on
newer hardware. shutdown_entry isn't annotated with endbr and Xen
faults:
Panic on CPU 0:
CONTROL-FLOW PROTECTION FAULT: #CP[0003] endbranch
And Xen hangs at this point.
Disabling CET-IBT let Xen and tboot power off, but reboot was
perfoming a poweroff instead of a warm reboot. Disabling all of CET,
i.e. shadow stacks as well, lets tboot reboot properly.
Fixes: cdbe2b0a1aec ("x86: Enable CET Indirect Branch Tracking") Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Jan Beulich [Thu, 17 Aug 2023 14:24:17 +0000 (16:24 +0200)]
libxl: slightly correct JSON generation of CPU policy
The "cpuid_empty" label is also (in principle; maybe only for rubbish
input) reachable in the "cpuid_only" case. Hence the label needs to live
ahead of the check of the variable.
Fixes: 5b80cecb747b ("libxl: introduce MSR data in libxl_cpuid_policy") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Nicola Vetrini [Mon, 14 Aug 2023 09:05:30 +0000 (11:05 +0200)]
x86: address MISRA C:2012 Rule 5.3
Address some occurrences of shadowing between the global
variable 'e820' in 'xen/arch/x86/e820.c' and the function
parameter name of 'e820_add_range'.
Since the function is only ever called with the global variable
as the actual parameter, so there is no need to have it as a parameter
because both are defined in the same file (mentioned above).
This in turn causes several other functions to lose their parameter
'e820' because they are involved in the call chain that leads to
'e820_add_range'.
Similarly, 'kexec_reserve_area' is only ever called with the static
variable 'boot_e820' as a parameter, which is defined in the same file
as the function, thus it does not need that parameter, which is a cause
of shadowing, as explained above.
Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Nicola Vetrini [Mon, 14 Aug 2023 08:53:06 +0000 (10:53 +0200)]
xen/arm: traps: remove unused function 'dump_guest_s1_walk'
The function has no uses in the codebase, and can be removed.
This also avoids the violation of MISRA C:2012 Rule 8.4 and Rule 2.1
because it has no declaration and the function's code is unreachable.