]> xenbits.xensource.com Git - people/andrewcoop/xen.git/log
people/andrewcoop/xen.git
3 months agodbg xen-fred
Andrew Cooper [Fri, 27 Dec 2024 22:21:46 +0000 (22:21 +0000)]
dbg

3 months agoentry fred
Andrew Cooper [Tue, 24 Dec 2024 23:03:06 +0000 (23:03 +0000)]
entry fred

3 months agoopt-fred
Andrew Cooper [Sat, 28 Dec 2024 16:39:55 +0000 (16:39 +0000)]
opt-fred

3 months agoedata + stack layout
Andrew Cooper [Mon, 30 Dec 2024 16:42:57 +0000 (16:42 +0000)]
edata + stack layout

3 months agoload-sys
Andrew Cooper [Wed, 1 Jan 2025 15:16:33 +0000 (15:16 +0000)]
load-sys

3 months agox86/traps: Simplify early exception setup
Andrew Cooper [Mon, 6 Jan 2025 06:50:23 +0000 (06:50 +0000)]
x86/traps: Simplify early exception setup

Something which did not occur to me last time I

3 months agox86/traps: Fold init_idt_traps() and trap_init() into their single callers
Andrew Cooper [Wed, 1 Jan 2025 13:23:15 +0000 (13:23 +0000)]
x86/traps: Fold init_idt_traps() and trap_init() into their single callers

3 months agox86/traps: Introduce new init APIs
Andrew Cooper [Sat, 28 Dec 2024 14:56:40 +0000 (14:56 +0000)]
x86/traps: Introduce new init APIs

3 months agox86/traps: Move percpu_traps_init() into traps-init.c
Andrew Cooper [Tue, 31 Dec 2024 15:56:34 +0000 (15:56 +0000)]
x86/traps: Move percpu_traps_init() into traps-init.c

3 months agox86/traps: Move cpu_init() out of trap_init()
Andrew Cooper [Mon, 6 Jan 2025 06:36:34 +0000 (06:36 +0000)]
x86/traps: Move cpu_init() out of trap_init()

3 months agox86/traps: Convert pv_trap_init() to being an initcall
Andrew Cooper [Fri, 3 Jan 2025 17:17:38 +0000 (17:17 +0000)]
x86/traps: Convert pv_trap_init() to being an initcall

With most of pv_trap_init() being done at build time, opening of NMI_SOFTIRQ
can be a regular initcall, simplifying trap_init().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/idt: Don't rewrite bsp_idt[] at boot time
Andrew Cooper [Fri, 3 Jan 2025 15:16:45 +0000 (15:16 +0000)]
x86/idt: Don't rewrite bsp_idt[] at boot time

Now that bsp_idt[] is constructed at build time, we do not need to manually
initialise it in init_idt_traps() and trap_init().

The only edit needed to the bsp_idt[] is to switch from the early #PF handler
to the normal one, and this can be done using _update_gate_addr_lower() as we
do on the kexec path for NMI and #MC.

This in turn allows us to drop set_{intr,swint}_gate() and the underlying
infrastructure.  It also lets us drop autogen_entrypoints[] and that
underlying infrastructure.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/idt: Generate bsp_idt[] at build time
Andrew Cooper [Fri, 3 Jan 2025 14:44:19 +0000 (14:44 +0000)]
x86/idt: Generate bsp_idt[] at build time

... rather than dynamically at boot time.  Aside from less runtime overhead,
this approach is less fragile than the preexisting autogen stubs mechanism.

We can manage this with some linker calculations.  See patch comments for full
details.

For simplicity, we create a new set of entry stubs here, and clean up the old
ones in the subsequent patch.  bsp_idt[] needs to move from .bss to .data.

No functional change yet; the boot path still rewrites bsp_idt[] at this
juncture.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/idt: Make idt_tables[] be per_cpu(idt)
Andrew Cooper [Thu, 2 Jan 2025 17:47:24 +0000 (17:47 +0000)]
x86/idt: Make idt_tables[] be per_cpu(idt)

This can be a plain per_cpu() variable, and __read_mostly seeing as it's
allocated once and never touched again.

This removes a NR_CPU's sized structure, and improves NUMA locality of access
for both the the VT-x and SVM context switch paths.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/idt: Rename idt_table[] to bsp_idt[]
Andrew Cooper [Thu, 2 Jan 2025 17:17:30 +0000 (17:17 +0000)]
x86/idt: Rename idt_table[] to bsp_idt[]

Having variables named idt_table[] and idt_tables[] is less than clear.

Use X86_IDT_VECTORS and remove IDT_ENTRIES.  State the size of bsp_idt[] in
idt.h so that load_system_tables() and cpu_smpboot_alloc() can use sizeof()
rather than opencoding the calculation.

Move the variable into a new traps-init.c, to make a start at splitting
traps.c in half.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/idt: Rename X86_NR_VECTORS to X86_IDT_VECTORS
Andrew Cooper [Thu, 2 Jan 2025 16:56:59 +0000 (16:56 +0000)]
x86/idt: Rename X86_NR_VECTORS to X86_IDT_VECTORS

Observant readers may have noticed that the FRED spec has another 8 bits of
space reserved immediately following the vector field.

Make the existing constant more precise.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/idt: Move IDT related content into idt.h
Andrew Cooper [Wed, 1 Jan 2025 15:43:20 +0000 (15:43 +0000)]
x86/idt: Move IDT related content into idt.h

Logic concerning the IDT is somewhat different to the other system tables, and
in particular ought not to be in asm/processor.h.  Collect it together from
asm/processor.h and asm/desc.h into a new header.

Adjust set_ist() to use volatile rather than ACCESS_ONCE(), as
_write_gate_lower() already does, which avoids needing to include xen/lib.h.

Move the BUILD_BUG_ON() from subarch_percpu_traps_init() into mm.c's
build_assertions(), rather than including idt.h into x86_64/traps.c.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agohdr-srt
Andrew Cooper [Wed, 1 Jan 2025 15:51:57 +0000 (15:51 +0000)]
hdr-srt

domain.c: Remove duplicate asm/spec_ctrl.h include
power.c: double xen/sched.h
setup.c: double xen/serial.h

3 months agoFRED vmx enum
Andrew Cooper [Tue, 24 Dec 2024 22:47:47 +0000 (22:47 +0000)]
FRED vmx enum

3 months agox86: FRED enumerations
Andrew Cooper [Fri, 18 Sep 2020 15:50:15 +0000 (16:50 +0100)]
x86: FRED enumerations

3 months agodocs: FRED support in Xen
Andrew Cooper [Sat, 28 Dec 2024 16:45:14 +0000 (16:45 +0000)]
docs: FRED support in Xen

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
An initial RFC discussion and plan.  Open TODOs are at the end.

I've got an 8-patch series doing the cpu_user_regs disentangling vs the public
API.  That's in pretty good shape now.

FRED itself is orders and orders of magnitude more simple than IDT, both in
terms of setup and operation, but I'm in the middle of a very large
cleanup (35 patches and count) to setup.c and trap.c in order to make FRED
able to be cleanly integrated into Xen, and that's still before any of the GS
changes to keep PV guests functioning correctly.

3 months ago---abi---
Andrew Cooper [Mon, 30 Dec 2024 13:53:15 +0000 (13:53 +0000)]
---abi---

3 months agodrop-vm86
Andrew Cooper [Sun, 29 Dec 2024 14:46:34 +0000 (14:46 +0000)]
drop-vm86

3 months agofold
Andrew Cooper [Sun, 29 Dec 2024 14:30:17 +0000 (14:30 +0000)]
fold

3 months agoABI rename
Andrew Cooper [Sun, 29 Dec 2024 17:40:54 +0000 (17:40 +0000)]
ABI rename

3 months agox86/emul: Adjust put_fpu()
Andrew Cooper [Mon, 30 Dec 2024 16:31:46 +0000 (16:31 +0000)]
x86/emul: Adjust put_fpu()

The use of regs->?s here is buggy in almost all cases.  For HVM guests,
they're poison from hvm_sanitize_regs_fields(), and for PV guests the data
segment selectors are stale from the last context switch.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
I'm honestly not sure what this path is doing here.  The only user of FPU
emulation is HVM guests, which has working read_segment().

3 months agox86/pv: Store the data segment selectors outside of cpu_user_regs
Andrew Cooper [Mon, 30 Dec 2024 15:50:56 +0000 (15:50 +0000)]
x86/pv: Store the data segment selectors outside of cpu_user_regs

In order to support FRED, we're going to have to remove the {ds..gs} fields
from struct cpu_user_regs.  This will impact v->arch.user_regs.

These fields are unused for HVM guests, but for PV hold the selector values
when the vCPU is scheduled out.

Introduce new fields for the selectors in struct pv_vcpu, and update:

 * {save,load}_segments(), context switching
 * arch_{set,set}_info_guest(), hypercalls
 * vcpu_show_registers(), diagnostics
 * dom0_construct(), PV dom0

to use the new storage.  This removes the final user of read_sregs() so drop
it too.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/domctl: Stop using XLAT_cpu_user_regs()
Andrew Cooper [Mon, 30 Dec 2024 11:49:14 +0000 (11:49 +0000)]
x86/domctl: Stop using XLAT_cpu_user_regs()

In order to support FRED, we're going to have to remove the {ds..gs} fields
from struct cpu_user_regs, meaning that it is going to have to become a
different type to the structure embedded in vcpu_guest_context_u.

In both arch_{get,set}_info_guest(), expand the memcpy()/XLAT_cpu_user_regs()
to copy the fields individually.  This will allow us to eventually make them
different types.

No practical change.  The compat cases are identical, while the non-compat
cases no longer copy _pad fields.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
Should we really be copying error_code/entry_vector?  They're already listed
as explicitly private fields, and I don't think anything good can come of
providing/consuming them to/from the guest.

3 months agoRevert "x86/traps: 'Fix' safety of read_registers() in #DF path"
Andrew Cooper [Mon, 30 Dec 2024 14:07:18 +0000 (14:07 +0000)]
Revert "x86/traps: 'Fix' safety of read_registers() in #DF path"

This reverts commit 6065a05adf152a556fb9f11a5218c89e41b62893.

The discussed "proper fix" has now been implemented, and the #DF path no
longer writes out-of-bounds.  Restore the proper #DF IST pointer.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/traps: Avoid OoB accesses to print the data selectors
Andrew Cooper [Sun, 29 Dec 2024 19:06:10 +0000 (19:06 +0000)]
x86/traps: Avoid OoB accesses to print the data selectors

_show_registers() prints the data selectors from struct cpu_user_regs, but
these fields are sometimes out-of-bounds.  See commit 6065a05adf15
("x86/traps: 'Fix' safety of read_registers() in #DF path").

There are 3 callers of _show_registers():

 1. vcpu_show_registers(), which always operates on a scheduled-out vCPU,
    where v->arch.user_regs (or aux_regs on the stack) is always in-bounds.

 2. show_registers() where regs is always an on-stack frame.  regs is copied
    into a local variable first (which is an OoB read for constructs such as
    WARN()), before being modified (so no OoB write).

 3. do_double_fault(), where regs is adjacent to the stack guard page, and
    written into directly.  This is an out of bounds read and write, with a
    bodge to avoid the writes hitting the guard page.

Furthermore, these fields are a vestigial remenant of vm86 mode, and need to
be changed in order to support FRED.

Therefore, include the data segment selectors in struct extra_state, and use
those fields instead of the fields in regs.  This resolves the OoB write on
the #DF path.  The OoB read in show_registers() is resolved by doing a partial
memcpy() rather than full structure copy.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/traps: Rework register state printing to use an extra_state struct
Andrew Cooper [Sun, 29 Dec 2024 19:23:03 +0000 (19:23 +0000)]
x86/traps: Rework register state printing to use an extra_state struct

... in preference to the crs[8] array.  This avoids abusing crs[5..7] for the
fs/gs bases, giving them proper named fields instead, and avoids storage for
cr1 which doesn't exist.

In show_registers(), remove a redundant read_cr2().  read_registers() already
did the same, and it is only the PV path which needs to override with
arch_get_cr2().

In vcpu_show_registers(), express the gsb/gss decision using SWAP().  The
determinination is going to get even more complicated under FRED.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months ago---cleanup---
Andrew Cooper [Mon, 30 Dec 2024 13:53:09 +0000 (13:53 +0000)]
---cleanup---

3 months agox86/traps: Drop incorrect BUILD_BUG_ON() and comment in load_system_tables()
Andrew Cooper [Mon, 6 Jan 2025 10:59:19 +0000 (10:59 +0000)]
x86/traps: Drop incorrect BUILD_BUG_ON() and comment in load_system_tables()

It is only the hardware task switching mechanism which checks that a TSS is at
least 0x67 bytes long.  It is perfectly possible to load a shorter TSS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agodrop textsection
Andrew Cooper [Wed, 1 Jan 2025 12:44:03 +0000 (12:44 +0000)]
drop textsection

3 months agox86/boot: Fix zap_low_mappings() to map less of the trampoline
Andrew Cooper [Tue, 31 Dec 2024 16:52:39 +0000 (16:52 +0000)]
x86/boot: Fix zap_low_mappings() to map less of the trampoline

Regular data access into the trampoline is via the directmap.

As now discussed quite extensively in asm/trampoline.h, the trampoline is
arranged so that only the AP and S3 paths need an identity mapping, and that
they fit within a single page.

Right now, PFN_UP(trampoline_end - trampoline_start) is 2, causing more than
expected of the trampoline to be mapped.  Cut it down just the single page it
ought to be.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
There's not an obvious candidate for a Fixes tag.

3 months agomove activate_debugregs
Andrew Cooper [Fri, 3 Jan 2025 15:19:49 +0000 (15:19 +0000)]
move activate_debugregs

3 months agox86/traps: Move guest_{rd,wr}msr_xen() into msr.c
Andrew Cooper [Tue, 31 Dec 2024 11:02:49 +0000 (11:02 +0000)]
x86/traps: Move guest_{rd,wr}msr_xen() into msr.c

They are out of place in traps.c, and only have a single caller each.  Make
them static inside msr.c.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agox86/traps: Move cpuid_hypervisor_leaves() into cpuid.c
Andrew Cooper [Tue, 31 Dec 2024 10:56:00 +0000 (10:56 +0000)]
x86/traps: Move cpuid_hypervisor_leaves() into cpuid.c

It's out of place in traps.c, and only has a single caller.  Make it static
inside cpuid.c.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agorename MSR_INTERRUPT_SSP_TABLE -> MSR_ISST
Andrew Cooper [Tue, 24 Dec 2024 22:03:49 +0000 (22:03 +0000)]
rename MSR_INTERRUPT_SSP_TABLE -> MSR_ISST

3 months agox86/elf: Improve code generation in elf_core_save_regs()
Andrew Cooper [Sun, 29 Dec 2024 14:06:18 +0000 (14:06 +0000)]
x86/elf: Improve code generation in elf_core_save_regs()

A CALL with 0 displacement is handled specially, and is why this logic
functions even with CET Shadow Stacks active, but a rip-relative LEA is the
more normal way of doing this in 64bit code.

The retrieval of flags modifies the stack pointer so needs to state a
dependency on the stack pointer.  Despite it's name, ASM_CALL_CONSTRAINT is
the way to do this.

read_sreg() forces the answer through a register, causing code generation of
the form:

    mov    %gs, %eax
    mov    %eax, %eax
    mov    %rax, 0x140(%rsi)

Encode the reads directly with a memory operand.  This results in a 16bit
store instead of an 64bit store, but the backing memory is zeroed.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agoxen: Don't cast away const-ness in vcpu_show_registers()
Andrew Cooper [Mon, 30 Dec 2024 06:41:46 +0000 (06:41 +0000)]
xen: Don't cast away const-ness in vcpu_show_registers()

The final hunk is `(struct vcpu *)v` expressed using a runtime pointer chase
through memory and a technicality of the C type system.

For anyone interested, this is one reason why C cannot optimise any reads
across sequence points, even for a function purporting to take a const object.

Anyway, have the function correctly state that it needs a mutable vcpu.  All
callers have a mutable vCPU to hand, and it removes the runtime pointer chase
in x86.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agoNOPOST force perfc
Andrew Cooper [Sun, 29 Dec 2024 18:12:49 +0000 (18:12 +0000)]
NOPOST force perfc

3 months agoxen/perfc: Cleanup
Andrew Cooper [Sun, 29 Dec 2024 19:36:34 +0000 (19:36 +0000)]
xen/perfc: Cleanup

 * Strip trailing whitspace.
 * Remove PRIperfc.  It has never been used and doesn't make sense in context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agoxen/perfc: Trim includes
Andrew Cooper [Sun, 29 Dec 2024 18:01:34 +0000 (18:01 +0000)]
xen/perfc: Trim includes

This is mostly for the removal of xen/lib.h and xen/smp.h from perfc.h.  All
that is needed is xen/macros.h.

Trim and sort the includes for perfc.c too.  There's no need for smp.h,
keyhandler.h or mm.h, but cpumask.h is needed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agoxen/perfc: Add perfc_defn.h to asm-generic
Andrew Cooper [Sun, 29 Dec 2024 18:18:22 +0000 (18:18 +0000)]
xen/perfc: Add perfc_defn.h to asm-generic

... and hook it up for RISC-V and PPC.

On RISC-V at least, no combination of headers pulls in errno.h, so include it
explicitly.

Guard the hypercalls array declaration based on NR_hypercalls existing.  This
is sufficient to get PERF_COUNTERS fully working on RISC-V and PPC, so drop
the randconfig override.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months agoxen/perfc: Drop arch_perfc_{gather,reset}()
Andrew Cooper [Sun, 29 Dec 2024 18:31:32 +0000 (18:31 +0000)]
xen/perfc: Drop arch_perfc_{gather,reset}()

These were only ever used by the IA64 port, which was droped in commit
570c311ca2c7 ("remove ia64").

Remove them, and clean up the arm/x86 stub headers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 months ago---infra---
Andrew Cooper [Tue, 31 Dec 2024 15:17:59 +0000 (15:17 +0000)]
---infra---

3 months agomake hd.img
Andrew Cooper [Wed, 25 Dec 2024 01:49:48 +0000 (01:49 +0000)]
make hd.img

3 months agox86/amd: Misc setup for Fam1Ah processors
Andrew Cooper [Tue, 31 Dec 2024 14:15:22 +0000 (14:15 +0000)]
x86/amd: Misc setup for Fam1Ah processors

Fam1Ah is similar to Fam19h in these regards.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
With this patch, I think we're in an ok position to declare support on Zen5
CPUs.  I'm very disappointed that AMD don't have any documetation about ERAPS,
but to the best of my (backchannel) knowledge, Xen should behave safely.

3 months agox86/pv: Fix build with Clang and CONFIG_PERF_COUNTERS
Andrew Cooper [Thu, 2 Jan 2025 19:46:19 +0000 (19:46 +0000)]
x86/pv: Fix build with Clang and CONFIG_PERF_COUNTERS

Clang, of at least verion 17 complains:

  arch/x86/pv/hypercall.c:30:10: error: variable 'eax' is used uninitialized
  whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
     30 |     if ( !compat )
        |          ^~~~~~~
  arch/x86/pv/hypercall.c:87:29: note: uninitialized use occurs here
     87 |     perfc_incra(hypercalls, eax);
        |                             ^~~

This function is forced always_inline to cause compat to be
constant-propagated through, but that is only a heuristic to try and get the
compiler to do what we want, not a gurantee that it does.

Clang doesn't appear to be able to see that the only case where compat is
true (and therefore the if() is false) is when there's an else clause on the
end which sets eax too.

Initialise eax to -1, which ought to be optimised out, but if for whatever
reason it happens not to be, then perfc_incra() will fail it's bounds check
and do nothing.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 months agox86/traps: Rework LER initialisation and support Zen5/Diamond Rapids
Andrew Cooper [Tue, 31 Dec 2024 14:06:19 +0000 (14:06 +0000)]
x86/traps: Rework LER initialisation and support Zen5/Diamond Rapids

AMD have always used the architectural MSRs for LER.  As the first processor
to support LER was the K7 (which was 32bit), we can assume it's presence
unconditionally in 64bit mode.

Intel are about to run out of space in Family 6 and start using 19.  It is
only the Pentium 4 which uses non-architectural LER MSRs.

percpu_traps_init(), which runs on every CPU, contains a lot of code which
should be init-only, and is the only reason why opt_ler can't be in initdata.

Write a brand new init_ler() which expects all future Intel and AMD CPUs to
continue using the architectural MSRs, and does all setup together.  Call it
from trap_init(), and remove the setup logic percpu_traps_init() except for
the single path configuring MSR_IA32_DEBUGCTLMSR.

Leave behind a warning if the user asked for LER and Xen couldn't enable it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 months agoeclair-analysis: tidy toolchain.ecl configuration and mark Rule 1.1 clean
Nicola Vetrini [Sun, 22 Dec 2024 14:04:08 +0000 (15:04 +0100)]
eclair-analysis: tidy toolchain.ecl configuration and mark Rule 1.1 clean

Reformat the list of GNU extensions and non-standard tokens used by Xen
in the ECLAIR configuration to make it easier to review any changes to it.

The extension "ext_missing_varargs_arg", which captures the GNU extension that
allows variadic functions and macros not to require at least one named parameter
before C23 has been renamed to "ext_c_missing_varargs_arg" in the current version
of ECLAIR used in CI, therefore this resolves regressions on MISRA C Rule 1.1:

"The program shall contain no violations of the standard C syntax and constraints,
and shall not exceed the implementation's translation limits."

As a result, Rule 1.1 now has no violations and is tagged as such.

Remove two unused configurations, that were already commented out.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Fixes: 631f535a3d4f ("xen: update ECLAIR service identifiers from MC3R1 to MC3A2.")
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 months agoxen/scripts: Fix regex syntax warnings with Python 3.12
Ariel Otilibili [Thu, 19 Dec 2024 18:10:43 +0000 (19:10 +0100)]
xen/scripts: Fix regex syntax warnings with Python 3.12

Same fix than commit 826a9eb072 (tools: Fix regex syntax warnings with Python 3.12).

It clears out the warning:

```
$ xen/scripts/xen-analysis.py
xen/scripts/xen_analysis/cppcheck_analysis.py:94: SyntaxWarning: invalid escape sequence '\*'
  comment_line_starts = re.match('^[ \t]*/\*.*$', line)
```

The  warning appears only the first time the command is run, then it disappears.

Fixes: 02b26c02c7 (xen/scripts: add cppcheck tool to the xen-analysis.py script)
Signed-off-by: Ariel Otilibili <Ariel.Otilibili-Anieli@eurecom.fr>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
--
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Anthony PERARD <anthony.perard@vates.tech>
Cc: Michal Orzel <michal.orzel@amd.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
4 months agox86/spec-ctrl: Support for SRSO_U/S_NO and SRSO_MSR_FIX
Andrew Cooper [Mon, 25 Mar 2024 15:14:46 +0000 (15:14 +0000)]
x86/spec-ctrl: Support for SRSO_U/S_NO and SRSO_MSR_FIX

AMD have updated the SRSO whitepaper[1] with further information.  These
features exist on AMD Zen5 CPUs and are necessary for Xen to use.

The two features are in principle unrelated:

 * SRSO_U/S_NO is an enumeration saying that SRSO attacks can't cross the
   User(CPL3) / Supervisor(CPL<3) boundary.  i.e. Xen don't need to use
   IBPB-on-entry for PV64.  PV32 guests are explicitly unsupported for
   speculative issues, and excluded from consideration for simplicity.

 * SRSO_MSR_FIX is an enumeration identifying that the BP_SPEC_REDUCE bit is
   available in MSR_BP_CFG.  When set, SRSO attacks can't cross the host/guest
   boundary.  i.e. Xen don't need to use IBPB-on-entry for HVM.

Extend ibpb_calculations() to account for these when calculating
opt_ibpb_entry_{pv,hvm} defaults.  Add a `bp-spec-reduce=<bool>` option to
control the use of BP_SPEC_REDUCE, with it active by default.

Because MSR_BP_CFG is core-scoped with a race condition updating it, repurpose
amd_check_erratum_1485() into amd_check_bp_cfg() and calculate all updates at
once.

Xen also needs to to advertise SRSO_U/S_NO to guests to allow the guest kernel
to skip SRSO mitigations too:

 * This is trivial for HVM guests.  It is also is accurate for PV32 guests
   too, but we have already excluded them from consideration, and do so again
   here to simplify the policy logic.

 * As written, SRSO_U/S_NO does not help for the PV64 user->kernel boundary.
   However, after discussing with AMD, an implementation detail of having
   BP_SPEC_REDUCE active causes the PV64 user->kernel boundary to have the
   property described by SRSO_U/S_NO, so we can advertise SRSO_U/S_NO to
   guests when the BP_SPEC_REDUCE precondition is met.

Finally, fix a typo in the SRSO_NO's comment.

[1] https://www.amd.com/content/dam/amd/en/documents/corporate/cr/speculative-return-stack-overflow-whitepaper.pdf
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 months agoxen/arch/x86: make objdump output user locale agnostic
Maximilian Engelhardt [Mon, 30 Dec 2024 21:00:31 +0000 (22:00 +0100)]
xen/arch/x86: make objdump output user locale agnostic

The objdump output is fed to grep, so make sure it doesn't change with
different user locales and break the grep parsing.
This problem was identified while updating xen in Debian and the fix is
needed for generating reproducible builds in varying environments.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 months agotools: fix typo: subsytem -> subsystem
Maximilian Engelhardt [Mon, 30 Dec 2024 21:00:33 +0000 (22:00 +0100)]
tools: fix typo: subsytem -> subsystem

This was found by the lintian tool (Debian package checker) during
packaging xen for Debian.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 months agodocs/man: fix typo: hexidecimal -> hexadecimal
Maximilian Engelhardt [Mon, 30 Dec 2024 21:00:32 +0000 (22:00 +0100)]
docs/man: fix typo: hexidecimal -> hexadecimal

This was found by the lintian tool (Debian package checker) during
packaging xen for Debian.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 months agodocs/man/xen-vbd-interface.7: Provide properly-formatted NAME section
Ian Jackson [Mon, 30 Dec 2024 21:00:29 +0000 (22:00 +0100)]
docs/man/xen-vbd-interface.7: Provide properly-formatted NAME section

This manpage was omitted from
   docs/man: Provide properly-formatted NAME sections
   (423c4def1f7a01eeff56fa70564180640ef3af43)
because I was previously building with markdown not installed.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
Tested-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 months agoCHANGELOG: Mention LLC coloring feature on Arm
Michal Orzel [Fri, 20 Dec 2024 08:19:40 +0000 (09:19 +0100)]
CHANGELOG: Mention LLC coloring feature on Arm

It's definitely worth mentioning as one of the most notable feature on
Arm this release.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
4 months agotools: Introduce a xc_xenver_buildid() wrapper
Andrew Cooper [Tue, 17 Jan 2023 12:52:01 +0000 (12:52 +0000)]
tools: Introduce a xc_xenver_buildid() wrapper

... which converts binary content to hex automatically.

Update libxl to match.  No API/ABI change.

This removes a latent libxl bug for cases when the buildid is longer than 4092
bytes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
4 months agotools: Introduce a non-truncating xc_xenver_cmdline()
Andrew Cooper [Tue, 17 Jan 2023 12:47:44 +0000 (12:47 +0000)]
tools: Introduce a non-truncating xc_xenver_cmdline()

Update libxl to match.  No API/ABI change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
4 months agotools: Introduce a non-truncating xc_xenver_changeset()
Andrew Cooper [Tue, 17 Jan 2023 12:45:37 +0000 (12:45 +0000)]
tools: Introduce a non-truncating xc_xenver_changeset()

Update libxl and the ocaml stubs to match.  No API/ABI change in either.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
4 months agotools: Introduce a non-truncating xc_xenver_capabilities()
Andrew Cooper [Tue, 17 Jan 2023 12:39:48 +0000 (12:39 +0000)]
tools: Introduce a non-truncating xc_xenver_capabilities()

Update libxl and the ocaml stubs to match.  No API/ABI change in either.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
4 months agotools: Introduce a non-truncating xc_xenver_extraversion()
Andrew Cooper [Mon, 16 Jan 2023 16:56:17 +0000 (16:56 +0000)]
tools: Introduce a non-truncating xc_xenver_extraversion()

... which uses XENVER_extraversion2.

In order to do this sensibly, use manual hypercall buffer handling.  Not only
does this avoid an extra bounce buffer (we need to strip the xen_varbuf_t
header anyway), it's also shorter and easlier to follow.

Update libxl and the ocaml stubs to match.  No API/ABI change in either.

With this change made, `xl info` can now correctly access a >15 char
extraversion:

  # xl info xen_version
  4.18-unstable+REALLY LONG EXTRAVERSION

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
4 months agotools/libxc: Move xc_version() out of xc_private.c into its own file
Andrew Cooper [Mon, 16 Jan 2023 14:40:07 +0000 (14:40 +0000)]
tools/libxc: Move xc_version() out of xc_private.c into its own file

kexec-tools uses xc_version(), meaning that it is not a private API.  As we're
going to extend the functionality substantially, move it to its own file.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
4 months agoxen/version: Misc style fixes
Andrew Cooper [Tue, 20 Dec 2022 16:45:23 +0000 (16:45 +0000)]
xen/version: Misc style fixes

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
4 months agoxen/version: Fold build_id handling into xenver_varbuf_op()
Andrew Cooper [Tue, 3 Jan 2023 19:06:43 +0000 (19:06 +0000)]
xen/version: Fold build_id handling into xenver_varbuf_op()

struct xen_build_id and struct xen_varbuf are identical from an ABI point of
view, so XENVER_build_id can reuse xenver_varbuf_op() rather than having it's
own almost identical copy of the logic.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 months agoxen/version: Introduce non-truncating deterministically-signed XENVER_* subops
Andrew Cooper [Tue, 20 Dec 2022 13:12:52 +0000 (13:12 +0000)]
xen/version: Introduce non-truncating deterministically-signed XENVER_* subops

In XenServer, we have encountered problems caused by both XENVER_extraversion
and XENVER_commandline having fixed bounds.

More than just the invariant size, the APIs/ABIs also broken by typedef-ing an
array, and using an unqualified 'char' which has implementation-specific
signed-ness.

Provide brand new ops, which are capable of expressing variable length
strings, and mark the older ops as broken.

This fixes all issues around XENVER_extraversion being longer than 15 chars.
Further work beyond just this API is needed to remove other assumptions about
XENVER_commandline being 1023 chars long.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
---
Non-technical objections to this patch were raised, and subsequently rejected
by a community wide vote.  The results of the vote have not been shared with
the community at the time of committing.

4 months agoxen/version: Calculate xen_capabilities_info once at boot
Andrew Cooper [Fri, 13 Jan 2023 17:20:41 +0000 (17:20 +0000)]
xen/version: Calculate xen_capabilities_info once at boot

The arch_get_xen_caps() infrastructure is horribly inefficient for something
that is constant after features have been resolved on boot.

Every instance used snprintf() to format constants into a string (which gets
shorter when %d gets resolved!), and which get double buffered on the stack.

Switch to using string literals with the "3.0" inserted - these numbers
haven't changed in 19 years; the Xen 3.0 release was Dec 5th 2005.

Use initcalls to format the data into xen_cap_info, which is deliberately not
of type xen_capabilities_info_t because a 1k array is a silly overhead for
storing a maximum of 77 chars (the x86 version) and isn't liable to need any
more space in the forseeable future.  RISC-V and PPC have their stub dropped,
with the expectation that they won't carry this legacy interface forward.

This speeds up the the XENVER_capabilities hypercall, but the purpose of the
change is to allow us to introduce a better XENVER_* API that doesn't force
the use of a 1k buffer on the stack.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 months agoMAINTAINERS: Add myself as maintainer for NXP S32G3
Andrei Cherechesu [Thu, 19 Dec 2024 11:23:15 +0000 (13:23 +0200)]
MAINTAINERS: Add myself as maintainer for NXP S32G3

Add myself as maintainer for NXP S32G3 SoCs Family,
and the S32 Linux Team as relevant reviewers list.

Signed-off-by: Andrei Cherechesu <andrei.cherechesu@nxp.com>
Acked-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>
4 months agoSUPPORT.md: Describe SCMI-SMC layer feature
Andrei Cherechesu [Thu, 19 Dec 2024 11:23:14 +0000 (13:23 +0200)]
SUPPORT.md: Describe SCMI-SMC layer feature

Describe the layer which enables SCMI over SMC calls forwarding
to EL3 FW if issued by the Hardware domain. If the SCMI firmware
node is not found in the Host DT during initialization, it fails
silently as it's not mandatory.

The SCMI SMCs trapping at EL2 now lets hwdom perform SCMI ops for
interacting with system-level resources almost as if it would be
running natively.

Signed-off-by: Andrei Cherechesu <andrei.cherechesu@nxp.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>
4 months agoCHANGELOG.md: Add NXP S32G3 and SCMI-SMC layer support mentions
Andrei Cherechesu [Thu, 19 Dec 2024 11:23:13 +0000 (13:23 +0200)]
CHANGELOG.md: Add NXP S32G3 and SCMI-SMC layer support mentions

Signed-off-by: Andrei Cherechesu <andrei.cherechesu@nxp.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
4 months agoxen/arm: platforms: Add NXP S32G3 Processors config
Andrei Cherechesu [Thu, 19 Dec 2024 11:23:12 +0000 (13:23 +0200)]
xen/arm: platforms: Add NXP S32G3 Processors config

Platforms based on NXP S32G3 processors use the NXP LINFlexD
UART driver for console by default, and rely on Dom0 having
access to SCMI services for system-level resources from
firmware at EL3.

Signed-off-by: Andrei Cherechesu <andrei.cherechesu@nxp.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 months agoxen/arm: vsmc: Enable handling SiP-owned SCMI SMC calls
Andrei Cherechesu [Thu, 19 Dec 2024 11:23:11 +0000 (13:23 +0200)]
xen/arm: vsmc: Enable handling SiP-owned SCMI SMC calls

Change the handling of SiP SMC calls to be more generic,
instead of directly relying on the `platform_smc()` callback
implementation.

Try to handle the SiP SMC first through the `platform_smc()`
callback (if implemented). Otherwise, try to handle it as SCMI
message.

Signed-off-by: Andrei Cherechesu <andrei.cherechesu@nxp.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <jgrall@amazon.com>
4 months agoxen/arm: firmware: Add SCMI over SMC calls handling layer
Andrei Cherechesu [Thu, 19 Dec 2024 11:23:10 +0000 (13:23 +0200)]
xen/arm: firmware: Add SCMI over SMC calls handling layer

Introduce the SCMI-SMC layer to have some basic degree of
awareness about SCMI calls that are based on the ARM System
Control and Management Interface (SCMI) specification (DEN0056E).

The SCMI specification includes various protocols for managing
system-level resources, such as: clocks, pins, reset, system power,
power domains, performance domains, etc. The clients are named
"SCMI agents" and the server is named "SCMI platform".

Only support the shared-memory based transport with SMCs as
the doorbell mechanism for notifying the platform. Also, this
implementation only handles the "arm,scmi-smc" compatible,
requiring the following properties:
- "arm,smc-id" (unique SMC ID)
- "shmem" (one or more phandles pointing to shmem zones
for each channel)

The initialization is done as initcall, since we need
SMCs, and PSCI should already probe EL3 FW for SMCCC support.
If no "arm,scmi-smc" compatible node is found in the host
DT, the initialization fails silently, as it's not mandatory.
Otherwise, we get the 'arm,smc-id' DT property from the node,
to know the SCMI SMC ID we handle. The 'shmem' memory ranges
are not validated, as the SMC calls are only passed through
to EL3 FW if coming from the hardware domain.

Create a new 'firmware' folder to keep the SCMI code separate
from the generic ARM code.

Signed-off-by: Andrei Cherechesu <andrei.cherechesu@nxp.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Michal Orzel <michal.orzel@amd.com>
4 months agoxen/arm: add cache coloring support for Xen image
Carlo Nonato [Tue, 17 Dec 2024 17:06:37 +0000 (18:06 +0100)]
xen/arm: add cache coloring support for Xen image

Xen image is relocated to a new colored physical space. Some relocation
functionalities must be brought back:
- the virtual address of the new space is taken from 0c18fb76323b
  ("xen/arm: Remove unused BOOT_RELOC_VIRT_START").
- relocate_xen() and get_xen_paddr() are taken from f60658c6ae47
  ("xen/arm: Stop relocating Xen").

setup_pagetables() must be adapted for coloring and for relocation. Runtime
page tables are used to map the colored space, but they are also linked in
boot tables so that the new space is temporarily available for relocation.
This implies that Xen protection must happen after the copy.

Finally, since the alternative framework needs to remap the Xen text and
inittext sections, this operation must be done in a coloring-aware way.
The function xen_remap_colored() is introduced for that.

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Reviewed-by: Jan Beulich <jbeulich@suse.com> # common
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 months agoxen/arm: make consider_modules() available for xen relocation
Carlo Nonato [Tue, 17 Dec 2024 17:06:36 +0000 (18:06 +0100)]
xen/arm: make consider_modules() available for xen relocation

Cache coloring must physically relocate Xen in order to color the hypervisor
and consider_modules() is a key function that is needed to find a new
available physical address.

672d67f339c0 ("xen/arm: Split MMU-specific setup_mm() and related code out")
moved consider_modules() under arm32. Move it to mmu/setup.c and make it
non-static so that it can be used outside.

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 months agoxen/arm: add Xen cache colors command line parameter
Luca Miccio [Tue, 17 Dec 2024 17:06:35 +0000 (18:06 +0100)]
xen/arm: add Xen cache colors command line parameter

Add a new command line parameter to configure Xen cache colors.
These colors are dumped together with other coloring info.

Benchmarking the VM interrupt response time provides an estimation of
LLC usage by Xen's most latency-critical runtime task. Results on Arm
Cortex-A53 on Xilinx Zynq UltraScale+ XCZU9EG show that one color, which
reserves 64 KiB of L2, is enough to attain best responsiveness:
- Xen 1 color latency:  3.1 us
- Xen 2 color latency:  3.1 us

Since this is the most common target for Arm cache coloring, the default
amount of Xen colors is set to one.

More colors are instead very likely to be needed on processors whose L1
cache is physically-indexed and physically-tagged, such as Cortex-A57.
In such cases, coloring applies to L1 also, and there typically are two
distinct L1-colors. Therefore, reserving only one color for Xen would
senselessly partitions a cache memory that is already private, i.e.
underutilize it.

Signed-off-by: Luca Miccio <lucmiccio@gmail.com>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 months agoxen: add cache coloring allocator for domains
Carlo Nonato [Tue, 17 Dec 2024 17:06:34 +0000 (18:06 +0100)]
xen: add cache coloring allocator for domains

Add a new memory page allocator that implements the cache coloring mechanism.
The allocation algorithm enforces equal frequency distribution of cache
partitions, following the coloring configuration of a domain. This allows
for an even utilization of cache sets for every domain.

Pages are stored in a color-indexed array of lists. Those lists are filled
by a simple init function which computes the color of each page.
When a domain requests a page, the allocator extracts the page from the list
with the maximum number of free pages among those that the domain can access,
given its coloring configuration.

The allocator can only handle requests of order-0 pages. This allows for
easier implementation and since cache coloring targets only embedded systems,
it's assumed not to be a major problem.

The buddy allocator must coexist with the colored one because the Xen heap
isn't colored. For this reason a new Kconfig option and a command line
parameter are added to let the user set the amount of memory reserved for
the buddy allocator. Even when cache coloring is enabled, this memory
isn't managed by the colored allocator.

Colored heap information is dumped in the dump_heap() debug-key function.

Based on original work from: Luca Miccio <lucmiccio@gmail.com>

Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>
4 months agoxen/arm: add support for cache coloring configuration via device-tree
Carlo Nonato [Tue, 17 Dec 2024 17:06:32 +0000 (18:06 +0100)]
xen/arm: add support for cache coloring configuration via device-tree

Add the "llc-colors" Device Tree property to express DomUs and Dom0less
color configurations.

Based on original work from: Luca Miccio <lucmiccio@gmail.com>

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Reviewed-by: Jan Beulich <jbeulich@suse.com> # non-Arm
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 months agotools: add support for cache coloring configuration
Carlo Nonato [Tue, 17 Dec 2024 17:06:31 +0000 (18:06 +0100)]
tools: add support for cache coloring configuration

Add a new "llc_colors" parameter that defines the LLC color assignment for
a domain. The user can specify one or more color ranges using the same
syntax used everywhere else for color config described in the
documentation.
The parameter is defined as a list of strings that represent the color
ranges.

Documentation is also added.
Golang bindings are regenerated.

Based on original work from: Luca Miccio <lucmiccio@gmail.com>

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>
4 months agoxen: extend domctl interface for cache coloring
Carlo Nonato [Tue, 17 Dec 2024 17:06:30 +0000 (18:06 +0100)]
xen: extend domctl interface for cache coloring

Add a new domctl hypercall to allow the user to set LLC coloring
configurations. Colors can be set only once, just after domain creation,
since recoloring isn't supported.

Based on original work from: Luca Miccio <lucmiccio@gmail.com>

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 months agoxen/arm: add Dom0 cache coloring support
Carlo Nonato [Tue, 17 Dec 2024 17:06:29 +0000 (18:06 +0100)]
xen/arm: add Dom0 cache coloring support

Add a command line parameter to allow the user to set the coloring
configuration for Dom0.
A common configuration syntax for cache colors is introduced and
documented.
Take the opportunity to also add:
 - default configuration notion.
 - function to check well-formed configurations.

Direct mapping Dom0 isn't possible when coloring is enabled, so
CDF_directmap flag is removed when creating it.

Based on original work from: Luca Miccio <lucmiccio@gmail.com>

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 months agoxen/arm: permit non direct-mapped Dom0 construction
Carlo Nonato [Tue, 17 Dec 2024 17:06:28 +0000 (18:06 +0100)]
xen/arm: permit non direct-mapped Dom0 construction

Cache coloring requires Dom0 not to be direct-mapped because of its non
contiguous mapping nature, so allocate_memory() is needed in this case.
8d2c3ab18cc1 ("arm/dom0less: put dom0less feature code in a separate module")
moved allocate_memory() in dom0less_build.c. In order to use it
in Dom0 construction bring it back to domain_build.c and declare it in
domain_build.h.

Adapt the implementation of allocate_memory() so that it uses the host
layout when called on the hwdom, via find_unallocated_memory().

Since gnttab information are needed in the process, move find_gnttab_region()
before allocate_memory() in construct_dom0().

Introduce add_hwdom_free_regions() callback to add hwdom banks in descending
order.

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
4 months agoxen/arm: add initial support for LLC coloring on arm64
Carlo Nonato [Tue, 17 Dec 2024 17:06:27 +0000 (18:06 +0100)]
xen/arm: add initial support for LLC coloring on arm64

LLC coloring needs to know the last level cache layout in order to make the
best use of it. This can be probed by inspecting the CLIDR_EL1 register,
so the Last Level is defined as the last level visible by this register.
Note that this excludes system caches in some platforms.

Static memory allocation and cache coloring are incompatible because static
memory can't be guaranteed to use only colors assigned to the domain.
Panic during DomUs creation when both are enabled.

Based on original work from: Luca Miccio <lucmiccio@gmail.com>

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 months agoxen/common: add cache coloring common code
Carlo Nonato [Tue, 17 Dec 2024 17:06:26 +0000 (18:06 +0100)]
xen/common: add cache coloring common code

Last Level Cache (LLC) coloring allows to partition the cache in smaller
chunks called cache colors.

Since not all architectures can actually implement it, add a HAS_LLC_COLORING
Kconfig option.
LLC_COLORS_ORDER Kconfig option has a range maximum of 10 (2^10 = 1024)
because that's the number of colors that fit in a 4 KiB page when integers
are 4 bytes long.

LLC colors are a property of the domain, so struct domain has to be extended.

Based on original work from: Luca Miccio <lucmiccio@gmail.com>

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Signed-off-by: Marco Solieri <marco.solieri@minervasys.tech>
Acked-by: Michal Orzel <michal.orzel@amd.com>
4 months agoautomation: Pin down CONFIG_QEMU_PLATFORM for RISC-V's randconfig job
Oleksii Kurochko [Thu, 19 Dec 2024 11:18:31 +0000 (12:18 +0100)]
automation: Pin down CONFIG_QEMU_PLATFORM for RISC-V's randconfig job

Except setting CONFIG_QEMU_PLATFORM=y in tiny64_defconfig,
CONFIG_QEMU_PLATFORM should be fixed for RISC-V's randconfig job.
Otherwise, an expected compilation error for RISC-V's randconfig job
will occur since clean_and_invalidate_dcache_va_range() and
clean_dcache_va_range() are currently implemented only for the QEMU
platform.

Additionally, sort the EXTRA_FIXED_RANDCONFIG list alphabetically.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Fixes: f92e2709bd ("xen/riscv: implement data and instruction cache operations")
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 months agoxen/ioreq: Fix check for CONFIG_ARCH_VCPU_IOREQ_COMPLETION
Sergiy Kibrik [Thu, 19 Dec 2024 11:13:26 +0000 (13:13 +0200)]
xen/ioreq: Fix check for CONFIG_ARCH_VCPU_IOREQ_COMPLETION

It should be CONFIG_ARCH_VCPU_IOREQ_COMPLETION (as in Kconfig) and not
misspelled CONFIG_VCPU_ARCH_IOREQ_COMPLETION.

Fixes: 979cfdd3e58c ("ioreq: do not build arch_vcpu_ioreq_completion() for non-VMX configurations")
Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 months agotools/xg: increase LZMA_BLOCK_SIZE for uncompressing the kernel
Marek Marczykowski-Górecki [Tue, 8 Oct 2024 21:32:23 +0000 (23:32 +0200)]
tools/xg: increase LZMA_BLOCK_SIZE for uncompressing the kernel

Linux 6.12-rc2 fails to decompress with the current 128MiB, contrary to
the code comment. It results in a failure like this:

    domainbuilder: detail: xc_dom_kernel_file: filename="/var/lib/qubes/vm-kernels/6.12-rc2-1.1.fc37/vmlinuz"
    domainbuilder: detail: xc_dom_malloc_filemap    : 12104 kB
    domainbuilder: detail: xc_dom_module_file: filename="/var/lib/qubes/vm-kernels/6.12-rc2-1.1.fc37/initramfs"
    domainbuilder: detail: xc_dom_malloc_filemap    : 7711 kB
    domainbuilder: detail: xc_dom_boot_xen_init: ver 4.19, caps xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
    domainbuilder: detail: xc_dom_parse_image: called
    domainbuilder: detail: xc_dom_find_loader: trying multiboot-binary loader ...
    domainbuilder: detail: loader probe failed
    domainbuilder: detail: xc_dom_find_loader: trying HVM-generic loader ...
    domainbuilder: detail: loader probe failed
    domainbuilder: detail: xc_dom_find_loader: trying Linux bzImage loader ...
    domainbuilder: detail: _xc_try_lzma_decode: XZ decompression error: Memory usage limit reached
    xc: error: panic: xg_dom_bzimageloader.c:761: xc_dom_probe_bzimage_kernel unable to XZ decompress kernel: Invalid kernel
    domainbuilder: detail: loader probe failed
    domainbuilder: detail: xc_dom_find_loader: trying ELF-generic loader ...
    domainbuilder: detail: loader probe failed
    xc: error: panic: xg_dom_core.c:689: xc_dom_find_loader: no loader found: Invalid kernel
    libxl: error: libxl_dom.c:566:libxl__build_dom: xc_dom_parse_image failed

The important part: XZ decompression error: Memory usage limit reached

This looks to be related to the following change in Linux:
8653c909922743bceb4800e5cc26087208c9e0e6 ("xz: use 128 MiB dictionary and force single-threaded mode")

Fix this by increasing the block size to 256MiB. And remove the
misleading comment (from lack of better ideas).

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@vates.tech>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 months agox86/hvm: Use constants for x86 modes
Teddy Astie [Mon, 2 Dec 2024 09:49:14 +0000 (09:49 +0000)]
x86/hvm: Use constants for x86 modes

In many places of x86 HVM code, constants integer are used to indicate in what mode is
running the CPU (real, vm86, 16-bits, 32-bits, 64-bits). However, these constants are
are written directly as integer which hides the actual meaning of these modes.

This patch introduces X86_MODE_* macros and replace those occurences with it.

Signed-off-by: Teddy Astie <teddy.astie@vates.tech>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Teddy Astie <teddy.astie@vates.tech>
4 months agotools/libxg: Don't gunzip the guests initrd
Andrew Cooper [Thu, 27 Jun 2024 12:55:51 +0000 (13:55 +0100)]
tools/libxg: Don't gunzip the guests initrd

Decompressing the kernel is necessary to inspect the ELF notes, but the
dombuilder will gunzip() secondary modules too.  Specifically gunzip(), no
other decompression algorithms.

This may have been necessary in the dim and distant past, but it is broken
today.  Linux specifically supports concatenating CPIO fragments of differing
compressions, and any attempt to interpret it with a single algorithm may
corrupt later parts.

This was an unexpected discovery while trying to test Xen's gunzip()
logic (Xen as a PVH guest, with a gzipped XTF kernel as dom0).

Interpreting secondary modules should be left as an exercise to the guest.
This reduces work done in dom0.

This is not expected to cause a practical difference to guests these days.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
4 months agox86/boot: Use printk_once() instead of opencoding it
Andrew Cooper [Tue, 3 Sep 2024 23:14:24 +0000 (00:14 +0100)]
x86/boot: Use printk_once() instead of opencoding it

Adjust the message for brevity.  No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
4 months agox86/sched: Drop unused includes from credit2.c
Andrew Cooper [Thu, 12 Sep 2024 02:02:37 +0000 (03:02 +0100)]
x86/sched: Drop unused includes from credit2.c

Sort the remaining includes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
4 months agoxen/sched: Untangle credit2 vs cpu_nr_siblings()
Andrew Cooper [Thu, 12 Sep 2024 01:18:40 +0000 (02:18 +0100)]
xen/sched: Untangle credit2 vs cpu_nr_siblings()

Credit2 has no buisness including asm/cpufeature.h or asm/processor.h.

This was caused by a bad original abstraction, and an even less wise attempt
to fix the build on my behalf.  It is also the sole reason why PPC and RISC-V
need cpufeature.h header.

Worst of all, cpu_data[cpu].x86_num_siblings doesn't even have the same
meaning between vendors on x86 CPUS.

Implement cpu_nr_siblings() locally in credit2.c, leaving behind a TODO.  Drop
the stub from each architecture.

Fixes: 8e2aa76dc167 ("xen: credit2: limit the max number of CPUs in a runqueue")
Fixes: ad33a573c009 ("xen/credit2: Fix build following c/s 8e2aa76dc (take 2)")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
4 months agoxen/riscv: relocating and unflattening host device tree
Oleksii Kurochko [Thu, 19 Dec 2024 09:23:48 +0000 (10:23 +0100)]
xen/riscv: relocating and unflattening host device tree

Introduce relocate_fdt() and call it to relocate FDT to Xen heap
instead of using early mapping as it is expected that discard_initial_modules()
( is supposed to call in the future ) discards the FDT boot module and
remove_early_mappings() destroys the early mapping.

Unflatten a device tree, creating the tree of struct device_node.
It also fills the "name" and "type" pointers of the nodes so the normal
device-tree walking functions can be used.

Set device_tree_flattened to NULL in the case when acpi_disabled is
equal to false.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 months agoxen/riscv: implement prereq for DTB relocation
Oleksii Kurochko [Thu, 19 Dec 2024 09:23:28 +0000 (10:23 +0100)]
xen/riscv: implement prereq for DTB relocation

DTB relocatin in Xen heap requires the following functions which are
introduced in current patch:
- xvmalloc_array()
- copy_from_paddr()

For internal use of xvmalloc, the functions flush_page_to_ram() and
virt_to_page() are introduced. virt_to_page() is also required for
free_xenheap_pages().

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 months agoxen/riscv: implement data and instruction cache operations
Oleksii Kurochko [Thu, 19 Dec 2024 09:22:46 +0000 (10:22 +0100)]
xen/riscv: implement data and instruction cache operations

Implement following cache operations:
- clean_and_invalidate_dcache_va_range()
- clean_dcache_va_range()
- invalidate_icache()

The first two functions may require support for the CMO (Cache Management
Operations) extension and/or hardware-specific instructions.
Currently, only QEMU is supported, which does not model cache behavior.
Therefore, clean_and_invalidate_dcache_va_range() and clean_dcache_va_range()
are implemented to simply return 0. For other cases, generate compilation error
so a user won't miss to update this function if necessery.
If hardware supports CMO or hardware-specific instructions, these functions
should be updated accordingly. To support current implementation of these
function CONFIG_QEMU_PLATFORM is introduced.

invalidate_icache() is implemented using fence.i instruction as
mentioned in the unpriv spec:
  The FENCE.I instruction was designed to support a wide variety of
  implementations. A simple implementation can flush the local instruction
  cache and the instruction pipeline when the FENCE.I is executed.
  A more complex implementation might snoop the instruction (data) cache
  on every data (instruction) cache miss, or use an inclusive unified
  private L2 cache to invalidate lines from the primary instruction cache
  when they are being written by a local store instruction.
  If instruction and data caches are kept coherent in this way, or if the
  memory system consists of only uncached RAMs, then just the fetch pipeline
  needs to be flushed at a FENCE.I.
The FENCE.I instruction requires the presence of the Zifencei extension,
which might not always be available. However, Xen uses the RV64G ISA, which
guarantees the presence of the Zifencei extension. According to the
unprivileged ISA specification (version 20240411):
  One goal of the RISC-V project is that it be used as a stable software
  development target. For this purpose, we define a combination of a base ISA
  (RV32I or RV64I) plus selected standard extensions (IMAFD, Zicsr, Zifencei)
  as a "general-purpose" ISA, and we use the abbreviation G for the
  IMAFDZicsr_Zifencei combination of instruction-set extensions.

Set CONFIG_QEMU_PLATFORM=y in tiny64_defconfig to have proper implemtation of
clean_and_invalidate_dcache_va_range() and clean_dcache_va_range() for CI.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 months agoxen/riscv: update layout table in config.h
Oleksii Kurochko [Thu, 19 Dec 2024 09:21:11 +0000 (10:21 +0100)]
xen/riscv: update layout table in config.h

Make all upper bounds (end addresses) for areas inclusive to align
with the corresponding definitions.

For the Direct map region, the upper bound was calculated incorrectly
in efadb18dd58aba ("xen/riscv: add VM space layout"). It should be
0x7f80000000 (considering that the value is exclusive, instead of
0x7f40000000). Therefore, the inclusive upper bound for that region
is 0x7f80000000 - 1.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 months agoxen/page_alloc: introduce preserved page flags macro
Carlo Nonato [Thu, 19 Dec 2024 09:05:14 +0000 (10:05 +0100)]
xen/page_alloc: introduce preserved page flags macro

PGC_static and PGC_extra need to be preserved when assigning a page.
Define a new macro that groups those flags and use it instead of or'ing
every time.

Signed-off-by: Carlo Nonato <carlo.nonato@minervasys.tech>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 months agotools: add a dedicated header file for barrier definitions
Juergen Gross [Thu, 19 Dec 2024 09:04:45 +0000 (10:04 +0100)]
tools: add a dedicated header file for barrier definitions

Instead of having to include xenctrl.h for getting definitions of cpu
barriers, add a dedicated header for that purpose.

Switch the xen-9pfsd daemon to use the new header instead of xenctrl.h.

This is in preparation of making Xenstore independent from libxenctrl.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@vates.tech>