]> xenbits.xensource.com Git - xen.git/log
xen.git
2 years agox86/pagewalk: Support PKS
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/pagewalk: Support PKS

PKS is very similar to the existing PKU behaviour, operating on pagewalks for
any supervisor mapping.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/hvm: Enable guest access to MSR_PKRS
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/hvm: Enable guest access to MSR_PKRS

Have guest_{rd,wr}msr(), via hvm_{get,set}_reg(), access either the live
register, or stashed state, depending on context.  Include MSR_PKRS for
migration, and let the guest have full access.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/hvm: Context switch MSR_PKRS
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/hvm: Context switch MSR_PKRS

Under PKS, MSR_PKRS is available and based on the CPUID policy alone, and
usable independently of CR4.PKS.  See the large comment in prot-key.h for
details of the context switching arrangement.

Use WRMSRNS right away, as we don't care about serialsing properties for
context switching this MSR.

Sanitise MSR_PKRS on boot.  In anticipation of wanting to use PKS for Xen in
the future, arrange for the sanitisation to occur prior to potentially setting
CR4.PKS; if PKEY0.{AD,WD} leak in from a previous context, we will triple
fault immediately on setting CR4.PKS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
2 years agox86: Initial support for WRMSRNS
Andrew Cooper [Mon, 9 Jan 2023 15:25:11 +0000 (15:25 +0000)]
x86: Initial support for WRMSRNS

WRMSR Non-Serialising is an optimisation intended for cases where an MSR needs
updating, but architectural serialising properties are not needed.

In is anticipated that this will apply to most if not all MSRs modified on
context switch paths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/prot-key: Split PKRU infrastructure out of asm/processor.h
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/prot-key: Split PKRU infrastructure out of asm/processor.h

asm/processor.h is in desperate need of splitting up, and protection key
functionality in only used in the emulator and pagewalk.  Introduce a new
asm/prot-key.h and move the relevant content over.

Rename the PKRU_* constants to drop the user part and to use the architectural
terminology.

Drop the read_pkru_{ad,wd}() helpers entirely.  The pkru infix is about to
become wrong, and the sole user is shorter and easier to follow without the
helpers.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/prot-key: Enumeration for Protection Key Supervisor
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/prot-key: Enumeration for Protection Key Supervisor

Protection Key Supervisor works in a very similar way to Protection Key User,
except that instead of a PKRU register used by the {RD,WR}PKRU instructions,
the supervisor protection settings live in MSR_PKRS and is accessed using
normal {RD,WR}MSR instructions.

PKS has the same problematic interactions with PV guests as PKU (more infact,
given the guest kernel's CPL), so we'll only support this for HVM guests for
now.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/boot: Sanitise PKRU on boot
Andrew Cooper [Tue, 10 Jan 2023 10:57:21 +0000 (10:57 +0000)]
x86/boot: Sanitise PKRU on boot

While the reset value of the register is 0, it might not be after kexec/etc.
If PKEY0.{WD,AD} have leaked in from an earlier context, construction of a PV
dom0 will explode.

Sequencing wise, this must come after setting CR4.PKE, and before we touch any
user mappings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/vmx: Partially revert "x86/vmx: implement Notify VM Exit"
Andrew Cooper [Wed, 18 Jan 2023 19:20:05 +0000 (19:20 +0000)]
x86/vmx: Partially revert "x86/vmx: implement Notify VM Exit"

The original patch tried to do two things - implement VMNotify, and
re-optimise VT-x to not intercept #DB/#AC by default.

The second part is buggy in multiple ways.  Both GDBSX and Introspection need
to conditionally intercept #DB, which was not accounted for.  Also, #DB
interception has nothing at all to do with cpu_has_monitor_trap_flag.

Revert the second half, leaving #DB/#AC intercepted unilaterally, but with
VMNotify active by default when available.

Fixes: 573279cde1c4 ("x86/vmx: implement Notify VM Exit")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
2 years agox86/shadow: Drop dubious lastpage diagnostic
Andrew Cooper [Fri, 20 Jan 2023 11:01:52 +0000 (11:01 +0000)]
x86/shadow: Drop dubious lastpage diagnostic

This is a global variable (actually 3, one per GUEST_PAGING_LEVEL), operated
on using atomics only (with no regard to what else shares the same cacheline),
which emits a diagnostic (in debug builds only) without changing any program
behaviour.

It is presumably left-over debugging, as it interlinks the behaviour of all
vCPUs in chronological order.  Based on the read-only p2m types, this
diagnostic can be tripped by entirely legitimate guest behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/hvm: Drop pat_entry_2_pte_flags
Andrew Cooper [Mon, 16 Jan 2023 11:01:26 +0000 (11:01 +0000)]
x86/hvm: Drop pat_entry_2_pte_flags

Converting from PAT to PTE is trivial, and shorter to encode with bitwise
logic than the space taken by a table counting from 0 to 7 in non-adjacent
bits.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/version: Drop bogus return values for XENVER_platform_parameters
Andrew Cooper [Tue, 3 Jan 2023 13:45:48 +0000 (13:45 +0000)]
xen/version: Drop bogus return values for XENVER_platform_parameters

A split in virtual address space is only applicable for x86 PV guests.
Furthermore, the information returned for x86 64bit PV guests is wrong.

Explain the problem in version.h, stating the other information that PV guests
need to know.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoRevert "xen/arm: Use the correct format specifier"
Julien Grall [Fri, 20 Jan 2023 17:32:00 +0000 (17:32 +0000)]
Revert "xen/arm: Use the correct format specifier"

This is breaking the CI. See:

https://lore.kernel.org/all/ba37ee02-c07c-2803-0867-149c779890b6@amd.com/

This reverts commit 13bfdceda0991214854f3e563a36f621e9da3bec.

Signed-off-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/arm: Harden setup_frametable_mappings
Michal Orzel [Tue, 17 Jan 2023 11:43:32 +0000 (12:43 +0100)]
xen/arm: Harden setup_frametable_mappings

The amount of supported physical memory depends on the frametable size
and the number of struct page_info entries that can fit into it. Define
a macro PAGE_INFO_SIZE to store the current size of the struct page_info
(i.e. 56B for arm64 and 32B for arm32) and add a sanity check in
setup_frametable_mappings to be notified whenever the size of the
structure changes. Also call a panic if the calculated frametable_size
exceeds the limit defined by FRAMETABLE_SIZE macro.

Update the comments regarding the frametable in asm/config.h.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/arm32: Remove unused macro FRAMETABLE_VIRT_END
Michal Orzel [Tue, 17 Jan 2023 11:43:31 +0000 (12:43 +0100)]
xen/arm32: Remove unused macro FRAMETABLE_VIRT_END

This macro is unused and the corresponding one for arm64 has already
been removed as part of the commit 6dc9a1fe982f ("xen/arm: Remove most
of the *_VIRT_END defines").

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/arm64: Fix incorrect DIRECTMAP_SIZE calculation
Michal Orzel [Tue, 17 Jan 2023 11:43:30 +0000 (12:43 +0100)]
xen/arm64: Fix incorrect DIRECTMAP_SIZE calculation

The direct mapped area occupies L0 slots from 256 to 265 included
(i.e. 10 slots), resulting in 5TB (512GB * 10) of virtual address space.
However, due to incorrect slot subtraction (we take 9 slots into account)
we set DIRECTMAP_SIZE to 4.5TB instead. Fix it.

Note that we only support up to 2TB of physical memory so this is
a latent issue.

Fixes: 5263507b1b4a ("xen: arm: Use a direct mapping of RAM on arm64")
Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/arm: domain_build: Replace use of paddr_t in find_domU_holes()
Ayan Kumar Halder [Tue, 17 Jan 2023 17:43:50 +0000 (17:43 +0000)]
xen/arm: domain_build: Replace use of paddr_t in find_domU_holes()

bankbase, banksize and bankend are used to hold values of type 'unsigned
long long'. This can be represented as 'uint64_t' instead of 'paddr_t'.
This will ensure consistency with allocate_static_memory() (where we use
'uint64_t' for rambase and ramsize).

In future, paddr_t can be used for 'uin32_t' as well to represent 32bit
physical addresses.

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoxen/arm: Use the correct format specifier
Ayan Kumar Halder [Tue, 17 Jan 2023 17:43:49 +0000 (17:43 +0000)]
xen/arm: Use the correct format specifier

1. One should use 'PRIpaddr' to display 'paddr_t' variables.
2. One should use 'PRIx64' to display 'u64' in hex format. The current
use of 'PRIpaddr' for printing PTE is buggy as this is not a physical
address.

Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agotools/xenstore: make output of "xenstore-control help" more pretty
Juergen Gross [Wed, 18 Jan 2023 09:50:15 +0000 (10:50 +0100)]
tools/xenstore: make output of "xenstore-control help" more pretty

Using a tab for separating the command from the options in the output
of "xenstore-control help" results in a rather ugly list.

Use a fixed size for the command instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: let check_store() check the accounting data
Juergen Gross [Wed, 18 Jan 2023 09:50:14 +0000 (10:50 +0100)]
tools/xenstore: let check_store() check the accounting data

Today check_store() is only testing the correctness of the node tree.

Add verification of the accounting data (number of nodes) and correct
the data if it is wrong.

Do the initial check_store() call only after Xenstore entries of a
live update have been read. This is wanted to make sure the accounting
data is correct after a live update.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: introduce trace classes
Juergen Gross [Wed, 18 Jan 2023 09:50:13 +0000 (10:50 +0100)]
tools/xenstore: introduce trace classes

Make the xenstored internal trace configurable by adding classes
which can be switched on and off independently from each other.

Define the following classes:

- obj: Creation and deletion of interesting "objects" (watch,
  transaction, connection)
- io: incoming requests and outgoing responses
- wrl: write limiting

Per default "obj" and "io" are switched on.

Entries written via trace() will always be printed (if tracing is on
at all).

Add the capability to control the trace settings via the "log"
command and via a new "--log-control" command line option.

Add a missing trace_create() call for creating a transaction.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: switch hashtable to use the talloc framework
Juergen Gross [Wed, 18 Jan 2023 09:50:12 +0000 (10:50 +0100)]
tools/xenstore: switch hashtable to use the talloc framework

Instead of using malloc() and friends, let the hashtable implementation
use the talloc framework.

This is more consistent with the rest of xenstored and it allows to
track memory usage via "xenstore-control memreport".

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: don't let hashtable_remove() return the removed value
Juergen Gross [Wed, 18 Jan 2023 09:50:11 +0000 (10:50 +0100)]
tools/xenstore: don't let hashtable_remove() return the removed value

Letting hashtable_remove() return the value of the removed element is
not used anywhere in Xenstore, and it conflicts with a hashtable
created specifying the HASHTABLE_FREE_VALUE flag.

So just drop returning the value.

This of course requires to free the value if the HASHTABLE_FREE_VALUE
was specified, as otherwise it would be a memory leak.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: let chk_domain_generation() return a bool
Juergen Gross [Wed, 18 Jan 2023 09:50:10 +0000 (10:50 +0100)]
tools/xenstore: let chk_domain_generation() return a bool

Instead of returning 0 or 1 let chk_domain_generation() return a
boolean value.

Simplify the only caller by removing the ret variable.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: make domain_is_unprivileged() an inline function
Juergen Gross [Wed, 18 Jan 2023 09:50:09 +0000 (10:50 +0100)]
tools/xenstore: make domain_is_unprivileged() an inline function

clang 14 is complaining about a NULL dereference for constructs like:

  domain_is_unprivileged(conn) ? conn->in : NULL

as it can't know that domain_is_unprivileged(conn) will return false
if conn is NULL.

Fix that by making domain_is_unprivileged() an inline function (and
related to that domid_is_unprivileged(), too).

In order not having to make struct domain public, use conn->id instead
of conn->domain->domid for the test.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: replace literal domid 0 with dom0_domid
Juergen Gross [Wed, 18 Jan 2023 09:50:08 +0000 (10:50 +0100)]
tools/xenstore: replace literal domid 0 with dom0_domid

There are some places left where dom0 is associated with domid 0.

Use dom0_domid instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: change per-domain node accounting interface
Juergen Gross [Wed, 18 Jan 2023 09:50:07 +0000 (10:50 +0100)]
tools/xenstore: change per-domain node accounting interface

Rework the interface and the internals of the per-domain node
accounting:

- rename the functions to domain_nbentry_*() in order to better match
  the related counter name

- switch from node pointer to domid as interface, as all nodes have the
  owner filled in

- use a common internal function for adding a value to the counter

For the transaction case add a helper function to get the list head
of the per-transaction changed domains, enabling to eliminate the
transaction_entry_*() functions.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: move changed domain handling
Juergen Gross [Wed, 18 Jan 2023 09:50:06 +0000 (10:50 +0100)]
tools/xenstore: move changed domain handling

Move all code related to struct changed_domain from
xenstored_transaction.c to xenstored_domain.c.

This will be needed later in order to simplify the accounting data
updates in cases of errors during a request.

Split the code to have a more generic base framework.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: replace watch->relative_path with a prefix length
Juergen Gross [Wed, 18 Jan 2023 09:50:05 +0000 (10:50 +0100)]
tools/xenstore: replace watch->relative_path with a prefix length

Instead of storing a pointer to the path which is prepended to
relative paths in struct watch, just use the length of the prepended
path.

It should be noted that the now removed special case of the
relative path being "" in get_watch_path() can't happen at all.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: introduce dummy nodes for special watch paths
Juergen Gross [Wed, 18 Jan 2023 09:50:04 +0000 (10:50 +0100)]
tools/xenstore: introduce dummy nodes for special watch paths

Instead of special casing the permission handling and watch event
firing for the special watch paths "@introduceDomain" and
"@releaseDomain", use static dummy nodes added to the data base when
starting Xenstore.

The node accounting needs to reflect that change by adding the special
nodes in the domain_entry_fix() call in setup_structure().

Note that this requires to rework the calls of fire_watches() for the
special events in order to avoid leaking memory.

Move the check for a valid node name from get_node() to
get_node_canonicalized(), as it allows to use get_node() for the
special nodes, too.

In order to avoid read and write accesses to the special nodes use a
special variant for obtaining the current node data for the permission
handling.

This allows to simplify quite some code. In future sub-nodes of the
special nodes will be possible due to this change, allowing more fine
grained permission control of special events for specific domains.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: make log macro globally available
Juergen Gross [Wed, 18 Jan 2023 09:50:03 +0000 (10:50 +0100)]
tools/xenstore: make log macro globally available

Move the definition of the log() macro to xenstored_core.h in order
to make it usable from other source files, too.

While at it preserve errno from being modified.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: add hashlist for finding struct domain by domid
Juergen Gross [Wed, 18 Jan 2023 09:50:02 +0000 (10:50 +0100)]
tools/xenstore: add hashlist for finding struct domain by domid

Today finding a struct domain by its domain id requires to scan the
list of domains until finding the correct domid.

Add a hashlist for being able to speed this up. This allows to remove
the linking of struct domain in a list. Note that the list of changed
domains per transaction is kept as a list, as there are no known use
cases with more than 4 domains being touched in a single transaction
(this would be a device handled by a driver domain and being assigned
to a HVM domain with device model in a stubdom, plus the control
domain).

Some simple performance tests comparing the scanning and hashlist have
shown that the hashlist will win as soon as more than 6 entries need
to be scanned.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: remove all watches when a domain has stopped
Juergen Gross [Wed, 18 Jan 2023 09:50:01 +0000 (10:50 +0100)]
tools/xenstore: remove all watches when a domain has stopped

When a domain has been released by Xen tools, remove all its
registered watches. This avoids sending watch events to the dead domain
when all the nodes related to it are being removed by the Xen tools.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agotools/xenstore: let talloc_free() preserve errno
Juergen Gross [Wed, 18 Jan 2023 09:50:00 +0000 (10:50 +0100)]
tools/xenstore: let talloc_free() preserve errno

Today talloc_free() is not guaranteed to preserve errno, especially in
case a custom destructor is being used.

So preserve errno in talloc_free().

This allows to remove some errno saving outside of talloc.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/riscv: introduce sbi call to putchar to console
Bobby Eshleman [Fri, 20 Jan 2023 08:26:31 +0000 (09:26 +0100)]
xen/riscv: introduce sbi call to putchar to console

Originally SBI implementation for Xen was introduced by
Bobby Eshleman <bobby.eshleman@gmail.com> but it was removed
all the stuff for simplicity  except SBI call for putting
character to console.

The patch introduces sbi_putchar() SBI call which is necessary
to implement initial early_printk.

Signed-off-by: Bobby Eshleman <bobby.eshleman@gmail.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Bobby Eshleman <bobby.eshleman@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
2 years agoxen/riscv: introduce asm/types.h header file
Oleksii Kurochko [Fri, 20 Jan 2023 08:25:44 +0000 (09:25 +0100)]
xen/riscv: introduce asm/types.h header file

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
2 years agoxen/include: change <asm/types.h> to <xen/types.h> in early_printk.h
Oleksii Kurochko [Fri, 20 Jan 2023 08:24:20 +0000 (09:24 +0100)]
xen/include: change <asm/types.h> to <xen/types.h> in early_printk.h

In the patch "include/types: move stddef.h-kind types to common
header" [1] size_t was moved from <asm/types.h> to <xen/types.h>
so early_printk should be updated correspondingly.

[1] https://lore.kernel.org/xen-devel/5a0a9e2a-c116-21b5-8081-db75fe4178d7@suse.com/

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/shadow: fix PAE check for top-level table unshadowing
Jan Beulich [Fri, 20 Jan 2023 08:23:42 +0000 (09:23 +0100)]
x86/shadow: fix PAE check for top-level table unshadowing

Clearly within the for_each_vcpu() the vCPU of this loop is meant, not
the (loop invariant) one the fault occurred on.

Fixes: 3d5e6a3ff383 ("x86 hvm: implement HVMOP_pagetable_dying")
Fixes: ef3b0d8d2c39 ("x86/shadow: shadow_table[] needs only one entry for PV-only configs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/acpi: separate AMD-Vi and VT-d specific functions
Xenia Ragiadakou [Fri, 20 Jan 2023 08:22:42 +0000 (09:22 +0100)]
x86/acpi: separate AMD-Vi and VT-d specific functions

The functions acpi_dmar_init() and acpi_dmar_zap/reinstate() are
VT-d specific while the function acpi_ivrs_init() is AMD-Vi specific.
To eliminate dead code, they need to be guarded under CONFIG_INTEL_IOMMU
and CONFIG_AMD_IOMMU, respectively.

Instead of adding #ifdef guards around the function calls, implement them
as empty static inline functions.

Take the opportunity to move the declaration of acpi_dmar_init from the
x86 arch-specific header to the common header, since Intel VT-d has been
also used on IA-64 platforms.

No functional change intended.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/iommu: iommu_igfx and iommu_qinval are Intel VT-d specific
Xenia Ragiadakou [Fri, 20 Jan 2023 08:22:13 +0000 (09:22 +0100)]
x86/iommu: iommu_igfx and iommu_qinval are Intel VT-d specific

Use CONFIG_INTEL_IOMMU to guard the usage of iommu_igfx and iommu_qinval
in common code.

No functional change intended.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/iommu: amd_iommu_perdev_intremap is AMD-Vi specific
Xenia Ragiadakou [Fri, 20 Jan 2023 08:21:37 +0000 (09:21 +0100)]
x86/iommu: amd_iommu_perdev_intremap is AMD-Vi specific

Move its definition to the AMD-Vi driver and use CONFIG_AMD_IOMMU
to guard its usage in common code.

Take the opportunity to replace bool_t with bool and 1 with true.

No functional change intended.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/shadow: further correct MMIO handling in _sh_propagate()
Jan Beulich [Fri, 20 Jan 2023 08:20:26 +0000 (09:20 +0100)]
x86/shadow: further correct MMIO handling in _sh_propagate()

While c61a6f74f80e ("x86: enforce consistent cachability of MMIO
mappings") correctly converted one !mfn_valid() check there, two others
were wrongly left untouched: Both cachability control and log-dirty
tracking ought to be uniformly handled/excluded for all (non-)MMIO
ranges, not just ones qualifiable by mfn_valid().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoinclude/types: move stddef.h-kind types to common header
Jan Beulich [Fri, 20 Jan 2023 08:18:39 +0000 (09:18 +0100)]
include/types: move stddef.h-kind types to common header

size_t, ssize_t, and ptrdiff_t are all expected to be uniformly defined
on any ports Xen might gain. In particular I hope new ports can rely on
__SIZE_TYPE__ and __PTRDIFF_TYPE__ being made available by the compiler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agox86/shadow: L2H shadow type is PV32-only
Jan Beulich [Fri, 20 Jan 2023 08:17:33 +0000 (09:17 +0100)]
x86/shadow: L2H shadow type is PV32-only

Like for the various HVM-only types, save a little bit of code by suitably
"masking" this type out when !PV32.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86: split populating of struct vcpu_time_info into a separate function
Jan Beulich [Fri, 20 Jan 2023 08:12:48 +0000 (09:12 +0100)]
x86: split populating of struct vcpu_time_info into a separate function

This is to facilitate subsequent re-use of this code.

While doing so add const in a number of places, extending to
gtime_to_gtsc() and then for symmetry also its inverse function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper@citrix.com>
2 years agoxen/arm: Add 0x prefix when printing memory size in construct_domU
Michal Orzel [Tue, 3 Jan 2023 10:25:19 +0000 (11:25 +0100)]
xen/arm: Add 0x prefix when printing memory size in construct_domU

Printing memory size in hex without 0x prefix can be misleading, so
add it. Also, take the opportunity to adhere to 80 chars line length
limit by moving the printk arguments to the next line.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/arm: linker: The identitymap check should cover the whole .text.header
Julien Grall [Thu, 12 Jan 2023 22:07:42 +0000 (22:07 +0000)]
xen/arm: linker: The identitymap check should cover the whole .text.header

At the moment, we are only checking that only some part of .text.header
is part of the identity mapping. However, this doesn't take into account
the literal pool which will be located at the end of the section.

While we could try to avoid using a literal pool, in the near future we
will also want to use an identity mapping for switch_ttbr().

Not everything in .text.header requires to be part of the identity
mapping. But it is below a page size (i.e. 4KB) so take a shortcut and
check that .text.header is smaller than a page size.

With that _end_boot can be removed as it is now unused. Take the
opportunity to avoid assuming that a page size is always 4KB in the
error message and comment.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoxen/arm: linker: Indent correctly _stext
Julien Grall [Thu, 12 Jan 2023 22:06:42 +0000 (22:06 +0000)]
xen/arm: linker: Indent correctly _stext

_stext is indented by one space more compare to the lines. This doesn't
seem warrant, so delete the extra space.

Signed-off: Julien Grall <jgrall@amazon.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agox86/vmx: Support for CPUs without model-specific LBR
Andrew Cooper [Mon, 9 Jan 2023 10:58:31 +0000 (10:58 +0000)]
x86/vmx: Support for CPUs without model-specific LBR

Ice Lake (server at least) has both architectural LBR and model-specific LBR.
Sapphire Rapids does not have model-specific LBR at all.  I.e. On SPR and
later, model_specific_lbr will always be NULL, so we must make changes to
avoid reliably hitting the domain_crash().

The Arch LBR spec states that CPUs without model-specific LBR implement
MSR_DBG_CTL.LBR by discarding writes and always returning 0.

Do this for any CPU for which we lack model-specific LBR information.

Adjust the now-stale comment, now that the Arch LBR spec has created a way to
signal "no model specific LBR" to guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
2 years agox86/vmx: Calculate model-specific LBRs once at start of day
Andrew Cooper [Mon, 9 Jan 2023 11:42:22 +0000 (11:42 +0000)]
x86/vmx: Calculate model-specific LBRs once at start of day

There is no point repeating this calculation at runtime, especially as it is
in the fallback path of the WRSMR/RDMSR handlers.

Move the infrastructure higher in vmx.c to avoid forward declarations,
renaming last_branch_msr_get() to get_model_specific_lbr() to highlight that
these are model-specific only.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
2 years agoxen/version: Drop compat/kernel.c
Andrew Cooper [Tue, 3 Jan 2023 15:08:56 +0000 (15:08 +0000)]
xen/version: Drop compat/kernel.c

kernel.c is mostly in an #ifndef COMPAT guard, because compat/kernel.c
re-includes kernel.c to recompile xen_version() in a compat form.

However, the xen_version hypercall is almost guest-ABI-agnostic; only
XENVER_platform_parameters has a compat split.  Handle this locally, and do
away with the re-include entirely.  Also drop the CHECK_TYPE()'s between types
that are simply char-arrays in their native and compat form.

In particular, this removed the final instances of obfuscation via the DO()
macro.

No functional change.  Also saves 2k of of .text in the x86 build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agopublic/version: Change xen_feature_info to have a fixed size
Andrew Cooper [Tue, 20 Dec 2022 15:51:07 +0000 (15:51 +0000)]
public/version: Change xen_feature_info to have a fixed size

This is technically an ABI change, but Xen doesn't operate in any environment
where "unsigned int" is different to uint32_t, so switch to the explicit form.
This avoids the need to derive (identical) compat logic for handling the
subop.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agoinclude/compat: produce stubs for headers not otherwise generated
Jan Beulich [Thu, 12 Jan 2023 15:17:54 +0000 (16:17 +0100)]
include/compat: produce stubs for headers not otherwise generated

Public headers can include other public headers. Such interdependencies
are retained in their compat counterparts. Since some compat headers are
generated only in certain configurations, the referenced headers still
need to exist. The lack thereof was observed with hvm/hvm_op.h needing
trace.h, where generation of the latter depends on TRACEBUFFER=y. Make
empty stubs in such cases (as generating the extra headers is relatively
slow and hence better to avoid). Changes to .config and incrementally
(re-)building is covered by the respective .*.cmd then no longer
matching the command to be used, resulting in the necessary re-creation
of the (possibly stub) header.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2 years agox86/shadow: call sh_detach_old_tables() directly
Jan Beulich [Thu, 12 Jan 2023 10:14:50 +0000 (11:14 +0100)]
x86/shadow: call sh_detach_old_tables() directly

There's nothing really mode specific in this function anymore (the
varying number of valid entries in v->arch.paging.shadow.shadow_table[]
is dealt with fine by the zero check, and we have other similar cases of
iterating through the full array in common.c), and hence there's neither
a need to have multiple instances of it, nor does it need calling
through a function pointer.

While moving the function drop a non-conforming and not very useful
(anymore) comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/shadow: reduce effort of hash calculation
Jan Beulich [Thu, 12 Jan 2023 10:12:35 +0000 (11:12 +0100)]
x86/shadow: reduce effort of hash calculation

The "n" input is a GFN/MFN value and hence bounded by the physical
address bits in use on a system. The hash quality won't improve by also
including the upper always-zero bits in the calculation. To keep things
as compile-time-constant as they were before, use PADDR_BITS (not
paddr_bits) for loop bounding. This reduces loop iterations from 8 to 5.

While there also drop the unnecessary conversion to an array of unsigned
char, moving the value off the stack altogether (at least with
optimization enabled).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/shadow: drop a few uses of mfn_valid()
Jan Beulich [Thu, 12 Jan 2023 10:11:47 +0000 (11:11 +0100)]
x86/shadow: drop a few uses of mfn_valid()

v->arch.paging.shadow.shadow_table[], v->arch.paging.shadow.oos[],
v->arch.paging.shadow.oos_{snapshot[],fixup[].smfn[]} as well as the
hash table are all only ever written with valid MFNs or INVALID_MFN.
Avoid the somewhat expensive mfn_valid() when checking MFNs coming from
these arrays.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/iommu: introduce AMD-Vi and Intel VT-d Kconfig options
Xenia Ragiadakou [Thu, 12 Jan 2023 10:09:16 +0000 (11:09 +0100)]
x86/iommu: introduce AMD-Vi and Intel VT-d Kconfig options

Introduce two new Kconfig options, AMD_IOMMU and INTEL_IOMMU, to allow code
specific to each IOMMU technology to be separated and, when not required,
stripped. AMD_IOMMU will be used to enable IOMMU support for platforms that
implement the AMD I/O Virtualization Technology. INTEL_IOMMU will be used to
enable IOMMU support for platforms that implement the Intel Virtualization
Technology for Directed I/O.

Since, at this point, disabling any of them would cause Xen to not compile,
the options are not visible to the user and are enabled by default if X86.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/riscv: introduce stack stuff
Oleksii Kurochko [Tue, 10 Jan 2023 15:17:56 +0000 (17:17 +0200)]
xen/riscv: introduce stack stuff

The patch introduces and sets up a stack in order to go to C environment

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoxen: Remove the arch specific header init.h
Julien Grall [Wed, 11 Jan 2023 11:44:09 +0000 (11:44 +0000)]
xen: Remove the arch specific header init.h

Both x86 and (soon) RISC-V version of init.h are empty. On Arm, it contains
a structure that should not be used by any common code.

The structure init_info is used to store information to setup the CPU
currently being brought-up. setup.h seems to be more suitable even though
the header is getting quite crowded.

Looking through the history, <asm/init.h> was introduced at the same
time as the ia64 port because for some reasons most of the macros
where duplicated. This was changed in 72c07f413879 and I don't
foresee any reason to require arch specific definition for init.h
in the near future.

Therefore remove asm/init.h for both x86 and arm (the only definition
is moved in setup.h). With that RISC-V will not need to introduce
an empty header.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
2 years agotools: Fix build with recent QEMU, use "--enable-trace-backends"
Anthony PERARD [Wed, 11 Jan 2023 09:45:29 +0000 (10:45 +0100)]
tools: Fix build with recent QEMU, use "--enable-trace-backends"

The configure option "--enable-trace-backend" isn't accepted anymore
and we should use "--enable-trace-backends" instead which was
introduce in 2014 and allow multiple backends.

"--enable-trace-backends" was introduced by:
    5b808275f3bb ("trace: Multi-backend tracing")
The backward compatible option "--enable-trace-backend" is removed by
    10229ec3b0ff ("configure: remove backwards-compatibility and obsolete options")

As we already use ./configure options that wouldn't be accepted by
older version of QEMU's configure, we will simply use the new spelling
for the option and avoid trying to detect which spelling to use.

We already make use if "--firmwarepath=" which was introduced by
    3d5eecab4a5a ("Add --firmwarepath to configure")
which already include the new spelling for "--enable-trace-backends".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
2 years agox86/S3: Restore Xen's MSR_PAT value on S3 resume
Andrew Cooper [Thu, 5 Jan 2023 20:42:58 +0000 (20:42 +0000)]
x86/S3: Restore Xen's MSR_PAT value on S3 resume

There are two paths in the trampoline, and Xen's PAT needs setting up in both,
not just the boot path.

Fixes: 4304ff420e51 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen: Drop $ARCH/trace.h
Andrew Cooper [Mon, 20 Sep 2021 16:12:56 +0000 (17:12 +0100)]
xen: Drop $ARCH/trace.h

Each architecture's main trace.h is empty.  Drop them all, so as not force all
new architectures to create an empty file too.

While moving the declaration of tb_init_done, change from int to bool.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agoautomation: add qemu-system-riscv to riscv64.dockerfile
Oleksii Kurochko [Mon, 9 Jan 2023 09:50:32 +0000 (11:50 +0200)]
automation: add qemu-system-riscv to riscv64.dockerfile

qemu-system-riscv will be used to run RISC-V Xen binary and
gather logs for smoke tests.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoautomation: temporarily disable CONFIG_COVERAGE for RISC-V randconfig jobs
Oleksii Kurochko [Fri, 6 Jan 2023 10:28:01 +0000 (12:28 +0200)]
automation: temporarily disable CONFIG_COVERAGE for RISC-V randconfig jobs

As common isn't built for RISC-V architecture now, accordingly,
common/coverage (where __gconv_* function are defined) isn't built either
but randconfig may decide to enable CONFIG_COVERAGE which will lead to
the following compilation error:

riscv64-linux-gnu-ld: prelink.o: in function `.L0 ':
arch/riscv/early_printk.c:(.text+0x18):
    undefined reference to `__gcov_init'
riscv64-linux-gnu-ld: arch/riscv/early_printk.c:(.text+0x40):
    undefined reference to `__gcov_exit'

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoMAINTAINERS: Clarify check-in requirements for mixed-author patches
George Dunlap [Mon, 5 Dec 2022 16:41:39 +0000 (16:41 +0000)]
MAINTAINERS: Clarify check-in requirements for mixed-author patches

There was a question raised recently about the requirements for
checking in a patch which was originally written by one maintainer,
then picked up and modified by a second maintainer, and which they now both
agree should be checked in.

It was proposed that in that case, the following set of tags would suffice:

Signed-off-by: First Author <...>
Signed-off-by: Second Author <...>
Reviewed-by: First Author <...>
The rationale was as follows:

1. The patch will be a mix of code, whose copyright is owned by the
various authors (or the companies they work for).  It's important to
keep this information around in the event, for instance, of a license
change or something else requiring knowledge of the copyright owner.

2. The Signed-off-by of the Second Author approves not only their own
code, but First Author's code; the Reviewed-by of the First Author
approves not only their own code, but the Second Author's code.  Thus
all the code has been approved by a maintainer, as well as someone who
was not the author.

In support of this, several arguments were put forward:

* We shouldn't make it harder for maintainers to get their code in
  than for non-maintainers

* The system we set up should not add pointless bureaucracy; nor
  discourage collaboration; nor encourage contributors to get around
  the rules by dropping important information.  (For instance, by
  removing the first SoB, so that the patch appears to have been
  written entirely by Second Author.)

Concerns were raised about two maintainers from the same company
colluding to get a patch in from their company; but such maintainers
could already collude, by working on the patch in secret, and posting
it publicly with only a single author's SoB, and having the other
person review it.

There's also something slightly strange about adding "Reviewed-by" to
code that you've written; but in the end you're reviewing not only the
code itself, but the final arrangement of it.  There's no need to
overcomplicate things.

Encode this in MAINTAINERS as follows:

* Refine the wording of requirement #2 in the check-in policy; such
that *each change* must have approval from someone other than *the
person who wrote it*.

* Add a paragraph explicitly stating that the multiple-SoB-approval
  system satisfies the requirements, and why.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/include: include <asm/types.h> in <xen/early_printk.h>
Oleksii Kurochko [Mon, 9 Jan 2023 13:29:49 +0000 (14:29 +0100)]
xen/include: include <asm/types.h> in <xen/early_printk.h>

<asm/types.h> should be included because second argument of
early_puts has type 'size_t' which is defined in <asm/types.h>

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agox86/shadow: sh_remove_all_mappings() is HVM-only
Jan Beulich [Mon, 9 Jan 2023 13:29:13 +0000 (14:29 +0100)]
x86/shadow: sh_remove_all_mappings() is HVM-only

All callers live in hvm.c. Moving the function there is undesirable, as
hash walking is local to common.c and probably better remains so. Hence
move an #endif, allowing to drop an #ifdef.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/shadow: correct shadow type bounds checks
Jan Beulich [Mon, 9 Jan 2023 13:26:54 +0000 (14:26 +0100)]
x86/shadow: correct shadow type bounds checks

In sh_remove_shadow_via_pointer() the type range checks, besides being
bogus (should be ">= min && <= max"), are fully redundant with the has-
up-pointer assertion. In sh_hash_audit_bucket() properly use "min"
instead of assuming a certain order of type numbers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/shadow: simplify conditionals in sh_{get,put}_ref()
Jan Beulich [Mon, 9 Jan 2023 13:26:12 +0000 (14:26 +0100)]
x86/shadow: simplify conditionals in sh_{get,put}_ref()

In both cases the "entry_pa != 0" check is redundant; storing 0 when the
field already is 0 is quite fine. Move the cheaper remaining part first
in sh_get_ref(). In sh_put_ref() convert the has-up-pointer check into
an assertion (requiring the zero check to be retained there).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/shadow: move bogus HVM checks in sh_pagetable_dying()
Jan Beulich [Mon, 9 Jan 2023 13:25:29 +0000 (14:25 +0100)]
x86/shadow: move bogus HVM checks in sh_pagetable_dying()

Perhaps these should have been dropped right in 2fb2dee1ac62 ("x86/mm:
pagetable_dying() is HVM-only"). Convert both to assertions, noting that
in particular the one in the 3-level variant of the function came too
late anyway - first thing there we access the HVM part of a union.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/shadow: convert sh_audit_flags()'es 1st parameter to domain
Jan Beulich [Mon, 9 Jan 2023 13:24:18 +0000 (14:24 +0100)]
x86/shadow: convert sh_audit_flags()'es 1st parameter to domain

Nothing in there is vCPU-specific.

With the introduction of the local variable in sh_audit_l1_table(),
convert other uses of v->domain as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoautomation: Add RISC-V 64 cross-build tests for Xen
Oleksii Kurochko [Thu, 5 Jan 2023 12:01:46 +0000 (14:01 +0200)]
automation: Add RISC-V 64 cross-build tests for Xen

Add build jobs to cross-compile Xen-only for RISC-V 64.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoxen/riscv: Initial RISC-V support to build/run minimal Xen
Oleksii Kurochko [Thu, 5 Jan 2023 12:01:45 +0000 (14:01 +0200)]
xen/riscv: Initial RISC-V support to build/run minimal Xen

The patch provides a minimal amount of changes to start
build and run minimal Xen binary at GitLab CI&CD that will
allow continuous checking of the build status of RISC-V Xen.

Except introduction of new files the following changes were done:
* Redefinition of ALIGN define from '.align 2' to '.align 4', as 2 was
  incorrect choice done previously.
* ALL_OBJ-y and ALL_LIBS-y were temporary overwritted to produce
  a minimal hypervisor image otherwise it will be required to push
  huge amount of headers and stubs for common, drivers, libs etc which
  aren't necessary for now.
* Section changed from .text to .text.header for start function
  to make it the first one executed.
* Rework riscv64/Makefile logic to rebase over changes since the first
  RISC-V commit.

RISC-V Xen can be built by the following instructions:
  $ CONTAINER=riscv64 ./automation/scripts/containerize \
       make XEN_TARGET_ARCH=riscv64 -C xen tiny64_defconfig
  $ CONTAINER=riscv64 ./automation/scripts/containerize \
       make XEN_TARGET_ARCH=riscv64 -C xen build

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agohvmloader: use memory type constants
Jan Beulich [Thu, 5 Jan 2023 15:21:13 +0000 (16:21 +0100)]
hvmloader: use memory type constants

Now that we have them available in a header which is okay to use from
hvmloader sources, do away with respective literal numbers and silent
assumptions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
2 years agox86/mm: avoid hard-coding PAT in get_page_from_l1e()
Demi Marie Obenour [Thu, 5 Jan 2023 15:19:43 +0000 (16:19 +0100)]
x86/mm: avoid hard-coding PAT in get_page_from_l1e()

get_page_from_l1e() relied on Xen's choice of PAT, which is brittle in
the face of future PAT changes.  Instead, compute the actual cacheability
used by the CPU and switch on that, as this will work no matter what PAT
Xen uses.

No functional change intended.  This code is itself questionable and may
be removed in the future, but removing it would be an observable
behavior change and so is out of scope for this patch series.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoCI: Simplify the MUSL check
Andrew Cooper [Thu, 29 Dec 2022 22:19:40 +0000 (22:19 +0000)]
CI: Simplify the MUSL check

There's no need to do ad-hoc string parsing.  Use grep -q instead.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoCI: Fix build script when CROSS_COMPILE is in use
Andrew Cooper [Thu, 29 Dec 2022 21:46:50 +0000 (21:46 +0000)]
CI: Fix build script when CROSS_COMPILE is in use

Some testcases use a cross compiler.  Presently it's only arm32 and due to
previous cleanup the only thing which is now wrong is printing the compiler
version at the start of day.

Construct $cc to match what `make` will eventually choose given CROSS_COMPILE,
taking care not to modify $CC.  Use $cc throughout the rest of the script.

Also correct the compiler detection logic.  Plain "gcc" was wrong, and
"clang"* was a bodge highlighting the issue, but neither survive the
CROSS_COMPILE correction.  Instead, construct cc_is_{gcc,clang} booleans like
we do elsewhere in the build system, by querying the --version text for gcc or
clang.

While making this change, adjust cc_ver to be calculated once at the same time
as cc_is_* are calculated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoCI: Express HYPERVISOR_ONLY in build.yml
Andrew Cooper [Thu, 29 Dec 2022 15:52:50 +0000 (15:52 +0000)]
CI: Express HYPERVISOR_ONLY in build.yml

Whether to build only Xen, or everything, is a property of container,
toolchain and/or testcase.  It is not a property of XEN_TARGET_ARCH.

Capitalise HYPERVISOR_ONLY and have it set by all the
debian-unstable-gcc-arm32-* testcases at the point that arm32 get matched with
a container that can only build Xen.

To reduce the churn elsewhere, retain the RANDCONFIG implies HYPERVISOR_ONLY
property.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoCI: Only calculate ./configure args if needed
Andrew Cooper [Thu, 29 Dec 2022 20:05:33 +0000 (20:05 +0000)]
CI: Only calculate ./configure args if needed

This is purely code motion of the cfgargs construction, into the case where it
is used.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoCI: Remove guesswork about which artefacts to preserve
Andrew Cooper [Thu, 29 Dec 2022 20:01:52 +0000 (20:01 +0000)]
CI: Remove guesswork about which artefacts to preserve

Preserve the artefacts based on the `make` rune we actually ran, rather than
guesswork about which rune we would have run based on other settings.

Note that the ARM qemu smoke tests depend on finding binaries/xen even from
full builds.  Also the Jessie-32 containers build tools but not Xen.

This means the x86_32 builds now store relevant artefacts.  No change in other
configurations.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoCI: Drop automation/configs/
Andrew Cooper [Thu, 29 Dec 2022 15:39:13 +0000 (15:39 +0000)]
CI: Drop automation/configs/

Having 3 extra hypervisor builds on the end of a full build is deeply
confusing to debug if one of them fails, because the .config file presented in
the artefacts is not the one which caused a build failure.  Also, the log
tends to be truncated in the UI.

PV-only is tested as part of PV-Shim in a full build anyway, so doesn't need
repeating.  HVM-only and neither appear frequently in randconfig, so drop all
the logic here to simplify things.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoxen/riscv: Introduce asm/page-bits.h
Alistair Francis [Wed, 28 Dec 2022 05:20:18 +0000 (15:20 +1000)]
xen/riscv: Introduce asm/page-bits.h

Define PADDR_BITS and PAGE_SHIFT for the RISC-V 64-bit architecture.

Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoxen/arm: vpl011: add ASSERT_UNREACHABLE in vpl011_mmio_read
Jiamei Xie [Mon, 5 Dec 2022 07:26:40 +0000 (15:26 +0800)]
xen/arm: vpl011: add ASSERT_UNREACHABLE in vpl011_mmio_read

In vpl011_mmio_read switch block, all cases should have a return. Add
ASSERT_UNREACHABLE to catch case where the return is not added.

Signed-off-by: Jiamei Xie <jiamei.xie@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/arm: vpl011: emulate non-SBSA registers as WI/RAZ
Jiamei Xie [Mon, 5 Dec 2022 07:26:39 +0000 (15:26 +0800)]
xen/arm: vpl011: emulate non-SBSA registers as WI/RAZ

When the guest kernel enables DMA engine with "CONFIG_DMA_ENGINE=y",
Linux SBSA PL011 driver will access PL011 DMACR register in some
functions. As chapter "B Generic UART" in "ARM Server Base System
Architecture"[1] documentation describes, SBSA UART doesn't support
DMA. In current code, when the kernel tries to access DMACR register,
Xen will inject a data abort:
Unhandled fault at 0xffffffc00944d048
Mem abort info:
  ESR = 0x96000000
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x00: ttbr address size fault
Data abort info:
  ISV = 0, ISS = 0x00000000
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000020e2e000
[ffffffc00944d048] pgd=100000003ffff803, p4d=100000003ffff803, pud=100000003ffff803, pmd=100000003fffa803, pte=006800009c090f13
Internal error: ttbr address size fault: 96000000 [#1] PREEMPT SMP
...
Call trace:
 pl011_stop_rx+0x70/0x80
 tty_port_shutdown+0x7c/0xb4
 tty_port_close+0x60/0xcc
 uart_close+0x34/0x8c
 tty_release+0x144/0x4c0
 __fput+0x78/0x220
 ____fput+0x1c/0x30
 task_work_run+0x88/0xc0
 do_notify_resume+0x8d0/0x123c
 el0_svc+0xa8/0xc0
 el0t_64_sync_handler+0xa4/0x130
 el0t_64_sync+0x1a0/0x1a4
Code: b9000083 b901f001 794038a0 8b000042 (b9000041)
---[ end trace 83dd93df15c3216f ]---
note: bootlogd[132] exited with preempt_count 1
/etc/rcS.d/S07bootlogd: line 47: 132 Segmentation fault start-stop-daemon

As discussed in [2], this commit makes the access to non-SBSA registers
RAZ/WI as an improvement.

[1] https://developer.arm.com/documentation/den0094/c/?lang=en
[2] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2211161552420.4020@ubuntu-linux-20-04-desktop/

Signed-off-by: Jiamei Xie <jiamei.xie@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
2 years agoxen/common: page_alloc: Re-order includes
Julien Grall [Fri, 23 Dec 2022 09:26:36 +0000 (09:26 +0000)]
xen/common: page_alloc: Re-order includes

Order the includes with the xen headers first, then asm headers and
last public headers. Within each category, they are sorted alphabetically.

Note that the includes in protected by CONFIG_X86 hasn't been sorted
to avoid adding multiple #ifdef.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/shadow: don't open-code copy_domain_page()
Jan Beulich [Thu, 22 Dec 2022 09:08:31 +0000 (10:08 +0100)]
x86/shadow: don't open-code copy_domain_page()

Let's use the library-like function that we have.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/shadow: adjust and move sh_type_to_size[]
Jan Beulich [Thu, 22 Dec 2022 09:07:50 +0000 (10:07 +0100)]
x86/shadow: adjust and move sh_type_to_size[]

Drop the SH_type_none entry - there are no allocation attempts with
this type, and there also shouldn't be any. Adjust the shadow_size()
alternative path to match that change. Also generalize two related
assertions.

While there move the entire table and the respective part of the comment
there to hvm.c, resulting in one less #ifdef. In the course of the
movement switch to using designated initializers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/PV: drop dead paging_update_paging_modes() call during Dom0 construction
Jan Beulich [Thu, 22 Dec 2022 09:06:57 +0000 (10:06 +0100)]
x86/PV: drop dead paging_update_paging_modes() call during Dom0 construction

The function won't ever be invoked, as paging_mode_enabled() always
returns false here due to the immediately preceding clearing of
d->arch.paging.mode. While compilers recognize this and eliminate the
call, make this explicit in the source (which likely 9a28170f2da2 ["pvh
dom0: construct_dom0 changes"] should have done right away, albeit even
before that the call looks to have been pointless - shadow mode enabling
has occurred later virtually forever).

While there also update an adjacent partly stale comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/paging: fold HAP and shadow memory alloc related fields
Jan Beulich [Thu, 22 Dec 2022 09:05:21 +0000 (10:05 +0100)]
x86/paging: fold HAP and shadow memory alloc related fields

Especially with struct shadow_domain and struct hap_domain not living in
a union inside struct paging_domain, let's avoid the duplication: The
fields are named and used in identical ways, and only one of HAP or
shadow can be in use for a domain. This then also renders involved
expressions slightly more legible.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agopublic: misra rule 20.7 fix on memory.h
Luca Fancellu [Thu, 22 Dec 2022 09:04:34 +0000 (10:04 +0100)]
public: misra rule 20.7 fix on memory.h

Cppcheck has found a violation of rule 20.7 for the macro
XENMEM_SHARING_OP_FIELD_MAKE_GREF, the argument "val" is used in an
expression, hence add parenthesis to the argument "val" to fix the
violation.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agopublic: misra rule 20.7 fix on errno.h
Luca Fancellu [Thu, 22 Dec 2022 09:04:17 +0000 (10:04 +0100)]
public: misra rule 20.7 fix on errno.h

Cppcheck has found a violation of rule 20.7 for the macro XEN_ERRNO,
while the macro parameter is never used as an expression, it doesn't
harm the code or the readability to add parenthesis, so add them.

This finding is reported also by eclair and coverity.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/arm: Allow to set grant table related limits for dom0less domUs
Michal Orzel [Mon, 19 Dec 2022 08:59:08 +0000 (09:59 +0100)]
xen/arm: Allow to set grant table related limits for dom0less domUs

At the moment, for dom0less domUs, we do not have a way to specify
per domain grant table related limits (unlike when using xl), namely
max version, max number of grant frames, max number of maptrack frames.
This means that such domains always use the values specified by the Xen
command line parameters or their default values if unspecified.

In order to have more control over dom0less domUs, introduce the
following device-tree properties that can be set under domUs nodes:
 - max_grant_version to set the maximum grant table version the domain
   is allowed to use,
 - max_grant_frames to set the maximum number of grant frames the domain
   is allowed to have,
 - max_maptrack_frames to set the maximum number of grant maptrack frames
   the domain is allowed to have.

Update documentation accordingly.

Note that the values obtained from device tree are of type uint32_t,
whereas the d_cfg.max_{grant_frames,maptrack_frames} are of type int32_t.
Call panic in case of overflow. Other sanity checks are already there in
grant_table_init() resulting in panic in case of errors, therefore no
need to repeat them in create_domUs().

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agox86/ucode: load microcode earlier on boot CPU
Sergey Dyasli [Mon, 19 Dec 2022 14:45:33 +0000 (14:45 +0000)]
x86/ucode: load microcode earlier on boot CPU

Call early_microcode_init() straight after multiboot modules become
accessible. Modify it to load the ucode directly from the blob bypassing
populating microcode_cache because xmalloc is still not available at
that point during Xen boot.

Introduce early_microcode_init_cache() for populating microcode_cache.
It needs to rescan the modules in order to find the new virtual address
of the ucode blob because it changes during the boot process, e.g.
from 0x00000000010802fc to 0xffff83204dac52fc.

While at it, drop alternative_vcall() from early_microcode_init() since
it's not useful in an __init fuction.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/ucode: allow cpu_request_microcode() to skip memory allocation
Sergey Dyasli [Mon, 19 Dec 2022 14:45:32 +0000 (14:45 +0000)]
x86/ucode: allow cpu_request_microcode() to skip memory allocation

This is a preparatory step in order to do earlier microcode loading on
the boot CPU when the domain heap has not been initialized yet and
xmalloc still unavailable.

Add make_copy argument which will allow to load microcode directly from
the blob bypassing microcode_cache.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/multiboot: add proper struct definitions to typedefs
Sergey Dyasli [Mon, 19 Dec 2022 14:45:31 +0000 (14:45 +0000)]
xen/multiboot: add proper struct definitions to typedefs

This allows to use them for forward declaration in other headers.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agox86: derive XEN_MSR_PAT from its individual entries
Demi Marie Obenour [Tue, 20 Dec 2022 15:51:55 +0000 (16:51 +0100)]
x86: derive XEN_MSR_PAT from its individual entries

This avoids it being a magic constant that is difficult for humans to
decode.  Use BUILD_BUG_ON to check that the old and new values are
identical.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86: replace EPT_EMT_* constants with X86_MT_*
Demi Marie Obenour [Tue, 20 Dec 2022 15:51:18 +0000 (16:51 +0100)]
x86: replace EPT_EMT_* constants with X86_MT_*

This allows eliminating the former.  No functional change intended.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86: replace MTRR_* constants with X86_MT_* constants
Demi Marie Obenour [Tue, 20 Dec 2022 15:50:38 +0000 (16:50 +0100)]
x86: replace MTRR_* constants with X86_MT_* constants

This allows eliminating of the former, with the exception of
MTRR_NUM_TYPES.  MTRR_NUM_TYPES is kept, as due to a quirk of the x86
architecture X86_MT_UCM (7) is not valid in an MTRR.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86: replace PAT_* with X86_MT_*
Demi Marie Obenour [Tue, 20 Dec 2022 15:49:35 +0000 (16:49 +0100)]
x86: replace PAT_* with X86_MT_*

This allows eliminating the former.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86: add memory type constants
Demi Marie Obenour [Tue, 20 Dec 2022 15:49:16 +0000 (16:49 +0100)]
x86: add memory type constants

These are not currently used, so there is no functional change.  Future
patches will use these constants.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/arm: smmuv3: mark arm_smmu_disable_pasid __maybe_unused
Stewart Hildebrand [Thu, 15 Dec 2022 21:26:19 +0000 (16:26 -0500)]
xen/arm: smmuv3: mark arm_smmu_disable_pasid __maybe_unused

When building with clang 12 and CONFIG_ARM_SMMU_V3=y, we observe the
following build error:

drivers/passthrough/arm/smmu-v3.c:1408:20: error: unused function 'arm_smmu_disable_pasid' [-Werror,-Wunused-function]
static inline void arm_smmu_disable_pasid(struct arm_smmu_master *master) { }
                   ^

arm_smmu_disable_pasid is not currently called from anywhere in Xen, but
it is inside a section of code guarded by CONFIG_PCI_ATS, which may be
helpful in the future if the PASID feature is to be implemented. Add the
attribute __maybe_unused to the function.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>