Cpuid leaf 4 contains information about how the state of the tsc, its
mode, and some additional information. A commit that is queued for
linux would like to use this to determine whether the tsc mode has been
set to 'no emulation' in order to make some decisions about which
clocksource is more reliable.
Expose this information in the public API headers so that they can
subsequently be imported into linux and used there.
Juergen Gross [Mon, 6 Feb 2023 07:52:15 +0000 (08:52 +0100)]
xen/public: move xenstore related doc into 9pfs.h
The Xenstore related documentation is currently to be found in
docs/misc/9pfs.pandoc, instead of the related header file
xen/include/public/io/9pfs.h like for most other paravirtualized
device protocols.
There is a comment in the header pointing at the document, but the
given file name is wrong. Additionally such headers are meant to be
copied into consuming projects (Linux kernel, qemu, etc.), so pointing
at a doc file in the Xen git repository isn't really helpful for the
consumers of the header.
This situation is far from ideal, which is already being proved by the
fact that neither qemu nor the Linux kernel are implementing the
device attach/detach protocol correctly.
Change that by moving the Xenstore related 9pfs documentation from
docs/misc/9pfs.pandoc into xen/include/public/io/9pfs.h.
x86/vpmu: remove unused svm and vmx specific headers
Fixes: 8c20aca6751b ("x86/vPMU: invoke <vendor>_vpmu_initialise() through a hook as well") Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Fixes: 2191599bacb7 ("x86/emul: Simplfy emulation state setup") Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
tools/python: change 's#' size type for Python >= 3.10
Python < 3.10 by default uses 'int' type for data+size string types
(s#), unless PY_SSIZE_T_CLEAN is defined - in which case it uses
Py_ssize_t. The former behavior was removed in Python 3.10 and now it's
required to define PY_SSIZE_T_CLEAN before including Python.h, and using
Py_ssize_t for the length argument. The PY_SSIZE_T_CLEAN behavior is
supported since Python 2.5.
Adjust bindings accordingly.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Luca Fancellu [Mon, 30 Jan 2023 11:01:32 +0000 (11:01 +0000)]
xen/cppcheck: add parameter to skip given MISRA rules
Add parameter to skip the passed MISRA rules during the cppcheck
analysis, the rules are specified as a list of comma separated
rules with the MISRA number notation (e.g. 1.1,1.3,...).
Modify convert_misra_doc.py script to take an extra parameter
giving a list of MISRA rule to be skipped, comma separated.
While there, fix some typos in the help and print functions.
Modify settings.py and cppcheck_analysis.py to have a new
parameter (--cppcheck-skip-rules) used to specify a list of
MISRA rule to be skipped during the cppcheck analysis.
Sort alphabetically cppcheck report entries when producing the text
report, this will help comparing different reports and will group
together findings from the same file.
The sort operation is performed with two criteria, the first one is
sorting by misra rule, the second one is sorting by file.
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
[stefano: add black line for code style] Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
xen/arm: Probe the load/entry point address of an uImage correctly
Currently, kernel_uimage_probe() does not read the load/entry point address
set in the uImge header. Thus, info->zimage.start is 0 (default value). This
causes, kernel_zimage_place() to treat the binary (contained within uImage)
as position independent executable. Thus, it loads it at an incorrect
address.
The correct approach would be to read "uimage.load" and set
info->zimage.start. This will ensure that the binary is loaded at the
correct address. Also, read "uimage.ep" and set info->entry (ie kernel entry
address).
If user provides load address (ie "uimage.load") as 0x0, then the image is
treated as position independent executable. Xen can load such an image at
any address it considers appropriate. A position independent executable
cannot have a fixed entry point address.
This behavior is applicable for both arm32 and arm64 platforms.
Earlier for arm32 and arm64 platforms, Xen was ignoring the load and entry
point address set in the uImage header. With this commit, Xen will use them.
This makes the behavior of Xen consistent with uboot for uimage headers.
Users who want to use Xen with statically partitioned domains, can provide
non zero load address and entry address for the dom0/domU kernel. It is
required that the load and entry address provided must be within the memory
region allocated by Xen.
A deviation from uboot behaviour is that we consider load address == 0x0,
to denote that the image supports position independent execution. This
is to make the behavior consistent across uImage and zImage.
Andrew Cooper [Wed, 25 Jan 2023 16:18:16 +0000 (16:18 +0000)]
x86/shadow: Fix PV32 shadowing when CONFIG_HVM is enabled
The OSSTest bisector identified an issue with c/s 1894049fa283 ("x86/shadow:
L2H shadow type is PV32-only") in !HVM builds.
The bug is ultimately caused by sh_type_to_size[] not actually being specific
to HVM guests, and it's position in shadow/hvm.c mislead the reasoning.
To fix the issue that OSSTest identified, SH_type_l2h_64_shadow must still
have the value 1 in any CONFIG_PV32 build. But simply adjusting this leaves
us with misleading logic, and a reasonable chance of making a related error
again in the future.
In hindsight, moving sh_type_to_size[] out of common.c in the first place a
mistake. Therefore, move sh_type_to_size[] back to living in common.c,
leaving a comment explaining why it happens to be inside an HVM conditional.
This effectively reverts the second half of 4fec945409fc ("x86/shadow: adjust
and move sh_type_to_size[]") while retaining the other improvements from the
same changeset.
While making this change, also adjust the sh_type_to_size[] declaration to
match its definition.
Fixes: 4fec945409fc ("x86/shadow: adjust and move sh_type_to_size[]") Fixes: 1894049fa283 ("x86/shadow: L2H shadow type is PV32-only") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@cloud.com>
Jason Andryuk [Thu, 26 Jan 2023 09:58:23 +0000 (10:58 +0100)]
libxl: fix guest kexec - skip cpuid policy
When a domain performs a kexec (soft reset), libxl__build_pre() is
called with the existing domid. Calling libxl__cpuid_legacy() on the
existing domain fails since the cpuid policy has already been set, and
the guest isn't rebuilt and doesn't kexec.
xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM
During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
issue. Before commit 34990446ca91, the libxl__cpuid_legacy() failure
would have been ignored, so kexec would continue.
Fixes: 34990446ca91 ("libxl: don't ignore the return value from xc_cpuid_apply_policy") Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
I'm observing guest kexec trigger xenstored to abort on a double free.
gdb output:
Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140645614258112) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
at ./nptl/pthread_kill.c:44
at ./nptl/pthread_kill.c:78
at ./nptl/pthread_kill.c:89
at ../sysdeps/posix/raise.c:26
at talloc.c:119
ptr=ptr@entry=0x559fae724290) at talloc.c:232
at xenstored_core.c:2945
(gdb) frame 5
at talloc.c:119
119 TALLOC_ABORT("Bad talloc magic value - double free");
(gdb) frame 7
at xenstored_core.c:2945
2945 talloc_increase_ref_count(conn);
(gdb) p conn
$1 = (struct connection *) 0x559fae724290
The trace shows that DESTROY was called for connection 0x559fae724290,
but that is the same pointer (conn) main() was looping through from
connections. So it wasn't actually removed from the connections list?
Reverting commit e8e6e42279a5 "tools/xenstore: simplify loop handling
connection I/O" fixes the abort/double free. I think the use of
list_for_each_entry_safe is incorrect. list_for_each_entry_safe makes
traversal safe for deleting the current iterator, but RELEASE/do_release
will delete some other entry in the connections list. I think the
observed abort is because list_for_each_entry has next pointing to the
deleted connection, and it is used in the subsequent iteration.
Add a comment explaining the unsuitability of list_for_each_entry_safe.
Also notice that the old code takes a reference on next which would
prevents a use-after-free.
Michal Orzel [Mon, 23 Jan 2023 13:10:23 +0000 (14:10 +0100)]
automation: Modify static-mem check in qemu-smoke-dom0less-arm64.sh
At the moment, the static-mem check relies on the way Xen exposes the
memory banks in device tree. As this might change, the check should be
modified to be generic and not to rely on device tree. In this case,
let's use /proc/iomem which exposes the memory ranges in %08x format
as follows:
<start_addr>-<end_addr> : <description>
This way, we can grep in /proc/iomem for an entry containing memory
region defined by the static-mem configuration with "System RAM"
description. If it exists, mark the test as passed. Also, take the
opportunity to add 0x prefix to domu_{base,size} definition rather than
adding it in front of each occurence.
Julien Grall [Tue, 24 Jan 2023 19:32:14 +0000 (19:32 +0000)]
xen/arm32: head: Remove restriction where to load Xen
At the moment, bootloaders can load Xen anywhere in memory but the
region 2MB - 4MB. While I am not aware of any issue, we have no way
to tell the bootloader to avoid that region.
In addition to that, in the future, Xen may grow over 2MB if we
enable feature like UBSAN or GCOV. To avoid widening the restriction
on the load address, it would be better to get rid of it.
When the identity mapping is clashing with the Xen runtime mapping,
we need an extra indirection to be able to replace the identity
mapping with the Xen runtime mapping.
Reserve a new memory region that will be used to temporarily map Xen.
For convenience, the new area is re-using the same first slot as the
domheap which is used for per-cpu temporary mapping after a CPU has
booted.
Furthermore, directly map boot_second (which cover Xen and more)
to the temporary area. This will avoid to allocate an extra page-table
for the second-level and will helpful for follow-up patches (we will
want to use the fixmap whilst in the temporary mapping).
Lastly, some part of the code now needs to know whether the temporary
mapping was created. So reserve r12 to store this information.
Julien Grall [Tue, 24 Jan 2023 19:31:11 +0000 (19:31 +0000)]
xen/arm32: head: Introduce an helper to flush the TLBs
The sequence for flushing the TLBs is 4 instruction long and often
requires an explanation how it works.
So create a helper and use it in the boot code (switch_ttbr() is left
alone until we decide the semantic of the call).
Note that in secondary_switched, we were also flushing the instruction
cache and branch predictor. Neither of them was necessary because:
* We are only supporting IVIPT cache on arm32, so the instruction
cache flush is only necessary when executable code is modified.
None of the boot code is doing that.
* The instruction cache is not invalidated and misprediction is not
a problem at boot.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Tested-by: Henry Wang <Henry.Wang@arm.com>
Julien Grall [Tue, 24 Jan 2023 19:31:08 +0000 (19:31 +0000)]
xen/arm32: head: Jump to the runtime mapping in enable_mmu()
At the moment, enable_mmu() will return to an address in the 1:1 mapping
and each path is responsible to switch to the runtime mapping.
In a follow-up patch, the behavior to switch to the runtime mapping
will become more complex. So to avoid more code/comment duplication,
move the switch in enable_mmu().
Lastly, take the opportunity to replace load from literal pool with
mov_w.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Tested-by: Henry Wang <Henry.Wang@arm.com>
Julien Grall [Tue, 24 Jan 2023 19:27:49 +0000 (19:27 +0000)]
xen/arm: Clean-up the memory layout
In a follow-up patch, the base address for the common mappings will
vary between arm32 and arm64. To avoid any duplication, define
every mapping in the common region from the previous one.
Take the opportunity to:
* add missing *_SIZE for FIXMAP_VIRT_* and XEN_VIRT_*
* switch to MB()/GB() to avoid hexadecimal (easier to read)
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com>
Julien Grall [Tue, 24 Jan 2023 19:26:09 +0000 (19:26 +0000)]
xen/arm32: flushtlb: Reduce scope of barrier for local TLB flush
Per G5-9224 in ARM DDI 0487I.a:
"A DSB NSH is sufficient to ensure completion of TLB maintenance
instructions that apply to a single PE. A DSB ISH is sufficient to
ensure completion of TLB maintenance instructions that apply to PEs
in the same Inner Shareable domain.
"
This is quoting the Armv8 specification because I couldn't find an
explicit statement in the Armv7 specification. Instead, I could find
bits in various places that confirm the same implementation.
Furthermore, Linux has been using 'nsh' since 2013 (62cbbc42e001
"ARM: tlb: reduce scope of barrier domains for TLB invalidation").
This means barrier after local TLB flushes could be reduced to
non-shareable.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Tested-by: Henry Wang <Henry.Wang@arm.com>
Julien Grall [Tue, 24 Jan 2023 19:25:50 +0000 (19:25 +0000)]
xen/arm64: flushtlb: Implement the TLBI repeat workaround for TLB flush by VA
Looking at the Neoverse N1 errata document, it is not clear to me
why the TLBI repeat workaround is not applied for TLB flush by VA.
The TLB flush by VA helpers are used in flush_xen_tlb_range_va_local()
and flush_xen_tlb_range_va(). So if the range size is a fixed size smaller
than a PAGE_SIZE, it would be possible that the compiler remove the loop
and therefore replicate the sequence described in the erratum 1286807.
So the TLBI repeat workaround should also be applied for the TLB flush
by VA helpers.
Fixes: 22e323d115d8 ("xen/arm: Add workaround for Cortex-A76/Neoverse-N1 erratum #1286807") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Tested-by: Henry Wang <Henry.Wang@arm.com>
Julien Grall [Tue, 24 Jan 2023 19:25:19 +0000 (19:25 +0000)]
xen/arm64: flushtlb: Reduce scope of barrier for local TLB flush
Per D5-4929 in ARM DDI 0487H.a:
"A DSB NSH is sufficient to ensure completion of TLB maintenance
instructions that apply to a single PE. A DSB ISH is sufficient to
ensure completion of TLB maintenance instructions that apply to PEs
in the same Inner Shareable domain.
"
This means barrier after local TLB flushes could be reduced to
non-shareable.
Note that the scope of the barrier in the workaround has not been
changed because Linux v6.1-rc8 is also using 'ish' and I couldn't
find anything in the Neoverse N1 suggesting that a 'nsh' would
be sufficient.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Tested-by: Henry Wang <Henry.Wang@arm.com>
ns16550: fix the use of simple_strtoul() for parsing u64
One should be using simple_strtoull() ( instead of simple_strtoul() )
to assign value to 'u64' variable. The reason being u64 can be
represented by 'unsigned long long' on all the platforms (ie Arm32,
Arm64 and x86).
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Mon, 23 Jan 2023 14:03:58 +0000 (15:03 +0100)]
build: fix building flask headers before descending in flask/ss/
Unfortunatly, adding prerequisite to "$(obj)/ss/built_in.o" doesn't
work because we have "$(obj)/%/built_in.o: $(obj)/% ;" in Rules.mk.
So, make is allow to try to build objects in "xsm/flask/ss/" before
generating the headers.
Adding a prerequisite on "$(obj)/ss" instead will fix the issue as
that's the target used to run make in this subdirectory.
Unfortunatly, that target is also used when running `make clean`, so
we want to ignore it in this case. $(MAKECMDGOALS) can't be used in
this case as it is empty, but we can guess which operation is done by
looking at the list of loaded makefiles.
Fixes: 7a3bcd2babcc ("build: build everything from the root dir, use obj=$subdir") Reported-by: "Daniel P. Smith" <dpsmith@apertussolutions.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/hvm: Enable guest access to MSR_PKRS
Have guest_{rd,wr}msr(), via hvm_{get,set}_reg(), access either the live
register, or stashed state, depending on context. Include MSR_PKRS for
migration, and let the guest have full access.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/hvm: Context switch MSR_PKRS
Under PKS, MSR_PKRS is available and based on the CPUID policy alone, and
usable independently of CR4.PKS. See the large comment in prot-key.h for
details of the context switching arrangement.
Use WRMSRNS right away, as we don't care about serialsing properties for
context switching this MSR.
Sanitise MSR_PKRS on boot. In anticipation of wanting to use PKS for Xen in
the future, arrange for the sanitisation to occur prior to potentially setting
CR4.PKS; if PKEY0.{AD,WD} leak in from a previous context, we will triple
fault immediately on setting CR4.PKS.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <JBeulich@suse.com>
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/prot-key: Split PKRU infrastructure out of asm/processor.h
asm/processor.h is in desperate need of splitting up, and protection key
functionality in only used in the emulator and pagewalk. Introduce a new
asm/prot-key.h and move the relevant content over.
Rename the PKRU_* constants to drop the user part and to use the architectural
terminology.
Drop the read_pkru_{ad,wd}() helpers entirely. The pkru infix is about to
become wrong, and the sole user is shorter and easier to follow without the
helpers.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 14 Dec 2021 16:51:28 +0000 (16:51 +0000)]
x86/prot-key: Enumeration for Protection Key Supervisor
Protection Key Supervisor works in a very similar way to Protection Key User,
except that instead of a PKRU register used by the {RD,WR}PKRU instructions,
the supervisor protection settings live in MSR_PKRS and is accessed using
normal {RD,WR}MSR instructions.
PKS has the same problematic interactions with PV guests as PKU (more infact,
given the guest kernel's CPL), so we'll only support this for HVM guests for
now.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 10 Jan 2023 10:57:21 +0000 (10:57 +0000)]
x86/boot: Sanitise PKRU on boot
While the reset value of the register is 0, it might not be after kexec/etc.
If PKEY0.{WD,AD} have leaked in from an earlier context, construction of a PV
dom0 will explode.
Sequencing wise, this must come after setting CR4.PKE, and before we touch any
user mappings.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 18 Jan 2023 19:20:05 +0000 (19:20 +0000)]
x86/vmx: Partially revert "x86/vmx: implement Notify VM Exit"
The original patch tried to do two things - implement VMNotify, and
re-optimise VT-x to not intercept #DB/#AC by default.
The second part is buggy in multiple ways. Both GDBSX and Introspection need
to conditionally intercept #DB, which was not accounted for. Also, #DB
interception has nothing at all to do with cpu_has_monitor_trap_flag.
Revert the second half, leaving #DB/#AC intercepted unilaterally, but with
VMNotify active by default when available.
Fixes: 573279cde1c4 ("x86/vmx: implement Notify VM Exit") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Fri, 20 Jan 2023 11:01:52 +0000 (11:01 +0000)]
x86/shadow: Drop dubious lastpage diagnostic
This is a global variable (actually 3, one per GUEST_PAGING_LEVEL), operated
on using atomics only (with no regard to what else shares the same cacheline),
which emits a diagnostic (in debug builds only) without changing any program
behaviour.
It is presumably left-over debugging, as it interlinks the behaviour of all
vCPUs in chronological order. Based on the read-only p2m types, this
diagnostic can be tripped by entirely legitimate guest behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 16 Jan 2023 11:01:26 +0000 (11:01 +0000)]
x86/hvm: Drop pat_entry_2_pte_flags
Converting from PAT to PTE is trivial, and shorter to encode with bitwise
logic than the space taken by a table counting from 0 to 7 in non-adjacent
bits.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Michal Orzel [Tue, 17 Jan 2023 11:43:32 +0000 (12:43 +0100)]
xen/arm: Harden setup_frametable_mappings
The amount of supported physical memory depends on the frametable size
and the number of struct page_info entries that can fit into it. Define
a macro PAGE_INFO_SIZE to store the current size of the struct page_info
(i.e. 56B for arm64 and 32B for arm32) and add a sanity check in
setup_frametable_mappings to be notified whenever the size of the
structure changes. Also call a panic if the calculated frametable_size
exceeds the limit defined by FRAMETABLE_SIZE macro.
Update the comments regarding the frametable in asm/config.h.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
This macro is unused and the corresponding one for arm64 has already
been removed as part of the commit 6dc9a1fe982f ("xen/arm: Remove most
of the *_VIRT_END defines").
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
The direct mapped area occupies L0 slots from 256 to 265 included
(i.e. 10 slots), resulting in 5TB (512GB * 10) of virtual address space.
However, due to incorrect slot subtraction (we take 9 slots into account)
we set DIRECTMAP_SIZE to 4.5TB instead. Fix it.
Note that we only support up to 2TB of physical memory so this is
a latent issue.
Fixes: 5263507b1b4a ("xen: arm: Use a direct mapping of RAM on arm64") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
xen/arm: domain_build: Replace use of paddr_t in find_domU_holes()
bankbase, banksize and bankend are used to hold values of type 'unsigned
long long'. This can be represented as 'uint64_t' instead of 'paddr_t'.
This will ensure consistency with allocate_static_memory() (where we use
'uint64_t' for rambase and ramsize).
In future, paddr_t can be used for 'uin32_t' as well to represent 32bit
physical addresses.
1. One should use 'PRIpaddr' to display 'paddr_t' variables.
2. One should use 'PRIx64' to display 'u64' in hex format. The current
use of 'PRIpaddr' for printing PTE is buggy as this is not a physical
address.
Juergen Gross [Wed, 18 Jan 2023 09:50:14 +0000 (10:50 +0100)]
tools/xenstore: let check_store() check the accounting data
Today check_store() is only testing the correctness of the node tree.
Add verification of the accounting data (number of nodes) and correct
the data if it is wrong.
Do the initial check_store() call only after Xenstore entries of a
live update have been read. This is wanted to make sure the accounting
data is correct after a live update.
Juergen Gross [Wed, 18 Jan 2023 09:50:11 +0000 (10:50 +0100)]
tools/xenstore: don't let hashtable_remove() return the removed value
Letting hashtable_remove() return the value of the removed element is
not used anywhere in Xenstore, and it conflicts with a hashtable
created specifying the HASHTABLE_FREE_VALUE flag.
So just drop returning the value.
This of course requires to free the value if the HASHTABLE_FREE_VALUE
was specified, as otherwise it would be a memory leak.
Rework the interface and the internals of the per-domain node
accounting:
- rename the functions to domain_nbentry_*() in order to better match
the related counter name
- switch from node pointer to domid as interface, as all nodes have the
owner filled in
- use a common internal function for adding a value to the counter
For the transaction case add a helper function to get the list head
of the per-transaction changed domains, enabling to eliminate the
transaction_entry_*() functions.
Juergen Gross [Wed, 18 Jan 2023 09:50:04 +0000 (10:50 +0100)]
tools/xenstore: introduce dummy nodes for special watch paths
Instead of special casing the permission handling and watch event
firing for the special watch paths "@introduceDomain" and
"@releaseDomain", use static dummy nodes added to the data base when
starting Xenstore.
The node accounting needs to reflect that change by adding the special
nodes in the domain_entry_fix() call in setup_structure().
Note that this requires to rework the calls of fire_watches() for the
special events in order to avoid leaking memory.
Move the check for a valid node name from get_node() to
get_node_canonicalized(), as it allows to use get_node() for the
special nodes, too.
In order to avoid read and write accesses to the special nodes use a
special variant for obtaining the current node data for the permission
handling.
This allows to simplify quite some code. In future sub-nodes of the
special nodes will be possible due to this change, allowing more fine
grained permission control of special events for specific domains.
Juergen Gross [Wed, 18 Jan 2023 09:50:02 +0000 (10:50 +0100)]
tools/xenstore: add hashlist for finding struct domain by domid
Today finding a struct domain by its domain id requires to scan the
list of domains until finding the correct domid.
Add a hashlist for being able to speed this up. This allows to remove
the linking of struct domain in a list. Note that the list of changed
domains per transaction is kept as a list, as there are no known use
cases with more than 4 domains being touched in a single transaction
(this would be a device handled by a driver domain and being assigned
to a HVM domain with device model in a stubdom, plus the control
domain).
Some simple performance tests comparing the scanning and hashlist have
shown that the hashlist will win as soon as more than 6 entries need
to be scanned.
Juergen Gross [Wed, 18 Jan 2023 09:50:01 +0000 (10:50 +0100)]
tools/xenstore: remove all watches when a domain has stopped
When a domain has been released by Xen tools, remove all its
registered watches. This avoids sending watch events to the dead domain
when all the nodes related to it are being removed by the Xen tools.
Bobby Eshleman [Fri, 20 Jan 2023 08:26:31 +0000 (09:26 +0100)]
xen/riscv: introduce sbi call to putchar to console
Originally SBI implementation for Xen was introduced by
Bobby Eshleman <bobby.eshleman@gmail.com> but it was removed
all the stuff for simplicity except SBI call for putting
character to console.
The patch introduces sbi_putchar() SBI call which is necessary
to implement initial early_printk.
Signed-off-by: Bobby Eshleman <bobby.eshleman@gmail.com> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Bobby Eshleman <bobby.eshleman@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Oleksii Kurochko [Fri, 20 Jan 2023 08:24:20 +0000 (09:24 +0100)]
xen/include: change <asm/types.h> to <xen/types.h> in early_printk.h
In the patch "include/types: move stddef.h-kind types to common
header" [1] size_t was moved from <asm/types.h> to <xen/types.h>
so early_printk should be updated correspondingly.
Jan Beulich [Fri, 20 Jan 2023 08:23:42 +0000 (09:23 +0100)]
x86/shadow: fix PAE check for top-level table unshadowing
Clearly within the for_each_vcpu() the vCPU of this loop is meant, not
the (loop invariant) one the fault occurred on.
Fixes: 3d5e6a3ff383 ("x86 hvm: implement HVMOP_pagetable_dying") Fixes: ef3b0d8d2c39 ("x86/shadow: shadow_table[] needs only one entry for PV-only configs") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Xenia Ragiadakou [Fri, 20 Jan 2023 08:22:42 +0000 (09:22 +0100)]
x86/acpi: separate AMD-Vi and VT-d specific functions
The functions acpi_dmar_init() and acpi_dmar_zap/reinstate() are
VT-d specific while the function acpi_ivrs_init() is AMD-Vi specific.
To eliminate dead code, they need to be guarded under CONFIG_INTEL_IOMMU
and CONFIG_AMD_IOMMU, respectively.
Instead of adding #ifdef guards around the function calls, implement them
as empty static inline functions.
Take the opportunity to move the declaration of acpi_dmar_init from the
x86 arch-specific header to the common header, since Intel VT-d has been
also used on IA-64 platforms.
No functional change intended.
Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 20 Jan 2023 08:20:26 +0000 (09:20 +0100)]
x86/shadow: further correct MMIO handling in _sh_propagate()
While c61a6f74f80e ("x86: enforce consistent cachability of MMIO
mappings") correctly converted one !mfn_valid() check there, two others
were wrongly left untouched: Both cachability control and log-dirty
tracking ought to be uniformly handled/excluded for all (non-)MMIO
ranges, not just ones qualifiable by mfn_valid().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 20 Jan 2023 08:18:39 +0000 (09:18 +0100)]
include/types: move stddef.h-kind types to common header
size_t, ssize_t, and ptrdiff_t are all expected to be uniformly defined
on any ports Xen might gain. In particular I hope new ports can rely on
__SIZE_TYPE__ and __PTRDIFF_TYPE__ being made available by the compiler.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Michal Orzel [Tue, 3 Jan 2023 10:25:19 +0000 (11:25 +0100)]
xen/arm: Add 0x prefix when printing memory size in construct_domU
Printing memory size in hex without 0x prefix can be misleading, so
add it. Also, take the opportunity to adhere to 80 chars line length
limit by moving the printk arguments to the next line.
Julien Grall [Thu, 12 Jan 2023 22:07:42 +0000 (22:07 +0000)]
xen/arm: linker: The identitymap check should cover the whole .text.header
At the moment, we are only checking that only some part of .text.header
is part of the identity mapping. However, this doesn't take into account
the literal pool which will be located at the end of the section.
While we could try to avoid using a literal pool, in the near future we
will also want to use an identity mapping for switch_ttbr().
Not everything in .text.header requires to be part of the identity
mapping. But it is below a page size (i.e. 4KB) so take a shortcut and
check that .text.header is smaller than a page size.
With that _end_boot can be removed as it is now unused. Take the
opportunity to avoid assuming that a page size is always 4KB in the
error message and comment.
Andrew Cooper [Mon, 9 Jan 2023 10:58:31 +0000 (10:58 +0000)]
x86/vmx: Support for CPUs without model-specific LBR
Ice Lake (server at least) has both architectural LBR and model-specific LBR.
Sapphire Rapids does not have model-specific LBR at all. I.e. On SPR and
later, model_specific_lbr will always be NULL, so we must make changes to
avoid reliably hitting the domain_crash().
The Arch LBR spec states that CPUs without model-specific LBR implement
MSR_DBG_CTL.LBR by discarding writes and always returning 0.
Do this for any CPU for which we lack model-specific LBR information.
Adjust the now-stale comment, now that the Arch LBR spec has created a way to
signal "no model specific LBR" to guests.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Mon, 9 Jan 2023 11:42:22 +0000 (11:42 +0000)]
x86/vmx: Calculate model-specific LBRs once at start of day
There is no point repeating this calculation at runtime, especially as it is
in the fallback path of the WRSMR/RDMSR handlers.
Move the infrastructure higher in vmx.c to avoid forward declarations,
renaming last_branch_msr_get() to get_model_specific_lbr() to highlight that
these are model-specific only.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 3 Jan 2023 15:08:56 +0000 (15:08 +0000)]
xen/version: Drop compat/kernel.c
kernel.c is mostly in an #ifndef COMPAT guard, because compat/kernel.c
re-includes kernel.c to recompile xen_version() in a compat form.
However, the xen_version hypercall is almost guest-ABI-agnostic; only
XENVER_platform_parameters has a compat split. Handle this locally, and do
away with the re-include entirely. Also drop the CHECK_TYPE()'s between types
that are simply char-arrays in their native and compat form.
In particular, this removed the final instances of obfuscation via the DO()
macro.
No functional change. Also saves 2k of of .text in the x86 build.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 20 Dec 2022 15:51:07 +0000 (15:51 +0000)]
public/version: Change xen_feature_info to have a fixed size
This is technically an ABI change, but Xen doesn't operate in any environment
where "unsigned int" is different to uint32_t, so switch to the explicit form.
This avoids the need to derive (identical) compat logic for handling the
subop.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Thu, 12 Jan 2023 15:17:54 +0000 (16:17 +0100)]
include/compat: produce stubs for headers not otherwise generated
Public headers can include other public headers. Such interdependencies
are retained in their compat counterparts. Since some compat headers are
generated only in certain configurations, the referenced headers still
need to exist. The lack thereof was observed with hvm/hvm_op.h needing
trace.h, where generation of the latter depends on TRACEBUFFER=y. Make
empty stubs in such cases (as generating the extra headers is relatively
slow and hence better to avoid). Changes to .config and incrementally
(re-)building is covered by the respective .*.cmd then no longer
matching the command to be used, resulting in the necessary re-creation
of the (possibly stub) header.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Thu, 12 Jan 2023 10:14:50 +0000 (11:14 +0100)]
x86/shadow: call sh_detach_old_tables() directly
There's nothing really mode specific in this function anymore (the
varying number of valid entries in v->arch.paging.shadow.shadow_table[]
is dealt with fine by the zero check, and we have other similar cases of
iterating through the full array in common.c), and hence there's neither
a need to have multiple instances of it, nor does it need calling
through a function pointer.
While moving the function drop a non-conforming and not very useful
(anymore) comment.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 12 Jan 2023 10:12:35 +0000 (11:12 +0100)]
x86/shadow: reduce effort of hash calculation
The "n" input is a GFN/MFN value and hence bounded by the physical
address bits in use on a system. The hash quality won't improve by also
including the upper always-zero bits in the calculation. To keep things
as compile-time-constant as they were before, use PADDR_BITS (not
paddr_bits) for loop bounding. This reduces loop iterations from 8 to 5.
While there also drop the unnecessary conversion to an array of unsigned
char, moving the value off the stack altogether (at least with
optimization enabled).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 12 Jan 2023 10:11:47 +0000 (11:11 +0100)]
x86/shadow: drop a few uses of mfn_valid()
v->arch.paging.shadow.shadow_table[], v->arch.paging.shadow.oos[],
v->arch.paging.shadow.oos_{snapshot[],fixup[].smfn[]} as well as the
hash table are all only ever written with valid MFNs or INVALID_MFN.
Avoid the somewhat expensive mfn_valid() when checking MFNs coming from
these arrays.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Xenia Ragiadakou [Thu, 12 Jan 2023 10:09:16 +0000 (11:09 +0100)]
x86/iommu: introduce AMD-Vi and Intel VT-d Kconfig options
Introduce two new Kconfig options, AMD_IOMMU and INTEL_IOMMU, to allow code
specific to each IOMMU technology to be separated and, when not required,
stripped. AMD_IOMMU will be used to enable IOMMU support for platforms that
implement the AMD I/O Virtualization Technology. INTEL_IOMMU will be used to
enable IOMMU support for platforms that implement the Intel Virtualization
Technology for Directed I/O.
Since, at this point, disabling any of them would cause Xen to not compile,
the options are not visible to the user and are enabled by default if X86.
Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Oleksii Kurochko [Tue, 10 Jan 2023 15:17:56 +0000 (17:17 +0200)]
xen/riscv: introduce stack stuff
The patch introduces and sets up a stack in order to go to C environment
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Wed, 11 Jan 2023 11:44:09 +0000 (11:44 +0000)]
xen: Remove the arch specific header init.h
Both x86 and (soon) RISC-V version of init.h are empty. On Arm, it contains
a structure that should not be used by any common code.
The structure init_info is used to store information to setup the CPU
currently being brought-up. setup.h seems to be more suitable even though
the header is getting quite crowded.
Looking through the history, <asm/init.h> was introduced at the same
time as the ia64 port because for some reasons most of the macros
where duplicated. This was changed in 72c07f413879 and I don't
foresee any reason to require arch specific definition for init.h
in the near future.
Therefore remove asm/init.h for both x86 and arm (the only definition
is moved in setup.h). With that RISC-V will not need to introduce
an empty header.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Julien Grall <jgrall@amazon.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Alistair Francis <alistair.francis@wdc.com>
Anthony PERARD [Wed, 11 Jan 2023 09:45:29 +0000 (10:45 +0100)]
tools: Fix build with recent QEMU, use "--enable-trace-backends"
The configure option "--enable-trace-backend" isn't accepted anymore
and we should use "--enable-trace-backends" instead which was
introduce in 2014 and allow multiple backends.
"--enable-trace-backends" was introduced by: 5b808275f3bb ("trace: Multi-backend tracing")
The backward compatible option "--enable-trace-backend" is removed by 10229ec3b0ff ("configure: remove backwards-compatibility and obsolete options")
As we already use ./configure options that wouldn't be accepted by
older version of QEMU's configure, we will simply use the new spelling
for the option and avoid trying to detect which spelling to use.
We already make use if "--firmwarepath=" which was introduced by 3d5eecab4a5a ("Add --firmwarepath to configure")
which already include the new spelling for "--enable-trace-backends".
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Andrew Cooper [Thu, 5 Jan 2023 20:42:58 +0000 (20:42 +0000)]
x86/S3: Restore Xen's MSR_PAT value on S3 resume
There are two paths in the trampoline, and Xen's PAT needs setting up in both,
not just the boot path.
Fixes: 4304ff420e51 ("x86/S3: Drop {save,restore}_rest_processor_state() completely") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
automation: temporarily disable CONFIG_COVERAGE for RISC-V randconfig jobs
As common isn't built for RISC-V architecture now, accordingly,
common/coverage (where __gconv_* function are defined) isn't built either
but randconfig may decide to enable CONFIG_COVERAGE which will lead to
the following compilation error:
riscv64-linux-gnu-ld: prelink.o: in function `.L0 ':
arch/riscv/early_printk.c:(.text+0x18):
undefined reference to `__gcov_init'
riscv64-linux-gnu-ld: arch/riscv/early_printk.c:(.text+0x40):
undefined reference to `__gcov_exit'
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Mon, 5 Dec 2022 16:41:39 +0000 (16:41 +0000)]
MAINTAINERS: Clarify check-in requirements for mixed-author patches
There was a question raised recently about the requirements for
checking in a patch which was originally written by one maintainer,
then picked up and modified by a second maintainer, and which they now both
agree should be checked in.
It was proposed that in that case, the following set of tags would suffice:
Signed-off-by: First Author <...> Signed-off-by: Second Author <...> Reviewed-by: First Author <...>
The rationale was as follows:
1. The patch will be a mix of code, whose copyright is owned by the
various authors (or the companies they work for). It's important to
keep this information around in the event, for instance, of a license
change or something else requiring knowledge of the copyright owner.
2. The Signed-off-by of the Second Author approves not only their own
code, but First Author's code; the Reviewed-by of the First Author
approves not only their own code, but the Second Author's code. Thus
all the code has been approved by a maintainer, as well as someone who
was not the author.
In support of this, several arguments were put forward:
* We shouldn't make it harder for maintainers to get their code in
than for non-maintainers
* The system we set up should not add pointless bureaucracy; nor
discourage collaboration; nor encourage contributors to get around
the rules by dropping important information. (For instance, by
removing the first SoB, so that the patch appears to have been
written entirely by Second Author.)
Concerns were raised about two maintainers from the same company
colluding to get a patch in from their company; but such maintainers
could already collude, by working on the patch in secret, and posting
it publicly with only a single author's SoB, and having the other
person review it.
There's also something slightly strange about adding "Reviewed-by" to
code that you've written; but in the end you're reviewing not only the
code itself, but the final arrangement of it. There's no need to
overcomplicate things.
Encode this in MAINTAINERS as follows:
* Refine the wording of requirement #2 in the check-in policy; such
that *each change* must have approval from someone other than *the
person who wrote it*.
* Add a paragraph explicitly stating that the multiple-SoB-approval
system satisfies the requirements, and why.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Mon, 9 Jan 2023 13:29:13 +0000 (14:29 +0100)]
x86/shadow: sh_remove_all_mappings() is HVM-only
All callers live in hvm.c. Moving the function there is undesirable, as
hash walking is local to common.c and probably better remains so. Hence
move an #endif, allowing to drop an #ifdef.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 9 Jan 2023 13:26:54 +0000 (14:26 +0100)]
x86/shadow: correct shadow type bounds checks
In sh_remove_shadow_via_pointer() the type range checks, besides being
bogus (should be ">= min && <= max"), are fully redundant with the has-
up-pointer assertion. In sh_hash_audit_bucket() properly use "min"
instead of assuming a certain order of type numbers.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 9 Jan 2023 13:26:12 +0000 (14:26 +0100)]
x86/shadow: simplify conditionals in sh_{get,put}_ref()
In both cases the "entry_pa != 0" check is redundant; storing 0 when the
field already is 0 is quite fine. Move the cheaper remaining part first
in sh_get_ref(). In sh_put_ref() convert the has-up-pointer check into
an assertion (requiring the zero check to be retained there).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 9 Jan 2023 13:25:29 +0000 (14:25 +0100)]
x86/shadow: move bogus HVM checks in sh_pagetable_dying()
Perhaps these should have been dropped right in 2fb2dee1ac62 ("x86/mm:
pagetable_dying() is HVM-only"). Convert both to assertions, noting that
in particular the one in the 3-level variant of the function came too
late anyway - first thing there we access the HVM part of a union.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>