Jan Beulich [Mon, 3 Apr 2023 10:48:12 +0000 (12:48 +0200)]
x86emul: move various utility functions to separate source files
Many are needed by the hypervisor only - have one file for this purpose.
Some are also needed by the harness (but not the fuzzer) - have another
file for these.
Code moved gets slightly adjusted in a few places, e.g. replacing
"state" by "s" (like was done for other that has been split off).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 3 Apr 2023 10:47:08 +0000 (12:47 +0200)]
x86emul: move x86_emul_blk() to separate source file
The function is already non-trivial and is expected to further grow.
Code moved gets slightly adjusted in a few places, e.g. replacing EXC_*
by X86_EXC_* (such that EXC_* don't need to move as well; we want these
to be phased out anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 3 Apr 2023 10:46:08 +0000 (12:46 +0200)]
x86emul: split off insn decoding
This is a fair chunk of code and data and can easily live separate from
the main emulation function.
Code moved gets slightly adjusted in a few places, e.g. replacing EXC_*
by X86_EXC_* (such that EXC_* don't need to move as well; we want these
to be phased out anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 3 Apr 2023 10:44:59 +0000 (12:44 +0200)]
x86emul: split off FPU opcode handling
Some of the helper functions/macros are needed only for this, and the
code is otherwise relatively independent of other parts of the emulator.
Code moved gets slightly adjusted in a few places, e.g. replacing EXC_*
by X86_EXC_* (such that EXC_* don't need to move as well; we want these
to be phased out anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 3 Apr 2023 10:43:51 +0000 (12:43 +0200)]
x86emul: split off opcode 0fc7 handling
There's a fair amount of sub-cases (with some yet to be implemented), so
a separate function seems warranted.
Code moved gets slightly adjusted in a few places, e.g. replacing EXC_*
by X86_EXC_* (such that EXC_* don't need to move as well; we want these
to be phased out anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 3 Apr 2023 10:42:44 +0000 (12:42 +0200)]
x86emul: split off opcode 0fae handling
There's a fair amount of sub-cases (with some yet to be implemented), so
a separate function seems warranted.
Code moved gets slightly adjusted in a few places, e.g. replacing EXC_*
by X86_EXC_* (such that EXC_* don't need to move as well; we want these
to be phased out anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 3 Apr 2023 10:41:08 +0000 (12:41 +0200)]
x86emul: split off opcode 0f01 handling
There's a fair amount of sub-cases (with some yet to be implemented), so
a separate function seems warranted.
Code moved gets slightly adjusted in a few places, e.g. replacing EXC_*
by X86_EXC_* (such that EXC_* don't need to move as well; we want these
to be phased out anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 31 Mar 2023 06:12:45 +0000 (08:12 +0200)]
x86/shadow: drop redundant present bit checks from FOREACH_PRESENT_L<N>E() "bodies"
FOREACH_PRESENT_L<N>E(), as their names (now) say, already invoke the
"body" only when the present bit is set; no need to re-do the check.
While there also
- stop open-coding mfn_to_maddr() in code being touched (re-indented)
anyway,
- stop open-coding mfn_eq() in code being touched or in adjacent code,
- drop local variables when they're no longer used at least twice.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 31 Mar 2023 06:11:53 +0000 (08:11 +0200)]
x86/shadow: rename SHADOW_FOREACH_L<N>E() to FOREACH_PRESENT_L<N>E()
These being local macros, the SHADOW prefix doesn't gain us much. What
is more important to be aware of at use sites is that the supplied code
is invoked for present entries only.
While making the adjustment also properly use NULL for the 3rd argument
at respective invocation sites.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
1. One should use 'PRIpaddr' to display 'paddr_t' variables. However,
while creating nodes in fdt, the address (if present in the node name)
should be represented using 'PRIx64'. This is to be in conformance
with the following rule present in https://elinux.org/Device_Tree_Linux
. node names
"unit-address does not have leading zeros"
As 'PRIpaddr' introduces leading zeros, we cannot use it.
So, we have introduced a wrapper ie domain_fdt_begin_node() which will
represent physical address using 'PRIx64'.
2. One should use 'PRIx64' to display 'u64' in hex format. The current
use of 'PRIpaddr' for printing PTE is buggy as this is not a physical
address.
Andrew Cooper [Mon, 6 Feb 2023 20:06:45 +0000 (20:06 +0000)]
tools/ocaml/mmap: Drop the len parameter from Xenmmap.write
Strings in Ocaml carry their own length. Absolutely nothing good can come
from having caml_string_length(data) be different to len.
Use the appropriate accessor, String_val(), but retain the workaround for the
Ocaml -safe-string constness bug in the same way as we've done elsewhere in
Xen.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@cloud.com>
Jan Beulich [Thu, 30 Mar 2023 11:07:16 +0000 (13:07 +0200)]
x86emul: pull permission check ahead for REP INS/OUTS
Based on observations on a fair range of hardware from both primary
vendors even zero-iteration-count instances of these insns perform the
port related permission checking first.
Fixes: fe300600464c ("x86: Fix emulation of REP prefix") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
When vgic_reserve_virq() fails and backend is in domain, we should also
free the allocated event channel.
When backend is in Xen and call to xzalloc() returns NULL, there is no
need to call xfree(). This should be done instead on an error path
from vgic_reserve_virq(). Moreover, we should be calling XFREE() to
prevent an extra free in domain_vpl011_deinit().
In order not to repeat the same steps twice, call domain_vpl011_deinit()
on an error path whenever there is more work to do than returning rc.
Since this function can now be called from different places and more
than once, add proper guards, use XFREE() instead of xfree() and move
vgic_free_virq() to it.
Take the opportunity to return -ENOMEM instead of -EINVAL when memory
allocation fails.
Fixes: 1ee1e4b0d1ff ("xen/arm: Allow vpl011 to be used by DomU") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Michal Orzel [Thu, 23 Mar 2023 13:56:35 +0000 (14:56 +0100)]
xen/arm: domain_build: Check return code of domain_vpl011_init
We are assigning return code of domain_vpl011_init() to a variable
without checking it for an error. Fix it.
Fixes: 3580c8b2dfc3 ("xen/arm: if direct-map domain use native UART address and IRQ number for vPL011") Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
Juergen Gross [Tue, 28 Mar 2023 14:43:27 +0000 (16:43 +0200)]
tools/xenstore: fix quota check in acc_fix_domains()
Today when finalizing a transaction the number of node quota is checked
to not being exceeded after the transaction. This check is always done,
even if the transaction is being performed by a privileged connection,
or if there were no nodes created in the transaction.
Correct that by checking quota only if:
- the transaction is being performed by an unprivileged guest, and
- at least one node was created in the transaction
Roger Pau Monné [Wed, 29 Mar 2023 12:56:33 +0000 (14:56 +0200)]
vpci/msix: restore PBA access length and alignment restrictions
Accesses to the PBA array have the same length and alignment
limitations as accesses to the MSI-X table:
"For all accesses to MSI-X Table and MSI-X PBA fields, software must
use aligned full DWORD or aligned full QWORD transactions; otherwise,
the result is undefined."
Introduce such length and alignment checks into the handling of PBA
accesses for vPCI. This was a mistake of mine for not reading the
specification correctly.
Note that accesses must now be aligned, and hence there's no longer a
need to check that the end of the access falls into the PBA region as
both the access and the region addresses must be aligned.
Fixes: b177892d2d ('vpci/msix: handle accesses adjacent to the MSI-X table') Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 29 Mar 2023 12:55:37 +0000 (14:55 +0200)]
ns16550: correct name/value pair parsing for PCI port/bridge
First of all these were inverted: "bridge=" caused the port coordinates
to be established, while "port=" controlled the bridge coordinates. And
then the error messages being identical also wasn't helpful. While
correcting this also move both case blocks close together.
Fixes: 97fd49a7e074 ("ns16550: add support for UART parameters to be specifed with name-value pairs") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Tue, 28 Mar 2023 12:20:35 +0000 (14:20 +0200)]
vpci/msix: handle accesses adjacent to the MSI-X table
The handling of the MSI-X table accesses by Xen requires that any
pages part of the MSI-X related tables are not mapped into the domain
physmap. As a result, any device registers in the same pages as the
start or the end of the MSIX or PBA tables is not currently
accessible, as the accesses are just dropped.
Note the spec forbids such placing of registers, as the MSIX and PBA
tables must be 4K isolated from any other registers:
"If a Base Address register that maps address space for the MSI-X
Table or MSI-X PBA also maps other usable address space that is not
associated with MSI-X structures, locations (e.g., for CSRs) used in
the other address space must not share any naturally aligned 4-KB
address range with one where either MSI-X structure resides."
Yet the 'Intel Wi-Fi 6 AX201' device on one of my boxes has registers
in the same page as the MSIX tables, and thus won't work on a PVH dom0
without this fix.
In order to cope with the behavior passthrough any accesses that fall
on the same page as the MSIX tables (but don't fall in between) to the
underlying hardware. Such forwarding also takes care of the PBA
accesses, so it allows to remove the code doing this handling in
msix_{read,write}. Note that as a result accesses to the PBA array
are no longer limited to 4 and 8 byte sizes, there's no access size
restriction for PBA accesses documented in the specification.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 28 Mar 2023 12:20:16 +0000 (14:20 +0200)]
include: don't mention stub headers more than once in a make rule
When !GRANT_TABLE and !PV_SHIM headers-n contains grant_table.h twice,
causing make to complain "target '...' given more than once in the same
rule" for the rule generating the stub headers. We don't need duplicate
entries in headers-n anywhere, so zap them (by using $(sort ...)) right
where the final value of the variable is constructed.
Fixes: 6bec713f871f ("include/compat: produce stubs for headers not otherwise generated") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Dmitry Isaykin [Tue, 28 Mar 2023 12:18:46 +0000 (14:18 +0200)]
x86/monitor: add new monitor event to catch I/O instructions
Adds monitor support for I/O instructions.
Signed-off-by: Dmitry Isaykin <isaikin-dmitry@yandex.ru> Signed-off-by: Anton Belousov <abelousov@ptsecurity.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Andrew Cooper [Fri, 24 Feb 2023 18:23:38 +0000 (18:23 +0000)]
CI: Minor updates to buster-gcc-ibt
* Update from GCC 11.2 to 11.3
* Use python3-minimal instead of python
* Use --no-install-recommends, requiring ca-certificates, g++-multilib and
build-essential to be listed explicitly
The resulting container is ~50M smaller
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Fri, 24 Mar 2023 17:59:56 +0000 (17:59 +0000)]
CI: Remove llvm-8 from the Debian Stretch container
For similar reasons to c/s a6b1e2b80fe20. While this container is still
build-able for now, all the other problems with explicitly-versioned compilers
remain.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Fri, 24 Mar 2023 20:09:33 +0000 (20:09 +0000)]
configure: Drop --enable-githttp
Following Demi's work to use HTTPS everywhere, all users of GIT_HTTP have
been removed from the build system. Drop the configure knob.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Demi Marie Obenour <demi@invisiblethingslab.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Andrew Cooper [Mon, 6 Dec 2021 13:07:40 +0000 (13:07 +0000)]
x86/boot: Restrict directmap permissions for .text/.rodata
While we've been diligent to ensure that the main text/data/rodata mappings
have suitable restrictions, their aliases via the directmap were left fully
read/write. Worse, we even had pieces of code making use of this as a
feature.
Restrict the permissions for .text/rodata, as we have no legitimate need for
writeability of these areas via the directmap alias. Note that the
compile-time allocated pagetables do get written through their directmap
alias, so need to remain writeable.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 4 May 2020 12:32:21 +0000 (13:32 +0100)]
x86/ucode: Fix error paths control_thread_fn()
These two early exits skipped re-enabling the watchdog, restoring the NMI
callback, and clearing the nmi_patch global pointer. Always execute the tail
of the function on the way out.
Fixes: 8dd4dfa92d62 ("x86/microcode: Synchronize late microcode loading") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
automation: add a smoke and suspend test on an Alder Lake system
This is a first test using Qubes OS CI infra. The gitlab-runner has
access to ssh-based control interface (control@thor.testnet, ssh key
exposed to the test via ssh-agent) and pre-configured HTTP dir for boot
files (mapped under /scratch/gitlab-runner/tftp inside the container).
Details about the setup are described on
https://www.qubes-os.org/news/2022/05/05/automated-os-testing-on-physical-laptops/
There are two test. First is a simple dom0+domU boot smoke test, similar
to other existing tests. The second is one boots Xen, and try if S3
works. It runs on a ADL-based desktop system. The test script is based
on the Xilinx one.
The machine needs newer kernel than other x86 tests run, so use 6.1.x
kernel added in previous commit.
The usage of fakeroot is necessary to preserve device nodes (/dev/null
etc) when repacking rootfs. The test runs in a rootless podman
container, which doesn't have full root permissions. BTW the same
applies to docker with user namespaces enabled (but it's only opt-in
feature there).
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
It will be used in tests added in subsequent patches.
Enable config options needed for those tests.
While at it, migrate all the x86 tests to the newer kernel, and
introduce x86-64-test-needs to allow deduplication later (for now it's
used only once).
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Thu, 24 Feb 2022 19:40:15 +0000 (19:40 +0000)]
x86/vmx: Don't spuriously crash the domain when INIT is received
In VMX operation, the handling of INIT IPIs is changed. Instead of the CPU
resetting, the next VMEntry fails with EXIT_REASON_INIT. From the TXT spec,
the intent of this behaviour is so that an entity which cares can scrub
secrets from RAM before participating in an orderly shutdown.
Right now, Xen's behaviour is that when an INIT arrives, the HVM VM which
schedules next is killed (citing an unknown VMExit), *and* we ignore the INIT
and continue blindly onwards anyway.
This patch addresses only the first of these two problems by ignoring the INIT
and continuing without crashing the VM in question.
The second wants addressing too, just as soon as we've figured out something
better to do...
Discovered as collateral damage from when an AP triple faults on S3 resume on
Intel TigerLake platforms.
After spending ages sorting out Gitlab CI, it appears that OSSTest too has an
out-of-date Lets Encrypt cert. Revert again in the short term while we fix
this up.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Obtaining code over an insecure transport is a terrible idea for
blatently obvious reasons. Even for non-executable data, insecure
transports are considered deprecated.
This patch enforces the use of secure transports in misc places.
All URLs are known to work.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
configure: Replace git:// and http:// with https://
Obtaining code over an insecure transport is a terrible idea for
blatently obvious reasons. Even for non-executable data, insecure
transports are considered deprecated.
This patch enforces the use of secure transports in the build system.
Some URLs returned 301 or 302 redirects, so I replaced them with the
URLs that were redirected to.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
build: Change remaining xenbits.xen.org link to HTTPS
Obtaining code over an insecure transport is a terrible idea for
blatently obvious reasons. Even for non-executable data, insecure
transports are considered deprecated.
This patch enforces the use of secure transports for all xenbits.xen.org
URLs. All altered links have been tested and are known to work.
Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
build: Use HTTPS for all xenbits.xen.org Git repos
Obtaining code over an insecure transport is a terrible idea for
blatently obvious reasons. Even for non-executable data, insecure
transports are considered deprecated.
This patch enforces the use of secure transports for all xenbits git
repositories. It was generated with the following shell script:
Andrew Cooper [Wed, 15 Sep 2021 16:01:43 +0000 (17:01 +0100)]
xen/credit2: Remove tail padding from TRC_CSCHED2_* records
All three of these records have tail padding, leaking stack rubble into the
trace buffer. Introduce an explicit _pad field and have the compiler zero the
padding automatically.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Andrew Cooper [Wed, 15 Sep 2021 15:49:01 +0000 (16:49 +0100)]
xen/memory: Remove tail padding from TRC_MEM_* records
Four TRC_MEM_* records supply custom structures with tail padding, leaking
stack rubble into the trace buffer. Three of the records were fine in 32-bit
builds of Xen, due to the relaxed alignment of 64-bit integers, but
POD_SUPERPAGE_SPLITER was broken right from the outset.
We could pack the datastructures to remove the padding, but xentrace_format
has no way of rendering the upper half of a 16-bit field. Instead, expand all
16-bit fields to 32-bit.
For POD_SUPERPAGE_SPLINTER, introduce an order field as it is relevant
information, and to match DECREASE_RESERVATION, and so it doesn't require a
__packed attribute to drop tail padding.
Update xenalyze's structures to match, and introduce xentrace_format rendering
which was absent previously.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Andrew Cooper [Thu, 16 Sep 2021 09:24:26 +0000 (10:24 +0100)]
xen/trace: Don't over-read trace objects
In the case that 'extra' isn't a multiple of uint32_t, the calculation rounds
the number of bytes up, causing later logic to read unrelated bytes beyond the
end of the object.
Also, asserting that the object is within TRACE_EXTRA_MAX, but truncating it
in release builds is rude. Instead, reject any out-of-spec records, leaving
enough of a message to identify the faulty caller.
There is one buggy trace record, TRC_RTDS_BUDGET_BURN. As it must remain
__packed (as cur_budget is misaligned), change bool has_extratime to uint32_t
to compensate.
It turns out that the new printk() can also be hit by HVMOP_xentrace, because
the hypercall is broken. It cannot be used outside of custom debugging, as
none of the tooling was ever updated to understand TRC_GUEST, nor is there any
evidence of hypercall ever being used in public.
While the hypercall was clearly intended to be used with units if uint32_t's,
that's not how the API/ABI works - Xen will in fact read the entire structure
rather than the initialised subset out of guest memory (most likely, stack
rubble), then copy up to 3 bytes of it (rounding up to the next uint32_t) into
the real tracebuffer.
There are several possible ways to fix this, but as the hypercall, and does
not plausibly have any users, go with the one that is least logic in Xen, by
rejecting tracing attempts that are not of uint32_t size.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Edwin Török [Mon, 16 May 2022 19:45:13 +0000 (20:45 +0100)]
x86/hvm: Improve hvm_set_guest_pat() code generation again
Following on from cset 9ce0a5e207f3 ("x86/hvm: Improve hvm_set_guest_pat()
code generation"), and the discovery that Clang/LLVM makes some especially
disastrous code generation for the loop at -O2
https://github.com/llvm/llvm-project/issues/54644
Edvin decided to remove the loop entirely by fully vectorising it. This is
substantially more efficient than the loop, and rather harder for a typical
compiler to mess up.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 3 Dec 2021 20:33:57 +0000 (20:33 +0000)]
x86/boot: Factor move_xen() out of __start_xen()
Partly for clarity because there is a lot of subtle magic at work here.
Expand the commentary of what's going on.
Also because there is no need to double copy the stack (32kB). Spilled
content does need accounting for, but this can be sorted by only copying only
a handful of words.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 23 Mar 2023 23:41:20 +0000 (23:41 +0000)]
x86/shadow: Fix build with no PG_log_dirty
Gitlab Randconfig found:
arch/x86/mm/shadow/common.c: In function 'shadow_prealloc':
arch/x86/mm/shadow/common.c:1023:18: error: implicit declaration of function
'paging_logdirty_levels'; did you mean 'paging_log_dirty_init'? [-Werror=implicit-function-declaration]
1023 | count += paging_logdirty_levels();
| ^~~~~~~~~~~~~~~~~~~~~~
| paging_log_dirty_init
arch/x86/mm/shadow/common.c:1023:18: error: nested extern declaration of 'paging_logdirty_levels' [-Werror=nested-externs]
The '#if PG_log_dirty' expression is currently SHADOW_PAGING && !HVM &&
PV_SHIM_EXCLUSIVE. Move the declaration outside.
Fixes: 33fb3a661223 ("x86/shadow: account for log-dirty mode when pre-allocating") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 24 Aug 2022 09:48:48 +0000 (10:48 +0100)]
x86/hvmloader: Don't build as PIC
HVMLoader is not relocatable in memory, and 32bit PIC code has a large
overhead. Override the compilers choice of pic/no-pic and force it to be
non-relocatable.
Andrew Cooper [Thu, 20 Jan 2022 15:45:02 +0000 (15:45 +0000)]
xen: Modify domain_crash() to take a print string
There are two problems with domain_crash().
First, that it is frequently not preceded by a printk() at all, or only by a
dprintk(). Either way, critical diagnostic information is missing for an
event which is fatal to the guest.
Second, the embedded __LINE__ is an issue for livepatching, creating unwanted
churn in the binary diff. This is the final __LINE__ remaining in
livepatching-relevant contexts.
The end goal is to have domain_crash() require a print string which gets fed
to printk(), making it far less easy to omit relevant diagnostic information.
However, modifying all callers at once is far too big and complicated, so use
some macro magic to tolerate the old API (no print string) in the short term.
Adjust two callers in load_segments() to demonstrate the new API.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
<eval_nospec_test>:
0f ae e8 lfence
85 ff test %edi,%edi
74 02 je <eval_nospec_test+0x9>
90 nop
c3 ret
90 nop
c3 ret
which is not safe because the lfence has been hoisted above the conditional
jump. Clang concludes that both barrier_nospec_true()'s have identical side
effects and can safely be merged.
Clang can be persuaded that the side effects are different if there are
different comments in the asm blocks. This is fragile, but no more fragile
that other aspects of this construct.
Introduce barrier_nospec_false() with a separate internal comment to prevent
Clang merging it with barrier_nospec_true() despite the otherwise-identical
content. The generated code now becomes:
<eval_nospec_test>:
85 ff test %edi,%edi
74 05 je <eval_nospec_test+0x9>
0f ae e8 lfence
90 nop
c3 ret
0f ae e8 lfence
90 nop
c3 ret
which has the correct number of lfence's, and in the correct place.
Link: https://github.com/llvm/llvm-project/issues/55084 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 14 Oct 2022 13:39:55 +0000 (14:39 +0100)]
xen/argo: Fixes to argo_dprintk()
Rewrite argo_dprintk() so printk() format typechecking can always be
performed. This also fixes the fact that parameters are not evaulated at all
in the default case.
Emit the messages at XENLOG_DEBUG.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Jan Beulich [Fri, 24 Mar 2023 10:20:59 +0000 (11:20 +0100)]
x86/shadow: OOS mode is HVM-only
XEN_DOMCTL_CDF_oos_off is forced set for PV domains, so the logic can't
ever be engaged for them. Conditionalize respective fields and remove
the respective bit from SHADOW_OPTIMIZATIONS when !HVM. As a result the
SH_type_oos_snapshot constant can disappear altogether in that case, and
a couple of #ifdef-s can also be dropped/combined.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
These aren't mode dependent (see 06f04f54ba97 ["x86/shadow:
sh_{write,cmpxchg}_guest_entry() are PV-only"], where they were moved
out of multi.c) and hence there's no need to have pointers to the
functions in struct shadow_paging_mode. Due to include dependencies,
however, the "paging" wrappers need to move out of paging.h; they're
needed from PV memory management code only anyway, so by moving them
their exposure is reduced at the same time.
By carefully placing the (moved and renamed) shadow function
declarations, #ifdef can also be dropped from the "paging" wrappers
(paging_mode_shadow() is constant false when !SHADOW_PAGING).
While moving the code, drop the (largely wrong) comment from
paging_write_guest_entry() and reduce that of
paging_cmpxchg_guest_entry().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Fri, 24 Mar 2023 10:13:43 +0000 (11:13 +0100)]
tools: add container_of() macro to xen-tools/common-macros.h
Instead of having 3 identical copies of the definition of a
container_of() macro in different tools header files, add that macro
to xen-tools/common-macros.h and use that instead.
Delete the other copies of that macro.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Juergen Gross [Fri, 24 Mar 2023 10:12:32 +0000 (11:12 +0100)]
tools: get rid of additional min() and max() definitions
Defining min(), min_t(), max() and max_t() at other places than
xen-tools/common-macros.h isn't needed, as the definitions in said
header can be used instead.
Same applies to BUILD_BUG_ON() in hvmloader.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Fri, 24 Mar 2023 10:11:48 +0000 (11:11 +0100)]
x86/PV: conditionalize arch_set_info_guest()'s call to update_cr3()
sh_update_paging_modes() as its last action already invokes
sh_update_cr3(). Therefore there is no reason to invoke update_cr3()
another time immediately after calling paging_update_paging_modes(),
especially as sh_update_cr3() does not short-circuit the "nothing
changed" case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 24 Mar 2023 10:10:41 +0000 (11:10 +0100)]
x86/shadow: replace memcmp() in sh_resync_l1()
Ordinary scalar operations are used in a multitude of other places, so
do so here as well. In fact take the opportunity and drop a local
variable then as well, first and foremost to get rid of a bogus cast.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 24 Mar 2023 10:08:36 +0000 (11:08 +0100)]
x86/shadow: fold/rename sh_unhook_*_mappings()
The "32b" and "pae" functions are identical at the source level (they
differ in what they get compiled to, due to differences in
SHADOW_FOREACH_L2E()), leaving aside a comment the PAE variant has and
the non-PAE one doesn't. Replace these infixes by the more usual l<N>
ones (and then also for the "64b" one for consistency; that'll also
allow for re-use once we support 5-level paging, if need be). The two
different instances are still distinguishable by their "level" suffix.
While fiddling with the names, convert the last parameter to boolean
as well.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 24 Mar 2023 10:07:08 +0000 (11:07 +0100)]
x86/shadow: fix and improve sh_page_has_multiple_shadows()
While no caller currently invokes the function without first making sure
there is at least one shadow [1], we'd better eliminate UB here:
find_first_set_bit() requires input to be non-zero to return a well-
defined result.
Further, using find_first_set_bit() isn't very efficient in the first
place for the intended purpose.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[1] The function has exactly two uses, and both are from OOS code, which
is HVM-only. For HVM (but not for PV) sh_mfn_is_a_page_table(),
guarding the call to sh_unsync(), guarantees at least one shadow.
Hence even if sh_page_has_multiple_shadows() returned a bogus value
when invoked for a PV domain, the subsequent is_hvm_vcpu() and
oos_active checks (the former being redundant with the latter) will
compensate. (Arguably that oos_active check should come first, for
both clarity and efficiency reasons.)
Juergen Gross [Thu, 23 Mar 2023 08:18:12 +0000 (09:18 +0100)]
tools/xl: make split_string_into_pair() more usable
Today split_string_into_pair() will not really do what its name is
suggesting: instead of splitting a string into a pair of strings using
a delimiter, it will return the first two strings of the initial string
by using the delimiter.
This is never what the callers want, so modify split_string_into_pair()
to split the string only at the first delimiter found, resulting in
something like "x=a=b" to be split into "x" and "a=b" when being called
with "=" as the delimiter. Today the returned strings would be "x" and
"a".
At the same time switch the delimiter from "const char *" (allowing
multiple delimiter characters) to "char" (a single character only), as
this makes the function more simple without breaking any use cases.
Suggested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Juergen Gross [Thu, 23 Mar 2023 08:17:57 +0000 (09:17 +0100)]
tools: use libxenlight for writing xenstore-stubdom console nodes
Instead of duplicating libxl__device_console_add() work in
init-xenstore-domain.c, just use libxenlight.
This requires to add a small wrapper function to libxenlight, as
libxl__device_console_add() is an internal function.
This at once removes a theoretical race between starting xenconsoled
and xenstore-stubdom, as the old code wasn't using a single
transaction for writing all the entries, leading to the possibility
that xenconsoled would see only some of the entries being written.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
VT-d: fix iommu=no-igfx if the IOMMU scope contains fake device(s)
If the scope for IGD's IOMMU contains additional device that doesn't
actually exist, iommu=no-igfx would not disable that IOMMU. In this
particular case (Thinkpad x230) it included 00:02.1, but there is no
such device on this platform. Consider only existing devices for the
"gfx only" check as well as the establishing of IGD DRHD address
(underlying is_igd_drhd(), which is used to determine applicability of
two workarounds).
Fixes: 2d7f191b392e ("VT-d: generalize and correct "iommu=no-igfx" handling") Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Juergen Gross [Wed, 22 Mar 2023 09:00:09 +0000 (10:00 +0100)]
tools/xl: allow split_string_into_pair() to trim values
Most use cases of split_string_into_pair() are requiring the returned
strings to be white space trimmed.
In order to avoid the same code pattern multiple times, add a predicate
parameter to split_string_into_pair() which can be specified to call
trim() with that predicate for the string pair returned. Specifying
NULL for the predicate will avoid the call of trim().
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Wed, 22 Mar 2023 08:58:25 +0000 (09:58 +0100)]
move {,vcpu_}show_execution_state() declarations to common header
These are used from common code, so their signatures should be
consistent across architectures. This is achieved / guaranteed easiest
when their declarations are in a common header.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Juergen Gross [Wed, 22 Mar 2023 08:57:19 +0000 (09:57 +0100)]
tools: rename xen-tools/libs.h file to common-macros.h
In order to better reflect the contents of the header and to make it
more appropriate to use it for different runtime environments like
programs, libraries, and firmware, rename the libs.h include file to
common-macros.h. Additionally add a comment pointing out the need to be
self-contained.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> # tools/python/xen/lowlevel/xc/xc.c Acked-by: Christian Lindig <christian.lindig@cloud.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 10 Feb 2023 21:11:14 +0000 (21:11 +0000)]
x86/spec-ctrl: Defer CR4_PV32_RESTORE on the cstar_enter path
As stated (correctly) by the comment next to SPEC_CTRL_ENTRY_FROM_PV, between
the two hunks visible in the patch, RET's are not safe prior to this point.
CR4_PV32_RESTORE hides a CALL/RET pair in certain configurations (PV32
compiled in, SMEP or SMAP active), and the RET can be attacked with one of
several known speculative issues.
Furthermore, CR4_PV32_RESTORE also hides a reference to the cr4_pv32_mask
global variable, which is not safe when XPTI is active before restoring Xen's
full pagetables.
This crash has gone unnoticed because it is only AMD CPUs which permit the
SYSCALL instruction in compatibility mode, and these are not vulnerable to
Meltdown so don't activate XPTI by default.
This is XSA-429 / CVE-2022-42331
Fixes: 5e7962901131 ("x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point") Fixes: 5784de3e2067 ("x86: Meltdown band-aid against malicious 64-bit PV guests") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 21 Mar 2023 12:01:01 +0000 (12:01 +0000)]
x86/HVM: serialize pinned cache attribute list manipulation
While the RCU variants of list insertion and removal allow lockless list
traversal (with RCU just read-locked), insertions and removals still
need serializing amongst themselves. To keep things simple, use the
domain lock for this purpose.
This is CVE-2022-42334 / part of XSA-428.
Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Tue, 21 Mar 2023 12:01:01 +0000 (12:01 +0000)]
x86/HVM: bound number of pinned cache attribute regions
This is exposed via DMOP, i.e. to potentially not fully privileged
device models. With that we may not permit registration of an (almost)
unbounded amount of such regions.
This is CVE-2022-42333 / part of XSA-428.
Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 21 Mar 2023 11:58:50 +0000 (11:58 +0000)]
x86/shadow: account for log-dirty mode when pre-allocating
Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).
Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)
It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.
This is CVE-2022-42332 / XSA-427.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Thu, 16 Mar 2023 17:53:56 +0000 (17:53 +0000)]
x86/vmx: Provide named fields for IO exit qualification
This removes most of the opencoded bit logic on the exit qualification.
Unfortunately, size is 1-based not 0-based, so need adjusting in a separate
variable.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 21 Mar 2023 08:23:25 +0000 (09:23 +0100)]
AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
An earlier change with the same title (commit 1ba66a870eba) altered only
the path where x2apic_phys was already set to false (perhaps from the
command line). The same of course needs applying when the variable
wasn't modified yet from its initial value.
Reported-by: Elliott Mitchell <ehem+xen@m5p.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jiamei Xie [Thu, 16 Mar 2023 09:12:24 +0000 (09:12 +0000)]
automation: arm64: Create test jobs for testing static shared memory on qemu
Create 2 new test jobs, called qemu-smoke-dom0less-arm64-gcc-static-shared-mem
and qemu-smoke-dom0less-arm64-gcc-debug-static-shared-mem.
Adjust qemu-smoke-dom0less-arm64.sh script to accomodate the static
shared memory test as a new test variant. The test variant is determined
based on the first argument passed to the script. For testing static
shared memory, the argument is 'static-shared-mem'.
The test configures two dom0less DOMUs with a static shared memory
region and adds a check in the init script.
The check consists in comparing the contents of the /proc/device-tree/reserved-memory
xen-shmem entry with the static shared memory range and id with which
DOMUs were configured. If the memory layout is correct, a message gets
printed by DOMU.
At the end of the qemu run, the script searches for the specific message
in the logs and fails if not found.
Jiamei Xie [Thu, 16 Mar 2023 09:12:23 +0000 (09:12 +0000)]
automation: arm64: Create test jobs for testing static heap on qemu
Create 2 new test jobs, called qemu-smoke-dom0less-arm64-gcc-staticheap
and qemu-smoke-dom0less-arm64-gcc-debug-staticheap.
Add property "xen,static-heap" under /chosen node to enable static-heap.
If the domU can start successfully with static-heap enabled, then this
test pass.
ImageBuillder sets the kernel and ramdisk range based on the file size.
It will use the memory range between 0x45600000 to 0x47AED1E8. It uses
MEMORY_START and MEMORY_END from the cfg file as a range in which it can
instruct u-boot where to place the images.
Change MEMORY_END to 0x50000000 for all test cases.
Michal Orzel [Mon, 20 Mar 2023 16:12:51 +0000 (17:12 +0100)]
xen/console: skip switching serial input to non existing domains
At the moment, we direct serial input to hardware domain by default.
This does not make any sense when running in true dom0less mode, since
such domain does not exist. As a result, users wishing to write to
an emulated UART of a domU are always forced to execute CTRL-AAA first.
The same issue is when rotating among serial inputs, where we always
have to go through hardware domain case. This problem can be elaborated
further to all the domains that no longer exist.
Modify switch_serial_input() so that we skip switching serial input to
non existing domains. Take the opportunity to define and make use of
macro max_console_rx to make it clear what 'max_init_domid + 1' means
in the console code context. Also, modify call to printk() to use correct
format specifier for unsigned int.
For now, to minimize the required changes and to match the current
behavior with hwdom, the default input goes to the first real domain.
The choice is more or less arbitrary since dom0less domUs are supposedly
equal. This will be handled in the future by adding support in boot time
configuration for marking a specific domain preferred in terms of
directing serial input to.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
David Woodhouse [Mon, 20 Mar 2023 16:12:34 +0000 (17:12 +0100)]
libacpi: fix PCI hotplug AML
The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).
The AML that Xen gives to a guest gets the operand order for the odd-
numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
instead.
As far as I can tell, this was the wrong way round in Xen from the
moment that PCI hotplug was first introduced in commit 83d82e6f35a8:
+ ShiftRight (0x4, \_GPE.PH00, Local1)
+ Return (Local1) /* IN status as the _STA */
Or maybe there's bizarre AML operand ordering going on there, like
Intel's wrong-way-round assembler, and it only broke later when it was
changed to being generated?
Either way, it's definitely wrong now, and instrumenting a Linux guest
shows that it correctly sees _STA being 0x00 in function 0 of an empty
slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
an adapter is present in every slot in /sys/bus/pci/slots/*
Quite why Linux wants to look for function 1 being physically present
when function 0 isn't... I don't want to think about right now.
Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug") Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Oleksii Kurochko [Mon, 20 Mar 2023 16:12:04 +0000 (17:12 +0100)]
xen/riscv: initialize .bss section
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Oleksii Kurochko [Mon, 20 Mar 2023 16:11:13 +0000 (17:11 +0100)]
xen/riscv: read/save hart_id and dtb_base passed by bootloader
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Oleksii Kurochko [Mon, 20 Mar 2023 16:10:34 +0000 (17:10 +0100)]
xen/riscv: disable fpu
Disable FPU to detect illegal usage of floating point in kernel
space.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Michal Orzel [Fri, 3 Mar 2023 12:53:46 +0000 (13:53 +0100)]
automation: Drop sles11sp4 dockerfile
It has reached EOL and there are no jobs using it on any branch.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Andrew Cooper [Tue, 14 Mar 2023 10:53:51 +0000 (10:53 +0000)]
tools: Use -s for python shebangs
This is mandated by the Fedora packaging guidelines because it is a security
vulnerability otherwise in suid scripts. While Xen doesn't have suid scripts,
it's a very good idea generally because it prevents the users local python
environment interfering from system packaged scripts.
pygrub is the odd-script-out, being installed by distutils rather than
manually with INSTALL_PYTHON_PROG. distutils has no nice way of editing the
shebang, so arrange to use INSTALL_PYTHON_PROG for pygrub too.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Tue, 14 Mar 2023 10:59:25 +0000 (10:59 +0000)]
tools/python: Improve unit test handling
* Add X86_{CPUID,MSR}_POLICY_FORMAT checks which were missed previously.
* Drop test_suite(). It hasn't been necessary since the Py2.3 era.
* Drop the __main__ logic. This can't be used without manually adjusting the
include path, and `make test` knows how to do the right thing.
* For `make test`, use `-v` to see which tests have been discovered and run.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Tue, 14 Mar 2023 13:17:19 +0000 (13:17 +0000)]
tools: Delete trailing whitespace in python scripts
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Tue, 14 Mar 2023 13:31:32 +0000 (13:31 +0000)]
tools/misc: Drop xencons
This script is not python3 compatible, but has its shebang altered to say
python3 by INSTALL_PYTHON_PROG.
The most recent reference I can find to this script (which isn't incidental
adjustments in the makefile) is from the Xen book, fileish 561e30b80402 which
says
%% <snip> Alternatively, if the
%% Xen machine is connected to a serial-port server then we supply a
%% dumb TCP terminal client, {\tt xencons}.
So this a not-invented-here version of telnet. Delete it.
Resolves: xen-project/xen#159 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Tue, 14 Mar 2023 13:18:41 +0000 (13:18 +0000)]
tools/python: Drop pylintrc
This was added in 2004 in c/s b7d4a69f0ccb5 and has never been referenced
since. Given the the commit message of simply "Added .", it was quite
possibly a mistake in the first place.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Jan Beulich [Thu, 16 Mar 2023 13:48:23 +0000 (14:48 +0100)]
x86/paging: move and conditionalize flush_tlb() hook
The hook isn't mode dependent, hence it's misplaced in struct
paging_mode. (Or alternatively I see no reason why the alloc_page() and
free_page() hooks don't also live there.) Move it to struct
paging_domain.
The hook also is used for HVM guests only, so make respective pieces
conditional upon CONFIG_HVM.
While there also add __must_check to the hook declaration, as it's
imperative that callers deal with getting back "false".
While moving the shadow implementation, introduce a "curr" local
variable.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 16 Mar 2023 13:46:31 +0000 (14:46 +0100)]
x86/paging: move update_paging_modes() hook
The hook isn't mode dependent, hence it's misplaced in struct
paging_mode. (Or alternatively I see no reason why the alloc_page() and
free_page() hooks don't also live there.) Move it to struct
paging_domain.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 16 Mar 2023 13:43:31 +0000 (14:43 +0100)]
x86/paging: drop set-allocation from final-teardown
The fixes for XSA-410 have arranged for P2M pages being freed by P2M
code to be properly freed directly, rather than being put back on the
paging pool list. Therefore whatever p2m_teardown() may return will no
longer need taking care of here. Drop the code, leaving the assertions
in place and adding "total" back to the PAGING_PRINTK() message.
With merely the (optional) log message and the assertions left, there's
really no point anymore to hold the paging lock there, so drop that too.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 16 Mar 2023 13:42:04 +0000 (14:42 +0100)]
x86/paging: fold most HAP and shadow final teardown
HAP does a few things beyond what's common, which are left there at
least for now. Common operations, however, are moved to
paging_final_teardown(), allowing shadow_final_teardown() to go away.
While moving (and hence generalizing) the respective SHADOW_PRINTK()
drop the logging of total_pages from the 2nd instance - the value is
necessarily zero after {hap,shadow}_set_allocation() - and shorten the
messages, in part accounting for PAGING_PRINTK() logging __func__
already.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 16 Mar 2023 12:23:14 +0000 (13:23 +0100)]
x86: don't include processor.h from system.h
processor.h in particular pulls in xen/smp.h, which is overly heavy for
a supposedly pretty fundamental header like system.h. To keep things
building, move the declarations of struct cpuinfo_x86 and boot_cpu_data
to asm/cpufeature.h (which arguably also is where they belong). In the
course of the move switch away from using fixed-width types and convert
plain "int" to "unsigned int" for the two x86_cache_* fields.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 16 Mar 2023 12:21:50 +0000 (13:21 +0100)]
console: use more appropriate domain RCU-locking function
While both 19afff14b4cb ("xen: support console_switching between Dom0
and DomUs on ARM") and 1ee1e4b0d1ff ("xen/arm: Allow vpl011 to be used
by DomU") were part of the same series (iirc), the latter correctly used
rcu_lock_domain_by_id() in console_input_domain(), whereas the former
for some reason used rcu_lock_domain_by_any_id() instead, despite that
code only kind of open-coding console_input_domain(). There's no point
here to deal with DOMID_SELF, which is the sole difference between the
two functions.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>