]> xenbits.xensource.com Git - people/tklengyel/xen.git/log
people/tklengyel/xen.git
3 years agox86/gdbsx: Move domain_pause_for_debugger() into gdbsx
Andrew Cooper [Wed, 20 Apr 2022 00:38:32 +0000 (01:38 +0100)]
x86/gdbsx: Move domain_pause_for_debugger() into gdbsx

domain_pause_for_debugger() is guest debugging (CONFIG_GDBSX) not host
debugging (CONFIG_CRASH_DEBUG).

Move it into the new gdbsx.c to drop the (incorrect) ifdefary, and provide a
static inline in the !CONFIG_GDBSX case so callers can optimise away
everything rather than having to emit a call to an empty function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/gdbsx: Rename debug.c to gdbsx.c
Bobby Eshleman [Tue, 28 Sep 2021 20:30:26 +0000 (13:30 -0700)]
x86/gdbsx: Rename debug.c to gdbsx.c

debug.c contains only dbg_rw_mem().  Rename it to gdbsx.c.

Move gdbsx_guest_mem_io(), and the prior setup of iop->remain, from domctl.c
to gdbsx.c, merging it with dbg_rw_mem().

Signed-off-by: Bobby Eshleman <bobby.eshleman@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/debugger: Remove debugger_trap_entry()
Bobby Eshleman [Tue, 28 Sep 2021 20:30:25 +0000 (13:30 -0700)]
x86/debugger: Remove debugger_trap_entry()

debugger_trap_entry() is unrelated to the other contents of debugger.h.  It is
a no-op for everything other than #DB/#BP, and for those it invokes guest
debugging (CONFIG_GDBSX) not host debugging (CONFIG_CRASH_DEBUG).

The reason it is a no-op for gdbstub is related to the fact that it's
description is inappropriate for any kind of useful debugging.  In normal
debugging, gdb only sees things which manifest as signals; it doesn't see
things which the kernel resolves itself (some #PF, #NM, etc).  Furthermore,
without a mechanism to invoke pv_inject_event(), the current infrastructure
will livelock on faults from guest context.

As such, there is no plausible future matching it's description.  Any work to
do something better than the current nothing will have to design something
more coherent.

Therefore, simplify everything by expanding debugger_trap_entry() into its two
non-empty locations, fixing bugs with their positioning (vs early exceptions
and curr not being safe to deference) and for #DB, deferring the pause until
the changes in %dr6 are saved to v->arch.dr6 so the debugger can actually see
which condition triggered.  This also removes some logically dead code from
do_trap(), where the compiler can't prove that #DB/#BP are handled by
different codepaths.

Signed-off-by: Bobby Eshleman <bobby.eshleman@gmail.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/build: Fix MAP rule when called in isolation
Andrew Cooper [Thu, 21 Apr 2022 14:23:37 +0000 (15:23 +0100)]
xen/build: Fix MAP rule when called in isolation

Now that `make MAP` might rebuild $(TARGET), it needs removing from
no-dot-config-targets.

Otherwise the build eventually fails with:

    CPP     arch/x86/asm-macros.i
  arch/x86/asm-macros.c:1:10: fatal error: asm/asm-defns.h: No such file or
  directory
      1 | #include <asm/asm-defns.h>
        |          ^~~~~~~~~~~~~~~~~

Fixes: e1e72198213b ("xen/build: Fix dependency for the MAP rule")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/build: make linking work again with ld incapable of generating build ID
Jan Beulich [Fri, 22 Apr 2022 12:56:23 +0000 (14:56 +0200)]
x86/build: make linking work again with ld incapable of generating build ID

The retaining of .note.* in a PT_NOTE segment requires a matching
program header to be present in the first place. Drop the respective
conditional and adjust mkelf32 to deal with (ignore) the potentially
present but empty extra segment (but have the new code be generic by
dropping any excess trailing entirely empty segments).

Fixes: dedb0aa42c6d ("x86/build: use --orphan-handling linker option if available")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoAMD/IOMMU: drop stray TLB flush
Jan Beulich [Fri, 22 Apr 2022 12:54:59 +0000 (14:54 +0200)]
AMD/IOMMU: drop stray TLB flush

I think this flush was overlooked when flushing was moved out of the
core (un)mapping functions. The flush the caller is required to invoke
anyway will satisfy the needs resulting from the splitting of a
superpage.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoIOMMU: have vendor code announce supported page sizes
Jan Beulich [Fri, 22 Apr 2022 12:54:16 +0000 (14:54 +0200)]
IOMMU: have vendor code announce supported page sizes

Generic code will use this information to determine what order values
can legitimately be passed to the ->{,un}map_page() hooks. For now all
ops structures simply get to announce 4k mappings (as base page size),
and there is (and always has been) an assumption that this matches the
CPU's MMU base page size (eventually we will want to permit IOMMUs with
a base page size smaller than the CPU MMU's).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
3 years agoVT-d: limit page table population in domain_pgd_maddr()
Jan Beulich [Fri, 22 Apr 2022 12:53:13 +0000 (14:53 +0200)]
VT-d: limit page table population in domain_pgd_maddr()

I have to admit that I never understood why domain_pgd_maddr() wants to
populate all page table levels for DFN 0. I can only assume that despite
the comment there what is needed is population just down to the smallest
possible nr_pt_levels that the loop later in the function may need to
run to. Hence what is needed is the minimum of all possible
iommu->nr_pt_levels, to then be passed into addr_to_dma_page_maddr()
instead of literal 1.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: have callers specify the target level for page table walks
Jan Beulich [Fri, 22 Apr 2022 12:52:40 +0000 (14:52 +0200)]
VT-d: have callers specify the target level for page table walks

In order to be able to insert/remove super-pages we need to allow
callers of the walking function to specify at which point to stop the
walk.

For intel_iommu_lookup_page() integrate the last level access into
the main walking function.

dma_pte_clear_one() gets only partly adjusted for now: Error handling
and order parameter get put in place, but the order parameter remains
ignored (just like intel_iommu_map_page()'s order part of the flags).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoAMD/IOMMU: have callers specify the target level for page table walks
Jan Beulich [Fri, 22 Apr 2022 12:51:37 +0000 (14:51 +0200)]
AMD/IOMMU: have callers specify the target level for page table walks

In order to be able to insert/remove super-pages we need to allow
callers of the walking function to specify at which point to stop the
walk. (For now at least gcc will instantiate just a variant of the
function with the parameter eliminated, so effectively no change to
generated code as far as the parameter addition goes.)

Instead of merely adjusting a BUG_ON() condition, convert it into an
error return - there's no reason to crash the entire host in that case.
Leave an assertion though for spotting issues early in debug builds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agogitlab-ci: add an ARM32 qemu-based smoke test
Stefano Stabellini [Thu, 21 Apr 2022 23:17:40 +0000 (16:17 -0700)]
gitlab-ci: add an ARM32 qemu-based smoke test

Add a minimal ARM32 smoke test based on qemu-system-arm, as provided by
the test-artifacts qemu container. The minimal test simply boots Xen
(built from previous build stages) and Dom0.

The test needs a working kernel and minimal initrd for dom0. Instead of
building our own kernel and initrd, which would mean maintaining one or
two more builting scripts under automation/, we borrow a kernel and
initrd from distros.

For the kernel we pick the Debian Bullseye kernel, which has everything
we need already built-in. However, we cannot use the Debian Bullseye
initrd because it is 22MB and the large size causes QEMU to core dump.

Instead, use the tiny busybox-based rootfs provided by Alpine Linux,
which is really minimal: just 2.5MB. Note that we cannot use the Alpine
Linux kernel because that doesn't boot on Xen.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
3 years agogitlab-ci: add qemu-system-arm to the existing tests-artifacts container
Stefano Stabellini [Sat, 16 Apr 2022 00:17:00 +0000 (17:17 -0700)]
gitlab-ci: add qemu-system-arm to the existing tests-artifacts container

Add qemu-system-arm to the existing test-artifacts qemu container (which
doesn't get build for every iteration but only updated once in a while.)

With qemu-system-arm available, we'll be able to run ARM32 tests.

This patch also bumps the QEMU version to v6.0.0 for both arm32 and
arm64 (the test-artifacts container is one, shared for both).

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/build: Rework binary conversion for boot/{cmdline,reloc}.c
Andrew Cooper [Thu, 14 Apr 2022 09:33:01 +0000 (10:33 +0100)]
x86/build: Rework binary conversion for boot/{cmdline,reloc}.c

There is no need to opencode .got.plt size check; it can be done with linker
asserts instead.  Extend the checking to all dynamic linkage sections, and
drop the $(OBJDUMP) pass.

Furthermore, instead of removing .got.plt specifically, take only .text when
converting to a flat binary.  This makes the process invariant of .text's
position relative to the start of the binary, which avoids needing to discard
all sections, and removes the need to work around sections that certain
linkers are unhappy discarding.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/build: Fix dependency for the MAP rule
Andrew Cooper [Thu, 14 Apr 2022 16:04:54 +0000 (17:04 +0100)]
xen/build: Fix dependency for the MAP rule

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/mm: avoid inadvertently degrading a TLB flush to local only
David Vrabel [Wed, 20 Apr 2022 08:55:01 +0000 (10:55 +0200)]
x86/mm: avoid inadvertently degrading a TLB flush to local only

If the direct map is incorrectly modified with interrupts disabled,
the required TLB flushes are degraded to flushing the local CPU only.

This could lead to very hard to diagnose problems as different CPUs will
end up with different views of memory. Although, no such issues have yet
been identified.

Change the check in the flush_area() macro to look at system_state
instead. This defers the switch from local to all later in the boot
(see xen/arch/x86/setup.c:__start_xen()). This is fine because
additional PCPUs are not brought up until after the system state is
SYS_STATE_smp_boot.

Signed-off-by: David Vrabel <dvrabel@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoVT-d: refuse to use IOMMU with reserved CAP.ND value
Jan Beulich [Wed, 20 Apr 2022 08:54:26 +0000 (10:54 +0200)]
VT-d: refuse to use IOMMU with reserved CAP.ND value

The field taking the value 7 (resulting in 18-bit DIDs when using the
calculation in cap_ndoms(), when the DID fields are only 16 bits wide)
is reserved. Instead of misbehaving in case we would encounter such an
IOMMU, refuse to use it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: plug memory leaks in iommu_alloc()
Jan Beulich [Wed, 20 Apr 2022 08:53:57 +0000 (10:53 +0200)]
VT-d: plug memory leaks in iommu_alloc()

While 97af062b89d5 ("IOMMU/x86: maintain a per-device pseudo domain ID")
took care of not making things worse, plugging pre-existing leaks wasn't
the purpose of that change; they're not security relevant after all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: drop ROOT_ENTRY_NR
Jan Beulich [Wed, 20 Apr 2022 08:53:19 +0000 (10:53 +0200)]
VT-d: drop ROOT_ENTRY_NR

It's not only misplaced, but entirely unused.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoIOMMU/x86: drop locking from quarantine_init() hooks
Jan Beulich [Wed, 20 Apr 2022 08:52:13 +0000 (10:52 +0200)]
IOMMU/x86: drop locking from quarantine_init() hooks

Prior extension of these functions to enable per-device quarantine page
tables already didn't add more locking there, but merely left in place
what had been there before. But really locking is unnecessary here:
We're running with pcidevs_lock held (i.e. multiple invocations of the
same function [or their teardown equivalents] are impossible, and hence
there are no "local" races), while all consuming of the data being
populated here can't race anyway due to happening sequentially
afterwards, and unlike ordinary domains' page tables quarantine ones
are never modified once fully constructed. See also the comment in
struct arch_pci_dev.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoinclude/public: add command result definitions to vscsiif.h
Juergen Gross [Wed, 20 Apr 2022 08:51:26 +0000 (10:51 +0200)]
include/public: add command result definitions to vscsiif.h

The result field of struct vscsiif_response is lacking a detailed
definition. Today the Linux kernel internal scsi definitions are being
used, which is not a sane interface for a PV device driver.

Add macros to change that by using today's values in the XEN namespace.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
3 years agoxen/arm: Add i.MX lpuart early printk support
Peng Fan [Tue, 19 Apr 2022 04:39:27 +0000 (12:39 +0800)]
xen/arm: Add i.MX lpuart early printk support

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: Add i.MX lpuart driver
Peng Fan [Tue, 19 Apr 2022 04:39:26 +0000 (12:39 +0800)]
xen/arm: Add i.MX lpuart driver

The i.MX LPUART Documentation:
https://www.nxp.com/webapp/Download?colCode=IMX8QMIEC
Chatper 13.6 Low Power Universal Asynchronous Receiver/
Transmitter (LPUART)

Tested-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: Make use of DT_MATCH_TIMER in make_timer_node
Michal Orzel [Thu, 14 Apr 2022 09:58:43 +0000 (11:58 +0200)]
xen/arm: Make use of DT_MATCH_TIMER in make_timer_node

DT_MATCH_TIMER stores the compatible timer ids and as such should be
used in all the places where we need to refer to them. make_timer_node
explicitly lists the same ids as the ones defined in DT_MATCH_TIMER so
make use of this macro instead.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen: cleanup gdbsx_guest_mem_io() call
Juergen Gross [Tue, 19 Apr 2022 13:52:53 +0000 (15:52 +0200)]
xen: cleanup gdbsx_guest_mem_io() call

Modify the gdbsx_guest_mem_io() interface to take the already known
domain pointer as parameter instead of the domid. This enables to
remove some more code further down the call tree.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen: fix XEN_DOMCTL_gdbsx_guestmemio crash
Juergen Gross [Tue, 19 Apr 2022 13:52:52 +0000 (15:52 +0200)]
xen: fix XEN_DOMCTL_gdbsx_guestmemio crash

A hypervisor built without CONFIG_GDBSX will crash in case the
XEN_DOMCTL_gdbsx_guestmemio domctl is being called, as the call will
end up in iommu_do_domctl() with d == NULL:

  (XEN) CPU:    6
  (XEN) RIP:    e008:[<ffff82d040269984>] iommu_do_domctl+0x4/0x30
  (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor (d0v0)
  (XEN) rax: 00000000000003e8   rbx: ffff830856277ef8   rcx: ffff830856277fff
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d040269984>] R iommu_do_domctl+0x4/0x30
  (XEN)    [<ffff82d04035cd5f>] S arch_do_domctl+0x7f/0x2330
  (XEN)    [<ffff82d040239e46>] S do_domctl+0xe56/0x1930
  (XEN)    [<ffff82d040238ff0>] S do_domctl+0/0x1930
  (XEN)    [<ffff82d0402f8c59>] S pv_hypercall+0x99/0x110
  (XEN)    [<ffff82d0402f5161>] S arch/x86/pv/domain.c#_toggle_guest_pt+0x11/0x90
  (XEN)    [<ffff82d040366288>] S lstar_enter+0x128/0x130
  (XEN)
  (XEN) Pagetable walk from 0000000000000144:
  (XEN)  L4[0x000] = 0000000000000000 ffffffffffffffff
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 6:
  (XEN) FATAL PAGE FAULT
  (XEN) [error_code=0000]
  (XEN) Faulting linear address: 0000000000000144
  (XEN) ****************************************

It used to be permitted to pass DOMID_IDLE to dbg_rw_mem(), which is why the
special case skipping the domid checks exists.  Now that it is only permitted
to pass proper domids, remove the special case, making 'd' always valid.

Reported-by: Cheyenne Wills <cheyenne.wills@gmail.com>
Fixes: e726a82ca0dc ("xen: make gdbsx support configurable")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/debug: Drop unnecessary include of compile.h
Andrew Cooper [Thu, 14 Apr 2022 09:01:53 +0000 (10:01 +0100)]
x86/debug: Drop unnecessary include of compile.h

compile.h changes across incremental builds, but nothing in debug.c uses it.
This avoids debug.c getting rebuilt on every incremental build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoIOMMU: log appropriate SBDF
Jan Beulich [Wed, 13 Apr 2022 10:36:03 +0000 (12:36 +0200)]
IOMMU: log appropriate SBDF

To handle phantom devices, several functions are passed separate "devfn"
arguments besides a PCI device. In such cases we want to log the phantom
device's coordinates instead of the main one's. (Note that not all of
the instances being changed are fallout from the referenced commit.)

Fixes: 1ee1441835f4 ("print: introduce a format specifier for pci_sbdf_t")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoAMD/IOMMU: replace a few PCI_BDF2()
Jan Beulich [Wed, 13 Apr 2022 10:35:17 +0000 (12:35 +0200)]
AMD/IOMMU: replace a few PCI_BDF2()

struct pci_dev has the wanted value directly available; use it. Note
that this fixes a - imo benign - mistake in reassign_device(): The unity
map removal ought to be based on the passed in devfn (as is the case on
the establishing side). This is benign because the mappings would be
removed anyway a little later, when the "main" device gets processed.
While there also limit the scope of two variables in that function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agobuild: adding out-of-tree support to the xen build
Anthony PERARD [Wed, 13 Apr 2022 10:33:21 +0000 (12:33 +0200)]
build: adding out-of-tree support to the xen build

This implement out-of-tree support, there's two ways to create an
out-of-tree build tree (after that, `make` in that new directory
works):
    make O=build
    mkdir build; cd build; make -f ../Makefile
also works with an absolute path for both.

This implementation only works if the source tree is clean, as we use
VPATH.

This patch copies most new code with handling out-of-tree build from
Linux v5.12.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Tested-by: Julien Grall <jgrall@amazon.com>
Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com> # livepatch
3 years agoMAINTAINERS: add myself as Continuous Integration maintainer
Stefano Stabellini [Fri, 8 Apr 2022 00:00:47 +0000 (17:00 -0700)]
MAINTAINERS: add myself as Continuous Integration maintainer

I have contributed all the ARM tests to gitlab-ci. After checking with
Doug, I am happy to volunteer to co-maintain Continuous Integration.

Also take the opportunity to remove the stale travis-ci entries.

Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agotools/xl: fix vif and vcpupin parse tests
Roger Pau Monné [Mon, 11 Apr 2022 10:33:02 +0000 (12:33 +0200)]
tools/xl: fix vif and vcpupin parse tests

Current vif and vcpupin parse tests are out of sync.  First of all, xl
returns 1 on failure, so replace the expected error code.

Secondly fix the expected output from some vif tests, as xl will no
longer print the unpopulated fields.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agox86/boot: LEA -> MOV in video handling code
Jan Beulich [Mon, 11 Apr 2022 10:31:02 +0000 (12:31 +0200)]
x86/boot: LEA -> MOV in video handling code

Replace most LEA instances with (one byte shorter) MOV.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Mon, 11 Apr 2022 10:30:37 +0000 (12:30 +0200)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

3 years agox86/boot: obtain video info from boot loader
Jan Beulich [Mon, 11 Apr 2022 10:30:09 +0000 (12:30 +0200)]
x86/boot: obtain video info from boot loader

With MB2 the boot loader may provide this information, allowing us to
obtain it without needing to enter real mode (assuming we don't need to
set a new mode from "vga=", but can instead inherit the one the
bootloader may have established).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/boot: make "vga=current" work with graphics modes
Jan Beulich [Mon, 11 Apr 2022 10:29:14 +0000 (12:29 +0200)]
x86/boot: make "vga=current" work with graphics modes

GrUB2 can be told to leave the screen in the graphics mode it has been
using (or any other one), via "set gfxpayload=keep" (or suitable
variants thereof). In this case we can avoid doing another mode switch
ourselves. This in particular avoids possibly setting the screen to a
less desirable mode: On one of my test systems the set of modes
reported available by the VESA BIOS depends on whether the interposed
KVM switch has that machine set as the active one. If it's not active,
only modes up to 1024x768 get reported, while when active 1280x1024
modes are also included. For things to always work with an explicitly
specified mode (via the "vga=" option), that mode therefore needs be a
1024x768 one.

For some reason this only works for me with "multiboot2" (and
"module2"); "multiboot" (and "module") still forces the screen into text
mode, despite my reading of the sources suggesting otherwise.

For starters I'm limiting this to graphics modes; I do think this ought
to also work for text modes, but
- I can't tell whether GrUB2 can set any text mode other than 80x25
  (I've only found plain "text" to be valid as a "gfxpayload" setting),
- I'm uncertain whether supporting that is worth it, since I'm uncertain
  how many people would be running their systems/screens in text mode,
- I'd like to limit the amount of code added to the realmode trampoline.

For starters I'm also limiting mode information retrieval to raw BIOS
accesses. This will allow things to work (in principle) also with other
boot environments where a graphics mode can be left in place. The
downside is that this then still is dependent upon switching back to
real mode, so retrieving the needed information from multiboot info is
likely going to be desirable down the road.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Henry Wang <Henry.Wang@arm.com>
3 years agoxen: Populate xen.lds.h and make use of its macros
Michal Orzel [Mon, 11 Apr 2022 07:03:00 +0000 (09:03 +0200)]
xen: Populate xen.lds.h and make use of its macros

Populate header file xen.lds.h with the first portion of macros storing
constructs common to x86 and arm linker scripts. Replace the original
constructs with these helpers.

No functional improvements to x86 linker script.

Making use of common macros improves arm linker script with:
- explicit list of debug sections that otherwise are seen as "orphans"
  by the linker. This will allow to fix issues after enabling linker
  option --orphan-handling one day,
- extended list of discarded section to include: .discard, destructors
  related sections, .fini_array which can reference .text.exit,
- sections not related to debugging that are placed by ld.lld. Even
  though we do not support linking with LLD on Arm, these sections do
  not cause problem to GNU ld,

As we are replacing hardcoded boundary specified as an argument to ALIGN
function with POINTER_ALIGN, this changes the alignment in HYPFS_PARAM
construct for arm32 from 8 to 4. It is fine as there are no 64bit values
used in struct param_hypfs.

Please note that this patch does not aim to perform the full sync up
between the linker scripts. It creates a base for further work.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoxen: Introduce a header to store common linker scripts content
Michal Orzel [Mon, 11 Apr 2022 07:02:59 +0000 (09:02 +0200)]
xen: Introduce a header to store common linker scripts content

Both x86 and arm linker scripts share quite a lot of common content.
It is difficult to keep syncing them up, thus introduce a new header
in include/xen called xen.lds.h to store the internals mutual to all
the linker scripts.

Include this header in linker scripts for x86 and arm.
This patch serves as an intermediate step before populating xen.lds.h
and making use of its content in the linker scripts later on.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoSUPPORT.md: add Dom0less as Supported
Stefano Stabellini [Fri, 8 Apr 2022 00:10:37 +0000 (17:10 -0700)]
SUPPORT.md: add Dom0less as Supported

Add Dom0less to SUPPORT.md to clarify its support status. The feature is
mature enough and small enough to make it security supported.

Clarify that dom0less DomUs memory is not scrubbed at boot when
bootscrub=on or bootscrub=off are passed as Xen command line parameters,
and no XSAs will be issued for that.

Also see XSA-372: 371347c5b64da and fd5dc41ceaed.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agox86/irq: skip unmap_domain_pirq XSM during destruction
Jason Andryuk [Fri, 8 Apr 2022 12:51:52 +0000 (14:51 +0200)]
x86/irq: skip unmap_domain_pirq XSM during destruction

xsm_unmap_domain_irq was seen denying unmap_domain_pirq when called from
complete_domain_destroy as an RCU callback.  The source context was an
unexpected, random domain.  Since this is a xen-internal operation,
going through the XSM hook is inapproriate.

Check d->is_dying and skip the XSM hook when set since this is a cleanup
operation for a domain being destroyed.

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/P2M: the majority for struct p2m_domain's fields are HVM-only
Jan Beulich [Fri, 8 Apr 2022 12:51:06 +0000 (14:51 +0200)]
x86/P2M: the majority for struct p2m_domain's fields are HVM-only

..., as are the majority of the locks involved. Conditionalize things
accordingly.

Also adjust the ioreq field's indentation at this occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: p2m.c is HVM-only
Jan Beulich [Fri, 8 Apr 2022 12:50:29 +0000 (14:50 +0200)]
x86/P2M: p2m.c is HVM-only

This only requires moving p2m_percpu_rwlock elsewhere (ultimately I
think all P2M locking should go away as well when !HVM, but this looks
to require further code juggling). The two other unguarded functions are
already unneeded (by virtue of DCE) when !HVM.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agopaged_pages field is MEM_PAGING-only
Jan Beulich [Fri, 8 Apr 2022 12:48:45 +0000 (14:48 +0200)]
paged_pages field is MEM_PAGING-only

Conditionalize it and its uses accordingly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agoshr_pages field is MEM_SHARING-only
Jan Beulich [Fri, 8 Apr 2022 12:47:56 +0000 (14:47 +0200)]
shr_pages field is MEM_SHARING-only

Conditionalize it and its uses accordingly. The main goal though is to
demonstrate that x86's p2m_teardown() is now empty when !HVM, which in
particular means the last remaining use of p2m_lock() in this cases goes
away.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/p2m: re-arrange {,__}put_gfn()
Jan Beulich [Fri, 8 Apr 2022 12:47:11 +0000 (14:47 +0200)]
x86/p2m: re-arrange {,__}put_gfn()

All explicit callers of __put_gfn() are in HVM-only code and hold a valid
P2M pointer in their hands. Move the paging_mode_translate() check out of
there into put_gfn(), renaming __put_gfn() and making its GFN parameter
type-safe.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: derive HVM-only variant from __get_gfn_type_access()
Jan Beulich [Fri, 8 Apr 2022 12:46:30 +0000 (14:46 +0200)]
x86/P2M: derive HVM-only variant from __get_gfn_type_access()

Introduce an inline wrapper dealing with the non-translated-domain case,
while stripping that logic from the main function, which gets renamed to
p2m_get_gfn_type_access(). HVM-only callers can then directly use the
main function.

Along with renaming the main function also make its and the new inline
helper's GFN parameters type-safe.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: p2m_get_page_from_gfn() is HVM-only
Jan Beulich [Fri, 8 Apr 2022 12:45:37 +0000 (14:45 +0200)]
x86/P2M: p2m_get_page_from_gfn() is HVM-only

This function is the wrong layer to go through for PV guests. It happens
to work, but produces results which aren't fully consistent with
get_page_from_gfn(). The latter function, however, cannot be used in
map_domain_gfn() as it may not be the host P2M we mean to act on.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: split out init/teardown functions
Jan Beulich [Fri, 8 Apr 2022 12:44:05 +0000 (14:44 +0200)]
x86/P2M: split out init/teardown functions

Mostly just code movement, and certainly no functional change intended.
In p2m_final_teardown() the calls to p2m_teardown_{alt,nested}p2m() need
to be guarded by an is_hvm_domain() check now, though. This matches
p2m_init(). And p2m_is_logdirty_range() also gets moved inside the (so
far) adjacent #ifdef.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: PoD, altp2m, and nested-p2m are HVM-only
Jan Beulich [Fri, 8 Apr 2022 12:41:51 +0000 (14:41 +0200)]
x86/P2M: PoD, altp2m, and nested-p2m are HVM-only

There's no need to initialize respective data for PV domains. Note that
p2m_teardown_{alt,nested}p2m() will handle the lack-of-initialization
case fine.

As a result, despite PV domains having a host P2M associated with them
and hence using XENMEM_get_pod_target on such may not be a real problem,
calling p2m_pod_set_mem_target() for a PV domain is surely wrong, even
if benign at present. Add a guard there as well.

In p2m_pod_demand_populate() the situation is a little different: This
function is reachable only for HVM domains anyway, but following from
other PoD functions only ever acting on the host P2M (and hence PoD
entries only ever existing in host P2Ms), assert and bail from there for
non-host-P2Ms.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: p2m_{alloc,free}_ptp() and p2m_alloc_table() are HVM-only
Jan Beulich [Fri, 8 Apr 2022 12:40:46 +0000 (14:40 +0200)]
x86/P2M: p2m_{alloc,free}_ptp() and p2m_alloc_table() are HVM-only

This also includes the two p2m related fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/mm: split set_identity_p2m_entry() into PV and HVM parts
Jan Beulich [Fri, 8 Apr 2022 12:39:43 +0000 (14:39 +0200)]
x86/mm: split set_identity_p2m_entry() into PV and HVM parts

..., moving the former into the new physmap.c. Also call the new
functions directly from arch_iommu_hwdom_init() and
vpci_make_msix_hole(), as the PV/HVM split is explicit there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agolivepatch: avoid relocations referencing ignored section symbols
Roger Pau Monné [Fri, 8 Apr 2022 08:27:11 +0000 (10:27 +0200)]
livepatch: avoid relocations referencing ignored section symbols

Track whether symbols belong to ignored sections in order to avoid
applying relocations referencing those symbols. The address of such
symbols won't be resolved and thus the relocation will likely fail or
write garbage to the destination.

Return an error in that case, as leaving unresolved relocations would
lead to malfunctioning payload code.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Bjoern Doebel <doebel@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
3 years agolivepatch: do not ignore sections with 0 size
Roger Pau Monné [Fri, 8 Apr 2022 08:24:10 +0000 (10:24 +0200)]
livepatch: do not ignore sections with 0 size

A side effect of ignoring such sections is that symbols belonging to
them won't be resolved, and that could make relocations belonging to
other sections that reference those symbols fail.

For example it's likely to have an empty .altinstr_replacement with
symbols pointing to it, and marking the section as ignored will
prevent the symbols from being resolved, which in turn will cause any
relocations against them to fail.

In order to solve this do not ignore sections with 0 size, only ignore
sections that don't have the SHF_ALLOC flag set.

Special case such empty sections in move_payload so they are not taken
into account in order to decide whether a livepatch can be safely
re-applied after a revert.

Fixes: 98b728a7b2 ('livepatch: Disallow applying after an revert')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Bjoern Doebel <doebel@amazon.de>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
3 years agovPCI: fix MSI-X PBA read/write gprintk()s
Jan Beulich [Thu, 7 Apr 2022 16:01:24 +0000 (18:01 +0200)]
vPCI: fix MSI-X PBA read/write gprintk()s

%pp wants the address of an SBDF, not that of a PCI device.

Fixes: b4f211606011 ("vpci/msix: fix PBA accesses")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agobuild: shuffle main Makefile
Anthony PERARD [Thu, 7 Apr 2022 15:58:44 +0000 (17:58 +0200)]
build: shuffle main Makefile

Reorganize a bit the Makefile ahead of patch
"build: adding out-of-tree support to the xen build"

We are going to want to calculate all the $(*srctree) and $(*objtree)
once, when we can calculate them. This can happen within the
"$(root-make-done)" guard, in an out-of-tree build scenario, so move
those variable there.

$(XEN_ROOT) is going to depends on the value of $(abs_srctree) so
needs to move as well. "Kbuild.include" also depends on $(srctree).

Next, "Config.mk" depends on $(XEN_ROOT) and $(TARGET_*ARCH) depends
on "Config.mk" so those needs to move as well.

This should only be code movement without functional changes.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: specify source tree in include/ for prerequisite
Anthony PERARD [Thu, 7 Apr 2022 15:57:44 +0000 (17:57 +0200)]
build: specify source tree in include/ for prerequisite

When doing an out-of-tree build, and thus setting VPATH,
GNU Make 3.81 on Ubuntu Trusty complains about Circular dependency of
include/Makefile and include/xlat.lst and drop them. The build fails
later due to headers malformed.

This might be due to bug #13529
    "Incorrect circular dependancy"
    https://savannah.gnu.org/bugs/?13529
which was fixed in 3.82.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: rework "headers*.chk" prerequisite in include/
Anthony PERARD [Thu, 7 Apr 2022 15:56:53 +0000 (17:56 +0200)]
build: rework "headers*.chk" prerequisite in include/

Listing public headers when out-of-tree build are involved becomes
more annoying where every path to every headers needs to start with
"$(srctree)/$(src)", or $(wildcard ) will not work. This means more
repetition. ( "$(srcdir)" is a shortcut for "$(srctree)/$(src)" )

This patch attempt to reduce the amount of duplication and make better
use of make's meta programming capability. The filters are now listed
in a variable and don't have to repeat the path to the headers files
as this is added later as needed.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: replace $(BASEDIR) and use $(srctree)
Anthony PERARD [Thu, 7 Apr 2022 15:56:00 +0000 (17:56 +0200)]
build: replace $(BASEDIR) and use $(srctree)

$(srctree) is a better description for the source directory than
$(BASEDIR) that has been used for both source and build directory
(which where the same).

This adds $(srctree) to a few path where make's VPATH=$(srctree) won't
apply. And replace $(BASEDIR) by $(srctree).

Introduce "$(srcdir)" as a shortcut for "$(srctree)/$(src)" as the
later is used often enough.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com> # XSM
3 years agobuild: replace $(BASEDIR) by $(objtree)
Anthony PERARD [Thu, 7 Apr 2022 15:54:42 +0000 (17:54 +0200)]
build: replace $(BASEDIR) by $(objtree)

We need to differentiate between source files and generated/built
files. We will be replacing $(BASEDIR) by $(objtree) for files that
are generated.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com>
3 years agox86/cpuid: Clobber CPUID leaves 0x800000{1d..20} in policies
Andrew Cooper [Wed, 6 Apr 2022 21:40:20 +0000 (22:40 +0100)]
x86/cpuid: Clobber CPUID leaves 0x800000{1d..20} in policies

c/s 1a914256dca5 increased the AMD max leaf from 0x8000001c to 0x80000021, but
did not adjust anything in the calculate_*_policy() chain.  As a result, on
hardware supporting these leaves, we read the real hardware values into the
raw policy, then copy into host, and all the way into the PV/HVM default
policies.

All 4 of these leaves have enable bits (first two by TopoExt, next by SEV,
next by PQOS), so any software following the rules is fine and will leave them
alone.  However, leaf 0x8000001d takes a subleaf input and at least two
userspace utilities have been observed to loop indefinitely under Xen (clearly
waiting for eax to report "no more cache levels").

Such userspace is buggy, but Xen's behaviour isn't great either.

In the short term, clobber all information in these leaves.  This is a giant
bodge, but there are complexities with implementing all of these leaves
properly.

Fixes: 1a914256dca5 ("x86/cpuid: support LFENCE always serialising CPUID bit")
Link: https://github.com/QubesOS/qubes-issues/issues/7392
Reported-by: fosslinux <fosslinux@aussies.space>
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoVT-d: avoid infinite recursion on domain_context_mapping_one() error path
Jan Beulich [Thu, 7 Apr 2022 10:31:16 +0000 (12:31 +0200)]
VT-d: avoid infinite recursion on domain_context_mapping_one() error path

Despite the comment there infinite recursion was still possible, by
flip-flopping between two domains. This is because prev_dom is derived
from the DID found in the context entry, which was already updated by
the time error recovery is invoked. Simply introduce yet another mode
flag to prevent rolling back an in-progress roll-back of a prior
mapping attempt.

Also drop the existing recursion prevention for having been dead anyway:
Earlier in the function we already bail when prev_dom == domain.

Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoVT-d: avoid NULL deref on domain_context_mapping_one() error paths
Jan Beulich [Thu, 7 Apr 2022 10:30:19 +0000 (12:30 +0200)]
VT-d: avoid NULL deref on domain_context_mapping_one() error paths

First there's a printk() which actually wrongly uses pdev in the first
place: We want to log the coordinates of the (perhaps fake) device
acted upon, which may not be pdev.

Then it was quite pointless for eb19326a328d ("VT-d: prepare for per-
device quarantine page tables (part I)") to add a domid_t parameter to
domain_context_unmap_one(): It's only used to pass back here via
me_wifi_quirk() -> map_me_phantom_function(). Drop the parameter again.

Finally there's the invocation of domain_context_mapping_one(), which
needs to be passed the correct domain ID. Avoid taking that path when
pdev is NULL and the quarantine state is what would need restoring to.
This means we can't security-support non-PCI-Express devices with RMRRs
(if such exist in practice) any longer; note that as of trhe 1st of the
two commits referenced below assigning them to DomU-s is unsupported
anyway.

Fixes: 8f41e481b485 ("VT-d: re-assign devices directly")
Fixes: 14dd241aad8a ("IOMMU/x86: use per-device page tables for quarantining")
Coverity ID: 1503784
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoVT-d: don't needlessly look up DID
Jan Beulich [Thu, 7 Apr 2022 10:29:03 +0000 (12:29 +0200)]
VT-d: don't needlessly look up DID

If get_iommu_domid() in domain_context_unmap_one() fails, we better
wouldn't clear the context entry in the first place, as we're then unable
to issue the corresponding flush. However, we have no need to look up the
DID in the first place: What needs flushing is very specifically the DID
that was in the context entry before our clearing of it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoplatform/cpufreq: add public defines for CPUFREQ_SHARED_TYPE_
Roger Pau Monné [Thu, 7 Apr 2022 10:25:42 +0000 (12:25 +0200)]
platform/cpufreq: add public defines for CPUFREQ_SHARED_TYPE_

The values set in the shared_type field of xen_processor_performance
have so far relied on Xen and Linux having the same
CPUFREQ_SHARED_TYPE_ defines, as those have never been part of the
public interface.

Formalize by adding the defines for the allowed values in the public
header, while renaming them to use the XEN_CPUPERF_SHARED_TYPE_ prefix
for clarity.

Set the Xen internal defines for CPUFREQ_SHARED_TYPE_ using the newly
introduced XEN_CPUPERF_SHARED_TYPE_ public defines in order to avoid
unnecessary code churn.  While there also drop
CPUFREQ_SHARED_TYPE_NONE as it's unused.

Fixes: 2fa7bee0a0 ('Get ACPI Px from dom0 and choose Px controller')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoEFI: correct indentation in efi_tables()
Jan Beulich [Thu, 7 Apr 2022 06:39:03 +0000 (08:39 +0200)]
EFI: correct indentation in efi_tables()

Eliminate hard tabs. While there also cast to the intended type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86/boot: fold two MOVs into an ADD
Jan Beulich [Thu, 7 Apr 2022 06:37:27 +0000 (08:37 +0200)]
x86/boot: fold two MOVs into an ADD

There's no point going through %ax; the addition can be done directly in
%di.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/boot: fold/replace moves in video handling code
Jan Beulich [Thu, 7 Apr 2022 06:36:02 +0000 (08:36 +0200)]
x86/boot: fold/replace moves in video handling code

Replace (mainly) MOV forms with shorter insns (or sequences thereof).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/boot: fold branches in video handling code
Jan Beulich [Thu, 7 Apr 2022 06:34:58 +0000 (08:34 +0200)]
x86/boot: fold branches in video handling code

Using Jcc to branch around a JMP is necessary only in pre-386 code,
where Jcc is limited to disp8. Use the opposite Jcc directly in two
places. Since it's adjacent, also convert an ORB to TESTB.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/boot: simplify mode_table
Jan Beulich [Thu, 7 Apr 2022 06:34:07 +0000 (08:34 +0200)]
x86/boot: simplify mode_table

There's no point in writing 80x25 text mode information via multiple
insns all storing immediate values. The data can simply be included
first thing in the vga_modes table, allowing the already present
REP MOVSB to take care of everything in one go.

While touching this also correct a related but stale comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/EFI: retrieve EDID
Jan Beulich [Thu, 7 Apr 2022 06:33:09 +0000 (08:33 +0200)]
x86/EFI: retrieve EDID

When booting directly from EFI, obtaining this information from EFI is
the only possible way. And even when booting with a boot loader
interposed, it's more clean not to use legacy BIOS calls for this
purpose. (The downside being that there are no "capabilities" that we
can retrieve the EFI way.)

To achieve this we need to propagate the handle used to obtain the
EFI_GRAPHICS_OUTPUT_PROTOCOL instance for further obtaining an
EFI_EDID_*_PROTOCOL instance, which has been part of the spec since 2.5.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> # Arm, common
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm
3 years agox86/mm: move guest_physmap_{add,remove}_page()
Jan Beulich [Thu, 7 Apr 2022 06:30:36 +0000 (08:30 +0200)]
x86/mm: move guest_physmap_{add,remove}_page()

... to a new file, separating the functions from their HVM-specific
backing ones, themselves only dealing with the non-translated case.

To avoid having a new CONFIG_HVM conditional in there, do away with
the inline placeholder.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: introduce p2m_{add,remove}_page()
Jan Beulich [Thu, 7 Apr 2022 06:29:33 +0000 (08:29 +0200)]
x86/P2M: introduce p2m_{add,remove}_page()

Rename guest_physmap_add_entry() to p2m_add_page(); make
guest_physmap_remove_page() a trivial wrapper around p2m_remove_page().
This way callers can use suitable pairs of functions (previously
violated by hvm/grant_table.c).

In HVM-specific code further avoid going through the guest_physmap_*()
layer, and instead use the two new/renamed functions directly.

Ultimately the goal is to have guest_physmap_...() functions cover all
types of guests, but p2m_...() dealing only with translated ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agox86/P2M: rename p2m_remove_page()
Jan Beulich [Thu, 7 Apr 2022 06:28:38 +0000 (08:28 +0200)]
x86/P2M: rename p2m_remove_page()

This is in preparation to re-using the original name.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
3 years agoIOMMU/x86: use per-device page tables for quarantining
Jan Beulich [Tue, 5 Apr 2022 12:24:18 +0000 (14:24 +0200)]
IOMMU/x86: use per-device page tables for quarantining

Devices with RMRRs / unity mapped regions, due to it being unspecified
how/when these memory regions may be accessed, may not be left
disconnected from the mappings of these regions (as long as it's not
certain that the device has been fully quiesced). Hence even the page
tables used when quarantining such devices need to have mappings of
those regions. This implies installing page tables in the first place
even when not in scratch-page quarantining mode.

This is CVE-2022-26361 / part of XSA-400.

While for the purpose here it would be sufficient to have devices with
RMRRs / unity mapped regions use per-device page tables, extend this to
all devices (in scratch-page quarantining mode). This allows the leaf
pages to be mapped r/w, thus covering also memory writes (rather than
just reads) issued by non-quiescent devices.

Set up quarantine page tables as late as possible, yet early enough to
not encounter failure during de-assign. This means setup generally
happens in assign_device(), while (for now) the one in deassign_device()
is there mainly to be on the safe side.

As to the removal of QUARANTINE_SKIP() from domain_context_unmap_one():
I think this was never really needed there, as the function explicitly
deals with finding a non-present context entry. Leaving it there would
require propagating pgd_maddr into the function (like was done by "VT-d:
prepare for per-device quarantine page tables" for
domain_context_mapping_one()).

In VT-d's DID allocation function don't require the IOMMU lock to be
held anymore: All involved code paths hold pcidevs_lock, so this way we
avoid the need to acquire the IOMMU lock around the new call to
context_set_domain_id().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoAMD/IOMMU: abstract maximum number of page table levels
Jan Beulich [Tue, 5 Apr 2022 12:20:04 +0000 (14:20 +0200)]
AMD/IOMMU: abstract maximum number of page table levels

We will want to use the constant elsewhere.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoIOMMU/x86: drop TLB flushes from quarantine_init() hooks
Jan Beulich [Tue, 5 Apr 2022 12:19:42 +0000 (14:19 +0200)]
IOMMU/x86: drop TLB flushes from quarantine_init() hooks

The page tables just created aren't hooked up yet anywhere, so there's
nothing that could be present in any TLB, and hence nothing to flush.
Dropping this flush is, at least on the VT-d side, a prereq to per-
device domain ID use when quarantining devices, as dom_io isn't going
to be assigned a DID anymore: The warning in get_iommu_did() would
trigger.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoIOMMU/x86: maintain a per-device pseudo domain ID
Jan Beulich [Tue, 5 Apr 2022 12:19:10 +0000 (14:19 +0200)]
IOMMU/x86: maintain a per-device pseudo domain ID

In order to subsequently enable per-device quarantine page tables, we'll
need domain-ID-like identifiers to be inserted in the respective device
(AMD) or context (Intel) table entries alongside the per-device page
table root addresses.

Make use of "real" domain IDs occupying only half of the value range
coverable by domid_t.

Note that in VT-d's iommu_alloc() I didn't want to introduce new memory
leaks in case of error, but existing ones don't get plugged - that'll be
the subject of a later change.

The VT-d changes are slightly asymmetric, but this way we can avoid
assigning pseudo domain IDs to devices which would never be mapped while
still avoiding to add a new parameter to domain_context_unmap().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoVT-d: prepare for per-device quarantine page tables (part II)
Jan Beulich [Tue, 5 Apr 2022 12:18:48 +0000 (14:18 +0200)]
VT-d: prepare for per-device quarantine page tables (part II)

Replace the passing of struct domain * by domid_t in preparation of
per-device quarantine page tables also requiring per-device pseudo
domain IDs, which aren't going to be associated with any struct domain
instances.

No functional change intended (except for slightly adjusted log message
text).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoVT-d: prepare for per-device quarantine page tables (part I)
Jan Beulich [Tue, 5 Apr 2022 12:18:26 +0000 (14:18 +0200)]
VT-d: prepare for per-device quarantine page tables (part I)

Arrange for domain ID and page table root to be passed around, the latter in
particular to domain_pgd_maddr() such that taking it from the per-domain
fields can be overridden.

No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoAMD/IOMMU: re-assign devices directly
Jan Beulich [Tue, 5 Apr 2022 12:18:04 +0000 (14:18 +0200)]
AMD/IOMMU: re-assign devices directly

Devices with unity map ranges, due to it being unspecified how/when
these memory ranges may get accessed, may not be left disconnected from
their unity mappings (as long as it's not certain that the device has
been fully quiesced). Hence rather than tearing down the old root page
table pointer and then establishing the new one, re-assignment needs to
be done in a single step.

This is CVE-2022-26360 / part of XSA-400.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Similarly quarantining scratch-page mode relies on page tables to be
continuously wired up.

To avoid complicating things more than necessary, treat all devices
mostly equally, i.e. regardless of their association with any unity map
ranges.  The main difference is when it comes to updating DTEs, which need
to be atomic when there are unity mappings. Yet atomicity can only be
achieved with CMPXCHG16B, availability of which we can't take for given.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoVT-d: re-assign devices directly
Jan Beulich [Tue, 5 Apr 2022 12:17:42 +0000 (14:17 +0200)]
VT-d: re-assign devices directly

Devices with RMRRs, due to it being unspecified how/when the specified
memory regions may get accessed, may not be left disconnected from their
respective mappings (as long as it's not certain that the device has
been fully quiesced). Hence rather than unmapping the old context and
then mapping the new one, re-assignment needs to be done in a single
step.

This is CVE-2022-26359 / part of XSA-400.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Similarly quarantining scratch-page mode relies on page tables to be
continuously wired up.

To avoid complicating things more than necessary, treat all devices
mostly equally, i.e. regardless of their association with any RMRRs. The
main difference is when it comes to updating context entries, which need
to be atomic when there are RMRRs. Yet atomicity can only be achieved
with CMPXCHG16B, availability of which we can't take for given.

The seemingly complicated choice of non-negative return values for
domain_context_mapping_one() is to limit code churn: This way callers
passing NULL for pdev don't need fiddling with.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoVT-d: drop ownership checking from domain_context_mapping_one()
Jan Beulich [Tue, 5 Apr 2022 12:17:21 +0000 (14:17 +0200)]
VT-d: drop ownership checking from domain_context_mapping_one()

Despite putting in quite a bit of effort it was not possible to
establish why exactly this code exists (beyond possibly sanity
checking). Instead of a subsequent change further complicating this
logic, simply get rid of it.

Take the opportunity and move the respective unmap_vtd_domain_page() out
of the locked region.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoIOMMU/x86: tighten iommu_alloc_pgtable()'s parameter
Jan Beulich [Tue, 5 Apr 2022 12:16:46 +0000 (14:16 +0200)]
IOMMU/x86: tighten iommu_alloc_pgtable()'s parameter

This is to make more obvious that nothing outside of domain_iommu(d)
actually changes or is otherwise needed by the function.

No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: fix add/remove ordering when RMRRs are in use
Jan Beulich [Tue, 5 Apr 2022 12:16:10 +0000 (14:16 +0200)]
VT-d: fix add/remove ordering when RMRRs are in use

In the event that the RMRR mappings are essential for device operation,
they should be established before updating the device's context entry,
while they should be torn down only after the device's context entry was
successfully cleared.

Also switch to %pd in related log messages.

Fixes: fa88cfadf918 ("vt-d: Map RMRR in intel_iommu_add_device() if the device has RMRR")
Fixes: 8b99f4400b69 ("VT-d: fix RMRR related error handling")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: fix (de)assign ordering when RMRRs are in use
Jan Beulich [Tue, 5 Apr 2022 12:15:33 +0000 (14:15 +0200)]
VT-d: fix (de)assign ordering when RMRRs are in use

In the event that the RMRR mappings are essential for device operation,
they should be established before updating the device's context entry,
while they should be torn down only after the device's context entry was
successfully updated.

Also adjust a related log message.

This is CVE-2022-26358 / part of XSA-400.

Fixes: 8b99f4400b69 ("VT-d: fix RMRR related error handling")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoVT-d: correct ordering of operations in cleanup_domid_map()
Jan Beulich [Tue, 5 Apr 2022 12:12:27 +0000 (14:12 +0200)]
VT-d: correct ordering of operations in cleanup_domid_map()

The function may be called without any locks held (leaving aside the
domctl one, which we surely don't want to depend on here), so needs to
play safe wrt other accesses to domid_map[] and domid_bitmap[]. This is
to avoid context_set_domain_id()'s writing of domid_map[] to be reset to
zero right away in the case of it racing the freeing of a DID.

For the interaction with context_set_domain_id() and did_to_domain_id()
see the code comment.

{check_,}cleanup_domid_map() are called with pcidevs_lock held or during
domain cleanup only (and pcidevs_lock is also held around
context_set_domain_id()), i.e. racing calls with the same (dom, iommu)
tuple cannot occur.

domain_iommu_domid(), besides its use by cleanup_domid_map(), has its
result used only to control flushing, and hence a stale result would
only lead to a stray extra flush.

This is CVE-2022-26357 / XSA-399.

Fixes: b9c20c78789f ("VT-d: per-iommu domain-id")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/hap: do not switch on log dirty for VRAM tracking
Roger Pau Monne [Wed, 23 Feb 2022 08:40:40 +0000 (09:40 +0100)]
x86/hap: do not switch on log dirty for VRAM tracking

XEN_DMOP_track_dirty_vram possibly calls into paging_log_dirty_enable
when using HAP mode, and it can interact badly with other ongoing
paging domctls, as XEN_DMOP_track_dirty_vram is not holding the domctl
lock.

This was detected as a result of the following assert triggering when
doing repeated migrations of a HAP HVM domain with a stubdom:

Assertion 'd->arch.paging.log_dirty.allocs == 0' failed at paging.c:198
----[ Xen-4.17-unstable  x86_64  debug=y  Not tainted ]----
CPU:    34
RIP:    e008:[<ffff82d040314b3b>] arch/x86/mm/paging.c#paging_free_log_dirty_bitmap+0x606/0x6
RFLAGS: 0000000000010206   CONTEXT: hypervisor (d0v23)
[...]
Xen call trace:
   [<ffff82d040314b3b>] R arch/x86/mm/paging.c#paging_free_log_dirty_bitmap+0x606/0x63a
   [<ffff82d040279f96>] S xsm/flask/hooks.c#domain_has_perm+0x5a/0x67
   [<ffff82d04031577f>] F paging_domctl+0x251/0xd41
   [<ffff82d04031640c>] F paging_domctl_continuation+0x19d/0x202
   [<ffff82d0403202fa>] F pv_hypercall+0x150/0x2a7
   [<ffff82d0403a729d>] F lstar_enter+0x12d/0x140

Such assert triggered because the stubdom used
XEN_DMOP_track_dirty_vram while dom0 was in the middle of executing
XEN_DOMCTL_SHADOW_OP_OFF, and so log dirty become enabled while
retiring the old structures, thus leading to new entries being
populated in already clear slots.

Fix this by not enabling log dirty for VRAM tracking, similar to what
is done when using shadow instead of HAP. Call
p2m_enable_hardware_log_dirty when enabling VRAM tracking in order to
get some hardware assistance if available. As a side effect the memory
pressure on the p2m pool should go down if only VRAM tracking is
enabled, as the dirty bitmap is no longer allocated.

Note that paging_log_dirty_range (used to get the dirty bitmap for
VRAM tracking) doesn't use the log dirty bitmap, and instead relies on
checking whether each gfn on the range has been switched from
p2m_ram_logdirty to p2m_ram_rw in order to account for dirty pages.

This is CVE-2022-26356 / XSA-397.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/time: use fake read_tsc()
Jan Beulich [Tue, 5 Apr 2022 09:40:58 +0000 (11:40 +0200)]
x86/time: use fake read_tsc()

Go a step further than bed9ae54df44 ("x86/time: switch platform timer
hooks to altcall") did and eliminate the "real" read_tsc() altogether:
It's not used except in pointer comparisons, and hence it looks overall
more safe to simply poison plt_tsc's read_counter hook.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoinclude: move STR() and IS_ALIGNED()
Jan Beulich [Tue, 5 Apr 2022 09:39:12 +0000 (11:39 +0200)]
include: move STR() and IS_ALIGNED()

lib.h is imo a better fit for them than config.h.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/APIC: make connections between seemingly arbitrary numbers
Jan Beulich [Tue, 5 Apr 2022 09:38:04 +0000 (11:38 +0200)]
x86/APIC: make connections between seemingly arbitrary numbers

Making adjustments to arbitrarily chosen values shouldn't require
auditing the code for possible derived numbers - such a change should
be doable in a single place, having an effect on all code depending on
that choice.

For one make the TDCR write actually use APIC_DIVISOR. With the
necessary mask constant introduced, also use that in vLAPIC code. While
introducing the constant, drop APIC_TDR_DIV_TMBASE: The bit has been
undefined in halfway recent SDM and PM versions.

And then introduce a constant tying together the scale used when
converting nanoseconds to bus clocks.

No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/APIC: calibrate against platform timer when possible
Jan Beulich [Tue, 5 Apr 2022 09:36:32 +0000 (11:36 +0200)]
x86/APIC: calibrate against platform timer when possible

Use the original calibration against PIT only when the platform timer
is PIT. This implicitly excludes the "xen_guest" case from using the PIT
logic (init_pit() fails there, and as of 5e73b2594c54 ["x86/time: minor
adjustments to init_pit()"] using_pit also isn't being set too early
anymore), so the respective hack there can be dropped at the same time.
This also reduces calibration time from 100ms to 50ms, albeit this step
is being skipped as of 0731a56c7c72 ("x86/APIC: no need for timer
calibration when using TDT") anyway.

While re-indenting the PIT logic in calibrate_APIC_clock(), besides
adjusting style also switch around the 2nd TSC/TMCCT read pair, to match
the order of the 1st one, yielding more consistent deltas.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agobuild: grab common EFI source files in arch specific dir
Anthony PERARD [Tue, 5 Apr 2022 09:33:29 +0000 (11:33 +0200)]
build: grab common EFI source files in arch specific dir

Rather than preparing the efi source file, we will make the symbolic
link as needed from the build location.

The `ln` command is run every time to allow to update the link in case
the source tree change location.

This patch also introduce "efi-common.mk" which allow to reuse the
common make instructions without having to duplicate them into each
arch.

And now that we have a list of common source file, we can start to
remove the links to the source files on clean.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agotools/firmware: do not add a .note.gnu.property section
Roger Pau Monne [Mon, 4 Apr 2022 10:40:44 +0000 (12:40 +0200)]
tools/firmware: do not add a .note.gnu.property section

Prevent the assembler from creating a .note.gnu.property section on
the output objects, as it's not useful for firmware related binaries,
and breaks the resulting rombios image.

This requires modifying the cc-option Makefile macro so it can test
assembler options (by replacing the usage of the -S flag with -c) and
also stripping the -Wa, prefix if present when checking for the test
output.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/firmware: fix setting of fcf-protection=none
Roger Pau Monne [Mon, 4 Apr 2022 10:40:43 +0000 (12:40 +0200)]
tools/firmware: fix setting of fcf-protection=none

Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the
Makefile doesn't get it propagated to the subdirectories, so instead
set the flag in firmware/Rules.mk, like it's done for other compiler
flags.

Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agolibxl: Re-scope qmp_proxy_spawn.ao usage
Jason Andryuk [Fri, 1 Apr 2022 14:33:10 +0000 (10:33 -0400)]
libxl: Re-scope qmp_proxy_spawn.ao usage

I've observed this failed assertion:
libxl_event.c:2057: libxl__ao_inprogress_gc: Assertion `ao' failed.

AFAICT, this is happening in qmp_proxy_spawn_outcome where
sdss->qmp_proxy_spawn.ao is NULL.

The out label of spawn_stub_launch_dm() calls qmp_proxy_spawn_outcome(),
but it is only in the success path that sdss->qmp_proxy_spawn.ao gets
set to the current ao.

qmp_proxy_spawn_outcome() should instead use sdss->dm.spawn.ao, which is
the already in-use ao when spawn_stub_launch_dm() is called.  The same
is true for spawn_qmp_proxy().

With this, move sdss->qmp_proxy_spawn.ao initialization to
spawn_qmp_proxy() since its use is for libxl__spawn_spawn() and it can
be initialized along with the rest of sdss->qmp_proxy_spawn.

Fixes: 83c845033dc8 ("libxl: use vchan for QMP access with Linux stubdomain")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agolibxl: Don't segfault on soft-reset failure
Jason Andryuk [Fri, 1 Apr 2022 14:32:56 +0000 (10:32 -0400)]
libxl: Don't segfault on soft-reset failure

If domain_soft_reset_cb can't rename the save file, it doesn't call
initiate_domain_create() and calls domcreate_complete().

Skipping initiate_domain_create() means dcs->console_wait is
uninitialized and all 0s.

We have:
  domcreate_complete()
    libxl__xswait_stop()
      libxl__ev_xswatch_deregister().

The uninitialized slotnum 0 is considered valid (-1 is the invalid
sentinel), so the NULL pointer path to passed to xs_unwatch() which
segfaults.

libxl__ev_xswatch_deregister:watch w=0x12bc250 wpath=(null) token=0/0: deregister slotnum=0

Move dcs->console_xswait initialization into the callers of
initiate_domain_create, do_domain_create() and do_domain_soft_reset(),
so it is initialized along with the other dcs state.

Fixes: c57e6ebd8c3e ("(lib)xl: soft reset support")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agolibxl: constify libxl__stubdomain_is_linux
Jason Andryuk [Wed, 30 Mar 2022 18:17:22 +0000 (14:17 -0400)]
libxl: constify libxl__stubdomain_is_linux

libxl__stubdomain_is_linux can take a const pointer, so make the change.

This isn't an issue in-tree, but was found with an OpenXT patch where it
was called with only const libxl_domain_build_info available.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoxl: Fix global pci options
Jason Andryuk [Wed, 30 Mar 2022 18:17:41 +0000 (14:17 -0400)]
xl: Fix global pci options

commit babde47a3fed "introduce a 'passthrough' configuration option to
xl.cfg..." moved the pci list parsing ahead of the global pci option
parsing.  This broke the global pci configuration options since they
need to be set first so that looping over the pci devices assigns their
values.

Move the global pci options ahead of the pci list to restore their
function.

Fixes: babde47a3fed ("introduce a 'passthrough' configuration option to xl.cfg...")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agolivepatch: account for patch offset when applying NOP patch
Jan Beulich [Thu, 31 Mar 2022 08:45:46 +0000 (10:45 +0200)]
livepatch: account for patch offset when applying NOP patch

While not triggered by the trivial xen_nop in-tree patch on
staging/master, that patch exposes a problem on the stable trees, where
all functions have ENDBR inserted. When NOP-ing out a range, we need to
account for this. Handle this right in livepatch_insn_len().

This requires livepatch_insn_len() to be called _after_ ->patch_offset
was set.

Fixes: 6974c75180f1 ("xen/x86: Livepatch: support patching CET-enhanced functions")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agolivepatch: fix typos
Bjoern Doebel [Thu, 31 Mar 2022 08:45:14 +0000 (10:45 +0200)]
livepatch: fix typos

Fix a couple of typos in livepatch code.

Signed-off-by: Bjoern Doebel <doebel@amazon.de>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: generic top-level rule to build individual files
Jan Beulich [Tue, 29 Mar 2022 13:48:15 +0000 (15:48 +0200)]
build: generic top-level rule to build individual files

In particular when cross-compiling or having in place other tool chain
overrides, invoking make to build individual files (e.g. object,
preprocessed, or assembly ones) so far involves putting the various
overrides on the command line instead of simply getting them from
./.config.

Furthermore this helps working around a yet unaddressed make quirk [1]:
Variables put on the command line are invisible to $(shell ...), unless
invoked from a recursive make: During the recursive invocation such
variables are put in the recursive make's environment and hence become
"visible".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
[1] https://savannah.gnu.org/bugs/?10593