]> xenbits.xensource.com Git - people/tklengyel/xen.git/log
people/tklengyel/xen.git
4 years agox86/AMD: expose HWCR.TscFreqSel to guests master 4.15.0-rc3
Jan Beulich [Fri, 12 Mar 2021 11:03:06 +0000 (12:03 +0100)]
x86/AMD: expose HWCR.TscFreqSel to guests

Linux has been warning ("firmware bug") about this bit being clear for a
long time. While writable in older hardware it has been readonly on more
than just most recent hardware. For simplicitly report it always set (if
anything we may want to log the issue ourselves if it turns out to be
clear on older hardware) on CPU families 10h and up (in family 0fh the
bit is part of a larger field of different purpose).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/PV: conditionally avoid raising #GP for early guest MSR reads
Jan Beulich [Fri, 12 Mar 2021 11:02:42 +0000 (12:02 +0100)]
x86/PV: conditionally avoid raising #GP for early guest MSR reads

Prior to 4.15 Linux, when running in PV mode, did not install a #GP
handler early enough to cover for example the rdmsrl_safe() of
MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
of MSR_K7_HWCR later in the same function). The respective change
(42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
backported to 4.14, but no further - presumably since it wasn't really
easy because of other dependencies.

Therefore, to prevent our change in the handling of guest MSR accesses
to render PV Linux 4.13 and older unusable on at least AMD systems, make
the raising of #GP on this paths conditional upon the guest having
installed a handler, provided of course the MSR can be read in the first
place (we would have raised #GP in that case even before). Producing
zero for reads isn't necessarily correct and may trip code trying to
detect presence of MSRs early, but since such detection logic won't work
without a #GP handler anyway, this ought to be a fair workaround.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agognttab: work around "may be used uninitialized" warning
Jan Beulich [Fri, 12 Mar 2021 16:35:54 +0000 (17:35 +0100)]
gnttab: work around "may be used uninitialized" warning

Sadly I was wrong to suggest dropping vaddrs' initializer during review
of v2 of the patch introducing this code. gcc 4.3 can't cope.

Fixes: 52531c734ea1 ("xen/gnttab: Rework resource acquisition")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen: fix for_each_cpu when NR_CPUS=1
Dario Faggioli [Fri, 12 Mar 2021 16:02:47 +0000 (17:02 +0100)]
xen: fix for_each_cpu when NR_CPUS=1

When running an hypervisor build with NR_CPUS=1 for_each_cpu does not
take into account whether the bit of the CPU is set or not in the
provided mask.

This means that whatever we have in the bodies of these loops is always
done once, even if the mask was empty and it should never be done. This
is clearly a bug and was in fact causing an assert to trigger in credit2
code.

Removing the special casing of NR_CPUS == 1 makes things work again.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agovtd: make sure QI/IR are disabled before initialisation
Igor Druzhinin [Fri, 12 Mar 2021 16:01:52 +0000 (17:01 +0100)]
vtd: make sure QI/IR are disabled before initialisation

BIOS might pass control to Xen leaving QI and/or IR in enabled and/or
partially configured state. In case of x2APIC code path where EIM is
enabled early in boot - those are correctly disabled by Xen before any
attempt to configure. But for xAPIC that step is missing which was
proven to cause QI initialization failures on some ICX based platforms
where QI is left pre-enabled and partially configured by BIOS. That
problem becomes hard to avoid since those platforms are shipped with
x2APIC opt out being advertised by default at the same time by firmware.

Unify the behaviour between x2APIC and xAPIC code paths keeping that in
line with what Linux does.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/msr: introduce an option for compatible MSR behavior selection
Roger Pau Monné [Fri, 12 Mar 2021 07:59:56 +0000 (08:59 +0100)]
x86/msr: introduce an option for compatible MSR behavior selection

Introduce an option to allow selecting a behavior similar to the pre
Xen 4.15 one for accesses to MSRs not explicitly handled. Since commit
84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly
handled by Xen result in the injection of a #GP to the guest. This
is a behavior change since previously a #GP was only injected if
accessing the MSR on the real hardware would also trigger a #GP, or if
the attempted to be set bits wouldn't match the hardware values (for
PV). The reasons for not leaking hardware MSR values and injecting a
#GP are fully valid, so the solution proposed here should be
considered a temporary workaround until all the required MSRs are
properly handled.

This seems to be problematic for some guests, so introduce an option
to fallback to this kind of legacy behavior without leaking the
underlying MSR values to the guest.

When the option is set, for both PV and HVM don't inject a #GP to the
guest on MSR read if reading the underlying MSR doesn't result in a
#GP, do the same for writes and simply discard the value to be written
on that case.

Note that for guests restored or migrated from previous Xen versions
the option is enabled by default, in order to keep a compatible
MSR behavior. Such compatibility is done at the libxl layer, to avoid
higher-level toolstacks from having to know the details about this flag.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libs: Fix headers.chk logic
Andrew Cooper [Thu, 4 Mar 2021 22:30:00 +0000 (22:30 +0000)]
tools/libs: Fix headers.chk logic

c/s 4664034cd dropped the $(LIBHEADERSGLOB) dependency for the headers.chk
rule, without replacing it.

As headers.chk uses $^, a typical build looks like:

  andrewcoop@andrewcoop:/local/xen.git$ make -C tools/libs/devicemodel/
  make: Entering directory '/local/xen.git/tools/libs/devicemodel'
  for i in ; do \
      gcc -x c -ansi -Wall -Werror -I/local/xen.git/tools/libs/devicemodel/../../../tools/include \
            -S -o /dev/null $i || exit 1; \
      echo $i; \
  done >headers.chk.new
  mv headers.chk.new headers.chk

i.e. with an empty for loop.

Reinsert a $(LIBHEADERS) dependency, so more than just the $(AUTOINCS) get
checked.

Fixes: 4664034cd ("tools/libs: move official headers to common directory")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/dmop: Strip __XEN_TOOLS__ header guard from public ABI
Andrew Cooper [Thu, 4 Mar 2021 22:30:00 +0000 (22:30 +0000)]
xen/dmop: Strip __XEN_TOOLS__ header guard from public ABI

__XEN_TOOLS__ is really there to separate the unstable from stable hypercalls.
Exactly as with c/s f40e1c52e4, stable interfaces shouldn't contain this
guard.

That change actually broke the build with:

  include/xendevicemodel.h:52:5: error: unknown type name 'ioservid_t'
       ioservid_t *id);
       ^

as libxendevicemodel.h now uses a type it can't see a typedef for.  However,
nothing noticed because the header.chk logic is also broken (fixed
subsequently).

Strip the guard from the public header, and remove compensation from
devicemodel's private.h.  Fix the dmop design doc to discuss both reasons
behind the the ABI design.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libxentoolcore: Fill in LIBHEADERS
Andrew Cooper [Mon, 1 Mar 2021 17:30:00 +0000 (17:30 +0000)]
tools/libxentoolcore: Fill in LIBHEADERS

c/s 4664034cd replaced a glob over include/*.h with an expectation that
LIBHEADER was suitably set for libraries which didn't have a single,
consistently named, header file.

This wasn't true for xentoolcore, which lost xentoolcore_internal.h as a
consequence, and failed an API/ABI check vs 4.14

Fixes: 4664034cd ("tools/libs: move official headers to common directory")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoautomation: allow doing hypervisor only builds
Roger Pau Monne [Wed, 3 Mar 2021 14:33:16 +0000 (15:33 +0100)]
automation: allow doing hypervisor only builds

For things like randconfig there's no need to do a full Xen build, a
hypervisor build only will be much faster and will achieve the same
level of testing, as randconfig only changes the hypervisor build
options.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
4 years agoxen/public: replace typeof() with __typeof__()
Elliott Mitchell [Mon, 8 Mar 2021 13:36:18 +0000 (05:36 -0800)]
xen/public: replace typeof() with __typeof__()

typeof() is available in Xen's build environment, which uses Xen's
compiler.  As these headers are public, they need strict standards
conformance.  Only __typeof__() is officially standardized.

A compiler in standards conformance mode should report:

warning: implicit declaration of function 'typeof' is invalid in C99
[-Wimplicit-function-declaration]

(this has been observed with FreeBSD's kernel build environment)

Based-on-patch-by: Julien Grall <julien@xen.org>, Sun Oct 4 20:33:04 2015 +0100
Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstore: Check the format printf for xprintf() and barf{,_perror}()
Julien Grall [Fri, 5 Mar 2021 12:40:03 +0000 (12:40 +0000)]
tools/xenstore: Check the format printf for xprintf() and barf{,_perror}()

Allow GCC to analyze the format printf for xprintf() and
barf{,_perror}().

Take the opportunity to define __noreturn to make the prototype for
barf{,_perror})() easier to read.

Also document why 'extern' is used for xprintf().

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstore: Consolidate PRINTF_ATTRIBUTE() in utils.h
Julien Grall [Fri, 5 Mar 2021 12:40:02 +0000 (12:40 +0000)]
tools/xenstore: Consolidate PRINTF_ATTRIBUTE() in utils.h

At the moment PRINTF_ATTRIBUTE() is defined in two places:
    - tdb.h: Defined as a NOP
    - talloc.h: Defined as a NOP for GCC older than 3.0 otherwise will
    add the attribute to check the printf format

Xen requires to build with minimum GCC 4.1 and we want to check the
printf format for all the printf-like functions.

Only implement PRINTF_ATTRIBUTE() once in utils.h and drop the
conditional check for GCC < 3.0.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoCHANGELOG.md: Glean some information from SUPPORT.md
Ian Jackson [Tue, 9 Mar 2021 14:39:21 +0000 (14:39 +0000)]
CHANGELOG.md: Glean some information from SUPPORT.md

Signed-off-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoCHANGELOG.md: Add sections for 4.14 (belated) and 4.15 (prospective)
Ian Jackson [Wed, 10 Mar 2021 15:53:31 +0000 (15:53 +0000)]
CHANGELOG.md: Add sections for 4.14 (belated) and 4.15 (prospective)

And update the release technician checklist to mention to edit it.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoMAINTAINERS: Make myself the owner of the changelog
Ian Jackson [Tue, 9 Mar 2021 14:47:58 +0000 (14:47 +0000)]
MAINTAINERS: Make myself the owner of the changelog

Signed-off-by: Ian Jackson <iwj@xenproject.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoautomation: enable rombios build on Alpine
Roger Pau Monne [Fri, 26 Feb 2021 08:59:08 +0000 (09:59 +0100)]
automation: enable rombios build on Alpine

It's now safe to enable the build of rombios on Alpine systems, as
hvmloader already builds fine there.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agolibxl/ACPI: add missing build dependency
Jan Beulich [Tue, 9 Mar 2021 17:04:07 +0000 (18:04 +0100)]
libxl/ACPI: add missing build dependency

Just like all other object files - wherever *.o is mentioned, *.opic
also needs mentioning to yield consistent behavior. Otherwise make may
decide to (re)build the object before recursion into $(ACPI_PATH)/ (to
update $(DSDT_FILES-y) and ssdt_*.h) was actually finished.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoautomation: Fix the Alpine clang builds to use clang
Andrew Cooper [Fri, 26 Feb 2021 11:02:33 +0000 (11:02 +0000)]
automation: Fix the Alpine clang builds to use clang

Looks like a copy&paste error.

Fixes: f6e1d8515d7 ("automation: add alpine linux x86 build jobs")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoautomation/alpine: add g++ to the list of build depends
Roger Pau Monne [Mon, 1 Mar 2021 09:57:14 +0000 (10:57 +0100)]
automation/alpine: add g++ to the list of build depends

clang++ relies on the C++ headers installed by g++, or else a clang
build will hit the following error:

<built-in>:3:10: fatal error: 'cstring' file not found
#include "cstring"
         ^~~~~~~~~
1 error generated.
make[10]: *** [Makefile:120: headers++.chk] Error 1

Link: https://gitlab.alpinelinux.org/alpine/aports/-/issues/12492
Reported-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/shadow: suppress "fast fault path" optimization when running virtualized
Jan Beulich [Mon, 8 Mar 2021 09:41:50 +0000 (10:41 +0100)]
x86/shadow: suppress "fast fault path" optimization when running virtualized

We can't make correctness of our own behavior dependent upon a
hypervisor underneath us correctly telling us the true physical address
with hardware uses. Without knowing this, we can't be certain reserved
bit faults can actually be observed. Therefore, besides evaluating the
number of address bits when deciding whether to use the optimization,
also check whether we're running virtualized ourselves. (Note that since
we may get migrated when running virtualized, the number of address bits
may also change.)

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstored: liveupdate: Increase the maximum number of parameters
Julien Grall [Fri, 5 Mar 2021 12:10:29 +0000 (12:10 +0000)]
tools/xenstored: liveupdate: Increase the maximum number of parameters

The longest possible command line for LiveUpdate is:

  liveupdate -s -t <timeout> -F

This is 5 parameters. However, the maximum is currently specified to 4.
This means the some of the parameters will get ignored.

Update the field max_pars to 5 so and admin can specify the timeout and
force at the same time.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/shadow: suppress "fast fault path" optimization without reserved bits
Jan Beulich [Fri, 5 Mar 2021 12:29:28 +0000 (13:29 +0100)]
x86/shadow: suppress "fast fault path" optimization without reserved bits

When none of the physical address bits in PTEs are reserved, we can't
create any 4k (leaf) PTEs which would trigger reserved bit faults. Hence
the present SHOPT_FAST_FAULT_PATH machinery needs to be suppressed in
this case, which is most easily achieved by never creating any magic
entries.

To compensate a little, eliminate sh_write_p2m_entry_post()'s impact on
such hardware.

While at it, also avoid using an MMIO magic entry when that would
truncate the incoming GFN.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libxendevicemodel: Strip __XEN_TOOLS__ header guard
Andrew Cooper [Thu, 4 Mar 2021 12:44:44 +0000 (12:44 +0000)]
tools/libxendevicemodel: Strip __XEN_TOOLS__ header guard

This is inappropriate for the header file of a standalone library with stable
API and ABI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/dmop: Fix XEN_DMOP_nr_vcpus to actually return data
Andrew Cooper [Thu, 4 Mar 2021 10:36:21 +0000 (10:36 +0000)]
xen/dmop: Fix XEN_DMOP_nr_vcpus to actually return data

The const_op boolean needs clobbering to cause data to be written back to the
caller.

Fixes: c4441ab1f1 ("dmop: Add XEN_DMOP_nr_vcpus")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agofirmware: provide a stand alone set of headers
Roger Pau Monné [Thu, 4 Mar 2021 15:49:00 +0000 (16:49 +0100)]
firmware: provide a stand alone set of headers

The current build of the firmware relies on having 32bit compatible
headers installed in order to build some of the 32bit firmware.
Usually this can be solved by using the -ffreestanding compiler option
which drops the usage of the system headers in favor of a private set
of freestanding headers provided by the compiler itself that are not
tied to libc.

However such option is broken at least in the gcc compiler provided in
Alpine Linux, as the system include path (ie: /usr/include) takes
precedence over the gcc private include path:

#include <...> search starts here:
 /usr/include
 /usr/lib/gcc/x86_64-alpine-linux-musl/10.2.1/include

And the headers in /usr/include are exclusively 64bit.

Since -ffreestanding is currently broken on at least that distro, and
for resilience against future compilers also having the option broken
provide a set of stand alone 32bit headers required for the firmware
build.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agocrypto: adjust rijndaelEncrypt() prototype for gcc11
Jan Beulich [Thu, 4 Mar 2021 15:47:51 +0000 (16:47 +0100)]
crypto: adjust rijndaelEncrypt() prototype for gcc11

The upcoming release complains, not entirely unreasonably:

In file included from rijndael.c:33:
.../xen/include/crypto/rijndael.h:55:53: note: previously declared as 'const unsigned char[]'
   55 | void    rijndaelEncrypt(const unsigned int [], int, const unsigned char [],
      |                                                     ^~~~~~~~~~~~~~~~~~~~~~
rijndael.c:865:8: error: argument 4 of type 'u8[16]' {aka 'unsigned char[16]'} with mismatched bound [-Werror=array-parameter=]
  865 |     u8 ct[16])
      |     ~~~^~~~~~
In file included from rijndael.c:33:
.../xen/include/crypto/rijndael.h:56:13: note: previously declared as 'unsigned char[]'
   56 |             unsigned char []);
      |             ^~~~~~~~~~~~~~~~

Simply declare the correct array dimensions right away. This then allows
compilers to apply checking at call sites, which seems desirable anyway.

For the moment I'm leaving untouched the disagreement between u8/u32
used in the function definition and unsigned {char,int} used in the
declaration, as making this consistent would call for touching further
functions.

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstored: liveupdate: Properly check long transaction
Julien Grall [Wed, 3 Mar 2021 17:05:26 +0000 (17:05 +0000)]
tools/xenstored: liveupdate: Properly check long transaction

As XenStored is single-threaded, conn->ta_start_time will always be
smaller than now. As we substract the latter from the former, it means
a transaction will never be considered long running.

Invert the two operands of the substraction in both lu_reject_reason()
and lu_check_allowed(). In addition to that, the former also needs to
check that conn->ta_start_time is not 0 (i.e the transaction is not
active).

Take the opportunity to document the return condition of
lu_check_allowed().

Fixes: e04e53a5be20 ("tools/xenstore: allow live update only with no transaction active")
Reported-by: Bjoern Doebel <doebel@amazon.de>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstore: Harden xs_domain_is_introduced()
Norbert Manthey [Fri, 26 Feb 2021 14:41:44 +0000 (15:41 +0100)]
tools/xenstore: Harden xs_domain_is_introduced()

The function single_with_domid() may return NULL if something
went wrong (e.g. XenStored returns an error or the connection is
in bad state).

They are unlikely but not impossible, so it would be better to
return an error and allow the caller to handle it gracefully rather
than crashing.

In this case we should treat it as the domain has disappeared (i.e.
return false) as the caller will not likely going to be able to
communicate with XenStored again.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.co.uk>
Reviewed-by: Raphael Ning <raphning@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxenstore: add missing NULL check
Michael Kurth [Fri, 26 Feb 2021 14:41:42 +0000 (15:41 +0100)]
xenstore: add missing NULL check

In case of allocation error, we should not dereference the obtained
NULL pointer.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Michael Kurth <mku@amazon.com>
Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxenstore: handle do_mkdir and do_rm failure
Norbert Manthey [Fri, 26 Feb 2021 14:41:41 +0000 (15:41 +0100)]
xenstore: handle do_mkdir and do_rm failure

In the out of memory case, we might return a NULL pointer when
canonicalizing node names. This NULL pointer is not checked when
creating a directory, or when removing a node. This change handles
the NULL pointer for these two cases.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxenstore: check formats of trace
Norbert Manthey [Fri, 26 Feb 2021 14:41:37 +0000 (15:41 +0100)]
xenstore: check formats of trace

When passing format strings to the trace function, allow gcc to analyze
those and warn on issues.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxenstore: fix print format string
Norbert Manthey [Fri, 26 Feb 2021 14:41:36 +0000 (15:41 +0100)]
xenstore: fix print format string

Use the correct format specifier for unsigned values. Additionally, a
cast was dropped, as the format specifier did not require it anymore.

This was reported by analysis with cppcheck.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxenstore: add missing NULL check
Norbert Manthey [Fri, 26 Feb 2021 14:41:35 +0000 (15:41 +0100)]
xenstore: add missing NULL check

In case of allocation error, we should not dereference the obtained
NULL pointer. Hence, fail early.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/shadow: replace bogus return path in shadow_get_page_from_l1e()
Jan Beulich [Tue, 2 Mar 2021 11:30:30 +0000 (12:30 +0100)]
x86/shadow: replace bogus return path in shadow_get_page_from_l1e()

Prior to be640b1800bb ("x86: make get_page_from_l1e() return a proper
error code") a positive return value did indicate an error. Said commit
failed to adjust this return path, but luckily the only caller has
always been inside a shadow_mode_refcounts() conditional.

Subsequent changes caused 1 to end up at the default (error) label in
the caller's switch() again, but the returning of 1 (== _PAGE_PRESENT)
is still rather confusing here, and a latent risk.

Convert to an ASSERT() instead, just in case any new caller would
appear.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agosched: fix build when NR_CPUS == 1
Jan Beulich [Tue, 2 Mar 2021 11:29:16 +0000 (12:29 +0100)]
sched: fix build when NR_CPUS == 1

In this case the compiler is recognizing that no valid array indexes
remain, and hence e.g. reports:

core.c: In function 'cpu_schedule_up':
core.c:2769:19: error: array subscript 1 is above array bounds
of 'struct vcpu *[1]' [-Werror=array-bounds]
 2769 |     if ( idle_vcpu[cpu] == NULL )
      |          ~~~~~~~~~^~~~~

Reported-by: Connor Davis <connojdavis@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/iommu: x86: Clear the root page-table before freeing the page-tables
Julien Grall [Fri, 26 Feb 2021 10:56:40 +0000 (10:56 +0000)]
xen/iommu: x86: Clear the root page-table before freeing the page-tables

The new per-domain IOMMU page-table allocator will now free the
page-tables when domain's resources are relinquished. However, the
per-domain IOMMU structure will still contain a dangling pointer to
the root page-table.

Xen may access the IOMMU page-tables afterwards at least in the case of
PV domain:

(XEN) Xen call trace:
(XEN)    [<ffff82d04025b4b2>] R iommu.c#addr_to_dma_page_maddr+0x12e/0x1d8
(XEN)    [<ffff82d04025b695>] F iommu.c#intel_iommu_unmap_page+0x5d/0xf8
(XEN)    [<ffff82d0402695f3>] F iommu_unmap+0x9c/0x129
(XEN)    [<ffff82d0402696a6>] F iommu_legacy_unmap+0x26/0x63
(XEN)    [<ffff82d04033c5c7>] F mm.c#cleanup_page_mappings+0x139/0x144
(XEN)    [<ffff82d04033c61d>] F put_page+0x4b/0xb3
(XEN)    [<ffff82d04033c87f>] F put_page_from_l1e+0x136/0x13b
(XEN)    [<ffff82d04033cada>] F devalidate_page+0x256/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d8d6>] F mm.c#put_page_from_l2e+0x8a/0xcf
(XEN)    [<ffff82d04033cc27>] F devalidate_page+0x3a3/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d807>] F mm.c#put_page_from_l3e+0x8a/0xcf
(XEN)    [<ffff82d04033cdf0>] F devalidate_page+0x56c/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d64d>] F mm.c#put_pt_page+0x6f/0x80
(XEN)    [<ffff82d04033d6c7>] F mm.c#put_page_from_l4e+0x69/0x6d
(XEN)    [<ffff82d04033cf24>] F devalidate_page+0x6a0/0x8dc
(XEN)    [<ffff82d04033d396>] F mm.c#_put_page_type+0x236/0x47e
(XEN)    [<ffff82d04033d92e>] F put_page_type_preemptible+0x13/0x15
(XEN)    [<ffff82d04032598a>] F domain.c#relinquish_memory+0x1ff/0x4e9
(XEN)    [<ffff82d0403295f2>] F domain_relinquish_resources+0x2b6/0x36a
(XEN)    [<ffff82d040205cdf>] F domain_kill+0xb8/0x141
(XEN)    [<ffff82d040236cac>] F do_domctl+0xb6f/0x18e5
(XEN)    [<ffff82d04031d098>] F pv_hypercall+0x2f0/0x55f
(XEN)    [<ffff82d04039b432>] F lstar_enter+0x112/0x120

This will result to a use after-free and possibly an host crash or
memory corruption.

It would not be possible to free the page-tables further down in
domain_relinquish_resources() because cleanup_page_mappings() will only
be called when the last reference on the page dropped. This may happen
much later if another domain still hold a reference.

After all the PCI devices have been de-assigned, nobody should use the
IOMMU page-tables and it is therefore pointless to try to modify them.

So we can simply clear any reference to the root page-table in the
per-domain IOMMU structure. This requires to introduce a new callback of
the method will depend on the IOMMU driver used.

Take the opportunity to add an ASSERT() in arch_iommu_domain_destroy()
to check if we freed all the IOMMU page tables.

Fixes: 3eef6d07d722 ("x86/iommu: convert VT-d code to use new page table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying
Julien Grall [Fri, 26 Feb 2021 10:56:39 +0000 (10:56 +0000)]
xen/x86: iommu: Ignore IOMMU mapping requests when a domain is dying

The new x86 IOMMU page-tables allocator will release the pages when
relinquishing the domain resources. However, this is not sufficient
when the domain is dying because nothing prevents page-table to be
allocated.

As the domain is dying, it is not necessary to continue to modify the
IOMMU page-tables as they are going to be destroyed soon.

At the moment, page-table allocates will only happen when iommu_map().
So after this change there will be no more page-table allocation
happening because we don't use superpage mappings yet when not sharing
page tables.

In order to observe d->is_dying correctly, we need to rely on per-arch
locking, so the check to ignore IOMMU mapping is added on the per-driver
map_page() callback.

Fixes: 15bc9a1ef51c ("x86/iommu: add common page-table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled
Julien Grall [Fri, 26 Feb 2021 10:56:38 +0000 (10:56 +0000)]
xen/iommu: x86: Don't try to free page tables is the IOMMU is not enabled

When using CONFIG_BIGMEM=y, the page_list cannot be accessed whilst it
is is unitialized. However, iommu_free_pgtables() will be called even if
the domain is not using an IOMMU.

Consequently, Xen will try to go through the page list and deference a
NULL pointer.

Bail out early if the domain is not using an IOMMU.

Fixes: 15bc9a1ef51c ("x86/iommu: add common page-table allocator")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstored: Avoid dereferencing a NULL pointer if LiveUpdate is failing
Julien Grall [Fri, 26 Feb 2021 18:26:55 +0000 (18:26 +0000)]
tools/xenstored: Avoid dereferencing a NULL pointer if LiveUpdate is failing

In case of failure in do_lu_start(), XenStored will first free lu_start
and then try to dereference it.

This will result to a NULL dereference as the destruction callback will
set lu_start to NULL.

The crash can be avoided by freeing lu_start *after* the reply has been
set.

Fixes: af216a99fb4a ("tools/xenstore: add the basic framework for doing the live update")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoautomation: Annotate that a 32bit libc is no longer a dependency
Andrew Cooper [Thu, 25 Feb 2021 19:35:39 +0000 (19:35 +0000)]
automation: Annotate that a 32bit libc is no longer a dependency

We can't drop the 32bit libc from the existing containers, because they are
used on older Xen branches as well.

However, we can avoid the dependency being propagated into newer conainers
derived from our dockerfiles.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/firmware: Build firmware as -ffreestanding
Andrew Cooper [Thu, 25 Feb 2021 19:15:08 +0000 (19:15 +0000)]
tools/firmware: Build firmware as -ffreestanding

firmware should always have been -ffreestanding, as it doesn't execute in the
host environment.  -ffreestanding implies -fno-builtin, so replace the option.

inttypes.h isn't a freestanding header, but the 32bitbios_support.c only wants
the stdint.h types so switch to the more appropriate include.

This removes the build time dependency on a 32bit libc just to compile the
hvmloader and friends.

Update README and the TravisCI configuration.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/hvmloader: Drop machelf include as well
Andrew Cooper [Thu, 25 Feb 2021 19:13:17 +0000 (19:13 +0000)]
tools/hvmloader: Drop machelf include as well

The logic behind switching to elfstructs applies to sun builds as well.

Fixes: 81b2b328a2 ("hvmloader: use Xen private header for elf structs")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agocirrus-ci: Drop obsolete dependency
Andrew Cooper [Thu, 25 Feb 2021 20:30:49 +0000 (20:30 +0000)]
cirrus-ci: Drop obsolete dependency

markdown as a dependency was dropped in 4.12

Fixes: 5d94433a66 ("cirrus-ci: introduce some basic FreeBSD testing")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoVersion numbers: Set to 4.15-rc
Ian Jackson [Mon, 1 Mar 2021 14:12:05 +0000 (14:12 +0000)]
Version numbers: Set to 4.15-rc

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agoConfig.mk: pin to fixed versions for 4.15-rc
Ian Jackson [Mon, 1 Mar 2021 14:11:43 +0000 (14:11 +0000)]
Config.mk: pin to fixed versions for 4.15-rc

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agodmop: Add XEN_DMOP_nr_vcpus
Andrew Cooper [Thu, 25 Feb 2021 15:46:10 +0000 (15:46 +0000)]
dmop: Add XEN_DMOP_nr_vcpus

Curiously absent from the stable API/ABIs is an ability to query the number of
vcpus which a domain has.  Emulators need to know this information in
particular to know how many stuct ioreq's live in the ioreq server mappings.

In practice, this forces all userspace to link against libxenctrl to use
xc_domain_getinfo(), which rather defeats the purpose of the stable libraries.

Introduce a DMOP to retrieve this information and surface it in
libxendevicemodel to help emulators shed their use of unstable interfaces.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Paul Durrant <paul@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Ian Jackson <iwj@xenproject.org>
For 4.15.  This was a surprise discovery in the massive ABI untangling effort
I'm currently doing for XenServer's new build system.

This is one new read-only op to obtain information which isn't otherwise
available under a stable API/ABI.  As such, its risk for 4.15 is very low,
with a very real quality-of-life improvement for downstreams.

I realise this is technically a new feature and we're long past feature
freeze, but I'm hoping that "really lets some emulators move off the unstable
libraries" is sufficiently convincing argument.

It's not sufficient to let Qemu move off unstable libraries yet - at a
minimum, the add_to_phymap hypercalls need stabilising to support PCI
Passthrough and BAR remapping.

I'd prefer not to duplicate the op handling between ARM and x86, and if this
weren't a release window, I'd submit a prereq patch to dedup the common dmop
handling.  That can wait to 4.16 at this point.  Also, this op ought to work
against x86 PV guests, but fixing that up will also need this rearrangement
into common code, so needs to wait.

4 years agox86/dmop: Properly fail for PV guests
Andrew Cooper [Thu, 25 Feb 2021 16:54:17 +0000 (16:54 +0000)]
x86/dmop: Properly fail for PV guests

The current code has an early exit for PV guests, but it returns 0 having done
nothing.

Fixes: 524a98c2ac5 ("public / x86: introduce __HYPERCALL_dm_op...")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/sched: Add missing memory barrier in vcpu_block()
Julien Grall [Sat, 20 Feb 2021 19:22:34 +0000 (19:22 +0000)]
xen/sched: Add missing memory barrier in vcpu_block()

The comment in vcpu_block() states that the events should be checked
/after/ blocking to avoids wakeup waiting race. However, from a generic
perspective, set_bit() doesn't prevent re-ordering. So the following
could happen:

CPU0  (blocking vCPU A)         |   CPU1 ( unblock vCPU A)
                                |
A <- read local events          |
                                |   set local events
                                |   test_and_clear_bit(_VPF_blocked)
                                |       -> Bail out as the bit if not set
                                |
set_bit(_VFP_blocked)           |
                                |
check A                         |

The variable A will be 0 and therefore the vCPU will be blocked when it
should continue running.

vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the CPU
to read any information about local events before the flag _VPF_blocked
is set.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ash Wilding <ash.j.wilding@gmail.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstore-control: Don't leak buf in live_update_start()
Julien Grall [Thu, 25 Feb 2021 17:08:49 +0000 (17:08 +0000)]
tools/xenstore-control: Don't leak buf in live_update_start()

All the error paths but one will free buf. Cover the remaining path so
buf can't be leaked.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Fixes: 7f97193e6aa8 ("tools/xenstore: add live update command to xenstore-control")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstored: control: Store the save filename in lu_dump_state
Julien Grall [Thu, 25 Feb 2021 16:33:23 +0000 (16:33 +0000)]
tools/xenstored: control: Store the save filename in lu_dump_state

The function lu_close_dump_state() will use talloc_asprintf() without
checking whether the allocation succeeded. In the unlikely case we are
out of memory, we would dereference a NULL pointer.

As we already computed the filename in lu_get_dump_state(), we can store
the name in the lu_dump_state. This is avoiding to deal with memory file
in the close path and also reduce the risk to use the different
filename.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Fixes: c0dc6a3e7c41 ("tools/xenstore: read internal state when doing live upgrade")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstored: Avoid unnecessary talloc_strdup() in do_lu_start()
Julien Grall [Thu, 25 Feb 2021 15:43:04 +0000 (15:43 +0000)]
tools/xenstored: Avoid unnecessary talloc_strdup() in do_lu_start()

At the moment, the return of talloc_strdup() is not checked. This means
we may dereference a NULL pointer if the allocation failed.

However, it is pointless to allocate the memory as send_reply() will
copy the data to a different buffer. So drop the use of talloc_strdup().

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Fixes: af216a99fb4a ("tools/xenstore: add the basic framework for doing the live update")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xenstored: Avoid unnecessary talloc_strdup() in do_control_lu()
Julien Grall [Thu, 25 Feb 2021 15:15:23 +0000 (15:15 +0000)]
tools/xenstored: Avoid unnecessary talloc_strdup() in do_control_lu()

At the moment, the return of talloc_strdup() is not checked. This means
we may dereference a NULL pointer if the allocation failed.

However, it is pointless to allocate the memory as send_reply() will
copy the data to a different buffer. So drop the use of talloc_strdup().

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Fixes: fecab256d474 ("tools/xenstore: add basic live-update command parsing")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoVMX: delay p2m insertion of APIC access page
Jan Beulich [Fri, 26 Feb 2021 09:18:59 +0000 (10:18 +0100)]
VMX: delay p2m insertion of APIC access page

Inserting the mapping at domain creation time leads to a memory leak
when the creation fails later on and the domain uses separate CPU and
IOMMU page tables - the latter requires intermediate page tables to be
allocated, but there's no freeing of them at present in this case. Since
we don't need the p2m insertion to happen this early, avoid the problem
altogether by deferring it until the last possible point. This comes at
the price of not being able to handle an error other than by crashing
the domain.

Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools: Fix typo in xc_vmtrace_set_option comment
Hubert Jasudowicz [Thu, 25 Feb 2021 11:43:07 +0000 (12:43 +0100)]
tools: Fix typo in xc_vmtrace_set_option comment

Signed-off-by: Hubert Jasudowicz <hubert.jasudowicz@cert.pl>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoautomation: Fix containerize to understand the Alpine container
Andrew Cooper [Thu, 25 Feb 2021 14:09:26 +0000 (14:09 +0000)]
automation: Fix containerize to understand the Alpine container

This was missing from the work to add the alpine container.

Fixes: a9afe7768bd ("automation: add alpine linux 3.12 x86 build container")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/PV: use get_unsafe() instead of copy_from_unsafe()
Jan Beulich [Thu, 25 Feb 2021 14:39:09 +0000 (15:39 +0100)]
x86/PV: use get_unsafe() instead of copy_from_unsafe()

The former expands to a single (memory accessing) insn, which the latter
does not guarantee (the __builtin_constant_p() based switch() statement
there is just an optimization). Yet we'd prefer to read consistent PTEs
rather than risking a split read racing with an update done elsewhere.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86: move stac()/clac() from {get,put}_unsafe_asm() ...
Jan Beulich [Thu, 25 Feb 2021 14:38:35 +0000 (15:38 +0100)]
x86: move stac()/clac() from {get,put}_unsafe_asm() ...

... to {get,put}_unsafe_size(). There's no need to have the macros
expanded once per case label in the latter. This also makes the former
well-formed single statements again. No change in generated code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86: rename copy_{from,to}_user() to copy_{from,to}_guest_pv()
Jan Beulich [Thu, 25 Feb 2021 14:37:35 +0000 (15:37 +0100)]
x86: rename copy_{from,to}_user() to copy_{from,to}_guest_pv()

Bring them (back) in line with __copy_{from,to}_guest_pv(). Since it
falls in the same group, also convert clear_user(). Instead of adjusting
__raw_clear_guest(), drop it - it's unused and would require a non-
checking __clear_guest_pv() which we don't have.

Add previously missing __user at some call sites and in the function
declarations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/gdbsx: convert "user" to "guest" accesses
Jan Beulich [Thu, 25 Feb 2021 14:36:54 +0000 (15:36 +0100)]
x86/gdbsx: convert "user" to "guest" accesses

Using copy_{from,to}_user(), this code was assuming to be called only by
PV guests. Use copy_{from,to}_guest() instead, transforming the incoming
structure field into a guest handle (the field should really have been
one in the first place). Also do not transform the debuggee address into
a pointer.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86: rename {get,put}_user() to {get,put}_guest()
Jan Beulich [Thu, 25 Feb 2021 14:36:10 +0000 (15:36 +0100)]
x86: rename {get,put}_user() to {get,put}_guest()

Bring them (back) in line with __{get,put}_guest().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/EFI: suppress GNU ld 2.36'es creation of base relocs
Jan Beulich [Thu, 25 Feb 2021 14:11:58 +0000 (15:11 +0100)]
x86/EFI: suppress GNU ld 2.36'es creation of base relocs

All of the sudden ld creates base relocations itself, for PE
executables - as a result we now have two of them for every entity to
be relocated. While we will likely want to use this down the road, it
doesn't work quite right yet in corner cases, so rather than suppressing
our own way of creating the relocations we need to tell ld to avoid
doing so.

Probe whether --disable-reloc-section (which was introduced by the same
commit making relocation generation the default) is recognized by ld's PE
emulation, and use the option if so. (To limit redundancy, move the first
part of setting EFI_LDFLAGS earlier, and use it already while probing.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86: mirror compat argument translation area for 32-bit PV
Jan Beulich [Thu, 25 Feb 2021 14:10:47 +0000 (15:10 +0100)]
x86: mirror compat argument translation area for 32-bit PV

Now that we guard the entire Xen VA space against speculative abuse
through hypervisor accesses to guest memory, the argument translation
area's VA also needs to live outside this range, at least for 32-bit PV
guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
uniformly.

While this could be conditionalized upon CONFIG_PV32 &&
CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
keeps the code more legible imo.

Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/vgic: Implement write to ISPENDR in vGICv{2, 3}
Julien Grall [Sat, 20 Feb 2021 14:04:12 +0000 (14:04 +0000)]
xen/vgic: Implement write to ISPENDR in vGICv{2, 3}

Currently, Xen will send a data abort to a guest trying to write to the
ISPENDR.

Unfortunately, recent version of Linux (at least 5.9+) will start
writing to the register if the interrupt needs to be re-triggered
(see the callback irq_retrigger). This can happen when a driver (such as
the xgbe network driver on AMD Seattle) re-enable an interrupt:

(XEN) d0v0: vGICD: unhandled word write 0x00000004000000 to ISPENDR44
[...]
[   25.635837] Unhandled fault at 0xffff80001000522c
[...]
[   25.818716]  gic_retrigger+0x2c/0x38
[   25.822361]  irq_startup+0x78/0x138
[   25.825920]  __enable_irq+0x70/0x80
[   25.829478]  enable_irq+0x50/0xa0
[   25.832864]  xgbe_one_poll+0xc8/0xd8
[   25.836509]  net_rx_action+0x110/0x3a8
[   25.840328]  __do_softirq+0x124/0x288
[   25.844061]  irq_exit+0xe0/0xf0
[   25.847272]  __handle_domain_irq+0x68/0xc0
[   25.851442]  gic_handle_irq+0xa8/0xe0
[   25.855171]  el1_irq+0xb0/0x180
[   25.858383]  arch_cpu_idle+0x18/0x28
[   25.862028]  default_idle_call+0x24/0x5c
[   25.866021]  do_idle+0x204/0x278
[   25.869319]  cpu_startup_entry+0x24/0x68
[   25.873313]  rest_init+0xd4/0xe4
[   25.876611]  arch_call_rest_init+0x10/0x1c
[   25.880777]  start_kernel+0x5b8/0x5ec

As a consequence, the OS may become unusable.

Implementing the write part of ISPENDR is somewhat easy. For
virtual interrupt, we only need to inject the interrupt again.

For physical interrupt, we need to be more careful as the de-activation
of the virtual interrupt will be propagated to the physical distributor.
For simplicity, the physical interrupt will be set pending so the
workflow will not differ from a "real" interrupt.

Longer term, we could possible directly activate the physical interrupt
and avoid taking an exception to inject the interrupt to the domain.
(This is the approach taken by the new vGIC based on KVM).

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoelfstructs: add relocation defines for i386
Roger Pau Monné [Wed, 24 Feb 2021 15:31:12 +0000 (16:31 +0100)]
elfstructs: add relocation defines for i386

Those are need by the rombios relocation code in hvmloader. Fixes the
following build error:

32bitbios_support.c: In function 'relocate_32bitbios':
32bitbios_support.c:130:18: error: 'R_386_PC32' undeclared (first use in this function); did you mean 'R_X86_64_PC32'?
             case R_386_PC32:
                  ^~~~~~~~~~
                  R_X86_64_PC32
32bitbios_support.c:130:18: note: each undeclared identifier is reported only once for each function it appears in
32bitbios_support.c:134:18: error: 'R_386_32' undeclared (first use in this function)
             case R_386_32:
                  ^~~~~~~~

Only add the two defines that are actually used, which seems to match
what we do for amd64.

Fixes: 81b2b328a26c1b ('hvmloader: use Xen private header for elf structs')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agohvmloader: use Xen private header for elf structs
Roger Pau Monné [Wed, 24 Feb 2021 11:48:13 +0000 (12:48 +0100)]
hvmloader: use Xen private header for elf structs

Do not use the system provided elf.h, and instead use elfstructs.h
from libelf.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agobuild: remove more absolute paths from dependency tracking files
Jan Beulich [Wed, 24 Feb 2021 11:47:34 +0000 (12:47 +0100)]
build: remove more absolute paths from dependency tracking files

d6b12add90da ("DEPS handling: Remove absolute paths from references to
cwd") took care of massaging the dependencies of the output file, but
for our passing of -MP to the compiler to take effect the same needs to
be done on the "phony" rules that the compiler emits.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agocirrus-ci: introduce some basic FreeBSD testing
Roger Pau Monne [Tue, 23 Feb 2021 15:53:53 +0000 (16:53 +0100)]
cirrus-ci: introduce some basic FreeBSD testing

Cirrus-CI supports FreeBSD natively, so introduce some build tests on
FreeBSD.

The Cirrus-CI requires a Github repository in order to trigger the
tests.

A sample run output can be seen at:

https://github.com/royger/xen/runs/1962451343

Note the FreeBSD 11 task fails to build QEMU and is not part of this
patch.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/misc/xen-vmtrace: use reset and enable
Tamas K Lengyel [Sat, 20 Feb 2021 21:07:38 +0000 (16:07 -0500)]
tools/misc/xen-vmtrace: use reset and enable

The expected behavior while using xen-vmtrace is to get a clean start, even if
the tool was used previously on the same VM.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libs/light: fix xl save -c handling
Juergen Gross [Fri, 19 Feb 2021 14:13:37 +0000 (15:13 +0100)]
tools/libs/light: fix xl save -c handling

libxl_domain_resume() won't work correctly for the case it was called
due to a "xl save -c" command, i.e. to continue the suspended domain.

The information to do that is not saved in libxl__dm_resume_state for
non-HVM domains.

Fixes: 6298f0eb8f443 ("libxl: Re-introduce libxl__domain_resume")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools/libs: Write out an ABI analysis when abi-dumper is available
Andrew Cooper [Thu, 11 Feb 2021 06:39:03 +0000 (06:39 +0000)]
tools/libs: Write out an ABI analysis when abi-dumper is available

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libs: Add rule to generate headers.lst
Andrew Cooper [Thu, 11 Feb 2021 06:29:31 +0000 (06:29 +0000)]
tools/libs: Add rule to generate headers.lst

abi-dumper needs a list of the public header files for shared objects, and
only accepts this in the form of a file.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools: Check for abi-dumper in ./configure
Andrew Cooper [Fri, 12 Feb 2021 11:51:04 +0000 (11:51 +0000)]
tools: Check for abi-dumper in ./configure

This will be optional.  No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools: Use -Og for debug builds when available
Andrew Cooper [Thu, 11 Feb 2021 14:22:44 +0000 (14:22 +0000)]
tools: Use -Og for debug builds when available

The recommended optimisation level for debugging is -Og, and is what tools
such as gdb prefer.  In practice, it equates to -01 with a few specific
optimisations turned off.

abi-dumper in particular wants the libraries it inspects in this form.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libxl: Work around unintialised variable libxl__domain_get_device_model_uid()
Andrew Cooper [Thu, 11 Feb 2021 15:29:12 +0000 (15:29 +0000)]
tools/libxl: Work around unintialised variable libxl__domain_get_device_model_uid()

Various version of gcc, when compiling with -Og, complain:

  libxl_dm.c: In function 'libxl__domain_get_device_model_uid':
  libxl_dm.c:256:12: error: 'kill_by_uid' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    256 |         if (kill_by_uid)
        |            ^

The logic is very tangled.  Set kill_by_uid on every path.

No funcational change.

Requested-by: Ian Jackson <iwj@xenproject.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Not-acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/hvm: refactor set param
Norbert Manthey [Fri, 19 Feb 2021 16:24:03 +0000 (17:24 +0100)]
x86/hvm: refactor set param

To prevent leaking HVM params via L1TF and similar issues on a
hyperthread pair, let's load values of domains only after performing all
relevant checks, and blocking speculative execution.

For both get and set, the value of the index is already checked in the
outer calling function. The block_speculation calls in hvmop_get_param
and hvmop_set_param are removed, because is_hvm_domain already blocks
speculation.

Furthermore, speculative barriers are re-arranged to make sure we do not
allow guests running on co-located VCPUs to leak hvm parameter values of
other domains.

To improve symmetry between the get and set operations, function
hvmop_set_param is made static.

This is part of the speculative hardening effort.

Reported-by: Hongyan Xia <hongyxia@amazon.co.uk>
Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()
Jan Beulich [Fri, 19 Feb 2021 16:21:41 +0000 (17:21 +0100)]
x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()

While doing this for small amounts may be okay, the unconditional use
of CPU0's value here has been found to be a problem when the boot time
TSC of the BSP was behind that of all APs by more than a second. In
particular because of get_s_time_fixed() producing insane output when
the calculated delta is negative, we can't allow this to happen.

On the first iteration have all other CPUs sort out the highest TSC
value any one of them has read. On the second iteration, if that
maximum is higher than CPU0's, update its recorded value from that
taken in the first iteration. Use the resulting value on the last
iteration to write everyone's TSCs.

To account for the possible discontinuity, have
time_calibration_rendezvous_tail() record the newly written value, but
extrapolate local stime using the value read.

Reported-by: Claudemir Todo Bom <claudemir@todobom.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/time: adjust time recording in time_calibration_tsc_rendezvous()
Jan Beulich [Fri, 19 Feb 2021 16:21:12 +0000 (17:21 +0100)]
x86/time: adjust time recording in time_calibration_tsc_rendezvous()

The (stime,tsc) tuple is the basis for extrapolation by get_s_time().
Therefore the two better get taken as close to one another as possible.
This means two things: First, reading platform time is too early when
done on the first iteration. The closest we can get is on the last
iteration, immediately before telling other CPUs to write their TSCs
(and then also writing CPU0's). While at the first glance it may seem
not overly relevant when exactly platform time is read (when assuming
that only stime is ever relevant anywhere, and hence the association
with the precise TSC values is of lower interest), both CPU frequency
changes and the effects of SMT make it unpredictable (between individual
rendezvous instances) how long the loop iterations will take. This will
in turn lead to higher an error than neccesary in how close to linear
stime movement we can get.

Second, re-reading the TSC for local recording is increasing the overall
error as well, when we already know a more precise value - the one just
written.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/time: change initiation of the calibration timer
Jan Beulich [Fri, 19 Feb 2021 16:20:46 +0000 (17:20 +0100)]
x86/time: change initiation of the calibration timer

Setting the timer a second (EPOCH) into the future at a random point
during boot (prior to bringing up APs and prior to launching Dom0) does
not yield predictable results: The timer may expire while we're still
bringing up APs (too early) or when Dom0 already boots (too late).
Instead invoke the timer handler function explicitly at a predictable
point in time, once we've established the rendezvous function to use
(and hence also once all APs are online). This will, through the raising
and handling of TIMER_SOFTIRQ, then also have the effect of arming the
timer.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/PV: harden guest memory accesses against speculative abuse
Jan Beulich [Fri, 19 Feb 2021 16:19:56 +0000 (17:19 +0100)]
x86/PV: harden guest memory accesses against speculative abuse

Inspired by
https://lore.kernel.org/lkml/f12e7d3cecf41b2c29734ea45a393be21d4a8058.1597848273.git.jpoimboe@redhat.com/
and prior work in that area of x86 Linux, suppress speculation with
guest specified pointer values by suitably masking the addresses to
non-canonical space in case they fall into Xen's virtual address range.

Introduce a new Kconfig control.

Note that it is necessary in such code to avoid using "m" kind operands:
If we didn't, there would be no guarantee that the register passed to
guest_access_mask_ptr is also the (base) one used for the memory access.

As a minor unrelated change in get_unsafe_asm() the unnecessary "itype"
parameter gets dropped and the XOR on the fixup path gets changed to be
a 32-bit one in all cases: This way we avoid pointless REX.W or operand
size overrides, or writes to partial registers.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86: split __copy_{from,to}_user() into "guest" and "unsafe" variants
Jan Beulich [Fri, 19 Feb 2021 16:19:19 +0000 (17:19 +0100)]
x86: split __copy_{from,to}_user() into "guest" and "unsafe" variants

The "guest" variants are intended to work with (potentially) fully guest
controlled addresses, while the "unsafe" variants are intended to be
used in order to access addresses not (directly) under guest control,
within Xen's part of virtual address space. Subsequently we will want
them to have distinct behavior, so as first step identify which one is
which. For now, both groups of constructs alias one another.

Double underscore prefixes are retained only on
__copy_{from,to}_guest_pv(), to allow still distinguishing them from
their "checking" counterparts once they also get renamed (to
copy_{from,to}_guest_pv()).

Add previously missing __user at some call sites.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org> [shadow]
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86: split __{get,put}_user() into "guest" and "unsafe" variants
Jan Beulich [Fri, 19 Feb 2021 16:18:27 +0000 (17:18 +0100)]
x86: split __{get,put}_user() into "guest" and "unsafe" variants

The "guest" variants are intended to work with (potentially) fully guest
controlled addresses, while the "unsafe" variants are intended to be
used in order to access addresses not (directly) under guest control,
within Xen's part of virtual address space. (For linear page table and
descriptor table accesses the low bits of the addresses may still be
guest controlled, but this still won't allow speculation to "escape"
into unwanted areas.) Subsequently we will want them to have distinct
behavior, so as first step identify which one is which. For now, both
groups of constructs alias one another.

Double underscore prefixes are retained only on __{get,put}_guest(), to
allow still distinguishing them from their "checking" counterparts once
they also get renamed (to {get,put}_guest()).

Since for them it's almost a full re-write, move what becomes
{get,put}_unsafe_size() into the "common" uaccess.h (x86_64/*.h should
disappear at some point anyway).

In __copy_to_user() one of the two casts in each put_guest_size()
invocation gets dropped. They're not needed and did break symmetry with
__copy_from_user().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org> [shadow]
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/arm : smmuv3: Fix to handle multiple StreamIds per device.
Rahul Singh [Wed, 17 Feb 2021 10:05:14 +0000 (10:05 +0000)]
xen/arm : smmuv3: Fix to handle multiple StreamIds per device.

SMMUv3 driver does not handle multiple StreamId if the master device
supports more than one StreamID.

This bug was introduced when the driver was ported from Linux to XEN.
dt_device_set_protected(..) should be called from add_device(..) not
from the dt_xlate(..).

Move dt_device_set_protected(..) from dt_xlate(..) to add_device().

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agognttab: bypass IOMMU (un)mapping when a domain is (un)mapping its own grant
Jan Beulich [Thu, 18 Feb 2021 12:16:59 +0000 (13:16 +0100)]
gnttab: bypass IOMMU (un)mapping when a domain is (un)mapping its own grant

Mappings for a domain's own pages should already be present in the
IOMMU. While installing the same mapping again is merely redundant (and
inefficient), removing the mapping when the grant mapping gets removed
is outright wrong in this case: The mapping was there before the map, so
should remain in place after unmapping.

This affects
- Arm Dom0 in the direct mapped case,
- x86 PV Dom0 in the "iommu=dom0-strict" / "dom0-iommu=strict" case,
- all x86 PV DomU-s, including driver domains.

See the code comment for why it's the original domain and not the page
owner that gets compared against.

Reported-by: Rahul Singh <Rahul.Singh@arm.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agognttab: never permit mapping transitive grants
Jan Beulich [Thu, 18 Feb 2021 12:16:12 +0000 (13:16 +0100)]
gnttab: never permit mapping transitive grants

Transitive grants allow an intermediate domain I to grant a target
domain T access to a page which origin domain O did grant I access to.
As an implementation restriction, T is not allowed to map such a grant.
This restriction is currently tried to be enforced by marking active
entries resulting from transitive grants as is-sub-page; sub-page grants
for obvious reasons don't allow mapping. However, marking (and checking)
only active entries is insufficient, as a map attempt may also occur on
a grant not otherwise in use. When not presently in use (pin count zero)
the grant type itself needs checking. Otherwise T may be able to map an
unrelated page owned by I. This is because the "transitive" sub-
structure of the v2 union would end up being interpreted as "full_page"
sub-structure instead. The low 32 bits of the GFN used would match the
grant reference specified in I's transitive grant entry, while the upper
32 bits could be random (depending on how exactly I sets up its grant
table entries).

Note that if one mapping already exists and the granting domain _then_
changes the grant to GTF_transitive (which the domain is not supposed to
do), the changed type will only be honored after the pin count has gone
back to zero. This is no different from e.g. GTF_readonly or
GTF_sub_page becoming set when a grant is already in use.

While adjusting the implementation, also adjust commentary in the public
header to better reflect reality.

Fixes: 3672ce675c93 ("Transitive grant support")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoIOREQ: refine when to send mapcache invalidation request
Jan Beulich [Thu, 18 Feb 2021 12:11:19 +0000 (13:11 +0100)]
IOREQ: refine when to send mapcache invalidation request

XENMEM_decrease_reservation isn't the only means by which pages can get
removed from a guest, yet all removals ought to be signaled to qemu. Put
setting of the flag into the central p2m_remove_page() underlying all
respective hypercalls as well as a few similar places, mainly in PoD
code.

Additionally there's no point sending the request for the local domain
when the domain acted upon is a different one. The latter domain's ioreq
server mapcaches need invalidating. We assume that domain to be paused
at the point the operation takes place, so sending the request in this
case happens from the hvm_do_resume() path, which as one of its first
steps calls handle_hvm_io_completion().

Even without the remote operation aspect a single domain-wide flag
doesn't do: Guests may e.g. decrease-reservation on multiple vCPU-s in
parallel. Each of them needs to issue an invalidation request in due
course, in particular because exiting to guest context should not happen
before the request was actually seen by (all) the emulator(s).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agostubdom/xenstored: Fix uninitialised variables in lu_read_state()
Andrew Cooper [Thu, 11 Feb 2021 21:10:51 +0000 (21:10 +0000)]
stubdom/xenstored: Fix uninitialised variables in lu_read_state()

Various version of gcc, when compiling with -Og, complain:

  xenstored_control.c: In function ‘lu_read_state’:
  xenstored_control.c:540:11: error: ‘state.size’ is used uninitialized in this
  function [-Werror=uninitialized]
    if (state.size == 0)
        ~~~~~^~~~~
  xenstored_control.c:543:6: error: ‘state.buf’ may be used uninitialized in
  this function [-Werror=maybe-uninitialized]
    pre = state.buf;
    ~~~~^~~~~~~~~~~
  xenstored_control.c:550:23: error: ‘state.buf’ may be used uninitialized in
  this function [-Werror=maybe-uninitialized]
     (void *)head - state.buf < state.size;
                    ~~~~~^~~~
  xenstored_control.c:550:35: error: ‘state.size’ may be used uninitialized in
  this function [-Werror=maybe-uninitialized]
     (void *)head - state.buf < state.size;
                                ~~~~~^~~~~

for the stubdom build.  This is because lu_get_dump_state() is a no-op stub in
MiniOS, and state really is operated on uninitialised.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libxl: Fix uninitialised variable in libxl__write_stub_dmargs()
Andrew Cooper [Thu, 11 Feb 2021 17:44:36 +0000 (17:44 +0000)]
tools/libxl: Fix uninitialised variable in libxl__write_stub_dmargs()

Various version of gcc, when compiling with -Og, complain:

  libxl_dm.c: In function ‘libxl__write_stub_dmargs’:
  libxl_dm.c:2166:16: error: ‘dmargs’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
               rc = libxl__xs_write_checked(gc, t, path, dmargs);
               ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It isn't actually used while uninitialised, but only because of how the
is_linux_stubdom checks line up.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libxg: Drop stale p2m logic from ARM's meminit()
Andrew Cooper [Thu, 11 Feb 2021 17:45:21 +0000 (17:45 +0000)]
tools/libxg: Drop stale p2m logic from ARM's meminit()

Various version of gcc, when compiling with -Og, complain:

  xg_dom_arm.c: In function 'meminit':
  xg_dom_arm.c:420:19: error: 'p2m_size' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    420 |     dom->p2m_size = p2m_size;
        |     ~~~~~~~~~~~~~~^~~~~~~~~~

This is actually entirely stale code since ee21f10d70^..97e34ad22d which
removed the 1:1 identity p2m for translated domains.

Drop the write of d->p2m_size, and the p2m_size local variable.  Reposition
the p2m_size field in struct xc_dom_image and correct some stale
documentation.

This change really ought to have been part of the original cleanup series.

No actual change to how ARM domains are constructed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libxg: Fix uninitialised variable in write_x86_cpu_policy_records()
Andrew Cooper [Thu, 11 Feb 2021 14:25:57 +0000 (14:25 +0000)]
tools/libxg: Fix uninitialised variable in write_x86_cpu_policy_records()

Various version of gcc, when compiling with -Og, complain:

  xg_sr_common_x86.c: In function 'write_x86_cpu_policy_records':
  xg_sr_common_x86.c:92:12: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     92 |     return rc;
        |            ^~

The complaint is legitimate, and can occur with unexpected behaviour of two
related hypercalls in combination with a libc which permits zero-length
malloc()s.

Have an explicit rc = 0 on the success path, and make the MSRs record error
handling consistent with the CPUID record before it.

Fixes: f6b2b8ec53d ("libxc/save: Write X86_{CPUID,MSR}_DATA records")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/xl: Fix exit code for `xl vkbattach`
Andrew Cooper [Thu, 11 Feb 2021 18:49:23 +0000 (18:49 +0000)]
tools/xl: Fix exit code for `xl vkbattach`

Various version of gcc, when compiling with -Og, complain:

  xl_vkb.c: In function 'main_vkbattach':
  xl_vkb.c:79:12: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
     79 |     return rc;
        |            ^~

The dryrun_only path really does leave rc uninitalised.  Introduce a done
label for success paths to use.

Fixes: a15166af7c3 ("xl: add vkb config parser and CLI")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen/iommu: Check if the IOMMU was initialized before tearing down
Julien Grall [Thu, 17 Dec 2020 12:27:21 +0000 (12:27 +0000)]
xen/iommu: Check if the IOMMU was initialized before tearing down

is_iommu_enabled() will return true even if the IOMMU has not been
initialized (e.g. the ops are not set).

In the case of an early failure in arch_domain_init(), the function
iommu_destroy_domain() will be called even if the IOMMU is not
initialized.

This will result to dereference the ops which will be NULL and an host
crash.

Fix the issue by checking that ops has been set before accessing it.

Fixes: 71e617a6b8f6 ("use is_iommu_enabled() where appropriate...")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>
4 years agoxen/page_alloc: Only flush the page to RAM once we know they are scrubbed
Julien Grall [Thu, 21 Jan 2021 10:16:08 +0000 (10:16 +0000)]
xen/page_alloc: Only flush the page to RAM once we know they are scrubbed

At the moment, each page are flushed to RAM just after the allocator
found some free pages. However, this is happening before check if the
page was scrubbed.

As a consequence, on Arm, a guest may be able to access the old content
of the scrubbed pages if it has cache disabled (default at boot) and
the content didn't reach the Point of Coherency.

The flush is now moved after we know the content of the page will not
change. This also has the benefit to reduce the amount of work happening
with the heap_lock held.

This is XSA-364.

Fixes: 307c3be3ccb2 ("mm: Don't scrub pages while holding heap lock in alloc_heap_pages()")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoSUPPORT.md: PV display frontend is unsupported in "backend allocation" mode
Jan Beulich [Tue, 16 Feb 2021 14:31:59 +0000 (15:31 +0100)]
SUPPORT.md: PV display frontend is unsupported in "backend allocation" mode

This wasn't meant to be supported, but wasn't stated this way.

This is XSA-363.

Reported-by: Jan Belich <jbeulich@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/arm: fix gnttab_need_iommu_mapping
Stefano Stabellini [Mon, 8 Feb 2021 18:49:32 +0000 (10:49 -0800)]
xen/arm: fix gnttab_need_iommu_mapping

Commit 91d4eca7add broke gnttab_need_iommu_mapping on ARM.
The offending chunk is:

 #define gnttab_need_iommu_mapping(d)                    \
-    (is_domain_direct_mapped(d) && need_iommu(d))
+    (is_domain_direct_mapped(d) && need_iommu_pt_sync(d))

On ARM we need gnttab_need_iommu_mapping to be true for dom0 when it is
directly mapped and IOMMU is enabled for the domain, like the old check
did, but the new check is always false.

In fact, need_iommu_pt_sync is defined as dom_iommu(d)->need_sync and
need_sync is set as:

    if ( !is_hardware_domain(d) || iommu_hwdom_strict )
        hd->need_sync = !iommu_use_hap_pt(d);

iommu_use_hap_pt(d) means that the page-table used by the IOMMU is the
P2M. It is true on ARM. need_sync means that you have a separate IOMMU
page-table and it needs to be updated for every change. need_sync is set
to false on ARM. Hence, gnttab_need_iommu_mapping(d) is false too,
which is wrong.

As a consequence, when using PV network from a domU on a system where
IOMMU is on from Dom0, I get:

(XEN) smmu: /smmu@fd800000: Unhandled context fault: fsr=0x402, iova=0x8424cb148, fsynr=0xb0001, cb=0
[   68.290307] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK

The fix is to go back to something along the lines of the old
implementation of gnttab_need_iommu_mapping.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Fixes: 91d4eca7add ("mm / iommu: split need_iommu() into has_iommu_pt() and need_iommu_pt_sync()")
Backport: 4.13+

4 years agoxen: workaround missing device_type property in pci/pcie nodes
Stefano Stabellini [Tue, 9 Feb 2021 19:53:34 +0000 (11:53 -0800)]
xen: workaround missing device_type property in pci/pcie nodes

PCI buses differ from default buses in a few important ways, so it is
important to detect them properly. Normally, PCI buses are expected to
have the following property:

    device_type = "pci"

In reality, it is not always the case. To handle PCI bus nodes that
don't have the device_type property, also consider the node name: if the
node name is "pcie" or "pci" then consider the bus as a PCI bus.

This commit is based on the Linux kernel commit
d1ac0002dd29 "of: address: Work around missing device_type property in
pcie nodes".

This fixes Xen boot on RPi4. Some RPi4 kernels have the following node
on their device trees:

&pcie0 {
pci@1,0 {
#address-cells = <3>;
#size-cells = <2>;
ranges;

reg = <0 0 0 0 0>;

usb@1,0 {
reg = <0x10000 0 0 0 0>;
resets = <&reset RASPBERRYPI_FIRMWARE_RESET_ID_USB>;
};
};
};

The pci@1,0 node is a PCI bus. If we parse the node and its children as
a default bus, the reg property under usb@1,0 would have to be
interpreted as an address range mappable by the CPU, which is not the
case and would break.

Link: https://lore.kernel.org/xen-devel/YBmQQ3Tzu++AadKx@mattapan.m5p.com/
[fix style on commit]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Tested-by: Elliott Mitchell <ehem+xen@m5p.com>
Tested-by: Jukka Kaartinen <jukka.kaartinen@unikie.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoautomation: Add Ubuntu Focal builds
Andrew Cooper [Thu, 11 Feb 2021 13:25:58 +0000 (13:25 +0000)]
automation: Add Ubuntu Focal builds

Logical continuation of c/s eb52442d7f "automation: Add Ubuntu:focal
container".

No further changes required.  Everything builds fine.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools/libxl: Document where the magic MAC numbers come from
Andrew Cooper [Wed, 10 Feb 2021 13:51:21 +0000 (13:51 +0000)]
tools/libxl: Document where the magic MAC numbers come from

Matches the comment in the xl-network-configuration manpage.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agox86emul: fix SYSENTER/SYSCALL switching into 64-bit mode
Jan Beulich [Thu, 11 Feb 2021 16:53:10 +0000 (17:53 +0100)]
x86emul: fix SYSENTER/SYSCALL switching into 64-bit mode

When invoked by compat mode, mode_64bit() will be false at the start of
emulation. The logic after complete_insn, however, needs to consider the
mode switched into, in particular to avoid truncating RIP.

Inspired by / paralleling and extending Linux commit 943dea8af21b ("KVM:
x86: Update emulator context mode if SYSENTER xfers to 64-bit mode").

While there, tighten a related assertion in x86_emulate_wrapper() - we
want to be sure to not switch into an impossible mode when the code gets
built for 32-bit only (as is possible for the test harness).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citirix.com>
4 years agotools: rerun autoconf again
Ian Jackson [Wed, 10 Feb 2021 15:30:59 +0000 (15:30 +0000)]
tools: rerun autoconf again

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <iwj@xenproject.org>