Roger Pau Monne [Mon, 1 Mar 2021 09:51:14 +0000 (10:51 +0100)]
automation/alpine: add g++ to the list of build depends
clang++ relies on the C++ headers installed by g++, or else a clang
build will hit the following error:
<built-in>:3:10: fatal error: 'cstring' file not found
#include "cstring"
^~~~~~~~~
1 error generated.
make[10]: *** [Makefile:120: headers++.chk] Error 1
Reported-by: Stefano Stabellini <sstabellini@kernel.org> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
No real risk here from a release PoV, it's just pulling a package
required for the Alpine clang build. Worse that cold happen is that
the Alpine clang build broke, but it's already broken.
Roger Pau Monne [Thu, 25 Feb 2021 11:01:17 +0000 (12:01 +0100)]
firmware: provide a stand alone set of headers
The current build of the firmware relies on having 32bit compatible
headers installed in order to build some of the 32bit firmware.
Usually this can be solved by using the -ffreestanding compiler option
which drops the usage of the system headers in favor of a private set
of freestanding headers provided by the compiler itself that are not
tied to libc.
However such option is broken at least in the gcc compiler provided in
Alpine Linux, as the system include path (ie: /usr/include) takes
precedence over the gcc private include path:
And the headers in /usr/include are exclusively 64bit.
Since -ffreestanding is currently broken on at least that distro, and
for resilience against future compilers also having the option broken
provide a set of stand alone 32bit headers required for the firmware
build.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
---
Using -ffreestanding alone seems fragile, as it's broken on some
distros. Compensate for this by providing our own set of stand alone
headers for the firmware bits. Having the include paths wrongly sorted
can easily make the system headers being picked up instead of the gcc
ones, and then building can randomly fail because the system headers
could be amd64 only (like the musl ones).
I've also seen clang-9 on Debian with the following include paths:
Which also seems slightly dangerous as local comes before the compiler
private path.
IMO using -ffreestanding and our own set of stand alone headers is
more resilient.
Regarding the release risks, the main one would be breaking the build
(as it's currently broken on Alpine). I think there's a very low risk
of this change successfully producing a binary image that's broken,
and hence with enough build testing it should be safe to merge.
---
Changes since v2:
- Add a __P64__ check to stdint.h.
- Reword the comment in Rules.mk.
Andrew Cooper [Thu, 25 Feb 2021 19:15:08 +0000 (19:15 +0000)]
tools/firmware: Build firmware as -ffreestanding
firmware should always have been -ffreestanding, as it doesn't execute in the
host environment. -ffreestanding implies -fno-builtin, so replace the option.
inttypes.h isn't a freestanding header, but the 32bitbios_support.c only wants
the stdint.h types so switch to the more appropriate include.
This removes the build time dependency on a 32bit libc just to compile the
hvmloader and friends.
Update README and the TravisCI configuration.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 25 Feb 2021 19:13:17 +0000 (19:13 +0000)]
tools/hvmloader: Drop machelf include as well
The logic behind switching to elfstructs applies to sun builds as well.
Fixes: 81b2b328a2 ("hvmloader: use Xen private header for elf structs") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 25 Feb 2021 20:30:49 +0000 (20:30 +0000)]
cirrus-ci: Drop obsolete dependency
markdown as a dependency was dropped in 4.12
Fixes: 5d94433a66 ("cirrus-ci: introduce some basic FreeBSD testing") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 25 Feb 2021 15:46:10 +0000 (15:46 +0000)]
dmop: Add XEN_DMOP_nr_vcpus
Curiously absent from the stable API/ABIs is an ability to query the number of
vcpus which a domain has. Emulators need to know this information in
particular to know how many stuct ioreq's live in the ioreq server mappings.
In practice, this forces all userspace to link against libxenctrl to use
xc_domain_getinfo(), which rather defeats the purpose of the stable libraries.
Introduce a DMOP to retrieve this information and surface it in
libxendevicemodel to help emulators shed their use of unstable interfaces.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
--- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> CC: Paul Durrant <paul@xen.org> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Julien Grall <julien@xen.org> CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> CC: Ian Jackson <iwj@xenproject.org>
For 4.15. This was a surprise discovery in the massive ABI untangling effort
I'm currently doing for XenServer's new build system.
This is one new read-only op to obtain information which isn't otherwise
available under a stable API/ABI. As such, its risk for 4.15 is very low,
with a very real quality-of-life improvement for downstreams.
I realise this is technically a new feature and we're long past feature
freeze, but I'm hoping that "really lets some emulators move off the unstable
libraries" is sufficiently convincing argument.
It's not sufficient to let Qemu move off unstable libraries yet - at a
minimum, the add_to_phymap hypercalls need stabilising to support PCI
Passthrough and BAR remapping.
I'd prefer not to duplicate the op handling between ARM and x86, and if this
weren't a release window, I'd submit a prereq patch to dedup the common dmop
handling. That can wait to 4.16 at this point. Also, this op ought to work
against x86 PV guests, but fixing that up will also need this rearrangement
into common code, so needs to wait.
Andrew Cooper [Thu, 25 Feb 2021 16:54:17 +0000 (16:54 +0000)]
x86/dmop: Properly fail for PV guests
The current code has an early exit for PV guests, but it returns 0 having done
nothing.
Fixes: 524a98c2ac5 ("public / x86: introduce __HYPERCALL_dm_op...") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Julien Grall [Sat, 20 Feb 2021 19:22:34 +0000 (19:22 +0000)]
xen/sched: Add missing memory barrier in vcpu_block()
The comment in vcpu_block() states that the events should be checked
/after/ blocking to avoids wakeup waiting race. However, from a generic
perspective, set_bit() doesn't prevent re-ordering. So the following
could happen:
CPU0 (blocking vCPU A) | CPU1 ( unblock vCPU A)
|
A <- read local events |
| set local events
| test_and_clear_bit(_VPF_blocked)
| -> Bail out as the bit if not set
|
set_bit(_VFP_blocked) |
|
check A |
The variable A will be 0 and therefore the vCPU will be blocked when it
should continue running.
vcpu_block() is now gaining an smp_mb__after_atomic() to prevent the CPU
to read any information about local events before the flag _VPF_blocked
is set.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ash Wilding <ash.j.wilding@gmail.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Dario Faggioli <dfaggioli@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Julien Grall [Thu, 25 Feb 2021 16:33:23 +0000 (16:33 +0000)]
tools/xenstored: control: Store the save filename in lu_dump_state
The function lu_close_dump_state() will use talloc_asprintf() without
checking whether the allocation succeeded. In the unlikely case we are
out of memory, we would dereference a NULL pointer.
As we already computed the filename in lu_get_dump_state(), we can store
the name in the lu_dump_state. This is avoiding to deal with memory file
in the close path and also reduce the risk to use the different
filename.
This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.
Fixes: c0dc6a3e7c41 ("tools/xenstore: read internal state when doing live upgrade") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Julien Grall [Thu, 25 Feb 2021 15:43:04 +0000 (15:43 +0000)]
tools/xenstored: Avoid unnecessary talloc_strdup() in do_lu_start()
At the moment, the return of talloc_strdup() is not checked. This means
we may dereference a NULL pointer if the allocation failed.
However, it is pointless to allocate the memory as send_reply() will
copy the data to a different buffer. So drop the use of talloc_strdup().
This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.
Fixes: af216a99fb4a ("tools/xenstore: add the basic framework for doing the live update") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Fri, 26 Feb 2021 09:18:59 +0000 (10:18 +0100)]
VMX: delay p2m insertion of APIC access page
Inserting the mapping at domain creation time leads to a memory leak
when the creation fails later on and the domain uses separate CPU and
IOMMU page tables - the latter requires intermediate page tables to be
allocated, but there's no freeing of them at present in this case. Since
we don't need the p2m insertion to happen this early, avoid the problem
altogether by deferring it until the last possible point. This comes at
the price of not being able to handle an error other than by crashing
the domain.
Reported-by: Julien Grall <julien@xen.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 25 Feb 2021 14:09:26 +0000 (14:09 +0000)]
automation: Fix containerize to understand the Alpine container
This was missing from the work to add the alpine container.
Fixes: a9afe7768bd ("automation: add alpine linux 3.12 x86 build container") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 25 Feb 2021 14:39:09 +0000 (15:39 +0100)]
x86/PV: use get_unsafe() instead of copy_from_unsafe()
The former expands to a single (memory accessing) insn, which the latter
does not guarantee (the __builtin_constant_p() based switch() statement
there is just an optimization). Yet we'd prefer to read consistent PTEs
rather than risking a split read racing with an update done elsewhere.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 25 Feb 2021 14:38:35 +0000 (15:38 +0100)]
x86: move stac()/clac() from {get,put}_unsafe_asm() ...
... to {get,put}_unsafe_size(). There's no need to have the macros
expanded once per case label in the latter. This also makes the former
well-formed single statements again. No change in generated code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 25 Feb 2021 14:37:35 +0000 (15:37 +0100)]
x86: rename copy_{from,to}_user() to copy_{from,to}_guest_pv()
Bring them (back) in line with __copy_{from,to}_guest_pv(). Since it
falls in the same group, also convert clear_user(). Instead of adjusting
__raw_clear_guest(), drop it - it's unused and would require a non-
checking __clear_guest_pv() which we don't have.
Add previously missing __user at some call sites and in the function
declarations.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 25 Feb 2021 14:36:54 +0000 (15:36 +0100)]
x86/gdbsx: convert "user" to "guest" accesses
Using copy_{from,to}_user(), this code was assuming to be called only by
PV guests. Use copy_{from,to}_guest() instead, transforming the incoming
structure field into a guest handle (the field should really have been
one in the first place). Also do not transform the debuggee address into
a pointer.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 25 Feb 2021 14:11:58 +0000 (15:11 +0100)]
x86/EFI: suppress GNU ld 2.36'es creation of base relocs
All of the sudden ld creates base relocations itself, for PE
executables - as a result we now have two of them for every entity to
be relocated. While we will likely want to use this down the road, it
doesn't work quite right yet in corner cases, so rather than suppressing
our own way of creating the relocations we need to tell ld to avoid
doing so.
Probe whether --disable-reloc-section (which was introduced by the same
commit making relocation generation the default) is recognized by ld's PE
emulation, and use the option if so. (To limit redundancy, move the first
part of setting EFI_LDFLAGS earlier, and use it already while probing.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 25 Feb 2021 14:10:47 +0000 (15:10 +0100)]
x86: mirror compat argument translation area for 32-bit PV
Now that we guard the entire Xen VA space against speculative abuse
through hypervisor accesses to guest memory, the argument translation
area's VA also needs to live outside this range, at least for 32-bit PV
guests. To avoid extra is_hvm_*() conditionals, use the alternative VA
uniformly.
While this could be conditionalized upon CONFIG_PV32 &&
CONFIG_SPECULATIVE_HARDEN_GUEST_ACCESS, omitting such extra conditionals
keeps the code more legible imo.
Fixes: 4dc181599142 ("x86/PV: harden guest memory accesses against speculative abuse") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Julien Grall [Sat, 20 Feb 2021 14:04:12 +0000 (14:04 +0000)]
xen/vgic: Implement write to ISPENDR in vGICv{2, 3}
Currently, Xen will send a data abort to a guest trying to write to the
ISPENDR.
Unfortunately, recent version of Linux (at least 5.9+) will start
writing to the register if the interrupt needs to be re-triggered
(see the callback irq_retrigger). This can happen when a driver (such as
the xgbe network driver on AMD Seattle) re-enable an interrupt:
Implementing the write part of ISPENDR is somewhat easy. For
virtual interrupt, we only need to inject the interrupt again.
For physical interrupt, we need to be more careful as the de-activation
of the virtual interrupt will be propagated to the physical distributor.
For simplicity, the physical interrupt will be set pending so the
workflow will not differ from a "real" interrupt.
Longer term, we could possible directly activate the physical interrupt
and avoid taking an exception to inject the interrupt to the domain.
(This is the approach taken by the new vGIC based on KVM).
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Roger Pau Monné [Wed, 24 Feb 2021 15:31:12 +0000 (16:31 +0100)]
elfstructs: add relocation defines for i386
Those are need by the rombios relocation code in hvmloader. Fixes the
following build error:
32bitbios_support.c: In function 'relocate_32bitbios':
32bitbios_support.c:130:18: error: 'R_386_PC32' undeclared (first use in this function); did you mean 'R_X86_64_PC32'?
case R_386_PC32:
^~~~~~~~~~
R_X86_64_PC32
32bitbios_support.c:130:18: note: each undeclared identifier is reported only once for each function it appears in
32bitbios_support.c:134:18: error: 'R_386_32' undeclared (first use in this function)
case R_386_32:
^~~~~~~~
Only add the two defines that are actually used, which seems to match
what we do for amd64.
Fixes: 81b2b328a26c1b ('hvmloader: use Xen private header for elf structs') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Roger Pau Monné [Wed, 24 Feb 2021 11:48:13 +0000 (12:48 +0100)]
hvmloader: use Xen private header for elf structs
Do not use the system provided elf.h, and instead use elfstructs.h
from libelf.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Wed, 24 Feb 2021 11:47:34 +0000 (12:47 +0100)]
build: remove more absolute paths from dependency tracking files
d6b12add90da ("DEPS handling: Remove absolute paths from references to
cwd") took care of massaging the dependencies of the output file, but
for our passing of -MP to the compiler to take effect the same needs to
be done on the "phony" rules that the compiler emits.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Note the FreeBSD 11 task fails to build QEMU and is not part of this
patch.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 11 Feb 2021 06:39:03 +0000 (06:39 +0000)]
tools/libs: Write out an ABI analysis when abi-dumper is available
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 11 Feb 2021 06:29:31 +0000 (06:29 +0000)]
tools/libs: Add rule to generate headers.lst
abi-dumper needs a list of the public header files for shared objects, and
only accepts this in the form of a file.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Fri, 12 Feb 2021 11:51:04 +0000 (11:51 +0000)]
tools: Check for abi-dumper in ./configure
This will be optional. No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 11 Feb 2021 14:22:44 +0000 (14:22 +0000)]
tools: Use -Og for debug builds when available
The recommended optimisation level for debugging is -Og, and is what tools
such as gdb prefer. In practice, it equates to -01 with a few specific
optimisations turned off.
abi-dumper in particular wants the libraries it inspects in this form.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 11 Feb 2021 15:29:12 +0000 (15:29 +0000)]
tools/libxl: Work around unintialised variable libxl__domain_get_device_model_uid()
Various version of gcc, when compiling with -Og, complain:
libxl_dm.c: In function 'libxl__domain_get_device_model_uid':
libxl_dm.c:256:12: error: 'kill_by_uid' may be used uninitialized in this function [-Werror=maybe-uninitialized]
256 | if (kill_by_uid)
| ^
The logic is very tangled. Set kill_by_uid on every path.
No funcational change.
Requested-by: Ian Jackson <iwj@xenproject.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Not-acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Norbert Manthey [Fri, 19 Feb 2021 16:24:03 +0000 (17:24 +0100)]
x86/hvm: refactor set param
To prevent leaking HVM params via L1TF and similar issues on a
hyperthread pair, let's load values of domains only after performing all
relevant checks, and blocking speculative execution.
For both get and set, the value of the index is already checked in the
outer calling function. The block_speculation calls in hvmop_get_param
and hvmop_set_param are removed, because is_hvm_domain already blocks
speculation.
Furthermore, speculative barriers are re-arranged to make sure we do not
allow guests running on co-located VCPUs to leak hvm parameter values of
other domains.
To improve symmetry between the get and set operations, function
hvmop_set_param is made static.
This is part of the speculative hardening effort.
Reported-by: Hongyan Xia <hongyxia@amazon.co.uk> Signed-off-by: Norbert Manthey <nmanthey@amazon.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Fri, 19 Feb 2021 16:21:41 +0000 (17:21 +0100)]
x86/time: don't move TSC backwards in time_calibration_tsc_rendezvous()
While doing this for small amounts may be okay, the unconditional use
of CPU0's value here has been found to be a problem when the boot time
TSC of the BSP was behind that of all APs by more than a second. In
particular because of get_s_time_fixed() producing insane output when
the calculated delta is negative, we can't allow this to happen.
On the first iteration have all other CPUs sort out the highest TSC
value any one of them has read. On the second iteration, if that
maximum is higher than CPU0's, update its recorded value from that
taken in the first iteration. Use the resulting value on the last
iteration to write everyone's TSCs.
To account for the possible discontinuity, have
time_calibration_rendezvous_tail() record the newly written value, but
extrapolate local stime using the value read.
Reported-by: Claudemir Todo Bom <claudemir@todobom.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Fri, 19 Feb 2021 16:21:12 +0000 (17:21 +0100)]
x86/time: adjust time recording in time_calibration_tsc_rendezvous()
The (stime,tsc) tuple is the basis for extrapolation by get_s_time().
Therefore the two better get taken as close to one another as possible.
This means two things: First, reading platform time is too early when
done on the first iteration. The closest we can get is on the last
iteration, immediately before telling other CPUs to write their TSCs
(and then also writing CPU0's). While at the first glance it may seem
not overly relevant when exactly platform time is read (when assuming
that only stime is ever relevant anywhere, and hence the association
with the precise TSC values is of lower interest), both CPU frequency
changes and the effects of SMT make it unpredictable (between individual
rendezvous instances) how long the loop iterations will take. This will
in turn lead to higher an error than neccesary in how close to linear
stime movement we can get.
Second, re-reading the TSC for local recording is increasing the overall
error as well, when we already know a more precise value - the one just
written.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Fri, 19 Feb 2021 16:20:46 +0000 (17:20 +0100)]
x86/time: change initiation of the calibration timer
Setting the timer a second (EPOCH) into the future at a random point
during boot (prior to bringing up APs and prior to launching Dom0) does
not yield predictable results: The timer may expire while we're still
bringing up APs (too early) or when Dom0 already boots (too late).
Instead invoke the timer handler function explicitly at a predictable
point in time, once we've established the rendezvous function to use
(and hence also once all APs are online). This will, through the raising
and handling of TIMER_SOFTIRQ, then also have the effect of arming the
timer.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Fri, 19 Feb 2021 16:19:56 +0000 (17:19 +0100)]
x86/PV: harden guest memory accesses against speculative abuse
Inspired by
https://lore.kernel.org/lkml/f12e7d3cecf41b2c29734ea45a393be21d4a8058.1597848273.git.jpoimboe@redhat.com/
and prior work in that area of x86 Linux, suppress speculation with
guest specified pointer values by suitably masking the addresses to
non-canonical space in case they fall into Xen's virtual address range.
Introduce a new Kconfig control.
Note that it is necessary in such code to avoid using "m" kind operands:
If we didn't, there would be no guarantee that the register passed to
guest_access_mask_ptr is also the (base) one used for the memory access.
As a minor unrelated change in get_unsafe_asm() the unnecessary "itype"
parameter gets dropped and the XOR on the fixup path gets changed to be
a 32-bit one in all cases: This way we avoid pointless REX.W or operand
size overrides, or writes to partial registers.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Fri, 19 Feb 2021 16:19:19 +0000 (17:19 +0100)]
x86: split __copy_{from,to}_user() into "guest" and "unsafe" variants
The "guest" variants are intended to work with (potentially) fully guest
controlled addresses, while the "unsafe" variants are intended to be
used in order to access addresses not (directly) under guest control,
within Xen's part of virtual address space. Subsequently we will want
them to have distinct behavior, so as first step identify which one is
which. For now, both groups of constructs alias one another.
Double underscore prefixes are retained only on
__copy_{from,to}_guest_pv(), to allow still distinguishing them from
their "checking" counterparts once they also get renamed (to
copy_{from,to}_guest_pv()).
Add previously missing __user at some call sites.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> [shadow] Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Fri, 19 Feb 2021 16:18:27 +0000 (17:18 +0100)]
x86: split __{get,put}_user() into "guest" and "unsafe" variants
The "guest" variants are intended to work with (potentially) fully guest
controlled addresses, while the "unsafe" variants are intended to be
used in order to access addresses not (directly) under guest control,
within Xen's part of virtual address space. (For linear page table and
descriptor table accesses the low bits of the addresses may still be
guest controlled, but this still won't allow speculation to "escape"
into unwanted areas.) Subsequently we will want them to have distinct
behavior, so as first step identify which one is which. For now, both
groups of constructs alias one another.
Double underscore prefixes are retained only on __{get,put}_guest(), to
allow still distinguishing them from their "checking" counterparts once
they also get renamed (to {get,put}_guest()).
Since for them it's almost a full re-write, move what becomes
{get,put}_unsafe_size() into the "common" uaccess.h (x86_64/*.h should
disappear at some point anyway).
In __copy_to_user() one of the two casts in each put_guest_size()
invocation gets dropped. They're not needed and did break symmetry with
__copy_from_user().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> [shadow] Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Rahul Singh [Wed, 17 Feb 2021 10:05:14 +0000 (10:05 +0000)]
xen/arm : smmuv3: Fix to handle multiple StreamIds per device.
SMMUv3 driver does not handle multiple StreamId if the master device
supports more than one StreamID.
This bug was introduced when the driver was ported from Linux to XEN.
dt_device_set_protected(..) should be called from add_device(..) not
from the dt_xlate(..).
Move dt_device_set_protected(..) from dt_xlate(..) to add_device().
Jan Beulich [Thu, 18 Feb 2021 12:16:59 +0000 (13:16 +0100)]
gnttab: bypass IOMMU (un)mapping when a domain is (un)mapping its own grant
Mappings for a domain's own pages should already be present in the
IOMMU. While installing the same mapping again is merely redundant (and
inefficient), removing the mapping when the grant mapping gets removed
is outright wrong in this case: The mapping was there before the map, so
should remain in place after unmapping.
This affects
- Arm Dom0 in the direct mapped case,
- x86 PV Dom0 in the "iommu=dom0-strict" / "dom0-iommu=strict" case,
- all x86 PV DomU-s, including driver domains.
See the code comment for why it's the original domain and not the page
owner that gets compared against.
Jan Beulich [Thu, 18 Feb 2021 12:16:12 +0000 (13:16 +0100)]
gnttab: never permit mapping transitive grants
Transitive grants allow an intermediate domain I to grant a target
domain T access to a page which origin domain O did grant I access to.
As an implementation restriction, T is not allowed to map such a grant.
This restriction is currently tried to be enforced by marking active
entries resulting from transitive grants as is-sub-page; sub-page grants
for obvious reasons don't allow mapping. However, marking (and checking)
only active entries is insufficient, as a map attempt may also occur on
a grant not otherwise in use. When not presently in use (pin count zero)
the grant type itself needs checking. Otherwise T may be able to map an
unrelated page owned by I. This is because the "transitive" sub-
structure of the v2 union would end up being interpreted as "full_page"
sub-structure instead. The low 32 bits of the GFN used would match the
grant reference specified in I's transitive grant entry, while the upper
32 bits could be random (depending on how exactly I sets up its grant
table entries).
Note that if one mapping already exists and the granting domain _then_
changes the grant to GTF_transitive (which the domain is not supposed to
do), the changed type will only be honored after the pin count has gone
back to zero. This is no different from e.g. GTF_readonly or
GTF_sub_page becoming set when a grant is already in use.
While adjusting the implementation, also adjust commentary in the public
header to better reflect reality.
Fixes: 3672ce675c93 ("Transitive grant support") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Thu, 18 Feb 2021 12:11:19 +0000 (13:11 +0100)]
IOREQ: refine when to send mapcache invalidation request
XENMEM_decrease_reservation isn't the only means by which pages can get
removed from a guest, yet all removals ought to be signaled to qemu. Put
setting of the flag into the central p2m_remove_page() underlying all
respective hypercalls as well as a few similar places, mainly in PoD
code.
Additionally there's no point sending the request for the local domain
when the domain acted upon is a different one. The latter domain's ioreq
server mapcaches need invalidating. We assume that domain to be paused
at the point the operation takes place, so sending the request in this
case happens from the hvm_do_resume() path, which as one of its first
steps calls handle_hvm_io_completion().
Even without the remote operation aspect a single domain-wide flag
doesn't do: Guests may e.g. decrease-reservation on multiple vCPU-s in
parallel. Each of them needs to issue an invalidation request in due
course, in particular because exiting to guest context should not happen
before the request was actually seen by (all) the emulator(s).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Julien Grall <jgrall@amazon.com>
Andrew Cooper [Thu, 11 Feb 2021 21:10:51 +0000 (21:10 +0000)]
stubdom/xenstored: Fix uninitialised variables in lu_read_state()
Various version of gcc, when compiling with -Og, complain:
xenstored_control.c: In function ‘lu_read_state’:
xenstored_control.c:540:11: error: ‘state.size’ is used uninitialized in this
function [-Werror=uninitialized]
if (state.size == 0)
~~~~~^~~~~
xenstored_control.c:543:6: error: ‘state.buf’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
pre = state.buf;
~~~~^~~~~~~~~~~
xenstored_control.c:550:23: error: ‘state.buf’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
(void *)head - state.buf < state.size;
~~~~~^~~~
xenstored_control.c:550:35: error: ‘state.size’ may be used uninitialized in
this function [-Werror=maybe-uninitialized]
(void *)head - state.buf < state.size;
~~~~~^~~~~
for the stubdom build. This is because lu_get_dump_state() is a no-op stub in
MiniOS, and state really is operated on uninitialised.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 11 Feb 2021 17:44:36 +0000 (17:44 +0000)]
tools/libxl: Fix uninitialised variable in libxl__write_stub_dmargs()
Various version of gcc, when compiling with -Og, complain:
libxl_dm.c: In function ‘libxl__write_stub_dmargs’:
libxl_dm.c:2166:16: error: ‘dmargs’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
rc = libxl__xs_write_checked(gc, t, path, dmargs);
~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It isn't actually used while uninitialised, but only because of how the
is_linux_stubdom checks line up.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 11 Feb 2021 17:45:21 +0000 (17:45 +0000)]
tools/libxg: Drop stale p2m logic from ARM's meminit()
Various version of gcc, when compiling with -Og, complain:
xg_dom_arm.c: In function 'meminit':
xg_dom_arm.c:420:19: error: 'p2m_size' may be used uninitialized in this function [-Werror=maybe-uninitialized]
420 | dom->p2m_size = p2m_size;
| ~~~~~~~~~~~~~~^~~~~~~~~~
This is actually entirely stale code since ee21f10d70^..97e34ad22d which
removed the 1:1 identity p2m for translated domains.
Drop the write of d->p2m_size, and the p2m_size local variable. Reposition
the p2m_size field in struct xc_dom_image and correct some stale
documentation.
This change really ought to have been part of the original cleanup series.
No actual change to how ARM domains are constructed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 11 Feb 2021 14:25:57 +0000 (14:25 +0000)]
tools/libxg: Fix uninitialised variable in write_x86_cpu_policy_records()
Various version of gcc, when compiling with -Og, complain:
xg_sr_common_x86.c: In function 'write_x86_cpu_policy_records':
xg_sr_common_x86.c:92:12: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
92 | return rc;
| ^~
The complaint is legitimate, and can occur with unexpected behaviour of two
related hypercalls in combination with a libc which permits zero-length
malloc()s.
Have an explicit rc = 0 on the success path, and make the MSRs record error
handling consistent with the CPUID record before it.
Fixes: f6b2b8ec53d ("libxc/save: Write X86_{CPUID,MSR}_DATA records") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 11 Feb 2021 18:49:23 +0000 (18:49 +0000)]
tools/xl: Fix exit code for `xl vkbattach`
Various version of gcc, when compiling with -Og, complain:
xl_vkb.c: In function 'main_vkbattach':
xl_vkb.c:79:12: error: 'rc' may be used uninitialized in this function [-Werror=maybe-uninitialized]
79 | return rc;
| ^~
The dryrun_only path really does leave rc uninitalised. Introduce a done
label for success paths to use.
Fixes: a15166af7c3 ("xl: add vkb config parser and CLI") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Julien Grall [Thu, 21 Jan 2021 10:16:08 +0000 (10:16 +0000)]
xen/page_alloc: Only flush the page to RAM once we know they are scrubbed
At the moment, each page are flushed to RAM just after the allocator
found some free pages. However, this is happening before check if the
page was scrubbed.
As a consequence, on Arm, a guest may be able to access the old content
of the scrubbed pages if it has cache disabled (default at boot) and
the content didn't reach the Point of Coherency.
The flush is now moved after we know the content of the page will not
change. This also has the benefit to reduce the amount of work happening
with the heap_lock held.
This is XSA-364.
Fixes: 307c3be3ccb2 ("mm: Don't scrub pages while holding heap lock in alloc_heap_pages()") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
On ARM we need gnttab_need_iommu_mapping to be true for dom0 when it is
directly mapped and IOMMU is enabled for the domain, like the old check
did, but the new check is always false.
In fact, need_iommu_pt_sync is defined as dom_iommu(d)->need_sync and
need_sync is set as:
if ( !is_hardware_domain(d) || iommu_hwdom_strict )
hd->need_sync = !iommu_use_hap_pt(d);
iommu_use_hap_pt(d) means that the page-table used by the IOMMU is the
P2M. It is true on ARM. need_sync means that you have a separate IOMMU
page-table and it needs to be updated for every change. need_sync is set
to false on ARM. Hence, gnttab_need_iommu_mapping(d) is false too,
which is wrong.
As a consequence, when using PV network from a domU on a system where
IOMMU is on from Dom0, I get:
(XEN) smmu: /smmu@fd800000: Unhandled context fault: fsr=0x402, iova=0x8424cb148, fsynr=0xb0001, cb=0
[ 68.290307] macb ff0e0000.ethernet eth0: DMA bus error: HRESP not OK
The fix is to go back to something along the lines of the old
implementation of gnttab_need_iommu_mapping.
xen: workaround missing device_type property in pci/pcie nodes
PCI buses differ from default buses in a few important ways, so it is
important to detect them properly. Normally, PCI buses are expected to
have the following property:
device_type = "pci"
In reality, it is not always the case. To handle PCI bus nodes that
don't have the device_type property, also consider the node name: if the
node name is "pcie" or "pci" then consider the bus as a PCI bus.
This commit is based on the Linux kernel commit d1ac0002dd29 "of: address: Work around missing device_type property in
pcie nodes".
This fixes Xen boot on RPi4. Some RPi4 kernels have the following node
on their device trees:
The pci@1,0 node is a PCI bus. If we parse the node and its children as
a default bus, the reg property under usb@1,0 would have to be
interpreted as an address range mappable by the CPU, which is not the
case and would break.
Jan Beulich [Thu, 11 Feb 2021 16:53:10 +0000 (17:53 +0100)]
x86emul: fix SYSENTER/SYSCALL switching into 64-bit mode
When invoked by compat mode, mode_64bit() will be false at the start of
emulation. The logic after complete_insn, however, needs to consider the
mode switched into, in particular to avoid truncating RIP.
Inspired by / paralleling and extending Linux commit 943dea8af21b ("KVM:
x86: Update emulator context mode if SYSENTER xfers to 64-bit mode").
While there, tighten a related assertion in x86_emulate_wrapper() - we
want to be sure to not switch into an impossible mode when the code gets
built for 32-bit only (as is possible for the test harness).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citirix.com>
Andrew Cooper [Tue, 9 Feb 2021 15:28:57 +0000 (15:28 +0000)]
x86/ucode/amd: Fix microcode payload size for Fam19 processors
The original limit provided wasn't accurate. Blobs are in fact rather larger.
Fixes: fe36a173d1 ("x86/amd: Initial support for Fam19h processors") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Tue, 9 Feb 2021 20:49:07 +0000 (20:49 +0000)]
x86/ucode/amd: Handle length sanity check failures more gracefully
Currently, a failure of verify_patch_size() causes an early abort of the
microcode blob loop, which in turn causes a second go around the main
container loop, ultimately failing the UCODE_MAGIC check.
First, check for errors after the blob loop. An error here is unrecoverable,
so avoid going around the container loop again and printing an
unhelpful-at-best error concerning bad UCODE_MAGIC.
Second, split the verify_patch_size() check out of the microcode blob header
check. In the case that the sanity check fails, we can still use the
known-to-be-plausible header length to continue walking the container to
potentially find other applicable microcode blobs.
Before:
(XEN) microcode: Bad microcode data
(XEN) microcode: Wrong microcode patch file magic
(XEN) Parsing microcode blob error -22
After:
(XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa000
(XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa010
(XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa011
(XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa200
(XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa210
(XEN) microcode: Bad microcode length 0x000015c0 for cpu 0xa500
(XEN) microcode: couldn't find any matching ucode in the provided blob!
Fixes: 4de936a38a ("x86/ucode/amd: Rework parsing logic in cpu_request_microcode()") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 9 Feb 2021 22:10:54 +0000 (22:10 +0000)]
x86/ucode/amd: Fix OoB read in cpu_request_microcode()
verify_patch_size() is a maximum size check, and doesn't have a minimum bound.
If the microcode container encodes a blob with a length less than 64 bytes,
the subsequent calls to microcode_fits()/compare_header() may read off the end
of the buffer.
Fixes: 4de936a38a ("x86/ucode/amd: Rework parsing logic in cpu_request_microcode()") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Thu, 4 Feb 2021 09:38:33 +0000 (10:38 +0100)]
autoconf: check endian.h include path
Introduce an autoconf macro to check for the include path of certain
headers that can be different between OSes.
Use such macro to find the correct path for the endian.h header, and
modify the users of endian.h to use the output of such check.
Suggested-by: Ian Jackson <iwj@xenproject.org> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Olaf Hering [Tue, 9 Feb 2021 15:45:35 +0000 (16:45 +0100)]
xl: optionally print timestamps when running xl commands
Add a global option "-T" to xl to enable timestamps in the output from
libxl and libxc. This is most useful with long running commands such
as "migrate".
During 'xl -v.. migrate domU host' a large amount of debug is generated.
It is difficult to map each line to the sending and receiving side.
Also the time spent for migration is not reported.
With 'xl -T migrate domU host' both sides will print timestamps and
also the pid of the invoked xl process to make it more obvious which
side produced a given log line.
Note: depending on the command, xl itself also produces other output
which does not go through libxentoollog. As a result such output will
not have timestamps prepended.
This change adds also the missing "-t" flag to "xl help" output.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Release-Acked-by: Ian Jackson <iwj@xenproject.org> Reviewed-by: Ian Jackson <iwj@xenproject.org>
Olaf Hering [Tue, 9 Feb 2021 15:45:33 +0000 (16:45 +0100)]
tools: move CONFIG_DIR and XEN_CONFIG_DIR in paths.m4
Upcoming changes need to reuse XEN_CONFIG_DIR.
In its current location the assignment happens too late. Move it up
in the file, along with CONFIG_DIR. Their only dependency is
sysconfdir, which may also be adjusted in this file.
No functional change intended.
[autoconf rerun -iwj]
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Ian Jackson [Tue, 9 Feb 2021 17:05:54 +0000 (17:05 +0000)]
tools: Regenerate autoconf
This seems to have been omitted in many recent commits. The earliest
of which are, according to git-bisect: 154137dfdba3 stubdom/configure stubdom: add xenstore pvh stubdom cc83ee4c6c37 all configure scripts NetBSD: Fix lock directory path
but it seems that this is true of several later commits too.
Release status: I consider this discrepancy a release critical bug.
Signed-off-by: Ian Jackson <iwj@xenproject.org> Release-acked-by: Ian Jackson <iwj@xenproject.org>
Olaf Hering [Mon, 11 Jan 2021 17:41:48 +0000 (18:41 +0100)]
docs: remove stale create example from xl.1
Maybe xm create had a feature to create a domU based on a configuration
file. xl create requires the '-f' option to refer to a file.
There is no code to look into XEN_CONFIG_DIR, so remove the example.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Mon, 8 Feb 2021 14:36:32 +0000 (14:36 +0000)]
tools/libxl: Fix ARM build
Fixes: 804fe751375 ("tools/libxl: pass libxl__domain_build_state to libxl__arch_domain_create") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Igor Druzhinin [Wed, 3 Feb 2021 20:07:04 +0000 (20:07 +0000)]
tools/libxl: only set viridian flags on new domains
Domains migrating or restoring should have viridian HVM param key in
the migration stream already and setting that twice results in Xen
returing -EEXIST on the second attempt later (during migration stream parsing)
in case the values don't match. That causes migration/restore operation
to fail at destination side.
That issue is now resurfaced by the latest commits (983524671 and 7e5cffcd1e)
extending default viridian feature set making the values from the previous
migration streams and those set at domain construction different.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Igor Druzhinin [Wed, 3 Feb 2021 20:07:03 +0000 (20:07 +0000)]
tools/libxl: pass libxl__domain_build_state to libxl__arch_domain_create
No functional change.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Tamas K Lengyel [Sat, 30 Jan 2021 13:36:37 +0000 (08:36 -0500)]
x86/vm_event: add response flag to reset vmtrace buffer
Allow resetting the vmtrace buffer in response to a vm_event. This can be used
to optimize a use-case where detecting a looped vmtrace buffer is important.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Tamas K Lengyel [Mon, 18 Jan 2021 17:46:37 +0000 (12:46 -0500)]
x86/vm_event: Carry the vmtrace buffer position in vm_event
Add vmtrace_pos field to x86 regs in vm_event. Initialized to ~0 if
vmtrace is not in use.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Tamas K Lengyel [Fri, 11 Sep 2020 18:14:00 +0000 (20:14 +0200)]
xen/vmtrace: support for VM forks
Implement vmtrace_reset_pt function. Properly set IPT
state for VM forks.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Add an demonstration tool that uses xc_vmtrace_* calls in order
to manage external IPT monitoring for DomU.
Signed-off-by: Michał Leszczyński <michal.leszczynski@cert.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Add functions in libxc that use the new XEN_DOMCTL_vmtrace interface.
Signed-off-by: Michał Leszczyński <michal.leszczynski@cert.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Implement an interface to configure and control tracing operations. Reuse the
existing SETDEBUGGING flask vector rather than inventing a new one.
Userspace using this interface is going to need platform specific knowledge
anyway to interpret the contents of the trace buffer. While some operations
(e.g. enable/disable) can reasonably be generic, others cannot. Provide an
explicitly-platform specific pair of get/set operations to reduce API churn as
new options get added/enabled.
For the VMX specific Processor Trace implementation, tolerate reading and
modifying a safe subset of bits in CTL, STATUS and OUTPUT_MASK. This permits
userspace to control the content which gets logged, but prevents modification
of details such as the position/size of the output buffer.
Signed-off-by: Michał Leszczyński <michal.leszczynski@cert.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Add CPUID/MSR enumeration details for Processor Trace. For now, we will only
support its use inside VMX operation. Fill in the vmtrace_available boolean
to activate the newly introduced common infrastructure for allocating trace
buffers.
For now, Processor Trace is going to be operated in Single Output mode behind
the guests back. Add the MSRs to struct vcpu_msrs, and set up the buffer
limit in vmx_init_ipt() as it is fixed for the lifetime of the domain.
Context switch the most of the MSRs in and out of vCPU context, but the main
control register needs to reside in the MSR load/save lists. Explicitly pull
the msrs pointer out into a local variable, because the optimiser cannot keep
it live across the memory clobbers in the MSR accesses.
Signed-off-by: Michał Leszczyński <michal.leszczynski@cert.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Allow to map processor trace buffer using acquire_resource().
Signed-off-by: Michał Leszczyński <michal.leszczynski@cert.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).
Signed-off-by: Michał Leszczyński <michal.leszczynski@cert.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
To use vmtrace, buffers of a suitable size need allocating, and different
tasks will want different sizes.
Add a domain creation parameter, and audit it appropriately in the
{arch_,}sanitise_domain_config() functions.
For now, the x86 specific auditing is tuned to Processor Trace running in
Single Output mode, which requires a single contiguous range of memory.
The size is given an arbitrary limit of 64M which is expected to be enough for
anticipated usecases, but not large enough to get into long-running-hypercall
problems.
Signed-off-by: Michał Leszczyński <michal.leszczynski@cert.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Mon, 27 Jul 2020 16:24:11 +0000 (17:24 +0100)]
xen/memory: Fix mapping grant tables with XENMEM_acquire_resource
A guest's default number of grant frames is 64, and XENMEM_acquire_resource
will reject an attempt to map more than 32 frames. This limit is caused by
the size of mfn_list[] on the stack.
Fix mapping of arbitrary size requests by looping over batches of 32 in
acquire_resource(), and using hypercall continuations when necessary.
To start with, break _acquire_resource() out of acquire_resource() to cope
with type-specific dispatching, and update the return semantics to indicate
the number of mfns returned. Update gnttab_acquire_resource() and x86's
arch_acquire_resource() to match these new semantics.
Have do_memory_op() pass start_extent into acquire_resource() so it can pick
up where it left off after a continuation, and loop over batches of 32 until
all the work is done, or a continuation needs to occur.
compat_memory_op() is a bit more complicated, because it also has to marshal
frame_list in the XLAT buffer. Have it account for continuation information
itself and hide details from the upper layer, so it can marshal the buffer in
chunks if necessary.
With these fixes in place, it is now possible to map the whole grant table for
a guest.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Fri, 5 Feb 2021 13:09:42 +0000 (14:09 +0100)]
x86/EFI: work around GNU ld 2.36 issue
Our linker capability check fails with the recent binutils release's ld:
.../check.o:(.debug_aranges+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_info'
.../check.o:(.debug_info+0x6): relocation truncated to fit: R_X86_64_32 against `.debug_abbrev'
.../check.o:(.debug_info+0xc): relocation truncated to fit: R_X86_64_32 against `.debug_str'+76
.../check.o:(.debug_info+0x11): relocation truncated to fit: R_X86_64_32 against `.debug_str'+d
.../check.o:(.debug_info+0x15): relocation truncated to fit: R_X86_64_32 against `.debug_str'+2b
.../check.o:(.debug_info+0x29): relocation truncated to fit: R_X86_64_32 against `.debug_line'
.../check.o:(.debug_info+0x30): relocation truncated to fit: R_X86_64_32 against `.debug_str'+19
.../check.o:(.debug_info+0x37): relocation truncated to fit: R_X86_64_32 against `.debug_str'+71
.../check.o:(.debug_info+0x3e): relocation truncated to fit: R_X86_64_32 against `.debug_str'
.../check.o:(.debug_info+0x45): relocation truncated to fit: R_X86_64_32 against `.debug_str'+5e
.../check.o:(.debug_info+0x4c): additional relocation overflows omitted from the output
Tell the linker to strip debug info as a workaround. Debug info has been
getting stripped already anyway when linking the actual xen.efi.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Fri, 5 Feb 2021 12:19:38 +0000 (13:19 +0100)]
tools/tests: fix resource test build on FreeBSD
error.h is not a standard header, and none of the functions declared
there are actually used by the code. This fixes the build on FreeBSD
that doesn't have error.h
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 23 Jul 2020 16:26:16 +0000 (17:26 +0100)]
tools/tests: Introduce a test for acquire_resource
For now, simply try to map 40 frames of grant table. This catches most of the
basic errors with resource sizes found and fixed through the 4.15 dev window.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Manuel Bouyer [Wed, 3 Feb 2021 16:54:19 +0000 (17:54 +0100)]
tools/xenstored: close socket connections on error
On error, don't keep socket connection in ignored state but close them.
When the remote end of a socket is closed, xenstored will flag it as an
error and switch the connection to ignored. But on some OSes (e.g.
NetBSD), poll(2) will return only POLLIN in this case, so sockets in ignored
state will stay open forever in xenstored (and it will loop with CPU 100%
busy).
Fixes: d2fa370d3ef9 ("tools/xenstore: Preserve bad client until they are destroyed") Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Juergen Gross <jgross@suse.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Manuel Bouyer [Wed, 3 Feb 2021 16:54:18 +0000 (17:54 +0100)]
tools/hotplug: Add a qemu-ifup script on NetBSD
On NetBSD, qemu-xen will use a qemu-ifup script to setup the tap interfaces
(as qemu-xen-traditional used to). Copy the script from qemu-xen-traditional,
and install it on NetBSD. While there document parameters and environnement
variables.
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Acked-by: Ian Jackson <iwj@xenproject.org>
Andrew Cooper [Thu, 4 Feb 2021 15:50:16 +0000 (15:50 +0000)]
libs/devicemodel: Fix ABI breakage from xendevicemodel_set_irq_level()
It is not permitted to edit the VERS clause for a version in a release of Xen.
Revert xendevicemodel_set_irq_level()'s inclusion in .so.1.2 and bump the the
library minor version to .so.1.4 instead.
Fixes: 5d752df85f ("xen/dm: Introduce xendevicemodel_set_irq_level DM op") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org> Release-Acked-by: Ian Jackson <iwj@xenproject.org>