Plain MSI doesn't allow caching the MSI address and data fields while
the capability is enabled and not masked, hence we need to allow any
changes to those fields to update the binding of the interrupt. For
reference, the same doesn't apply to MSI-X that is allowed to cache
the data and address fields while the entry is unmasked, see section
6.8.3.5 of the PCI Local Bus Specification 3.0.
Allowing such updates means that a guest can write an invalid address
(ie: all zeros) and then a valid one, so the PIRQs shouldn't be
unmapped when the interrupt cannot be bound to the guest, since
further updates to the address or data fields can result in the
binding succeeding.
Modify the vPCI MSI arch helpers to track whether the interrupt is
bound, and make failures in vpci_msi_update not unmap the PIRQ, so
that further calls can attempt to bind the PIRQ again.
Note this requires some modifications to the MSI-X handlers, but there
shouldn't be any functional changes in that area.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 26 Jan 2021 16:42:56 +0000 (17:42 +0100)]
tools/libs: honor build dependencies for recently moved subdirs
While the lack of proper dependency tracking of #include-d files is
wider than just the libs/ subtree, dealing with the problem universally
there or in tools/Rules.mk is too much of a risk at this point in the
release cycle. Add the missing inclusion of $(DEPS_INCLUDE) only in the
specific Makefile-s, after having checked that their prior Makefile-s
had such includes.
Interestingly the $(DEPS_RM) use is present in tools/libs/libs.mk's
clean target, so doesn't need taking care of in individual Makefile-s.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org> Release-acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Tue, 26 Jan 2021 13:42:23 +0000 (14:42 +0100)]
xen/include: compat/xlat.h may change with .config changes
$(xlat-y) getting derived from $(headers-y) means its contents may
change with changes to .config. The individual files $(xlat-y) refers
to, otoh, may not change, and hence not trigger rebuilding of xlat.h.
(Note that the issue was already present before the commit referred to
below, but it was far more limited in affecting only changes to
CONFIG_XSM_FLASK.)
Fixes: 2c8fabb2232d ("x86: only generate compat headers actually needed") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Add a DOMPRINTF() other methods have, indicating success. To facilitate
this, introduce an "outsize" local variable and update *size as well as
*blob only once done. The latter then also avoids leaving a pointer to
freed memory in dom->kernel_blob in case of a decompression error.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 26 Jan 2021 13:16:34 +0000 (14:16 +0100)]
libxenguest: support zstd compressed kernels
This follows the logic used for other decompression methods utilizing an
external library, albeit here we can't ignore the 32-bit size field
appended to the compressed image - its presence causes decompression to
fail. Leverage the field instead to allocate the output buffer in one
go, i.e. without incrementally realloc()ing.
As far as configure.ac goes, I'm pretty sure there is a better (more
"standard") way of using PKG_CHECK_MODULES(). The construct also gets
put next to the other decompression library checks, albeit I think they
all ought to be x86-specific (e.g. placed in the existing case block a
few lines down).
Note that, where possible, instead of #ifdef-ing xen/*.h inclusions,
they get removed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 26 Jan 2021 13:14:39 +0000 (14:14 +0100)]
libxenguest: add get_unaligned_le32()
Abstract xc_dom_check_gzip()'s reading of the uncompressed size into a
helper re-usable, in particular, by other decompressor code.
Sadly in the mini-os case this conflicts with other functions of the
same name (and purpose), which can't be easily replaced individually.
Yet it was requested that no full set of helpers be introduced at this
point in the release cycle. Hence the awkward XG_NEED_UNALIGNED.
Requested-by: Ian Jackson <iwj@xenproject.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 26 Jan 2021 13:13:18 +0000 (14:13 +0100)]
x86/shadow: use __put_user() instead of __copy_to_user()
In a subsequent patch I would almost have broken the logic here, if I
hadn't happened to read through the comment at the top of
safe_write_entry(): __copy_from_user() does not provide a guarantee
shadow_write_entries() requires - it's only an optimization that it
makes use of __put_user_size() for certain sizes. Use __put_user()
directly, which does expand to a single (memory accessing) insn.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Roger Pau Monne [Tue, 29 Dec 2020 16:58:01 +0000 (17:58 +0100)]
x86/msr: Don't inject #GP when trying to read FEATURE_CONTROL
Windows 10 will triple fault if #GP is injected when attempting to
read the FEATURE_CONTROL MSR on Intel or compatible hardware. Fix this
by injecting a #GP only when the vendor doesn't support the MSR, even
if there are no features to expose.
Fixes: 39ab598c50a2 ('x86/pv: allow reading FEATURE_CONTROL MSR') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Extended comment] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 26 Jun 2020 10:32:00 +0000 (11:32 +0100)]
x86/pv: Inject #UD for missing SYSCALL callbacks
Despite appearing to be a deliberate design choice of early PV64, the
resulting behaviour for unregistered SYSCALL callbacks creates an untenable
testability problem for Xen. Furthermore, the behaviour is undocumented,
bizarre, and inconsistent with related behaviour in Xen, and very liable
introduce a security vulnerability into a PV guest if the author hasn't
studied Xen's assembly code in detail.
There are two different bugs here.
1) The current logic confuses the registered entrypoints, and may deliver a
SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
entrypoint is registered.
This has been the case ever since 2007 (c/s cd75d47348b) but up until
2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.
Xen would malfunction under these circumstances, if it were a PV guest.
Linux would as well, but PVOps has always registered both entrypoints and
discarded the Xen-provided selectors. NetBSD really does malfunction as a
consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).
2) In the case that neither SYSCALL callbacks are registered, the guest will
be crashed when userspace executes a SYSCALL instruction, which is a
userspace => kernel DoS.
This has been the case ever since the introduction of 64bit PV support, but
behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield
#GP/#UD in userspace before the callback is registered, and are therefore
safe by default.
This change does constitute a change in the PV ABI, for corner cases of a PV
guest kernel registering neither callback, or not registering the 32bit
callback when running on AMD/Hygon hardware.
It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
SYSENTER (safe by default, until explicitly enabled), as well as native
hardware (always delivered to the single applicable callback).
Most importantly however, and the primary reason for the change, is that it
lets us sensibly test the fast system call entrypoints under all states a PV
guest can construct, to prove correct behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Paul Durrant [Thu, 8 Oct 2020 18:57:31 +0000 (19:57 +0100)]
docs/migration: add missing definitions to libxc-migration-stream
The STATIC_DATA_END, X86_CPUID_POLICY and X86_MSR_POLICY record types have
sections explaining what they are but their values are not defined. Indeed
their values are defined as "Reserved for future mandatory records."
Also, the spec revision is adjusted to match the migration stream version
and an END record is added to the description of a 'typical save record for
and x86 HVM guest.'
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Fixes: 6f71b5b1506 ("docs/migration Specify migration v3 and STATIC_DATA_END") Fixes: ddd273d8863 ("docs/migration: Specify X86_{CPUID,MSR}_POLICY records") Acked-by: Wei Liu <wl@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Mon, 25 Jan 2021 07:23:31 +0000 (08:23 +0100)]
tools/xenstore: fix use after free bug in xenstore_control
There is a very unlikely use after free bug and a memory leak in
live_update_start() of xenstore_control. Fix those.
Coverity-Id: 1472399 Fixes: 7f97193e6aa858 ("tools/xenstore: add live update command to xenstore-control") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Rahul Singh [Fri, 22 Jan 2021 11:37:19 +0000 (11:37 +0000)]
xen/arm: smmuv3: Add support for SMMUv3 driver
Add support for ARM architected SMMUv3 implementation. It is based on
the Linux SMMUv3 driver.
Driver is currently supported as Tech Preview.
Major differences with regard to Linux driver are as follows:
2. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
3. Use P2M page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
4. Tasklets are used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.
5. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.
Atomic functions used by the commands queue access functions are not
implemented in XEN therefore we decided to port the earlier version
of the code. Atomic operations are introduced to fix the bottleneck
of the SMMU command queue insertion operation. A new algorithm for
inserting commands into the queue is introduced, which is lock-free
on the fast-path.
Consequence of reverting the patch is that the command queue
insertion will be slow for large systems as spinlock will be used to
serializes accesses from all CPUs to the single queue supported by
the hardware. Once the proper atomic operations will be available in
XEN the driver can be updated.
6. Spin lock is used in place of mutex when attaching a device to the
SMMU, as there is no blocking locks implementation available in XEN.
This might introduce latency in XEN. Need to investigate before
driver is out for tech preview.
7. PCI ATS functionality is not supported, as there is no support
available in XEN to test the functionality. Code is not tested and
compiled. Code is guarded by the flag CONFIG_PCI_ATS.
8. MSI interrupts are not supported as there is no support available in
XEN to request MSI interrupts. Code is not tested and compiled. Code
is guarded by the flag CONFIG_MSI.
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
Rahul Singh [Wed, 20 Jan 2021 14:52:41 +0000 (14:52 +0000)]
xen/compiler: import 'fallthrough' keyword from linux
-Wimplicit-fallthrough warns when a switch case falls through. Warning
can be suppress by either adding a /* fallthrough */ comment, or by
using a null statement: __attribute__ ((fallthrough))
Define the pseudo keyword 'fallthrough' for the ability to convert the
various case block /* fallthrough */ style comments to null statement
"__attribute__((__fallthrough__))"
In C mode, GCC supports the __fallthrough__ attribute since 7.1,
the same time the warning and the comment parsing were introduced.
fallthrough devolves to an empty "do {} while (0)" if the compiler
version (any version less than gcc 7) does not support the attribute.
Rahul Singh [Wed, 20 Jan 2021 14:52:36 +0000 (14:52 +0000)]
xen/arm: Revert atomic operation related command-queue insertion patch
Linux SMMUv3 code implements the commands-queue insertion based on
atomic operations implemented in Linux. Atomic functions used by the
commands-queue insertion are not implemented in XEN therefore revert the
patch that implemented the commands-queue insertion based on atomic
operations.
Reverted the other patches also that are implemented based on the code
that introduced the atomic-operations.
Atomic operations are introduced in the patch "iommu/arm-smmu-v3: Reduce
contention during command-queue insertion" that fixed the bottleneck of
the SMMU command queue insertion operation. A new algorithm for
inserting commands into the queue is introduced in this patch, which is
lock-free on the fast-path.
Consequence of reverting the patch is that the command queue insertion
will be slow for large systems as spinlock will be used to serializes
accesses from all CPUs to the single queue supported by the hardware.
Once the proper atomic operations will be available in XEN the driver
can be updated.
Directory structure change for the SMMUv3 driver starting from
Linux 5.9, to revert the patches smoothly using the "git revert" command
we decided to choose Linux 5.8.18.
Only difference between latest stable Linux 5.9.12 and Linux 5.8.18
SMMUv3 driver is the use of the "fallthrough" keyword. This patch will
be merged once "fallthrough" keyword implementation is available in XEN.
It's a copy of the Linux SMMUv3 driver. Xen specific code has not
been added yet and code has not been compiled.
xen/arm: mm: Remove special case for CPU0 in dump_hyp_walk()
There is no need to have a special case for CPU0 when converting the
page-table virtual address into a physical address. The helper
virt_to_maddr() is able to translate any address as long as the root
page-tables is mapped in the virtual address. This is the case for all
the CPUs at the moment.
Juergen Gross [Sat, 16 Jan 2021 10:33:39 +0000 (11:33 +0100)]
xen: add support for automatic debug key actions in case of crash
When the host crashes it would sometimes be nice to have additional
debug data available which could be produced via debug keys, but
halting the server for manual intervention might be impossible due to
the need to reboot/kexec rather sooner than later.
Add support for automatic debug key actions in case of crashes which
can be activated via boot- or runtime-parameter.
Depending on the type of crash the desired data might be different, so
support different settings for the possible types of crashes.
The parameter is "crash-debug" with the following syntax:
crash-debug-<type>=<string>
with <type> being one of:
panic, hwdom, watchdog, kexeccmd, debugkey
and <string> a sequence of debug key characters with '+' having the
special semantics of a 10 millisecond pause.
So "crash-debug-watchdog=0+0qr" would result in special output in case
of watchdog triggered crash (dom0 state, 10 ms pause, dom0 state,
domain info, run queues).
Don't call key handlers in early boot, as some (e.g. for 'd') require
some initializations to be finished, like scheduler or idle domain.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Sat, 16 Jan 2021 10:33:38 +0000 (11:33 +0100)]
xen: enable keyhandlers to work without register set specified
There are only two keyhandlers which make use of the cpu_user_regs
struct passed to them. In order to be able to call any keyhandler in
non-interrupt contexts, too, modify those two handlers to cope with a
NULL regs pointer by using run_in_exception_handler() in that case.
Juergen Gross [Sat, 16 Jan 2021 10:33:37 +0000 (11:33 +0100)]
xen/arm: add support for run_in_exception_handler()
Add support to run a function in an exception handler for Arm. Do it
as on x86 via a bug_frame, but pass the function pointer via a
register.
This needs to be done that way because GCC will not allow to use
"i" when PIE is enabled (Xen doesn't set the flag but instead rely on
the default value from the compiler).
Use the same BUGFRAME_* #defines as on x86 in order to make a future
common header file more easily achievable.
Signed-off-by: Juergen Gross <jgross@suse.com>
[ julien: Add more details on the issue between "i" and -fpie ] Acked-by: Julien GralL <jgrall@amazon.com>
Remove copy/paste error introduced by f58976544ff4 ("automation: use
test-artifacts/qemu-system-aarch64 instead of Debian's")
Fixes: f58976544ff4 ("automation: use test-artifacts/qemu-system-aarch64 instead of Debian's") Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Fri, 8 Jan 2021 11:57:37 +0000 (11:57 +0000)]
tools/oxenstored: Use more efficient node trees
This changes the output of xenstore-ls to be sorted. Previously the keys were
listed in the order in which they were inserted in. docs/misc/xenstore.txt
doesn't specify in what order keys are listed.
Map.update is used to retain semantics with replace_child: only an existing
child is replaced, if it wasn't part of the original map we don't add it.
Similarly exception behaviour is retained for del_childname and related
functions.
Entries are stored in reverse sort order, so that upon Map.fold the
constructed list is sorted in ascending order and there is no need for a
List.rev.
This changes the semantics and is not suitable as is for a backport. It
reveals bugs in buggy clients that depend on xenstore entry order, however
those clients should be fixed.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Edwin Török [Fri, 8 Jan 2021 11:57:37 +0000 (11:57 +0000)]
tools/oxenstored: Replace hand rolled GC with weak GC references
The code here is attempting to reduce memory usage by sharing common
substrings in the tree: it replaces strings with ints, and keeps a string->int
map that gets manually garbage collected using a hand-rolled mark and sweep
algorithm.
This is unnecessary: OCaml already has a mark-and-sweep Garbage Collector
runtime, and sharing of common strings in tree nodes can be achieved through
Weak references: if the string hasn't been seen yet it gets added to the Weak
reference table, and if it has we use the entry from the table instead, thus
storing a string only once. When the string is no longer referenced OCaml's
GC will drop it from the weak table: there is no need to manually do a
mark-and-sweep, or to tell OCaml when to drop it.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Edwin Török [Fri, 8 Jan 2021 11:57:37 +0000 (11:57 +0000)]
tools/ocaml/libs/xc: Backward compatible domid control at domain creation time
One can specify the domid to use when creating the domain, but this was
hardcoded to 0.
Keep the existing `domain_create` function, and make domid an optional
argument. When not specified, default to 0.
Controlling the domid can be useful during testing or migration.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Fri, 8 Jan 2021 11:57:37 +0000 (11:57 +0000)]
tools/oxenstored: Trim txhistory on xenbus reconnect
There is a global history, containing transactions from the past 0.05s, which
get trimmed whenever any transaction commits or aborts. Destroying a domain
will cause xenopsd to perform some transactions deleting the tree, so that is
fine. But I think that a domain can abuse the xenbus reconnect facility to
cause a large history to be recorded - provided that noone does any
transactions on the system inbetween, which may be difficult to achieve given
squeezed's constant pinging.
The theoretical situation is like this:
- a domain starts a transaction, creates as large a tree as it can, commits
it. Then repeatedly:
- start a transaction, do nothing with it, start a transaction, delete
part of the large tree, write some new unique data there, don't commit
- cause a xenbus reconnect (I think this can be done by writing something
to the ring). This causes all transactions/watches for the connection to
be cleared, but NOT the history, there were no commits, so nobody
trimmed the history, i.e. it the history can contain transactions from
more than just 0.05s
- loop back and start more transactions, you can keep this up indefinitely
without hitting quotas
Now there is a periodic History.trim running every 0.05s, so I don't think you
can do much damage with it. But lets be safe an trim the transaction history
anyway on reconnect.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
See docs/misc/xenstore.txt for documentation on live-update command. Validate
that the binary exists and that the cmdline is valid, to prevent typos from
taking down xenstore; if live-update fails there is no way back due to the use
of exec().
Live update only proceeds if there are no active transactions, and no
unprocessed input or unflushed output.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Reviewed-by: Pau Ruiz Safont <pau.safont@citrix.com> Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
Edwin Török [Fri, 8 Jan 2021 11:57:37 +0000 (11:57 +0000)]
tools/oxenstored: Only quit on SIGTERM when a reload is possible
Currently when oxenstored receives SIGTERM it dumps its state and quits. It
is possible to then restart it if --restart is given, however that is not
always safe:
* Domains could have active transactions, and after a restart they would
either reuse transaction IDs of already open transactions, or get an error
back that the transaction doesn't exist
* There could be pending data to send to a VM still in oxenstored's
queue which would be lost
* There could be pending input to be processed from a VM in oxenstored's
queue which would be lost
Prevent shutting down oxenstored via SIGTERM in the above situations. Also
ignore domains marked as bad because oxenstored would never talk to them
again.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Reviewed-by: Pau Ruiz Safont <pau.safont@citrix.com> Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
Edwin Török [Fri, 15 Jan 2021 18:23:10 +0000 (18:23 +0000)]
tools/ocaml/libs/xb: Do not crash after xenbus is unmapped
Xenmmap.unmap sets the address to MAP_FAILED in xenmmap_stubs.c. If due to a
bug there were still references to the Xenbus and we attempt to use it then we
crash. Raise an exception instead of crashing.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Hans Reiser [Mon, 9 Nov 2020 14:36:00 +0000 (14:36 +0000)]
tools/libxenstat: Fix infinite loop when QEMU dies
Occasionally, "dead" xentop processes consuming 100% CPU time have been
observed.
When the QEMU process the qmp_read() function is communicating with
terminates, qmp_read() may enter an infinite loop. poll() signals EOF (POLLIN
and POLLHUP set), the subsequent read() call returns 0, and then the function
calls poll() again, which still sees the EOF condition and will return again
immediately with POLLIN and POLLHUP set, repeating ad infinitum.
A simple fix is to terminate the loop when read returns 0 (under "normal"
instances, poll will return with POLLIN set only if there is data to read, so
read will always read >0 bytes, except if the socket has been closed).
Signed-off-by: Hans Reiser <hr@sec.uni-passau.de> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Olaf Hering [Tue, 20 Oct 2020 12:39:28 +0000 (14:39 +0200)]
tools/xl: use proper name for bash_completion file
Files in the bash-completion dirs should be named like the commands,
without suffix. Without this change 'xl' will not be recognized as a
command with completion support if BASH_COMPLETION_DIR is set to
/usr/share/bash-completion/completions.
Fixes: 9136a919b ("xl: Add basic bash completion for xl command.") Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Juergen Gross [Fri, 22 Jan 2021 06:08:57 +0000 (07:08 +0100)]
tools/xenstore: fix unsigned < 0 compare in xenstore_control
Commit 7f97193e6aa858df ("tools/xenstore: add live update command to
xenstore-control") introduced testing an unsigned value to be less
than 0. Fix that.
Fixes: 7f97193e6aa858df ("tools/xenstore: add live update command to xenstore-control") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Fri, 22 Jan 2021 11:14:07 +0000 (12:14 +0100)]
xen/hypfs: add support for id-based dynamic directories
Add some helpers to hypfs.c to support dynamic directories with a
numerical id as name.
The dynamic directory is based on a template specified by the user
allowing to use specific access functions and having a predefined
set of entries in the directory.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Fri, 22 Jan 2021 11:13:40 +0000 (12:13 +0100)]
xen/hypfs: support dynamic hypfs nodes
Add a HYPFS_VARDIR_INIT() macro for initializing such a directory
statically, taking a struct hypfs_funcs pointer as parameter additional
to those of HYPFS_DIR_INIT().
Modify HYPFS_VARSIZE_INIT() to take the function vector pointer as an
additional parameter as this will be needed for dynamical entries.
For being able to let the generic hypfs coding continue to work on
normal struct hypfs_entry entities even for dynamical nodes add some
infrastructure for allocating a working area for the current hypfs
request in order to store needed information for traversing the tree.
This area is anchored in a percpu pointer and can be retrieved by any
level of the dynamic entries. The normal way to handle allocation and
freeing is to allocate the data in the enter() callback of a node and
to free it in the related exit() callback.
Add a hypfs_add_dyndir() function for adding a dynamic directory
template to the tree, which is needed for having the correct reference
to its position in hypfs.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Fri, 22 Jan 2021 11:13:05 +0000 (12:13 +0100)]
x86/vioapic: check IRR before attempting to inject interrupt after EOI
In vioapic_update_EOI the irq_lock will be dropped in order to forward
the EOI to the dpci handler, so there's a window between clearing IRR
and checking if the line is asserted where IRR can change behind our
back.
Fix this by checking whether IRR is set before attempting to inject a
new interrupt.
Fixes: 06e3f8f2766 ('vt-d: Do dpci eoi outside of irq_lock.') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Wed, 13 Jan 2021 13:00:21 +0000 (14:00 +0100)]
tools/xenstore: handle dying domains in live update
A domain could just be dying when live updating Xenstore, so the case
of not being able to map the ring page or to connect to the event
channel must be handled gracefully.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Wed, 13 Jan 2021 13:00:20 +0000 (14:00 +0100)]
tools/xenstore: add read connection state for live update
Add the needed functions for reading connection state for live update.
As the connection is identified by a unique connection id in the state
records we need to add this to struct connection. Add a new function
to return the connection based on a connection id.
Juergen Gross [Wed, 13 Jan 2021 13:00:19 +0000 (14:00 +0100)]
tools/xenstore: split off domain introduction from do_introduce()
For live update the functionality to introduce a new domain similar to
the XS_INTRODUCE command is needed, so split that functionality off
into a dedicated function introduce_domain().
Switch initial dom0 initialization to use this function, too.
Juergen Gross [Wed, 13 Jan 2021 13:00:19 +0000 (14:00 +0100)]
tools/xenstore: handle CLOEXEC flag for local files and pipes
For support of live update the locally used files need to have the
"close on exec" flag set. Fortunately the used Xen libraries are
already doing this, so only the logging and tdb related files and
pipes are affected. openlog() has the close on exec attribute, too.
In order to be able to keep the event channels open specify the
XENEVTCHN_NO_CLOEXEC flag when calling xenevtchn_open().
Juergen Gross [Wed, 13 Jan 2021 13:00:19 +0000 (14:00 +0100)]
tools/xenstore: dump the xenstore state for live update
Dump the complete Xenstore status to a file (daemon case) or memory
(stubdom case).
As we don't know the exact length of the needed area in advance we are
using an anonymous rather large mapping in stubdom case, which will
use only virtual address space until accessed. And as we are writing
the area in a sequential manner this is fine. As the initial size we
are choosing the double size of the memory allocated via talloc(),
which should be more than enough.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Julien Grall <jgrall@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Wed, 13 Jan 2021 13:00:18 +0000 (14:00 +0100)]
tools/xenstore: allow live update only with no transaction active
In order to simplify live update state dumping only allow live update
to happen when no transaction is active.
A timeout is used to detect guests which have a transaction active for
longer periods of time. In case such a guest is detected when trying
to do a live update it will be reported and the update will fail.
The admin can then either use a longer timeout, or use the force flag
to just ignore the transactions of such a guest, or kill the guest
before retrying.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Wed, 13 Jan 2021 13:00:18 +0000 (14:00 +0100)]
tools/xenstore: add the basic framework for doing the live update
Add the main framework for executing the live update. This for now
only defines the basic execution steps with empty dummy functions.
This final step returning means failure, as in case of success the
new executable will have taken over.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Thu, 14 Jan 2021 11:41:32 +0000 (12:41 +0100)]
tools/xenstore: add support for delaying execution of a xenstore request
Today a Xenstore request is processed as soon as it is seen by
xenstored. Add the framework for being able to delay processing of a
request if the right conditions aren't met.
Any delayed requests are executed at the end of the main processing
loop in xenstored. They can either delay themselves again or just do
their job. In order to enable the possibility of a timeout, the main
loop will be paused for max one second if any requests are delayed.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Wed, 13 Jan 2021 13:00:18 +0000 (14:00 +0100)]
tools/xenstore: add command line handling for live update
Updating an instance of Xenstore via live update needs to hand over
the command line parameters to the updated instance. Those can be
either the parameters used by the updated instance or new ones when
supplied when starting the live update.
So when supplied store the new command line parameters in lu_status.
As it is related add a new option -U (or --live-update") to the command
line of xenstored which will be added when starting the new instance.
This enables to perform slightly different initializations when
started as a result of live update.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Wed, 13 Jan 2021 13:00:17 +0000 (14:00 +0100)]
tools/xenstore: introduce live update status block
Live update of Xenstore is done in multiple steps. It needs a status
block holding the current state of live update and related data. It
is allocated as child of the connection live update was started over
in order to abort live update in case the connection is closed.
Allocation of the block is done in lu_binary[_alloc](), freeing in
lu_abort() (and for now in lu_start() as long as no real live-update
is happening).
Add tests in all live-update command handlers other than lu_abort()
and lu_binary[_alloc]() for being started via the same connection
as the begin of live-update.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Julien Grall <jgrall@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Wed, 13 Jan 2021 13:00:17 +0000 (14:00 +0100)]
tools/xenstore: add live update command to xenstore-control
Add the "live-update" command to xenstore-control enabling updating
xenstored to a new version in a running Xen system.
With -c <arg> it is possible to pass a different command line to the
new instance of xenstored. This will replace the command line used
for the invocation of the just running xenstored instance.
The running xenstored (or xenstore-stubdom) needs to support live
updating, of course.
For now just add a small dummy handler to C xenstore denying any
live update action.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Julien Grall <jgrall@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 5 Jan 2021 17:46:42 +0000 (17:46 +0000)]
libxl / libxlu: support 'xl pci-attach/detach' by name
This patch modifies libxlu_pci_parse_spec_string() to parse the new 'name'
parameter of PCI_SPEC_STRING detailed in the updated documention in
xl-pci-configuration(5) and populate the 'name' field of 'libxl_device_pci'.
If the 'name' field is non-NULL then both libxl_device_pci_add() and
libxl_device_pci_remove() will use it to look up the device BDF in
the list of assignable devices.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 5 Jan 2021 17:46:41 +0000 (17:46 +0000)]
docs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING
Since assignable devices can be named, a subsequent patch will support use
of a PCI_SPEC_STRING containing a 'name' parameter instead of a 'bdf'. In
this case the name will be used to look up the 'bdf' in the list of assignable
(or assigned) devices.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 5 Jan 2021 17:46:40 +0000 (17:46 +0000)]
xl: support naming of assignable devices
With this patch applied 'xl pci-assignable-add' will take an optional '--name'
parameter, 'xl pci-assignable-remove' can be passed either a BDF or a name and
'xl pci-assignable-list' will take a optional '--show-names' flag which
determines whether names are displayed in its output.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 5 Jan 2021 17:46:39 +0000 (17:46 +0000)]
libxl: add 'name' field to 'libxl_device_pci' in the IDL...
... and modify libxl_pci_bdf_assignable_add/remove/list() to make use of it.
libxl_pci_bdf_assignable_add() will store the name of the device in xenstore
if the field is specified (i.e. non-NULL) and libxl_pci_bdf_assignable_remove()
will remove devices specified only by name, looking up the BDF as necessary.
libxl_pci_bdf_assignable_list() will also populate the 'name' field if a name
was stored by libxl_pci_bdf_assignable_add().
NOTE: This patch also fixes whitespace in the declaration of 'libxl_device_pci'
in the IDL.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 5 Jan 2021 17:46:38 +0000 (17:46 +0000)]
libxl: stop setting 'vdevfn' in pci_struct_fill()
There are only two call-sites. One always sets it to 0 (which is unnecessary
as the structure is already initialized to zero) and the other can simply set
the 'vdevfn' field directly (after proper structure initialization), avoiding
the need for a local variable.
A subsequent patch will also make use of pci_struct_fill() in a context
where 'vdevfn' may already have been set.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 5 Jan 2021 17:46:37 +0000 (17:46 +0000)]
libxlu: introduce xlu_pci_parse_spec_string()
This patch largely re-writes the code to parse a PCI_SPEC_STRING and enters
it via the newly introduced function. The new parser also deals with 'bdf'
and 'vslot' as non-positional paramaters, as per the documentation in
xl-pci-configuration(5).
The existing xlu_pci_parse_bdf() function remains, but now strictly parses
BDF values. Some existing callers of xlu_pci_parse_bdf() are
modified to call xlu_pci_parse_spec_string() as per the documentation in xl(1).
NOTE: Usage text in xl_cmdtable.c and error messages are also modified
appropriately.
As a side-effect this patch also fixes a bug where using '*' to specify
all functions would lead to an assertion failure at the end of
xlu_pci_parse_bdf().
Fixes: d25cc3ec93eb ("libxl: workaround gcc 10.2 maybe-uninitialized warning") Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 5 Jan 2021 17:46:36 +0000 (17:46 +0000)]
docs/man: modify xl(1) in preparation for naming of assignable devices
A subsequent patch will introduce code to allow a name to be specified to
'xl pci-assignable-add' such that the assignable device may be referred to
by than name in subsequent operations.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Roger Pau Monné [Thu, 21 Jan 2021 15:11:41 +0000 (16:11 +0100)]
x86/dpci: do not remove pirqs from domain tree on unbind
A fix for a previous issue removed the pirqs from the domain tree when
they are unbound in order to prevent shared pirqs from triggering a
BUG_ON in __pirq_guest_unbind if they are unbound multiple times. That
caused free_domain_pirqs to no longer unmap the pirqs because they
are gone from the domain pirq tree, thus leaving stale unbound pirqs
after domain destruction if the domain had mapped dpci pirqs after
shutdown.
Take a different approach to fix the original issue, instead of
removing the pirq from d->pirq_tree clear the flags of the dpci pirq
struct to signal that the pirq is now unbound. This prevents calling
pirq_guest_unbind multiple times for the same pirq without having to
remove it from the domain pirq tree.
This is XSA-360.
Fixes: 5b58dad089 ('x86/pass-through: avoid double IRQ unbind during domain cleanup') Reported-by: Samuel Verschelde <samuel.verschelde@vates.fr> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Because MPIDR_AFF0_MASK is defined as a 32-bit value, we will miss out
the 3rd level affinity. As a consequence, the IPI would not be sent to
the correct vCPU.
This particular error can be solved by switching MPIDR_AFF0_MASK to use
unsigned long. However, take the opportunity to switch all the MPIDR_*
define to use unsigned long to avoid anymore issue.
Julien Grall [Sat, 28 Nov 2020 11:36:42 +0000 (11:36 +0000)]
xen/irq: Propagate the error from init_one_desc_irq() in init_*_irq_data()
init_one_desc_irq() can return an error if it is unable to allocate
memory. While this is unlikely to happen during boot (called from
init_{,local_}irq_data()), it is better to harden the code by
propagting the return value.
Wei Chen [Fri, 8 Jan 2021 06:21:26 +0000 (14:21 +0800)]
xen/arm: Add defensive barrier in get_cycles for Arm64
Per the discussion [1] on the mailing list, we'd better to
have a barrier after reading CNTPCT in get_cycles. If there
is not any barrier there. When get_cycles being used in some
seqlock critical context in the future, the seqlock can be
speculated potentially.
When executing clock_gettime(), either in the vDSO or via a system call,
we need to ensure that the read of the counter register occurs within
the seqlock reader critical section. This ensures that updates to the
clocksource parameters (e.g. the multiplier) are consistent with the
counter value and therefore avoids the situation where time appears to
go backwards across multiple reads.
Extend the vDSO logic so that the seqlock critical section covers the
read of the counter register as well as accesses to the data page. Since
reads of the counter system registers are not ordered by memory barrier
instructions, introduce dependency ordering from the counter read to a
subsequent memory access so that the seqlock memory barriers apply to
the counter access in both the vDSO and the system call paths.
Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/linux-arm-kernel/alpine.DEB.2.21.1902081950260.1662@nanos.tec.linutronix.de/ Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Will Deacon <will.deacon@arm.com>
While we are not aware of such use in Xen, it would be best to add the
barrier to avoid any suprise.
In order to reduce the impact of new barrier, we perfer to
use enforce order instead of ISB [2].
Currently, enforce order is not applied to arm32 as this is
not done in Linux at the date of this patch. If this is done
in Linux it will need to be also done in Xen.
To avoid adding read_cntpct_enforce_ordering everywhere, we introduced
a new helper read_cntpct_stable to replace original get_cycles, and turn
get_cycles to a wrapper which we can add read_cntpct_enforce_ordering
easily.
Roger Pau Monné [Tue, 19 Jan 2021 15:04:06 +0000 (16:04 +0100)]
x86/CPUID: unconditionally set XEN_HVM_CPUID_IOMMU_MAPPINGS
This is a revert of f5cfa0985673 plus a rework of the comment that
accompanies the setting of the flag so we don't forget why it needs to
be unconditionally set: it's indicating whether the version of Xen has
the original issue fixed and IOMMU entries are created for
grant/foreign maps.
If the flag is only exposed when the IOMMU is enabled the guest could
resort to use bounce buffers when running backends as it would assume
the underlying Xen version still has the bug present and thus
grant/foreign maps cannot be used with devices.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 19 Jan 2021 15:03:41 +0000 (16:03 +0100)]
kconfig: ensure strndup() declaration is visible
Its guard was updated such that it is visible by default when POSIX 2008
was adopted by glibc. It's not visible by default on older glibc.
Fixes: f80fe2b34f08 ("xen: Update Kconfig to Linux v5.4") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Doug Goldstein <cardoe@cardoe.com>
Manuel Bouyer [Tue, 12 Jan 2021 18:12:26 +0000 (19:12 +0100)]
tools/xenbackendd: Remove xenbackendd
NetBSD doens't need xenbackendd with xl toolstack so don't build it.
Remove now unused xenbackendd directory/files, and remaining references
in the hotplug scripts.
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
[Also clean up stale comments in the Linux xencommons script] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
read_exact/write_exact seems to not be available here, which cause a gcc
error. Use plain read/write, the xenevtchn interface won't do partial
read/write on NetBSD anyway so it should be safe. This is in line with the
rest of the OS specific helpers.
Fixes: b7f76a699dc ('tools: Refactor /dev/xen/evtchn wrappers into libxenevtchn') Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
UBSAN catches an uninitialized use of the 'preempted' variable in
fork_hap_allocation when there is no preemption.
Fixes: 41548c5472a ("mem_sharing: VM forking") Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>