Manuel Bouyer [Sat, 30 Jan 2021 18:27:10 +0000 (19:27 +0100)]
xenpmd.c: use dynamic allocation
On NetBSD, d_name is larger than 256, so file_name[284] may not be large
enough (and gcc emits a format-truncation error).
Use asprintf() instead of snprintf() on a static on-stack buffer.
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Plus
define GNU_SOURCE for asprintf()
Harmless on NetBSD.
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Tamas K Lengyel [Sat, 30 Jan 2021 01:59:53 +0000 (20:59 -0500)]
x86/debug: fix page-overflow bug in dbg_rw_guest_mem
When using gdbsx dbg_rw_guest_mem is used to read/write guest memory. When the
buffer being accessed is on a page-boundary, the next page needs to be grabbed
to access the correct memory for the buffer's overflown parts. While
dbg_rw_guest_mem has logic to handle that, it broke with 229492e210a. Instead
of grabbing the next page the code right now is looping back to the
start of the first page. This results in errors like the following while trying
to use gdb with Linux' lx-dmesg:
[ 0.114457] PM: hibernation: Registered nosave memory: [mem
0xfdfff000-0xffffffff]
[ 0.114460] [mem 0x90000000-0xfbffffff] available for PCI demem 0
[ 0.114462] f]f]
Python Exception <class 'ValueError'> embedded null character:
Error occurred in Python: embedded null character
Fixing this bug by taking the variable assignment outside the loop.
Fixes: 229492e210a ("x86/debugger: use copy_to/from_guest() in dbg_rw_guest_mem()") Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 20 Jan 2021 19:06:19 +0000 (19:06 +0000)]
xen+tools: Introduce XEN_SYSCTL_PHYSCAP_vmtrace
We're about to introduce support for Intel Processor Trace, but similar
functionality exists in other platforms.
Aspects of vmtrace can reasonably can be common, so start with
XEN_SYSCTL_PHYSCAP_vmtrace and plumb the signal from Xen all the way down into
`xl info`.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
The frame_list is an input, or an output, depending on whether the calling
domain is translated or not. The array does not need marshalling in both
directions.
Furthermore, the copy-in loop was very inefficient, copying 4 bytes at at
time. Rewrite it to copy in all nr_frames at once, and then expand
compat_pfn_t to xen_pfn_t in place.
Re-position the copy-in loop to simplify continuation support in a future
patch, and reduce the scope of certain variables.
No change in guest observed behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Thu, 23 Jul 2020 14:18:33 +0000 (15:18 +0100)]
xen/memory: Fix acquire_resource size semantics
Calling XENMEM_acquire_resource with a NULL frame_list is a request for the
size of the resource, but the returned 32 is bogus.
If someone tries to follow it for XENMEM_resource_ioreq_server, the acquire
call will fail as IOREQ servers currently top out at 2 frames, and it is only
half the size of the default grant table limit for guests.
Also, no users actually request a resource size, because it was never wired up
in the sole implementation of resource acquisition in Linux.
Introduce a new resource_max_frames() to calculate the size of a resource, and
implement it the IOREQ and grant subsystems.
It is impossible to guarantee that a mapping call following a successful size
call will succeed (e.g. The target IOREQ server gets destroyed, or the domain
switches from grant v2 to v1). Document the restriction, and use the
flexibility to simplify the paths to be lockless.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 27 Jul 2020 12:40:06 +0000 (13:40 +0100)]
xen/gnttab: Rework resource acquisition
The existing logic doesn't function in the general case for mapping a guests
grant table, due to arbitrary 32 frame limit, and the default grant table
limit being 64.
In order to start addressing this, rework the existing grant table logic by
implementing a single gnttab_acquire_resource(). This is far more efficient
than the previous acquire_grant_table() in memory.c because it doesn't take
the grant table write lock, and attempt to grow the table, for every single
frame.
The new gnttab_acquire_resource() function subsumes the previous two
gnttab_get_{shared,status}_frame() helpers.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
The ABI is unfortunate, and frame being 64 bits leads to all kinds of problems
performing correct overflow checks.
Reject out-of-range values, and combinations which overflow, and use unsigned
int consistently elsewhere. This fixes several truncation bugs in the grant
call tree, as the underlying limits are expressed with unsigned int to begin
with.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Manuel Bouyer [Tue, 26 Jan 2021 22:47:58 +0000 (23:47 +0100)]
libs/light: pass some infos to qemu
Pass bridge name to qemu as command line option
When starting qemu, set an environnement variable XEN_DOMAIN_ID,
to be used by qemu helper scripts
The only functional difference of using the br parameter is that the
bridge name gets passed to the QEMU script.
NetBSD doesn't have the ioctl to rename network interfaces implemented, and
thus cannot rename the interface from tapX to vifX.Y-emu. Only qemu knowns
the tap interface name, so we need to use the qemu script from qemu itself.
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Manuel Bouyer [Tue, 26 Jan 2021 22:47:57 +0000 (23:47 +0100)]
libs/light: make it build without setresuid()
NetBSD doesn't have setresuid(). introcuce libxl__setresuid(),
which on NetBSD assert() that it's never called (it should not be called when
dm restriction is off, and NetBSD doesn't support dm restriction at
this time).
On linux and FreeBSD it just calls setresuid().
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Manuel Bouyer [Tue, 26 Jan 2021 22:47:54 +0000 (23:47 +0100)]
libs/light: Switch NetBSD to QEMU_XEN
Switch NetBSD to QEMU_XEN.
All 3 versions of libxl__default_device_model() now return
LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN, so remove it and just set
b_info->device_model_version to LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN in
libxl__domain_build_info_setdefault().
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Manuel Bouyer [Tue, 26 Jan 2021 22:47:49 +0000 (23:47 +0100)]
NetBSD hotplug: fix block unconfigure on destroy
When a domain is destroyed, xparams may not be available any more when
the block script is called to unconfigure the vnd.
Check xparam only at configure time, and just unconfigure any vnd present
in the xenstore.
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Manuel Bouyer [Tue, 26 Jan 2021 22:47:48 +0000 (23:47 +0100)]
NetBSD hotplug: Introduce locking functions
On NetBSD, some block device configuration requires serialisation.
Introcuce locking functions (derived from the Linux version), and use them
in the block script where appropriate.
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/ioreq: Make the IOREQ feature selectable on Arm
The purpose of this patch is to add a possibility for user
to be able to select IOREQ support on Arm (which is disabled
by default) with retaining the current behaviour on x86
(is selected by HVM and it's prompt is not visible).
Also make the IOREQ be depended on CONFIG_EXPERT on Arm since
it is considered as Technological Preview feature and
update SUPPORT.md.
xen/ioreq: Do not let bufioreq to be used on other than x86 arches
This patch prevents the device model running on other than x86
systems to use buffered I/O feature for now.
Please note, there is no caller which requires to send buffered
I/O request on Arm currently and the purpose of this check is
to catch any future user of bufioreq.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Acked-by: Paul Durrant <paul@xen.org>
We need to send mapcache invalidation request to qemu/demu everytime
the page gets removed from a guest.
At the moment, the Arm code doesn't explicitely remove the existing
mapping before inserting the new mapping. Instead, this is done
implicitely by __p2m_set_entry().
First of all we need to recognize a case when the "freed" entry
contains some RAM page in order to set the corresponding flag.
The most suitable place to do this is p2m_free_entry(), there we can
find the correct leaf type. The invalidation request will be sent
in do_trap_hypercall() later on.
Taking into the account the following the do_trap_hypercall()
is the best place to send invalidation request:
- The only way a guest can modify its P2M on Arm is via an hypercall
- When sending the invalidation request, the vCPU will be blocked
until all the IOREQ servers have acknowledged the invalidation
xen/ioreq: Make x86's send_invalidate_req() common
As the IOREQ is a common feature now and we also need to
invalidate qemu/demu mapcache on Arm when the required condition
occurs this patch moves this function to the common code
(and remames it to ioreq_signal_mapcache_invalidate).
This patch also moves per-domain qemu_mapcache_invalidate
variable out of the arch sub-struct (and drops "qemu" prefix).
We don't put this variable inside the #ifdef CONFIG_IOREQ_SERVER
at the end of struct domain, but in the hole next to the group
of 5 bools further up which is more efficient.
The subsequent patch will add mapcache invalidation handling on Arm.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> CC: Julien Grall <julien.grall@arm.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
In the ideal world we would never get an undefined behavior when
propagating the sign bit since that bit can only be set for access
size smaller than the register size (i.e byte/half-word for aarch32,
byte/half-word/word for aarch64).
In the real world we need to care for *possible* hardware bug such as
advertising a sign extension for either 64-bit (or 32-bit) on Arm64
(resp. Arm32).
So harden a bit more the code to prevent undefined behavior when
propagating the sign bit in case of buggy hardware.
In order to avoid code duplication (both handle_read() and
handle_ioserv() contain the same code for the sign-extension)
put this code to a common helper to be used for both.
xen/dm: Introduce xendevicemodel_set_irq_level DM op
This patch adds ability to the device emulator to notify otherend
(some entity running in the guest) using a SPI and implements Arm
specific bits for it. Proposed interface allows emulator to set
the logical level of a one of a domain's IRQ lines.
We can't reuse the existing DM op (xen_dm_op_set_isa_irq_level)
to inject an interrupt as the "isa_irq" field is only 8-bit and
able to cover IRQ 0 - 255, whereas we need a wider range (0 - 1020).
Please note, for egde-triggered interrupt (which is used for
the virtio-mmio emulation) we only trigger the interrupt on Arm
if the level is asserted (rising edge) and do nothing if the level
is deasserted (falling edge), so the call could be named "trigger_irq"
(without the level parameter). But, in order to model the line closely
(to be able to support level-triggered interrupt) we need to know whether
the line is low or high, so the proposed interface has been chosen.
However, it is worth mentioning that in case of the level-triggered
interrupt, we should keep injecting the interrupt to the guest until
the line is deasserted (this is not covered by current patch).
Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
This patch introduces a helper the main purpose of which is to check
if a domain is using IOREQ server(s).
On Arm the current benefit is to avoid calling vcpu_ioreq_handle_completion()
(which implies iterating over all possible IOREQ servers anyway)
on every return in leave_hypervisor_to_guest() if there is no active
servers for the particular domain.
Also this helper will be used by one of the subsequent patches on Arm.
xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm
This patch implements reference counting of foreign entries in
in set_foreign_p2m_entry() on Arm. This is a mandatory action if
we want to run emulator (IOREQ server) in other than dom0 domain,
as we can't trust it to do the right thing if it is not running
in dom0. So we need to grab a reference on the page to avoid it
disappearing.
It is valid to always pass "p2m_map_foreign_rw" type to
guest_physmap_add_entry() since the current and foreign domains
would be always different. A case when they are equal would be
rejected by rcu_lock_remote_domain_by_id(). Besides the similar
comment in the code put a respective ASSERT() to catch incorrect
usage in future.
It was tested with IOREQ feature to confirm that all the pages given
to this function belong to a domain, so we can use the same approach
as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one().
This involves adding an extra parameter for the foreign domain to
set_foreign_p2m_entry() and a helper to indicate whether the arch
supports the reference counting of foreign entries and the restriction
for the hardware domain in the common code can be skipped for it.
xen/arm: Call vcpu_ioreq_handle_completion() in check_for_vcpu_work()
This patch adds remaining bits needed for the IOREQ support on Arm.
Besides just calling vcpu_ioreq_handle_completion() we need to handle
it's return value to make sure that all the vCPU works are done before
we return to the guest (the vcpu_ioreq_handle_completion() may return
false if there is vCPU work to do or IOREQ state is invalid).
For that reason we use an unbounded loop in leave_hypervisor_to_guest().
The worse that can happen here if the vCPU will never run again
(the I/O will never complete). But, in Xen case, if the I/O never
completes then it most likely means that something went horribly
wrong with the Device Emulator. And it is most likely not safe
to continue. So letting the vCPU to spin forever if the I/O never
completes is a safer action than letting it continue and leaving
the guest in unclear state and is the best what we can do for now.
Please note, using this loop we will not spin forever on a pCPU,
preventing any other vCPUs from being scheduled. At every loop
we will call check_for_pcpu_work() that will process pending
softirqs. In case of failure, the guest will crash and the vCPU
will be unscheduled. In normal case, if the rescheduling is necessary
the vCPU will be rescheduled to give place to someone else.
Julien Grall [Fri, 29 Jan 2021 01:48:42 +0000 (03:48 +0200)]
arm/ioreq: Introduce arch specific bits for IOREQ/DM features
This patch adds basic IOREQ/DM support on Arm. The subsequent
patches will improve functionality and add remaining bits.
The IOREQ/DM features are supposed to be built with IOREQ_SERVER
option enabled, which is disabled by default on Arm for now.
Please note, the "PIO handling" TODO is expected to left unaddressed
for the current series. It is not an big issue for now while Xen
doesn't have support for vPCI on Arm. On Arm64 they are only used
for PCI IO Bar and we would probably want to expose them to emulator
as PIO access to make a DM completely arch-agnostic. So "PIO handling"
should be implemented when we add support for vPCI.
xen/ioreq: Use guest_cmpxchg64() instead of cmpxchg()
The cmpxchg() in ioreq_send_buffered() operates on memory shared
with the emulator domain (and the target domain if the legacy
interface is used).
In order to be on the safe side we need to switch
to guest_cmpxchg64() to prevent a domain to DoS Xen on Arm.
The point to use 64-bit version of helper is to support Arm32
since the IOREQ code uses cmpxchg() with 64-bit value.
As there is no plan to support the legacy interface on Arm,
we will have a page to be mapped in a single domain at the time,
so we can use s->emulator in guest_cmpxchg64() safely.
Thankfully the only user of the legacy interface is x86 so far
and there is not concern regarding the atomics operations.
Please note, that the legacy interface *must* not be used on Arm
without revisiting the code.
xen/ioreq: Remove "hvm" prefixes from involved function names
This patch removes "hvm" prefixes and infixes from IOREQ related
function names in the common code and performs a renaming where
appropriate according to the more consistent new naming scheme:
- IOREQ server functions should start with "ioreq_server_"
- IOREQ functions should start with "ioreq_"
A few function names are clarified to better fit into their purposes:
handle_hvm_io_completion -> vcpu_ioreq_handle_completion
hvm_io_pending -> vcpu_ioreq_pending
hvm_ioreq_init -> ioreq_domain_init
hvm_alloc_ioreq_mfn -> ioreq_server_alloc_mfn
hvm_free_ioreq_mfn -> ioreq_server_free_mfn
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> CC: Julien Grall <julien.grall@arm.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
Julien Grall [Fri, 29 Jan 2021 01:48:39 +0000 (03:48 +0200)]
xen/mm: Make x86's XENMEM_resource_ioreq_server handling common
As x86 implementation of XENMEM_resource_ioreq_server can be
re-used on Arm later on, this patch makes it common and removes
arch_acquire_resource (and the corresponding option) as unneeded.
Also re-order #include-s alphabetically.
This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.
Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
xen/ioreq: Move x86's io_completion/io_req fields to struct vcpu
The IOREQ is a common feature now and these fields will be used
on Arm as is. Move them to common struct vcpu as a part of new
struct vcpu_io and drop duplicating "io" prefixes. Also move
enum hvm_io_completion to xen/sched.h and remove "hvm" prefixes.
This patch completely removes layering violation in the common code.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> CC: Julien Grall <julien.grall@arm.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
Julien Grall [Fri, 29 Jan 2021 01:48:37 +0000 (03:48 +0200)]
xen/ioreq: Make x86's IOREQ related dm-op handling common
As a lot of x86 code can be re-used on Arm later on, this patch
moves the IOREQ related dm-op handling to the common code.
The idea is to have the top level dm-op handling arch-specific
and call into ioreq_server_dm_op() for otherwise unhandled ops.
Pros:
- More natural than doing it other way around (top level dm-op
handling common).
- Leave compat_dm_op() in x86 code.
Cons:
- Code duplication. Both arches have to duplicate dm_op(), etc.
Make the corresponding functions static and rename them according
to the new naming scheme (including dropping the "hvm" prefixes).
Introduce common dm.c file as a resting place for the do_dm_op()
(which is identical for both Arm and x86) to minimize code duplication.
The common DM feature is supposed to be built with IOREQ_SERVER
option enabled (as well as the IOREQ feature), which is selected
for x86's config HVM for now.
Also update XSM code a bit to let dm-op be used on Arm.
This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.
Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
xen/ioreq: Move x86's ioreq_server to struct domain
The IOREQ is a common feature now and this struct will be used
on Arm as is. Move it to common struct domain. This also
significantly reduces the layering violation in the common code
(*arch.hvm* usage).
We don't move ioreq_gfn since it is not used in the common code
(the "legacy" mechanism is x86 specific).
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> CC: Julien Grall <julien.grall@arm.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
xen/ioreq: Make x86's hvm_ioreq_(page/vcpu/server) structs common
The IOREQ is a common feature now and these structs will be used
on Arm as is. Move them to xen/ioreq.h and remove "hvm" prefixes.
Also there is no need to include public/hvm/dm_op.h by
asm-x86/hvm/domain.h anymore since #define NR_IO_RANGE_TYPES
(which uses XEN_DMOP_IO_RANGE_PCI) gets moved to another location.
Instead include it by 2 places (p2m-pt.c and p2m-ept.c) which
require that header, but don't directly include it so far.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> CC: Julien Grall <julien.grall@arm.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
The IOREQ is a common feature now and this helper will be used
on Arm as is. Move it to xen/ioreq.h and remove "hvm" prefix.
Although PIO handling on Arm is not introduced with the current series
(it will be implemented when we add support for vPCI), technically
the PIOs exist on Arm (however they are accessed the same way as MMIO)
and it would be better not to diverge now.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> CC: Julien Grall <julien.grall@arm.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
As a lot of x86 code can be re-used on Arm later on, this patch
moves previously prepared IOREQ support to the common code
(the code movement is verbatim copy).
The "legacy" mechanism of mapping magic pages for the IOREQ servers
remains x86 specific and not exposed to the common code.
The common IOREQ feature is supposed to be built with IOREQ_SERVER
option enabled, which is selected for x86's config HVM for now.
In order to avoid having a gigantic patch here, the subsequent
patches will update remaining bits in the common code step by step:
- Make IOREQ related structs/materials common
- Drop the "hvm" prefixes and infixes
- Remove layering violation by moving corresponding fields
out of *arch.hvm* or abstracting away accesses to them
Introduce asm/ioreq.h wrapper to be included by common ioreq.h
instead of asm/hvm/ioreq.h to avoid HVM-ism in the code common.
Also include <xen/domain_page.h> which will be needed on Arm
to avoid touch the common code again when introducing Arm specific bits.
This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> CC: Julien Grall <julien.grall@arm.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
x86/ioreq: Provide out-of-line wrapper for the handle_mmio()
The IOREQ is about to be common feature and Arm will have its own
implementation.
But the name of the function is pretty generic and can be confusing
on Arm (we already have a try_handle_mmio()).
In order not to rename the function (which is used for a varying
set of purposes on x86) globally and get non-confusing variant on Arm
provide a wrapper arch_ioreq_complete_mmio() to be used on common
and Arm code.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Paul Durrant <paul@xen.org> CC: Julien Grall <julien.grall@arm.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
x86/ioreq: Prepare IOREQ feature for making it common
As a lot of x86 code can be re-used on Arm later on, this
patch makes some preparation to x86/hvm/ioreq.c before moving
to the common code. This way we will get a verbatim copy
for a code movement in subsequent patch.
This patch mostly introduces specific hooks to abstract arch
specific materials taking into the account the requirment to leave
the "legacy" mechanism of mapping magic pages for the IOREQ servers
x86 specific and not expose it to the common code.
These hooks are named according to the more consistent new naming
scheme right away (including dropping the "hvm" prefixes and infixes):
- IOREQ server functions should start with "ioreq_server_"
- IOREQ functions should start with "ioreq_"
other functions will be renamed in subsequent patches.
Introduce common ioreq.h right away and put arch hook declarations
there.
Also re-order #include-s alphabetically.
This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> CC: Julien Grall <julien.grall@arm.com>
[On Arm only] Tested-by: Wei Chen <Wei.Chen@arm.com>
Roger Pau Monné [Fri, 29 Jan 2021 16:10:33 +0000 (17:10 +0100)]
x86/pvh: pass module command line to dom0
Both the multiboot and the HVM start info structures allow passing a
string together with a module. Implement the missing support in
pvh_load_kernel so that module strings found in the multiboot
structure are forwarded to dom0.
Fixes: 62ba982424 ('x86: parse Dom0 kernel for PVHv2') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Igor Druzhinin [Fri, 29 Jan 2021 13:18:43 +0000 (14:18 +0100)]
viridian: allow vCPU hotplug for Windows VMs
If Viridian extensions are enabled, Windows wouldn't currently allow
a hotplugged vCPU to be brought up dynamically. We need to expose a special
bit to let the guest know we allow it. Hide it behind an option to stay
on the safe side regarding compatibility with existing guests but
nevertheless set the option on by default.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Igor Druzhinin [Fri, 29 Jan 2021 13:18:01 +0000 (14:18 +0100)]
viridian: remove implicit limit of 64 VPs per partition
TLFS 7.8.1 stipulates that "a virtual processor index must be less than
the maximum number of virtual processors per partition" that "can be obtained
through CPUID leaf 0x40000005". Furthermore, "Requirements for Implementing
the Microsoft Hypervisor Interface" defines that starting from Windows Server
2012, which allowed more than 64 CPUs to be brought up, this leaf can now
contain a value -1 basically assuming the hypervisor has no restriction while
0 (that we currently expose) means the default restriction is still present.
Along with the previous changes exposing ExProcessorMasks this allows a recent
Windows VM with Viridian extension enabled to have more than 64 vCPUs without
going into BSOD in some cases.
Since we didn't expose the leaf before and to keep CPUID data consistent for
incoming streams from previous Xen versions - let's keep it behind an option.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Norbert Kamiński [Tue, 12 Jan 2021 20:27:43 +0000 (21:27 +0100)]
x86: Support booting under Secure Startup via SKINIT
For now, this is simply enough logic to let Xen come up after the bootloader
has executed an SKINIT instruction to begin a Secure Startup.
During a Secure Startup, the BSP operates with the GIF clear (blocks all
external interrupts, even SMI/NMI), and INIT_REDIRECTION active (converts INIT
IPIs to #SX exceptions, if e.g. the platform needs to scrub secrets before
resetting). To afford APs the same Secure Startup protections as the BSP, the
INIT IPI must be skipped, and SIPI must be the first interrupt seen.
Full details are available in AMD APM Vol2 15.27 "Secure Startup with SKINIT"
Introduce skinit_enable_intr() and call it from cpu_init(), next to the
enable_nmis() which performs a related function for tboot startups.
Also introduce ap_boot_method to control the sequence of actions for AP boot.
Signed-off-by: Marek Kasiewicz <marek.kasiewicz@3mdeb.com> Signed-off-by: Norbert Kamiński <norbert.kaminski@3mdeb.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 29 Jan 2021 10:36:54 +0000 (11:36 +0100)]
x86/HVM: re-order error path of hvm_domain_initialise()
hvm_destroy_all_ioreq_servers(), called from
hvm_domain_relinquish_resources(), invokes relocate_portio_handler(),
which uses d->arch.hvm.io_handler. Defer freeing of this array
accordingly on the error path of hvm_domain_initialise().
Similarly rtc_deinit() requires d->arch.hvm.pl_time to still be around,
or else an armed timer structure would get freed, and that timer never
get killed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 29 Jan 2021 10:34:37 +0000 (11:34 +0100)]
memory: bail from page scrubbing when CPU is no longer online
Scrubbing can significantly delay the offlining (parking) of a CPU (e.g.
because of booting into in smt=0 mode), to a degree that the "CPU <n>
still not dead..." messages logged on x86 in 1s intervals can be seen
multiple times. There are no softirqs involved in this process, so
extend the existing preemption check in the scrubbing logic to also exit
when the CPU is no longer observed online.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Fri, 29 Jan 2021 08:09:05 +0000 (09:09 +0100)]
libs/foreignmemory: fix MiniOS build
Keep the dummy handlers for restrict, map_resource and unmap_resource
for MiniOS, or else the build breaks with:
ld: /home/osstest/build.158759.build-amd64/xen/stubdom/mini-os-x86_64-xenstore/mini-os.o: in function `xenforeignmemory_restrict':
/home/osstest/build.158759.build-amd64/xen/stubdom/libs-x86_64/foreignmemory/core.c:137: undefined reference to `osdep_xenforeignmemory_restrict'
ld: /home/osstest/build.158759.build-amd64/xen/stubdom/mini-os-x86_64-xenstore/mini-os.o: in function `xenforeignmemory_map_resource':
/home/osstest/build.158759.build-amd64/xen/stubdom/libs-x86_64/foreignmemory/core.c:171: undefined reference to `osdep_xenforeignmemory_map_resource'
ld: /home/osstest/build.158759.build-amd64/xen/stubdom/mini-os-x86_64-xenstore/mini-os.o: in function `xenforeignmemory_unmap_resource':
/home/osstest/build.158759.build-amd64/xen/stubdom/libs-x86_64/foreignmemory/core.c:185: undefined reference to `osdep_xenforeignmemory_unmap_resource'
ld: /home/osstest/build.158759.build-amd64/xen/stubdom/mini-os-x86_64-xenstore/mini-os.o: in function `xenforeignmemory_resource_size':
/home/osstest/build.158759.build-amd64/xen/stubdom/libs-x86_64/foreignmemory/core.c:200: undefined reference to `osdep_xenforeignmemory_map_resource'
Fixes: 2b4b33ffe7d67 ('libs/foreignmemory: Implement on NetBSD') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
A recent thread [1] has exposed a couple of issues with our current way
of handling EXPERT.
1) It is not obvious that "Configure standard Xen features (expert
users)" is actually the famous EXPERT we keep talking about on xen-devel
2) It is not obvious when we need to enable EXPERT to get a specific
feature
In particular if you want to enable ACPI support so that you can boot
Xen on an ACPI platform, you have to enable EXPERT first. But searching
through the kconfig menu it is really not clear (type '/' and "ACPI"):
nothing in the description tells you that you need to enable EXPERT to
get the option.
So this patch makes things easier by doing two things:
- introduce a new kconfig option UNSUPPORTED which is clearly to enable
UNSUPPORTED features as defined by SUPPORT.md
- change EXPERT options to UNSUPPORTED where it makes sense: keep
depending on EXPERT for features made for experts
- tag unsupported features by adding (UNSUPPORTED) to the one-line
description
Andrew Cooper [Wed, 27 Jan 2021 19:43:32 +0000 (19:43 +0000)]
x86/boot: Drop 'noapic' suggestion from check_timer()
In practice, there is no such thing as a real 64bit system without
APICs. (PVH style virtual environments, sure, but they don't end up here).
The suggestion to try and use noapic only makes a bad situation worse.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 25 Nov 2020 13:22:08 +0000 (13:22 +0000)]
xen-release-management doc: More info on schedule
This documents our practice, established in 2018
https://lists.xen.org/archives/html/xen-devel/2018-07/msg02240.html
et seq
CC: Jürgen Groß <jgross@suse.com> CC: Paul Durrant <xadimgnik@gmail.com> CC: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Ian Jackson <iwj@xenproject.org>
Manuel Bouyer [Tue, 12 Jan 2021 18:12:21 +0000 (19:12 +0100)]
Fix error: array subscript has type 'char'
Use unsigned char variable, or cast to (unsigned char), for
tolower()/islower() and friends. Fix compiler error
array subscript has type 'char' [-Werror=char-subscripts]
Signed-off-by: Manuel Bouyer <bouyer@netbsd.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Thu, 23 Jul 2020 14:58:48 +0000 (15:58 +0100)]
tools/foreignmem: Support querying the size of a resource
With the Xen side of this interface (soon to be) fixed to return real sizes,
userspace needs to be able to make the query.
Introduce xenforeignmemory_resource_size() for the purpose, bumping the
library minor version.
Update both all osdep_xenforeignmemory_map_resource() implementations to
understand size requests, skip the mmap() operation, and copy back the
nr_frames field.
For NetBSD, also fix up the ioctl() error path to issue an unmap(), which was
overlooked by c/s 4a64e2bb39 "libs/foreignmemory: Implement on NetBSD".
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Mon, 26 Oct 2020 15:32:12 +0000 (15:32 +0000)]
x86/ucode: Introduce ucode=allow-same for testing purposes
Many CPUs will actually reload microcode when offered the same version as
currently loaded. This allows for easy testing of the late microcode loading
path.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 26 Oct 2020 15:27:35 +0000 (15:27 +0000)]
x86/ucode/intel: Fix handling of microcode revision
For Intel microcode blobs, the revision field is signed (as documented in the
SDM) and negative revisions are used for pre-production/test microcode (not
documented publicly anywhere I can spot).
Adjust the revision checking to match the algorithm presented here:
This treats pre-production microcode as always applicable, but also production
microcode having higher precedent than pre-production. It is expected that
anyone using pre-production microcode knows what they are doing.
This is necessary to load production microcode on an SDP with pre-production
microcode embedded in firmware.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 6 Aug 2020 12:00:07 +0000 (13:00 +0100)]
x86/timer: Fix boot on Intel systems using ITSSPRC static PIT clock gating
Recent Intel client devices have disabled the legacy PIT for powersaving
reasons, breaking compatibility with a traditional IBM PC. Xen depends on a
legacy timer interrupt to check that the IO-APIC/PIC routing is configured
correctly, and fails to boot with:
(XEN) *******************************
(XEN) Panic on CPU 0:
(XEN) IO-APIC + timer doesn't work! Boot with apic_verbosity=debug and send report. Then try booting with the `noapic` option
(XEN) *******************************
While this setting can be undone by Xen, the details of how to differ by
chipset, and would be very short sighted for battery based devices. See bit 2
"8254 Static Clock Gating Enable" in:
All impacted systems have an HPET, but there is no indication of the absence
of PIT functionality, nor a suitable way to probe for its absence. As a short
term fix, reconfigure the HPET into legacy replacement mode. A better
longterm fix would be to avoid the reliance on the timer interrupt entirely.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 27 Jan 2021 16:08:32 +0000 (17:08 +0100)]
xenstored: fix build on libc without O_CLOEXEC
The call to lu_read_state() would remain unresolved in this case. Frame
the construct by a suitable #ifdef, and while at it also frame command
line handling related pieces similarly.
Fixes: 9777fa6b6ea0 ("tools/xenstore: evaluate the live update flag when starting") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 27 Jan 2021 16:08:14 +0000 (17:08 +0100)]
libxlutil: avoid almost-undefined behavior
While only value computations of an object are disallowed in the
presence of another unsequenced side effect, at least gcc 4.3 looks to
extend this to taking the object's address. The resulting warning causes
the build to fail, because of -Werror.
While there also correct an adjacent comment.
Fixes: bdc0799fe26a ("libxlu: introduce xlu_pci_parse_spec_string()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 27 Jan 2021 16:07:57 +0000 (17:07 +0100)]
libxenguest: drop now unused le32_to_cpup() from lz4 decompression
While gcc doesn't warn about this because of it being static inline,
clang does, causing the build to fail there because of -Werror.
Fixes: d8099d94dfaa ("libxenguest: add get_unaligned_le32()") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 27 Jan 2021 07:47:13 +0000 (08:47 +0100)]
x86/PV: use 64-bit subtract to adjust guest RIP upon missing SYSCALL callbacks
When discussing the shrunk down version of the commit in question it
was said (in reply to my conditional choosing of the width):
"However, the 32bit case isn't actually interesting here. A
guest can't execute a SYSCALL instruction on/across the 4G->0 boundary
because the M2P is mapped NX up to the 4G boundary, so we can never
reach this point with %eip < 2.
Therefore, the 64bit-only form is the appropriate one to use, which
solves any question of cleverness, or potential decode stalls it
causes."
Fixes: ca6fcf4321b3 ("x86/pv: Inject #UD for missing SYSCALL callbacks") Signed-off-by: Jan Beulich <JBeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Plain MSI doesn't allow caching the MSI address and data fields while
the capability is enabled and not masked, hence we need to allow any
changes to those fields to update the binding of the interrupt. For
reference, the same doesn't apply to MSI-X that is allowed to cache
the data and address fields while the entry is unmasked, see section
6.8.3.5 of the PCI Local Bus Specification 3.0.
Allowing such updates means that a guest can write an invalid address
(ie: all zeros) and then a valid one, so the PIRQs shouldn't be
unmapped when the interrupt cannot be bound to the guest, since
further updates to the address or data fields can result in the
binding succeeding.
Modify the vPCI MSI arch helpers to track whether the interrupt is
bound, and make failures in vpci_msi_update not unmap the PIRQ, so
that further calls can attempt to bind the PIRQ again.
Note this requires some modifications to the MSI-X handlers, but there
shouldn't be any functional changes in that area.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 26 Jan 2021 16:42:56 +0000 (17:42 +0100)]
tools/libs: honor build dependencies for recently moved subdirs
While the lack of proper dependency tracking of #include-d files is
wider than just the libs/ subtree, dealing with the problem universally
there or in tools/Rules.mk is too much of a risk at this point in the
release cycle. Add the missing inclusion of $(DEPS_INCLUDE) only in the
specific Makefile-s, after having checked that their prior Makefile-s
had such includes.
Interestingly the $(DEPS_RM) use is present in tools/libs/libs.mk's
clean target, so doesn't need taking care of in individual Makefile-s.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org> Release-acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Tue, 26 Jan 2021 13:42:23 +0000 (14:42 +0100)]
xen/include: compat/xlat.h may change with .config changes
$(xlat-y) getting derived from $(headers-y) means its contents may
change with changes to .config. The individual files $(xlat-y) refers
to, otoh, may not change, and hence not trigger rebuilding of xlat.h.
(Note that the issue was already present before the commit referred to
below, but it was far more limited in affecting only changes to
CONFIG_XSM_FLASK.)
Fixes: 2c8fabb2232d ("x86: only generate compat headers actually needed") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Add a DOMPRINTF() other methods have, indicating success. To facilitate
this, introduce an "outsize" local variable and update *size as well as
*blob only once done. The latter then also avoids leaving a pointer to
freed memory in dom->kernel_blob in case of a decompression error.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 26 Jan 2021 13:16:34 +0000 (14:16 +0100)]
libxenguest: support zstd compressed kernels
This follows the logic used for other decompression methods utilizing an
external library, albeit here we can't ignore the 32-bit size field
appended to the compressed image - its presence causes decompression to
fail. Leverage the field instead to allocate the output buffer in one
go, i.e. without incrementally realloc()ing.
As far as configure.ac goes, I'm pretty sure there is a better (more
"standard") way of using PKG_CHECK_MODULES(). The construct also gets
put next to the other decompression library checks, albeit I think they
all ought to be x86-specific (e.g. placed in the existing case block a
few lines down).
Note that, where possible, instead of #ifdef-ing xen/*.h inclusions,
they get removed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 26 Jan 2021 13:14:39 +0000 (14:14 +0100)]
libxenguest: add get_unaligned_le32()
Abstract xc_dom_check_gzip()'s reading of the uncompressed size into a
helper re-usable, in particular, by other decompressor code.
Sadly in the mini-os case this conflicts with other functions of the
same name (and purpose), which can't be easily replaced individually.
Yet it was requested that no full set of helpers be introduced at this
point in the release cycle. Hence the awkward XG_NEED_UNALIGNED.
Requested-by: Ian Jackson <iwj@xenproject.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 26 Jan 2021 13:13:18 +0000 (14:13 +0100)]
x86/shadow: use __put_user() instead of __copy_to_user()
In a subsequent patch I would almost have broken the logic here, if I
hadn't happened to read through the comment at the top of
safe_write_entry(): __copy_from_user() does not provide a guarantee
shadow_write_entries() requires - it's only an optimization that it
makes use of __put_user_size() for certain sizes. Use __put_user()
directly, which does expand to a single (memory accessing) insn.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Roger Pau Monne [Tue, 29 Dec 2020 16:58:01 +0000 (17:58 +0100)]
x86/msr: Don't inject #GP when trying to read FEATURE_CONTROL
Windows 10 will triple fault if #GP is injected when attempting to
read the FEATURE_CONTROL MSR on Intel or compatible hardware. Fix this
by injecting a #GP only when the vendor doesn't support the MSR, even
if there are no features to expose.
Fixes: 39ab598c50a2 ('x86/pv: allow reading FEATURE_CONTROL MSR') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Extended comment] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 26 Jun 2020 10:32:00 +0000 (11:32 +0100)]
x86/pv: Inject #UD for missing SYSCALL callbacks
Despite appearing to be a deliberate design choice of early PV64, the
resulting behaviour for unregistered SYSCALL callbacks creates an untenable
testability problem for Xen. Furthermore, the behaviour is undocumented,
bizarre, and inconsistent with related behaviour in Xen, and very liable
introduce a security vulnerability into a PV guest if the author hasn't
studied Xen's assembly code in detail.
There are two different bugs here.
1) The current logic confuses the registered entrypoints, and may deliver a
SYSCALL from 32bit userspace to the 64bit entry, when only a 64bit
entrypoint is registered.
This has been the case ever since 2007 (c/s cd75d47348b) but up until
2018 (c/s dba899de14) the wrong selectors would be handed to the guest for
a 32bit SYSCALL entry, making it appear as if it a 64bit entry all along.
Xen would malfunction under these circumstances, if it were a PV guest.
Linux would as well, but PVOps has always registered both entrypoints and
discarded the Xen-provided selectors. NetBSD really does malfunction as a
consequence (benignly now, but a VM DoS before the 2018 Xen selector fix).
2) In the case that neither SYSCALL callbacks are registered, the guest will
be crashed when userspace executes a SYSCALL instruction, which is a
userspace => kernel DoS.
This has been the case ever since the introduction of 64bit PV support, but
behaves unlike all other SYSCALL/SYSENTER callbacks in Xen, which yield
#GP/#UD in userspace before the callback is registered, and are therefore
safe by default.
This change does constitute a change in the PV ABI, for corner cases of a PV
guest kernel registering neither callback, or not registering the 32bit
callback when running on AMD/Hygon hardware.
It brings the behaviour in line with PV32 SYSCALL/SYSENTER, and PV64
SYSENTER (safe by default, until explicitly enabled), as well as native
hardware (always delivered to the single applicable callback).
Most importantly however, and the primary reason for the change, is that it
lets us sensibly test the fast system call entrypoints under all states a PV
guest can construct, to prove correct behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Paul Durrant [Thu, 8 Oct 2020 18:57:31 +0000 (19:57 +0100)]
docs/migration: add missing definitions to libxc-migration-stream
The STATIC_DATA_END, X86_CPUID_POLICY and X86_MSR_POLICY record types have
sections explaining what they are but their values are not defined. Indeed
their values are defined as "Reserved for future mandatory records."
Also, the spec revision is adjusted to match the migration stream version
and an END record is added to the description of a 'typical save record for
and x86 HVM guest.'
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Fixes: 6f71b5b1506 ("docs/migration Specify migration v3 and STATIC_DATA_END") Fixes: ddd273d8863 ("docs/migration: Specify X86_{CPUID,MSR}_POLICY records") Acked-by: Wei Liu <wl@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Juergen Gross [Mon, 25 Jan 2021 07:23:31 +0000 (08:23 +0100)]
tools/xenstore: fix use after free bug in xenstore_control
There is a very unlikely use after free bug and a memory leak in
live_update_start() of xenstore_control. Fix those.
Coverity-Id: 1472399 Fixes: 7f97193e6aa858 ("tools/xenstore: add live update command to xenstore-control") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Rahul Singh [Fri, 22 Jan 2021 11:37:19 +0000 (11:37 +0000)]
xen/arm: smmuv3: Add support for SMMUv3 driver
Add support for ARM architected SMMUv3 implementation. It is based on
the Linux SMMUv3 driver.
Driver is currently supported as Tech Preview.
Major differences with regard to Linux driver are as follows:
2. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
3. Use P2M page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
4. Tasklets are used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.
5. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.
Atomic functions used by the commands queue access functions are not
implemented in XEN therefore we decided to port the earlier version
of the code. Atomic operations are introduced to fix the bottleneck
of the SMMU command queue insertion operation. A new algorithm for
inserting commands into the queue is introduced, which is lock-free
on the fast-path.
Consequence of reverting the patch is that the command queue
insertion will be slow for large systems as spinlock will be used to
serializes accesses from all CPUs to the single queue supported by
the hardware. Once the proper atomic operations will be available in
XEN the driver can be updated.
6. Spin lock is used in place of mutex when attaching a device to the
SMMU, as there is no blocking locks implementation available in XEN.
This might introduce latency in XEN. Need to investigate before
driver is out for tech preview.
7. PCI ATS functionality is not supported, as there is no support
available in XEN to test the functionality. Code is not tested and
compiled. Code is guarded by the flag CONFIG_PCI_ATS.
8. MSI interrupts are not supported as there is no support available in
XEN to request MSI interrupts. Code is not tested and compiled. Code
is guarded by the flag CONFIG_MSI.
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.
Rahul Singh [Wed, 20 Jan 2021 14:52:41 +0000 (14:52 +0000)]
xen/compiler: import 'fallthrough' keyword from linux
-Wimplicit-fallthrough warns when a switch case falls through. Warning
can be suppress by either adding a /* fallthrough */ comment, or by
using a null statement: __attribute__ ((fallthrough))
Define the pseudo keyword 'fallthrough' for the ability to convert the
various case block /* fallthrough */ style comments to null statement
"__attribute__((__fallthrough__))"
In C mode, GCC supports the __fallthrough__ attribute since 7.1,
the same time the warning and the comment parsing were introduced.
fallthrough devolves to an empty "do {} while (0)" if the compiler
version (any version less than gcc 7) does not support the attribute.
Rahul Singh [Wed, 20 Jan 2021 14:52:36 +0000 (14:52 +0000)]
xen/arm: Revert atomic operation related command-queue insertion patch
Linux SMMUv3 code implements the commands-queue insertion based on
atomic operations implemented in Linux. Atomic functions used by the
commands-queue insertion are not implemented in XEN therefore revert the
patch that implemented the commands-queue insertion based on atomic
operations.
Reverted the other patches also that are implemented based on the code
that introduced the atomic-operations.
Atomic operations are introduced in the patch "iommu/arm-smmu-v3: Reduce
contention during command-queue insertion" that fixed the bottleneck of
the SMMU command queue insertion operation. A new algorithm for
inserting commands into the queue is introduced in this patch, which is
lock-free on the fast-path.
Consequence of reverting the patch is that the command queue insertion
will be slow for large systems as spinlock will be used to serializes
accesses from all CPUs to the single queue supported by the hardware.
Once the proper atomic operations will be available in XEN the driver
can be updated.
Directory structure change for the SMMUv3 driver starting from
Linux 5.9, to revert the patches smoothly using the "git revert" command
we decided to choose Linux 5.8.18.
Only difference between latest stable Linux 5.9.12 and Linux 5.8.18
SMMUv3 driver is the use of the "fallthrough" keyword. This patch will
be merged once "fallthrough" keyword implementation is available in XEN.
It's a copy of the Linux SMMUv3 driver. Xen specific code has not
been added yet and code has not been compiled.
xen/arm: mm: Remove special case for CPU0 in dump_hyp_walk()
There is no need to have a special case for CPU0 when converting the
page-table virtual address into a physical address. The helper
virt_to_maddr() is able to translate any address as long as the root
page-tables is mapped in the virtual address. This is the case for all
the CPUs at the moment.
Juergen Gross [Sat, 16 Jan 2021 10:33:39 +0000 (11:33 +0100)]
xen: add support for automatic debug key actions in case of crash
When the host crashes it would sometimes be nice to have additional
debug data available which could be produced via debug keys, but
halting the server for manual intervention might be impossible due to
the need to reboot/kexec rather sooner than later.
Add support for automatic debug key actions in case of crashes which
can be activated via boot- or runtime-parameter.
Depending on the type of crash the desired data might be different, so
support different settings for the possible types of crashes.
The parameter is "crash-debug" with the following syntax:
crash-debug-<type>=<string>
with <type> being one of:
panic, hwdom, watchdog, kexeccmd, debugkey
and <string> a sequence of debug key characters with '+' having the
special semantics of a 10 millisecond pause.
So "crash-debug-watchdog=0+0qr" would result in special output in case
of watchdog triggered crash (dom0 state, 10 ms pause, dom0 state,
domain info, run queues).
Don't call key handlers in early boot, as some (e.g. for 'd') require
some initializations to be finished, like scheduler or idle domain.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Sat, 16 Jan 2021 10:33:38 +0000 (11:33 +0100)]
xen: enable keyhandlers to work without register set specified
There are only two keyhandlers which make use of the cpu_user_regs
struct passed to them. In order to be able to call any keyhandler in
non-interrupt contexts, too, modify those two handlers to cope with a
NULL regs pointer by using run_in_exception_handler() in that case.
Juergen Gross [Sat, 16 Jan 2021 10:33:37 +0000 (11:33 +0100)]
xen/arm: add support for run_in_exception_handler()
Add support to run a function in an exception handler for Arm. Do it
as on x86 via a bug_frame, but pass the function pointer via a
register.
This needs to be done that way because GCC will not allow to use
"i" when PIE is enabled (Xen doesn't set the flag but instead rely on
the default value from the compiler).
Use the same BUGFRAME_* #defines as on x86 in order to make a future
common header file more easily achievable.
Signed-off-by: Juergen Gross <jgross@suse.com>
[ julien: Add more details on the issue between "i" and -fpie ] Acked-by: Julien GralL <jgrall@amazon.com>
Remove copy/paste error introduced by f58976544ff4 ("automation: use
test-artifacts/qemu-system-aarch64 instead of Debian's")
Fixes: f58976544ff4 ("automation: use test-artifacts/qemu-system-aarch64 instead of Debian's") Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>