Jan Beulich [Wed, 25 Aug 2021 12:15:11 +0000 (14:15 +0200)]
AMD/IOMMU: correct device unity map handling
Blindly assuming all addresses between any two such ranges, specified by
firmware in the ACPI tables, should also be unity-mapped can't be right.
Nor can it be correct to merge ranges with differing permissions. Track
ranges individually; don't merge at all, but check for overlaps instead.
This requires bubbling up error indicators, such that IOMMU init can be
failed when allocation of a new tracking struct wasn't possible, or an
overlap was detected.
At this occasion also stop ignoring
amd_iommu_reserve_domain_unity_map()'s return value.
This is part of XSA-378 / CVE-2021-28695.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Wed, 25 Aug 2021 12:12:13 +0000 (14:12 +0200)]
AMD/IOMMU: correct global exclusion range extending
Besides unity mapping regions, the AMD IOMMU spec also provides for
exclusion ranges (areas of memory not to be subject to DMA translation)
to be specified by firmware in the ACPI tables. The spec does not put
any constraints on the number of such regions.
Blindly assuming all addresses between any two such ranges should also
be excluded can't be right. Since hardware has room for just a single
such range (comprised of the Exclusion Base Register and the Exclusion
Range Limit Register), combine only adjacent or overlapping regions (for
now; this may require further adjustment in case table entries aren't
sorted by address) with matching exclusion_allow_all settings. This
requires bubbling up error indicators, such that IOMMU init can be
failed when concatenation wasn't possible.
Furthermore, since the exclusion range specified in IOMMU registers
implies R/W access, reject requests asking for less permissions (this
will be brought closer to the spec by a subsequent change).
This is part of XSA-378 / CVE-2021-28695.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Michal Orzel [Fri, 20 Aug 2021 09:39:24 +0000 (11:39 +0200)]
xen/public: arch-arm: Add mention of argo_op hypercall
Commit 1ddc0d43c20cb1c1125d4d6cefc78624b2a9ccb7 introducing
argo_op hypercall forgot to add a mention of it in the
comment listing supported hypercalls. Fix that.
Signed-off-by: Michal Orzel <michal.orzel@arm.com> Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com> Acked-by: Julien Grall <jgrall@amazon.com>
When a device is assigned/de-assigned it is required to properly set
IOMMU domain used to protect the device. This assignment was missing,
thus it was not possible to de-assign the device:
(XEN) Deassigning device 0000:03:00.0 from dom2
(XEN) smmu: 0000:03:00.0: not attached to domain 2
(XEN) d2: deassign (0000:03:00.0) failed (-3)
Fix this by assigning IOMMU domain on arm_smmu_assign_dev and reset it
to NULL on arm_smmu_deassign_dev.
ns16550: properly gate Exar PCIe UART cards support
Arm is about to get PCI passthrough support which means CONFIG_HAS_PCI
will be enabled, so this code will fail as Arm doesn't have ns16550
PCI support:
ns16550.c:313:5: error: implicit declaration of function 'enable_exar_enhanced_bits' [-Werror=implicit-function-declaration]
313 | enable_exar_enhanced_bits(uart);
| ^~~~~~~~~~~~~~~~~~~~~~~~~
Fix this by gating Exar PCIe UART cards support with the above in mind.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 20 Aug 2021 10:30:35 +0000 (12:30 +0200)]
AMD/IOMMU: don't leave page table mapped when unmapping ...
... an already not mapped page. With all other exit paths doing the
unmap, I have no idea how I managed to miss that aspect at the time.
Fixes: ad591454f069 ("AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Besides standard UART setup, this device needs enabling
(vendor-specific) "Enhanced Control Bits" - otherwise disabling hardware
control flow (MCR[2]) is ignored. Add appropriate quirk to the
ns16550_setup_preirq(), similar to the handle_dw_usr_busy_quirk(). The
new function act on Exar 2-, 4-, and 8- port cards only. I have tested
the functionality on 2-port card but based on the Linux driver, the same
applies to other models too.
Additionally, Exar card supports fractional divisor (DLD[3:0] register,
at 0x02). This part is not supported here yet, and seems to not
be required for working 115200bps at the very least.
The specification for the 2-port card is available at:
https://www.maxlinear.com/product/interface/uarts/pcie-uarts/xr17v352
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 20 Aug 2021 10:28:07 +0000 (12:28 +0200)]
x86/PV: account for 32-bit Dom0 in mark_pv_pt_pages_rdonly()'s ASSERT()s
Clearly I neglected the special needs here, and also failed to test the
change with a debug build of Xen.
Fixes: 6b1ca51b1a91 ("x86/PV: assert page state in mark_pv_pt_pages_rdonly()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jane Malalane [Tue, 17 Aug 2021 15:19:24 +0000 (16:19 +0100)]
libs/guest: Move the guest ABI check earlier into xc_dom_parse_image()
Xen may not support 32-bit PV guest for a number of reasons (lack of
CONFIG_PV32, explicit pv=no-32 command line argument, or implicitly
due to CET being enabled) and advertises this to the toolstack via the
absence of xen-3.0-x86_32p ABI.
Currently, when trying to boot a 32-bit PV guest, the ABI check is too
late and the build explodes in the following manner yielding an
unhelpful error message:
xc: error: panic: xg_dom_boot.c:121: xc_dom_boot_mem_init: can't allocate low memory for domain: Out of memory
libxl: error: libxl_dom.c:586:libxl__build_dom: xc_dom_boot_mem_init failed: Operation not supported
libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 1:Non-existant domain
libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 1:Unable to destroy guest
libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 1:Destruction of domain failed
Move the ABI check earlier into xc_dom_parse_image() along with other
ELF-note feature checks. With this adjustment, it now looks like
this:
xc: error: panic: xg_dom_boot.c:88: xc_dom_compat_check: guest type xen-3.0-x86_32p not supported by xen kernel, sorry: Invalid kernel
libxl: error: libxl_dom.c:571:libxl__build_dom: xc_dom_parse_image failed
domainbuilder: detail: xc_dom_release: called
libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 11:cannot (re-)build domain: -3
libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 11:Non-existant domain
libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 11:Unable to destroy guest
libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 11:Destruction of domain failed
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <iwj@xenproject.org>
Juergen Gross [Thu, 19 Aug 2021 11:38:31 +0000 (13:38 +0200)]
xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.
Fix that by letting get_cpu_idle_time() deal with this case. Drop a
redundant check in exchange.
Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Juergen Gross <jgross@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Dario Faggioli <dfaggioli@suse.com>
Jan Beulich [Thu, 19 Aug 2021 11:37:42 +0000 (13:37 +0200)]
Arm: relax iomem_access_permitted() check
Ranges checked by iomem_access_permitted() are inclusive; to permit a
mapping there's no need for access to also have been granted for the
subsequent page.
Fixes: 80f9c3167084 ("xen/arm: acpi: Map MMIO on fault in stage-2 page table for the hardware domain") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Thu, 19 Aug 2021 11:36:54 +0000 (13:36 +0200)]
x86: mark compat hypercall regs clobbering for intended fall-through
Oddly enough in the original report Coverity only complained about the
native hypercall related switch() statements. Now that it has seen those
fixed, it complains about (only HVM) compat ones. Hence the CIDs below
are all for the HVM side of things, yet while at it take care of the PV
side as well.
Coverity-ID: 1487105, 1487106, 1487107, 1487108, 1487109. Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 18 Aug 2021 07:44:14 +0000 (09:44 +0200)]
VT-d: Tylersburg errata apply to further steppings
While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
spec update, X58's also mentions B2, and searching the internet suggests
systems with this stepping are actually in use. Even worse, for X58
erratum #69 is marked applicable even to C2. Split the check to cover
all applicable steppings and to also report applicable errata numbers in
the log message. The splitting requires using the DMI port instead of
the System Management Registers device, but that's then in line (also
revision checking wise) with the spec updates.
Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Wed, 18 Aug 2021 07:40:08 +0000 (09:40 +0200)]
x86/PV: assert page state in mark_pv_pt_pages_rdonly()
About every time I look at dom0_construct_pv()'s "calculation" of
nr_pt_pages I question (myself) whether the result is precise or merely
an upper bound. I think it is meant to be precise, but I think we would
be better off having some checking in place. Hence add ASSERT()s to
verify that
- all pages have a valid L1...Ln (currently L4) page table type and
- no other bits are set, in particular the type refcount is still zero.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com>
Jan Beulich [Wed, 18 Aug 2021 07:39:08 +0000 (09:39 +0200)]
x86/PV: suppress unnecessary Dom0 construction output
v{xenstore,console}_{start,end} can only ever be zero in PV shim
configurations. Similarly reporting just zeros for an unmapped (or
absent) initrd is not useful. Particularly in case video is the only
output configured, space is scarce: Split the printk() and omit lines
carrying no information at all.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 17 Aug 2021 10:38:07 +0000 (11:38 +0100)]
x86/cet: Fix build on newer versions of GCC
Some versions of GCC complain with:
traps.c:405:22: error: 'get_shstk_bottom' defined but not used [-Werror=unused-function]
static unsigned long get_shstk_bottom(unsigned long sp)
^~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Change #ifdef to if ( IS_ENABLED(...) ) to make the sole user of
get_shstk_bottom() visible to the compiler.
Fixes: 35727551c070 ("x86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Compile-tested-by: Jan Beulich <jbeulich@suse.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Thu, 12 Aug 2021 16:39:16 +0000 (17:39 +0100)]
x86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}
This was a clear oversight in the original CET work. The BUGFRAME_run_fn and
BUGFRAME_warn paths update regs->rip without an equivalent adjustment to the
shadow stack, causing IRET to suffer #CP because of the mismatch.
One subtle, and therefore fragile, aspect of extable_shstk_fixup() was that it
required regs->rip to have its old value as a cross-check that the right word
in the shadow stack was being edited.
Rework extable_shstk_fixup() into fixup_exception_return() which takes
ownership of the update to both the regular and shadow stacks, ensuring that
the regs->rip update is ordered correctly.
Use the new fixup_exception_return() for BUGFRAME_run_fn and BUGFRAME_warn to
ensure that the shadow stack is updated too.
Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible") Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 16 Aug 2021 13:24:44 +0000 (14:24 +0100)]
x86/ACPI: Insert missing newlines into FACS error messages
Booting Xen as a PVH guest currently yields:
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:b004,1:0], pm1x_evt[1:b000,1:0]
(XEN) ACPI: FACS is not 64-byte aligned: 0xfc001010<2>ACPI: wakeup_vec[fc00101c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee00000
Insert newlines as appropriate.
Fixes: d3faf9badf52 ("[host s3] Retrieve necessary sleep information from plain-text ACPI tables (FADT/FACS), and keep one hypercall remained for sleep notification.") Fixes: 0f089bbf43ec ("x86/ACPI: fix S3 wakeup vector mapping") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Kevin Stefanov [Mon, 16 Aug 2021 13:16:56 +0000 (15:16 +0200)]
x86/ioapic: remove use of TRUE/FALSE/1/0
Also fix stray usage in VT-d.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Kevin Stefanov <kevin.stefanov@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jane Malalane [Mon, 16 Aug 2021 13:16:20 +0000 (15:16 +0200)]
x86/pv: provide more helpful error when CONFIG_PV32 is absent
Currently, when booting a 32bit dom0 kernel, the message isn't very
helpful:
(XEN) Xen kernel: 64-bit, lsb
(XEN) Dom0 kernel: 32-bit, PAE, lsb, paddr 0x100000 -> 0x112000
(XEN) Mismatch between Xen and DOM0 kernel
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not construct domain 0
(XEN) ****************************************
With this adjustment, it now looks like this:
(XEN) Xen kernel: 64-bit, lsb
(XEN) Found 32-bit PV kernel, but CONFIG_PV32 missing
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not construct domain 0
(XEN) ****************************************
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
ns16550: do not override fifo size if explicitly set
If fifo size is already set via uart_params, do not force it to 16 - which
may not match the actual hardware. Specifically Exar cards have fifo of
256 bytes.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 13 Aug 2021 14:50:09 +0000 (16:50 +0200)]
libxc: simplify HYPERCALL_BUFFER()
_hcbuf_buf1 has been there only for a pointer comparison to validate
type compatibility. The same can be achieved by not using typeof() on
the definition of what so far was _hcbuf_buf2, as the initializer has
to also be type-compatible. Drop _hcbuf_buf1 and the comaprison;
rename _hcbuf_buf2.
Since we're already using compiler extensions here, don't be shy and
also omit the middle operand of the involved ?: operator.
Bring line continuation character placement in line with that of
related macros.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 13 Aug 2021 14:49:10 +0000 (16:49 +0200)]
libxenguest: complete loops in xc_map_domain_meminfo()
minfo->p2m_size may have more than 31 significant bits. Change the
induction variable to unsigned long, and (largely for signed-ness
consistency) a helper variable to unsigned int. While there also avoid
open-coding min().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jane Malalane [Thu, 12 Aug 2021 15:14:25 +0000 (17:14 +0200)]
xen/bitmap: don't open code DIV_ROUND_UP()
Also, change bitmap_long_to_byte() and bitmap_byte_to_long() to take
'unsigned int' instead of 'int' number of bits, to match the type of
their callers.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Kevin Stefanov [Thu, 12 Aug 2021 15:10:23 +0000 (17:10 +0200)]
kexec: remove use of TRUE/FALSE
Whilst fixing this, also changed bool_t to bool, and use __read_mostly.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Kevin Stefanov <kevin.stefanov@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jane Malalane [Tue, 10 Aug 2021 07:29:52 +0000 (09:29 +0200)]
bitmap: make bitmap_long_to_byte() and bitmap_byte_to_long() static
Functions made static as there are no external callers.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Dario Faggioli [Tue, 10 Aug 2021 07:29:10 +0000 (09:29 +0200)]
credit2: avoid picking a spurious idle unit when caps are used
Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.
In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have enough budget for being run,
we should continue looking instead than giving up and picking the idle
unit.
Suggested-by: George Dunlap <george.dunlap@citrix.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/arm: Do not invalidate the P2M when the PT is shared with the IOMMU
Set/Way flushes never work correctly in a virtualized environment.
Our current implementation is based on clearing the valid bit in the p2m
pagetable to track guest memory accesses. This technique doesn't work
when the IOMMU is enabled for the domain and the pagetable is shared
between IOMMU and MMU because it triggers IOMMU faults.
Specifically, p2m_invalidate_root causes IOMMU faults if
iommu_use_hap_pt returns true for the domain.
Add a check in p2m_set_way_flush: if a set/way instruction is used
and iommu_use_hap_pt returns true, rather than failing with obscure
IOMMU faults, inject an undef exception straight away into the guest,
and print a verbose error message to explain the problem.
Also add an ASSERT in p2m_invalidate_root to make sure we don't
inadvertently stumble across this problem again in the future.
Brian Woods [Tue, 3 Aug 2021 00:24:09 +0000 (17:24 -0700)]
arm,smmu: add support for generic DT bindings. Implement add_device and dt_xlate.
For the legacy path, arm_smmu_dt_add_device_legacy is called by
register_smmu_master scanning mmu-masters (a fwspec entry is also
created.) For the generic path, arm_smmu_dt_add_device_generic gets
called instead. Then, arm_smmu_dt_add_device_generic calls
arm_smmu_dt_add_device_legacy afterwards, shared with the legacy path.
This way most of the low level implementation is shared between the two
paths.
If both legacy bindings and generic bindings are present in device tree,
the legacy bindings are the ones that are used. That's because
mmu-masters is parsed by
xen/drivers/passthrough/arm/smmu.c:arm_smmu_device_dt_probe which is
called by arm_smmu_dt_init. It happens very early. iommus is parsed by
xen/drivers/passthrough/device_tree.c:iommu_add_dt_device which is
called by xen/arch/arm/domain_build.c:handle_device and happens
afterwards.
arm_smmu_dt_xlate_generic is a verbatim copy from Linux
(drivers/iommu/arm/arm-smmu/arm-smmu.c:arm_smmu_of_xlate, version
v5.10).
A workaround was introduced by cf4af9d6d6c (xen/arm: boot with device
trees with "mmu-masters" and "iommus") because the SMMU driver only
supported the legacy bindings. Remove it now.
Brian Woods [Tue, 3 Aug 2021 00:24:08 +0000 (17:24 -0700)]
arm,smmu: restructure code in preparation to new bindings support
Restructure some of the code and add supporting functions for adding
generic device tree (DT) binding support. This will allow for using
current Linux device trees with just modifying the chosen field to
enable Xen.
Brian Woods [Tue, 3 Aug 2021 00:24:06 +0000 (17:24 -0700)]
arm,smmu: switch to using iommu_fwspec functions
Modify the smmu driver so that it uses the iommu_fwspec helper
functions. This means both ARM IOMMU drivers will both use the
iommu_fwspec helper functions, making enabling generic device tree
bindings in the SMMU driver much cleaner.
xen: do not return -EEXIST if iommu_add_dt_device is called twice
iommu_add_dt_device() returns -EEXIST if the device was already
registered. At the moment, this can only happen if the device was
already assigned to a domain (either dom0 at boot or via
XEN_DOMCTL_assign_device).
In a follow-up patch, we will convert the SMMU driver to use the FW
spec. When the legacy bindings are used, all the devices will be
registered at probe. Therefore, iommu_add_dt_device() will always
returns -EEXIST.
Currently, one caller (XEN_DOMCTL_assign_device) will check the return
and ignore -EEXIST. All the other will fail because it was technically a
programming error.
However, there is no harm to call iommu_add_dt_device() twice, so we can
simply return 0.
With that in place the caller doesn't need to check -EEXIST anymore, so
remove the check.
tools/xenstored: Don't assume errno will not be overwritten in lu_arch()
At the moment, do_control_lu() will set errno to 0 before calling
lu_arch() and then check errno. The expectation is nothing in lu_arch()
will change the value unless there is an error.
However, per errno(3), a function that succeeds is allowed to change
errno. In fact, syslog() will overwrite errno if the logs are rotated
at the time it is called.
To prevent any further issue, errno is now always set before
returning NULL.
Additionally, errno is only checked when returning NULL so the client
can see the error message if there is any.
tools/xenstored: Propagate correctly the error message from lu_start()
lu_start() will only set errno when it returns NULL. For all the
other cases, the value is unknown.
This means that when lu_start() returns an error message, it may not
be propagated to the client.
The check that errno is a non-zero value is now dropped and instead
the value is returned when no error message is provided. This
relies on errno to always be set when ret == NULL.
Fixes: af216a99fb ("tools/xenstore: add the basic framework for doing the live update") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com>
tools/xenstored: Fix off-by-one in dump_state_nodes()
The maximum path length supported by Xenstored protocol is
XENSTORE_ABS_PATH_MAX (i.e 3072). This doesn't take into account the
NUL at the end of the path.
However, the code to dump the nodes will allocate a buffer
of XENSTORE_ABS_PATH. As a result it may not be possible to live-update
if there is a node name of XENSTORE_ABS_PATH.
Fix it by allocating a buffer of XENSTORE_ABS_PATH_MAX + 1 characters.
Take the opportunity to pass the max length of the buffer as a
parameter of dump_state_node_tree(). This will be clearer that the
check in the function is linked to the allocation in dump_state_nodes().
Jane Malalane [Tue, 27 Jul 2021 18:47:15 +0000 (19:47 +0100)]
xen/lib: Fix strcmp() and strncmp()
The C standard requires that each character be compared as unsigned
char. Xen's current behaviour compares as signed char, which changes
the answer when chars with a value greater than 0x7f are used.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Reviewed-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 22 Jul 2021 09:20:38 +0000 (11:20 +0200)]
x86: work around build issue with GNU ld 2.37
I suspect it is commit 40726f16a8d7 ("ld script expression parsing")
which broke the hypervisor build, by no longer accepting section names
with a dash in them inside ADDR() (and perhaps other script directives
expecting just a section name, not an expression): .note.gnu.build-id
is such a section.
Quoting all section names passed to ADDR() via DECL_SECTION() works
around the regression.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Mon, 19 Jul 2021 13:48:45 +0000 (14:48 +0100)]
tools/firmware/ovmf: Use OvmfXen platform file if exist and update OVMF
A platform introduced in EDK II named OvmfXen is now the one to use for
Xen instead of OvmfX64. It comes with PVH support.
Also, the Xen support in OvmfX64 is deprecated,
"deprecation notice: *dynamic* multi-VMM (QEMU vs. Xen) support in OvmfPkg"
https://edk2.groups.io/g/devel/message/75498
and has been removed upstream.
We need to also update to a newer version of OVMF as OvmfXen in the
release "edk2-stable202105" doesn't work well with Xen, so we need the
fix b37cfdd28071 ("OvmfPkg/XenPlatformPei: Relocate shared_info page
mapping").
Also, don't set anymore the number of thread for parallel build when
building the newer platform, OvmfPkg/build.sh is now doing parallel
build by default.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <iwj@xenproject.org>
Scott Davis [Thu, 22 Jul 2021 16:54:30 +0000 (12:54 -0400)]
tools/xl: Add stubdomain_cmdline option to xl.cfg
This adds an option to the xl domain configuration file syntax for specifying
a kernel command line for device-model stubdomains. It is intended for use with
Linux-based stubdomains.
Signed-off-by: Scott Davis <scott.davis@starlab.io> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <iwj@xenproject.org>
Igor Druzhinin [Tue, 13 Jul 2021 01:31:41 +0000 (02:31 +0100)]
tools/libxc: use uint32_t for pirq in xc_domain_irq_permission
Current unit8_t for pirq argument in this interface is too restrictive
causing failures on modern hardware with lots of GSIs. That extends down to
XEN_DOMCTL_irq_permission ABI structure where it needs to be fixed up
as well.
Internal Xen structures appear to be fine. Existing users of the interface
in tree (libxl, ocaml and python bindings) are currently using signed int
for pirq representation which should be wide enough. Converting them to
uint32_t now is desirable to avoid accidental passing of a negative
number (probably denoting an error code) by caller as pirq, but left for
the future clean up.
Domctl interface version is needed to be bumped with this change but that
was already done by 918b8842a8 ("arm64: Change type of hsr, cpsr, spsr_el1
to uint64_t") in this release cycle.
Additionally, take a change and convert allow_access argument to bool.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
The last place in code making use of READ/WRITE_SYSREG32
on arm64 is in TVM_REG macro defining functions vreg_emulate_<register>.
Implement a macro WRITE_SYSREG_SZ which expands as follows:
-on arm64: WRITE_SYSREG
-on arm32: WRITE_SYSREG{32/64}
As there are no other places in the code using these helpers
on arm64 - remove them.
Andrew Cooper [Mon, 19 Jul 2021 10:44:06 +0000 (11:44 +0100)]
x86/hvm: Propagate real error information up through hvm_load()
hvm_load() is currently a mix of -errno and -1 style error handling, which
aliases -EPERM. This leads to the following confusing diagnostics:
From userspace:
xc: info: Restoring domain
xc: error: Unable to restore HVM context (1 = Operation not permitted): Internal error
xc: error: Restore failed (1 = Operation not permitted): Internal error
xc_domain_restore: [1] Restore failed (1 = Operation not permitted)
From Xen:
(XEN) HVM10.0 restore: inconsistent xsave state (feat=0x2ff accum=0x21f xcr0=0x7 bv=0x3 err=-22)
(XEN) HVM10 restore: failed to load entry 16/0
The actual error was a bad backport, but the -EINVAL got converted to -EPERM
on the way out of the hypercall.
The overwhelming majority of *_load() handlers already use -errno consistenty.
Fix up the rest to be consistent, and fix a few other errors noticed along the
way.
* Failures of hvm_load_entry() indicate a truncated record or other bad data
size. Use -ENODATA.
* Don't use {g,}dprintk(). Omitting diagnostics in release builds is rude,
and almost everything uses unconditional printk()'s.
* Switch some errors for more appropriate ones.
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 19 Jul 2021 10:28:50 +0000 (12:28 +0200)]
x86/AMD: adjust SYSCFG, TOM, etc exposure to deal with running nested
In the original change I neglected to consider the case of us running as
L1 under another Xen. In this case we're not Dom0, so the underlying Xen
wouldn't permit us access to these MSRs. As an immediate workaround use
rdmsr_safe(); I don't view this as the final solution though, as the
original problem the earlier change tried to address also applies when
running nested. Yet it is then unclear to me how to properly address the
issue: We shouldn't generally expose the MSR values, but handing back
zero (or effectively any other static value) doesn't look appropriate
either.
Fixes: bfcdaae9c210 ("x86/AMD: expose SYSCFG, TOM, TOM2, and IORRs to Dom0") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Mon, 19 Jul 2021 10:28:09 +0000 (12:28 +0200)]
libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl
The hypervisor may not have enough memory to satisfy the request. While
there, make the unit of the value clear by renaming the local variable.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
stubdom: foreignmemory: Fix build after 0dbb4be739c5
Commit 0dbb4be739c5 add the inclusion of xenctrl.h from private.h and
wreck the build in an interesting way:
In file included from xen/stubdom/include/xen/domctl.h:39:0,
from xen/tools/include/xenctrl.h:36,
from private.h:4,
from minios.c:29:
xen/include/public/memory.h:407:5: error: expected specifier-qualifier-list before ‘XEN_GUEST_HANDLE_64’
XEN_GUEST_HANDLE_64(const_uint8) buffer;
^~~~~~~~~~~~~~~~~~~
This is happening because xenctrl.h defines __XEN_TOOLS__ and therefore
the public headers will start to expose the non-stable ABI. However,
xen.h has already been included by a mini-OS header before hand. So
there is a mismatch in the way the headers are included.
For now solve it in a very simple (and gross) way by including
xenctrl.h before the mini-os headers.
Fixes: 0dbb4be739c5 ("tools/libs/foreignmemory: Fix PAGE_SIZE redefinition error") Signed-off-by: Julien Grall <jgrall@amazon.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 13 Jul 2021 08:16:18 +0000 (10:16 +0200)]
IOMMU: correct parsing of "quarantine=scratch-page"
During the multiple renames of the sub-option I apparently forgot to
update the left side of the &&, and this pretty consistently.
Fixes: 980d6acf1517 ("IOMMU: make DMA containment of quarantined devices optional") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Tue, 15 Jun 2021 15:02:29 +0000 (16:02 +0100)]
tests/xenstore: Rework Makefile
In particular, fill in the install/uninstall rules so this test can be
packaged to be automated sensibly.
This causes the code to be noticed by CI, which objects as follows:
test-xenstore.c: In function 'main':
test-xenstore.c:486:5: error: ignoring return value of 'asprintf', declared
with attribute warn_unused_result [-Werror=unused-result]
asprintf(&path, "%s/%u", TEST_PATH, getpid());
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Address the CI failure by checking the asprintf() return value and exiting.
Rename xs-test to test-xenstore to be consistent with other tests. Honour
APPEND_FLAGS too.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 15 Jun 2021 14:37:49 +0000 (15:37 +0100)]
tests/cpu-policy: Rework Makefile
In particular, fill in the install/uninstall rules so this test can be
packaged to be automated sensibly.
Rework TARGET-y to be TARGETS, drop redundant -f's for $(RM), drop the
unconditional -O3 and use the default instead, and drop CFLAGS from the link
line but honour APPEND_LDFLAGS.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 15 Jun 2021 13:19:15 +0000 (14:19 +0100)]
tools/tests: Drop obsolete mce-test infrastructure
mce-test has a test suite, but it depends on xend, needs to run in-tree, and
requires manual setup of at least one guest, and manual parameters to pass
into cases. Drop the test infrasturcture.
Move the one useful remaining item, xen-mceinj, into misc/, fixing some minor
style issues as it goes.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Costin Lupu [Tue, 8 Jun 2021 12:35:29 +0000 (15:35 +0300)]
tools/ocaml: Fix redefinition errors
If PAGE_SIZE is already defined in the system (e.g. in /usr/include/limits.h
header) then gcc will trigger a redefinition error because of -Werror. This
patch replaces usage of PAGE_* macros with XC_PAGE_* macros in order to avoid
confusion between control domain page granularity (PAGE_* definitions) and
guest domain page granularity (which is what we are dealing with here).
Same issue applies for redefinitions of Val_none and Some_val macros which
can be already define in the OCaml system headers (e.g.
/usr/lib/ocaml/caml/mlvalues.h).
Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro> Reviewed-by: Julien Grall <jgrall@amazon.com> Acked-by: Ian Jackson <iwj@xenproject.org> Tested-by: Dario Faggioli <dfaggioli@suse.com>
If PAGE_SIZE is already defined in the system (e.g. in /usr/include/limits.h
header) then gcc will trigger a redefinition error because of -Werror. This
patch replaces usage of PAGE_* macros with XC_PAGE_* macros in order to avoid
confusion between control domain page granularity (PAGE_* definitions) and
guest domain page granularity.
The exception is in osdep_xenforeignmemory_map() where we need the system page
size to check whether the PFN array should be allocated with mmap() or with
dynamic allocation.
Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro> Reviewed-by: Julien Grall <jgrall@amazon.com> Acked-by: Ian Jackson <iwj@xenproject.org>
If PAGE_SIZE is already defined in the system (e.g. in /usr/include/limits.h
header) then gcc will trigger a redefinition error because of -Werror. This
patch replaces usage of PAGE_* macros with XC_PAGE_* macros in order to avoid
confusion between control domain page granularity (PAGE_* definitions) and
guest domain page granularity.
The exception is in osdep_xenforeignmemory_map() where we need the system page
size to check whether the PFN array should be allocated with mmap() or with
dynamic allocation.
Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro> Reviewed-by: Julien Grall <jgrall@amazon.com> Acked-by: Ian Jackson <iwj@xenproject.org>
Costin Lupu [Tue, 8 Jun 2021 12:35:25 +0000 (15:35 +0300)]
tools/debugger: Fix PAGE_SIZE redefinition error
If PAGE_SIZE is already defined in the system (e.g. in /usr/include/limits.h
header) then gcc will trigger a redefinition error because of -Werror. This
patch replaces usage of PAGE_* macros with KDD_PAGE_* macros in order to avoid
confusion between control domain page granularity (PAGE_* definitions) and
guest domain page granularity (which is what we are dealing with here).
We chose to define the KDD_PAGE_* macros instead of using XC_PAGE_* macros
because (1) the code in kdd.c should not include any Xen headers and (2) to add
consistency for code in both kdd.c and kdd-xen.c.
Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Ian Jackson <iwj@xenproject.org>
Olaf Hering [Thu, 8 Jul 2021 13:57:04 +0000 (15:57 +0200)]
automation: use zypper dup in tumbleweed dockerfile
The 'dup' command aligns the installed packages with the packages
found in the enabled repositories, taking the repository priorities
into account. Using this command is generally a safe thing to do.
In the context of Tumbleweed using 'dup' is essential, because package
versions might be downgraded, and package names occasionally change.
Only 'dup' will do the correct thing in such cases.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Fixes: 91c3e3dc91d6 ("tools/xentop: Display '-' when stats are not available.") Signed-off-by: Richard Kojedzinszky <richard@kojedz.in> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Olaf Hering [Wed, 16 Jun 2021 13:14:35 +0000 (15:14 +0200)]
tools: ipxe: update for fixing build with GCC11
Use a snapshot which includes commit f3f568e382a5f19824b3bfc6081cde39eee661e8 ("[crypto] Add
memory output constraints for big-integer inline assembly"),
which fixes build with gcc11.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 9 Jul 2021 06:32:07 +0000 (08:32 +0200)]
x86: mark hypercall argument regs clobbering for intended fall-through
The CIDs below are all for the PV side of things, yet while at it take
care of the HVM side as well.
Coverity-ID: 1485896, 1485901, 1485906, 1485910, 1485911, Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 9 Jul 2021 06:31:28 +0000 (08:31 +0200)]
x86emul: pad blob-execution "okay" messages
We already do so in the native execution case, and a few descriptions (I
did notice this with SHA ones) are short enough for the output to look
slightly odd.
Jan Beulich [Fri, 9 Jul 2021 06:30:35 +0000 (08:30 +0200)]
x86/AMD: drop MSR_K7_HWCR
We don't support any K7 (32-bit only) hardware anymore, and the MSR is
accessible as MSR_K8_HWCR as well. Using the K7 name was particularly
odd for Hygon as well as in a Fam0F-specific piece of code.
Jan Beulich [Fri, 9 Jul 2021 06:28:14 +0000 (08:28 +0200)]
x86/AMD: expose SYSCFG, TOM, TOM2, and IORRs to Dom0
Sufficiently old Linux (3.12-ish) accesses these MSRs (with the
exception of IORRs) in an unguarded manner. Furthermore these same MSRs,
at least on Fam11 and older CPUs, are also consulted by modern Linux,
and their (bogus) built-in zapping of #GP faults from MSR accesses leads
to it effectively reading zero instead of the intended values, which are
relevant for PCI BAR placement (which ought to all live in MMIO-type
space, not in DRAM-type one).
For SYSCFG, only certain bits get exposed. Since MtrrVarDramEn also
covers the IORRs, expose them as well. Introduce (consistently named)
constants for the bits we're interested in and use them in pre-existing
code as well. While there also drop the unused and somewhat questionable
K8_MTRR_RDMEM_WRMEM_MASK. To complete the set of memory type and DRAM vs
MMIO controlling MSRs, also expose TSEG_{BASE,MASK} (the former also
gets read by Linux, dealing with which was already the subject of 6eef0a99262c ["x86/PV: conditionally avoid raising #GP for early guest
MSR reads"]).
As a welcome side effect, verbosity on/of debug builds gets (perhaps
significantly) reduced.
Note that at least as far as those MSR accesses by Linux are concerned,
there's no similar issue for DomU-s, as the accesses sit behind PCI
device matching logic. The checked for devices would never be exposed to
DomU-s in the first place. Nevertheless I think that at least for HVM we
should return sensible values, not 0 (as svm_msr_read_intercept() does
right now). The intended values may, however, need to be determined by
hvmloader, and then get made known to Xen.
Fixes: 322ec7c89f66 ("x86/pv: disallow access to unknown MSRs") Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Daniel P. Smith [Fri, 9 Jul 2021 06:19:47 +0000 (08:19 +0200)]
docs/designs/launch: Hyperlaunch design document
Adds a design document for Hyperlaunch, formerly DomB mode of dom0less.
Signed-off-by: Christopher Clark <christopher.clark@starlab.io>
Signed-off by: Daniel P. Smith <dpsmith@apertussolutions.com> Reviewed-by: Rich Persaud <rp@stacktrust.org>
Olaf Hering [Thu, 8 Jul 2021 06:54:35 +0000 (08:54 +0200)]
automation: collect log files in subdirectories
The current single *.log pattern collects just config.log, which
usually contains little useful information.
Collect also log files in subdirectories, tools/config.log usually
contains information about configure failures.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Olaf Hering [Thu, 8 Jul 2021 06:29:22 +0000 (08:29 +0200)]
automation: dump contents of /etc/os-release
To aid debugging build failures, dump /etc/os-release during build.
This helps with rolling releases such as Tumbleweed to understand the
state of the build container.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 7 Jul 2021 16:40:01 +0000 (17:40 +0100)]
automation: Check if ninja is available before building QEMU
ninja is now required to build the latest version of QEMU, and not all
distros have a suitable version. Skip the QEMU build when ninja is not
available.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 7 Jul 2021 16:40:00 +0000 (17:40 +0100)]
automation: Adding ninja-build to some docker images
This is to allow building the latest version of QEMU.
fedora/29:
In addition to adding "ninja", I've add to make some other
changes: some `go build` failed with `mkdir /.cache` no
permission, so I've created a user.
(this was discovered while testing the new container with the
script containerize.)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Michal Orzel [Tue, 6 Jul 2021 10:28:53 +0000 (12:28 +0200)]
arm: Fix arch_initialise_vcpu to be unsupported
Function arch_initialise_vcpu is not reachable as the
VCPUOP_initialise is an unsupported operation on arm.
Modify the function by adding ASSERT_UNREACHABLE() and
returning -EOPNOTSUPP.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Michal Orzel <michal.orzel@arm.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
918b8842a852 changed CPSR and SPSR to be stored as 64bit values.
This is fixing the print size in some tools to use 64bit type.
Fixes: 918b8842a852 ("arm64: Change type of hsr, cpsr, spsr_el1 to uint64_t") Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Michal Orzel <michal.orzel@arm.com> Tested-by: Michal Orzel <michal.orzel@arm.com>
tools/xen-foreign: Update the size for vcpu_guest_{core_regs, context}
Commit 918b8842a852 ("arm64: Change type of hsr, cpsr, spsr_el1 to
uint64_t") updated the size of the structure vcpu_guest_core_regs and
indirectly vcpu_guest_context.
On Arm, the two structures are only accessible to the tools and the
hypervisor (and therefore stable). However, they are still checked
by the scripts in tools/include/xen-foreign are not able to understand
that.
Ideally we should rework the scripts so we don't have to update
the size for non-stable structure. But I don't have limited time
to spend on the issue. So chose the simple solution and update
the size accordingly.
Note that we need to keep vcpu_guest_core_regs around because
the structure is used by vcpu_guest_context and therefore the
scripts expects the generated header to contain it.
Fixes: 918b8842a852 ("arm64: Change type of hsr, cpsr, spsr_el1 to uint64_t") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Michal Orzel <michal.orzel@arm.com> Tested-by: Michal Orzel <michal.orzel@arm.com>
Jan Beulich [Wed, 7 Jul 2021 10:35:54 +0000 (12:35 +0200)]
x86/mem-sharing: mov {get,put}_two_gfns()
There's no reason for every CU including p2m.h to have these two
functions compiled, when they're both mem-sharing specific right now and
for the foreseeable future.
Largely just code movement, with some style tweaks, the inline-s
dropped, and "put" being made consistent with "get" as to their NULL
checking of the passed in pointer to struct two_gfns.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Wed, 7 Jul 2021 10:35:12 +0000 (12:35 +0200)]
x86/mem-sharing: ensure consistent lock order in get_two_gfns()
While the comment validly says "Sort by domain, if same domain by gfn",
the implementation also included equal domain IDs in the first part of
the check, thus rending the second part entirely dead and leaving
deadlock potential when there's only a single domain involved.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Wed, 7 Jul 2021 10:32:45 +0000 (12:32 +0200)]
IOMMU: make DMA containment of quarantined devices optional
Containing still in flight DMA was introduced to work around certain
devices / systems hanging hard upon hitting a "not-present" IOMMU fault.
Passing through (such) devices (on such systems) is inherently insecure
(as guests could easily arrange for IOMMU faults of any kind to occur).
Defaulting to a mode where admins may not even become aware of issues
with devices can be considered undesirable. Therefore convert this mode
of operation to an optional one, not one enabled by default.
This involves resurrecting code commit ea38867831da ("x86 / iommu: set
up a scratch page in the quarantine domain") did remove, in a slightly
extended and abstracted fashion. Here, instead of reintroducing a pretty
pointless use of "goto" in domain_context_unmap(), and instead of making
the function (at least temporarily) inconsistent, take the opportunity
and replace the other similarly pointless "goto" as well.
In order to key the re-instated bypasses off of there (not) being a root
page table this further requires moving the allocate_domain_resources()
invocation from reassign_device() to amd_iommu_setup_domain_device() (or
else reassign_device() would allocate a root page table anyway); this is
benign to the second caller of the latter function.
In VT-d's domain_context_unmap(), instead of adding yet another
"goto out" when all that's wanted is a "return", eliminate the "out"
label at the same time.
Take the opportunity and also limit the control to builds supporting
PCI.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Olaf Hering [Thu, 1 Jul 2021 09:56:07 +0000 (11:56 +0200)]
tools/migration: unify type checking for data pfns in the VM
Introduce a helper which decides if a given pfn in the migration
stream is backed by memory.
This highlights more clearly that type XEN_DOMCTL_PFINFO_XALLOC (a
synthetic toolstack-only type used between Xen 4.2 to 4.5 which
indicated a dirty page on the sending side for which no data will be
send in the initial iteration) does get populated in the VM.
No change in behaviour intended, except for invalid page types which now
have a safer default.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Olaf Hering [Thu, 1 Jul 2021 09:56:05 +0000 (11:56 +0200)]
tools/migration: unify known page type checking
Users of xc_get_pfn_type_batch may want to sanity check the data
returned by Xen. Add helpers for this purpose:
is_known_page_type verifies the type returned by Xen on the saving
side, or the incoming type for a page on the restoring side, is known
by the save/restore code.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Olaf Hering [Thu, 1 Jul 2021 09:56:01 +0000 (11:56 +0200)]
tools/python: fix Python3.4 TypeError in format string
Using the first element of a tuple for a format specifier fails with
python3.4 as included in SLE12:
b = b"string/%x" % (i, )
TypeError: unsupported operand type(s) for %: 'bytes' and 'tuple'
It happens to work with python 2.7 and 3.6.
To support older Py3, format as strings and explicitly encode as ASCII.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Olaf Hering [Thu, 1 Jul 2021 09:56:00 +0000 (11:56 +0200)]
tools/python: handle libxl__physmap_info.name properly in convert-legacy-stream
The trailing member name[] in libxl__physmap_info is written as a
cstring into the stream. The current code does a sanity check if the
last byte is zero. This attempt fails with python3 because name[-1]
returns a type int. As a result the comparison with byte(\00) fails:
File "/usr/lib/xen/bin/convert-legacy-stream", line 347, in read_libxl_toolstack
raise StreamError("physmap name not NUL terminated")
StreamError: physmap name not NUL terminated
To handle both python variants, cast to bytearray().
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Michal Orzel [Mon, 5 Jul 2021 06:39:52 +0000 (08:39 +0200)]
arm64: Change type of hsr, cpsr, spsr_el1 to uint64_t
AArch64 registers are 64bit whereas AArch32 registers
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 registers have upper 32bit reserved
it does not mean that they can't be widen in the future.
Modify type of hsr, cpsr, spsr_el1 to uint64_t.
Previously we relied on the padding after spsr_el1.
As we removed the padding, modify the union to be 64bit so we don't corrupt spsr_fiq.
No need to modify the assembly code because the accesses were based on 64bit
registers as there was a 32bit padding after spsr_el1.
Remove 32bit padding in cpu_user_regs before spsr_fiq
as it is no longer needed due to upper union being 64bit now.
Add 64bit padding in cpu_user_regs before spsr_el1
because the kernel frame should be 16-byte aligned.
Change type of cpsr to uint64_t in the public outside interface
"public/arch-arm.h" to allow ABI compatibility between 32bit and 64bit.
Increment XEN_DOMCTL_INTERFACE_VERSION.
Change type of cpsr to uint64_t in the public outside interface
"public/vm_event.h" to allow ABI compatibility between 32bit and 64bit.