Roger Pau Monné [Mon, 11 Jan 2021 13:58:00 +0000 (14:58 +0100)]
x86/acpi: remove dead code
After the recent changes to acpi_fadt_parse_sleep_info the bad label
can never be called with facs mapped, and hence the unmap can be
removed.
Additionally remove the whole label, since it was used by a
single caller. Move the relevant code from the label.
No functional change intended.
CID: 1471722 Fixes: 16ca5b3f873 ('x86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be determined') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 11 Jan 2021 13:56:23 +0000 (14:56 +0100)]
ACPI: replace casts by container_of()
The latter is slightly more type-safe. Also add const where possible,
including without need to touch further code. Additionally replace an
adjacent unnecessary use of u16.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 11 Jan 2021 13:55:52 +0000 (14:55 +0100)]
x86/ACPI: don't overwrite FADT
When marking fields invalid for our own purposes, we should do so in our
local copy (so we will notice later on), not in the firmware provided
one (which another entity may want to look at again, e.g. after kexec).
Also mark the function parameter const to notice such issues right away.
Instead use the pointer at the firmware copy for specifying an adjacent
printk()'s arguments. If nothing else this at least reduces the number
of relocations the assembler hasto emit and the linker has to process.
Fixes: 62d1a69a4e9f ("ACPI: support v5 (reduced HW) sleep interface") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 11 Jan 2021 13:55:16 +0000 (14:55 +0100)]
ACPI: reduce verbosity by default
While they're KERN_INFO messages and hence not visible by default, we
still have had reports that the amount of output is too large, not the
least because
- the command line controlled resizing of the console ring buffer
happens only after SRAT parsing (which may alone produce more than 16k
of output),
- the default resizing of the console ring buffer happens only after
ACPI table parsing, since the default size gets calculated depending
on the number or processors found.
Gate all per-processor logging behind a new "acpi=verbose", making sure
we wouldn't unintentionally pass this on to Dom0.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 11 Jan 2021 13:53:55 +0000 (14:53 +0100)]
evtchn: closing of vIRQ-s doesn't require looping over all vCPU-s
Global vIRQ-s have their event channel association tracked on vCPU 0.
Per-vCPU vIRQ-s can't have their notify_vcpu_id changed. Hence it is
well-known which vCPU's virq_to_evtchn[] needs updating.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Mon, 11 Jan 2021 13:53:02 +0000 (14:53 +0100)]
evtchn: don't call Xen consumer callback with per-channel lock held
While there don't look to be any problems with this right now, the lock
order implications from holding the lock can be very difficult to follow
(and may be easy to violate unknowingly). The present callbacks don't
(and no such callback should) have any need for the lock to be held.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Mon, 11 Jan 2021 13:51:39 +0000 (14:51 +0100)]
x86/PV: fold redundant calls to adjust_guest_l<N>e()
At least from an abstract perspective it is quite odd for us to compare
adjusted old and unadjusted new page table entries when determining
whether the fast path can be used. This is largely benign because
FASTPATH_FLAG_WHITELIST covers most of the flags which the adjustments
may set, and the flags getting set don't affect the outcome of
get_page_from_l<N>e(). There's one exception: 32-bit L3 entries get
_PAGE_RW set, but get_page_from_l3e() doesn't allow linear page tables
to be created at this level for such guests. Apart from this _PAGE_RW
is unused by get_page_from_l<N>e() (for N > 1), and hence forcing the
bit on early has no functional effect.
The main reason for the change, however, is that adjust_guest_l<N>e()
aren't exactly cheap - both in terms of pure code size and because each
one has at least one evaluate_nospec() by way of containing
is_pv_32bit_domain() conditionals.
Call the functions once ahead of the fast path checks, instead of twice
after.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Commit 8a74707a7c ("x86/nospec: Use always_inline to fix code gen for
evaluate_nospec") converted inline to always_inline for
adjust_guest_l[134]e(), but left adjust_guest_l2e() and
unadjust_guest_l3e() alone without saying why these two would differ in
the needed / wanted treatment. Adjust these two as well.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
MVFR2 is not available on ARMv7. It is available on ARMv8 aarch32 and
aarch64. If Xen reads MVFR2 on ARMv7 it could crash.
Avoid the issue by doing the following:
- define MVFR2_MAYBE_UNDEFINED on arm32
- if MVFR2_MAYBE_UNDEFINED, do not attempt to read MVFR2 in Xen
- keep the 3rd register_t in struct cpuinfo_arm.mvfr on arm32 so that a
guest read to the register returns '0' instead of crashing the guest.
'0' is an appropriate value to return to the guest because it is defined
as "no support for miscellaneous features".
Aarch64 Xen is not affected by this patch.
Fixes: 9cfdb489af81 ("xen/arm: Add ID registers and complete cpuinfo") Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Acked-by: Julien Grall <jgrall@amazon.com>
Roger Pau Monné [Fri, 8 Jan 2021 15:51:52 +0000 (16:51 +0100)]
x86/hypercall: fix gnttab hypercall args conditional build on pvshim
A pvshim build doesn't require the grant table functionality built in,
but it does require knowing the number of arguments the hypercall has
so the hypercall parameter clobbering works properly.
Instead of also setting the argument count for the gnttab case if PV
shim functionality is enabled, just drop all of the conditionals from
hypercall_args_table, as a hypercall having a NULL handler won't get
to use that information anyway.
Note this hasn't been detected by osstest because the tools pvshim
build is done without debug enabled, so the hypercall parameter
clobbering doesn't happen.
Fixes: d2151152dd2 ('xen: make grant table support configurable') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 8 Jan 2021 15:51:19 +0000 (16:51 +0100)]
x86/shadow: adjust TLB flushing in sh_unshadow_for_p2m_change()
Accumulating transient state of d->dirty_cpumask in a local variable is
unnecessary here: The flush is fine to make with the dirty set at the
time of the call. With this, move the invocation to a central place at
the end of the function.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 8 Jan 2021 15:50:11 +0000 (16:50 +0100)]
x86/p2m: pass old PTE directly to write_p2m_entry_pre() hook
In no case is a pointer to non-const needed. Since no pointer arithmetic
is done by the sole user of the hook, passing in the PTE itself is quite
fine.
While doing this adjustment also
- drop the intermediate sh_write_p2m_entry_pre():
sh_unshadow_for_p2m_change() can itself be used as the hook function,
moving the conditional into there,
- introduce a local variable holding the flags of the old entry.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Fri, 8 Jan 2021 15:49:23 +0000 (16:49 +0100)]
x86/p2m: avoid unnecessary calls of write_p2m_entry_pre() hook
When shattering a large page, we first construct the new page table page
and only then hook it up. The "pre" hook in this case does nothing, for
the page starting out all blank. Avoid 512 calls into shadow code in
this case by passing in INVALID_GFN, indicating the page being updated
is (not yet) associated with any GFN. (The alternative to this change
would be to actually pass in a correct GFN, which can't be all the same
on every loop iteration.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Tamas K Lengyel [Fri, 8 Jan 2021 10:51:36 +0000 (11:51 +0100)]
x86/mem_sharing: resolve mm-lock order violations when forking VMs with nested p2m
Several lock-order violations have been encountered while attempting to fork
VMs with nestedhvm=1 set. This patch resolves the issues.
The order violations stems from a call to p2m_flush_nestedp2m being performed
whenever the hostp2m changes. This functions always takes the p2m lock for the
nested_p2m. However, with sharing the p2m locks always have to be taken before
the sharing lock. To resolve this issue we avoid taking the sharing lock where
possible (and was actually unecessary to begin with). But we also make
p2m_flush_nestedp2m aware that the p2m lock may have already been taken and
preemptively take all nested_p2m locks before unsharing a page where taking the
sharing lock is necessary.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 8 Jan 2021 10:50:32 +0000 (11:50 +0100)]
x86: fold indirect_thunk_asm.h into asm-defns.h
There's little point in having two separate headers both getting
included by asm_defns.h. This in particular reduces the number of
instances of guarding asm(".include ...") suitably in such dual use
headers.
No change to generated code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 8 Jan 2021 10:48:09 +0000 (11:48 +0100)]
x86: drop ASM_{CL,ST}AC
Use ALTERNATIVE directly, such that at the use sites it is visible that
alternative code patching is in use. Similarly avoid hiding the fact in
SAVE_ALL.
No change to generated code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 8 Jan 2021 10:45:07 +0000 (11:45 +0100)]
x86: replace __ASM_{CL,ST}AC
Introduce proper assembler macros instead, enabled only when the
assembler itself doesn't support the insns. To avoid duplicating the
macros for assembly and C files, have them processed into asm-macros.h.
This in turn requires adding a multiple inclusion guard when generating
that header.
No change to generated code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roman Skakun [Wed, 6 Jan 2021 11:26:57 +0000 (13:26 +0200)]
xen/arm: optee: The function identifier is always 32-bit
Per the SMCCC specification (see section 3.1 in ARM DEN 0028D), the
function identifier is only stored in the least significant 32-bits.
The most significant 32-bits should be ignored.
Signed-off-by: Roman Skakun <roman_skakun@epam.com> Acked-by: Volodymyr Babchyk <volodymyr_babchuk@epam.com>
[jgrall: Reword the commit message and comment] Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Thu, 7 Jan 2021 14:11:25 +0000 (15:11 +0100)]
xsm/dummy: harden against speculative abuse
First of all don't open-code is_control_domain(), which is already
suitably using evaluate_nospec(). Then also apply this construct to the
other paths of xsm_default_action(). Also guard two paths not using this
function.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
Roger Pau Monné [Thu, 7 Jan 2021 14:10:29 +0000 (15:10 +0100)]
x86/dpci: EOI interrupt regardless of its masking status
Modify hvm_pirq_eoi to always EOI the interrupt if required, instead
of not doing such EOI if the interrupt is routed through the vIO-APIC
and the entry is masked at the time the EOI is performed.
Further unmask of the vIO-APIC pin won't EOI the interrupt, and thus
the guest OS has to wait for the timeout to expire and the automatic
EOI to be performed.
This allows to simplify the helpers and drop the vioapic_redir_entry
parameter from all of them.
Fixes: ccfe4e08455 ('Intel vt-d specific changes in arch/x86/hvm/vmx/vtd.') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 7 Jan 2021 14:09:47 +0000 (15:09 +0100)]
x86: drop use of E801 memory "map" (and alike)
ACPI mandates use of E820 (or newer, e.g. EFI), and in fact firmware
has been observed to include E820_ACPI ranges in what E801 reports as
available (really "configured") memory. Since all 64-bit systems ought
to support ACPI, drop our use of older BIOS and boot loader interfaces.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 7 Jan 2021 14:03:17 +0000 (15:03 +0100)]
vPCI/MSI-X: fold clearing of entry->updated
Both call sites clear the flag after a successfull call to
update_entry(). This can be simplified by moving the clearing into the
function, onto its success path.
As a result of neither caller caring about update_entry()'s return value
anymore, the function gets switched to return void.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
libxl: cleanup remaining backend xs dirs after driver domain
When device is removed, backend domain (which may be a driver domain) is
responsible for removing backend entries from xenstore. But in case of
driver domain, it has no access to remove all of them - specifically the
directory named after frontend-id remains. This may accumulate enough to
exceed xenstore quote of the driver domain, breaking further devices.
Fix this by calling libxl__xs_path_cleanup() on the backend path from
libxl__device_destroy() in the toolstack domain too. Note
libxl__device_destroy() is called when the driver domain already removed
what it can (see device_destroy_be_watch_cb()->device_hotplug_done()).
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Tue, 5 Jan 2021 12:20:54 +0000 (13:20 +0100)]
lib/sort: adjust types
First and foremost do away with the use of plain int for sizes or size-
derived values. Use size_t, despite this requiring some adjustment to
the logic. Also replace u32 by uint32_t.
While not directly related also drop a leftover #ifdef from x86's
swap_ex - this was needed only back when 32-bit Xen was still a thing.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 5 Jan 2021 12:20:13 +0000 (13:20 +0100)]
vPCI/MSI-X: tidy init_msix()
First of all introduce a local variable for the to be allocated struct.
The compiler can't CSE all the occurrences (I'm observing 80 bytes of
code saved with gcc 10). Additionally, while the caller can cope and
there was no memory leak, globally "announce" the struct only once done
initializing it. This also removes the dependency of the function on
the caller cleaning up after it in case of an error.
Also prefer a local variable over using a structure field previously
set from this very variable.
Finally move the call to vpci_add_register() ahead of all further
initialization of the struct, to bail early in case of error.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 5 Jan 2021 12:18:26 +0000 (13:18 +0100)]
x86/vPCI: check address in vpci_msi_update()
If the upper address bits don't match the interrupt delivery address
space window, entirely different behavior would need to be implemented.
Refuse such requests for the time being.
Replace adjacent hard tabs while introducing MSI_ADDR_BASE_MASK.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 5 Jan 2021 12:17:54 +0000 (13:17 +0100)]
x86/vPCI: tolerate (un)masking a disabled MSI-X entry
None of the four reasons causing vpci_msix_arch_mask_entry() to get
called (there's just a single call site) are impossible or illegal prior
to an entry actually having got set up:
- the entry may remain masked (in this case, however, a prior masked ->
unmasked transition would already not have worked),
- MSI-X may not be enabled,
- the global mask bit may be set,
- the entry may not otherwise have been updated.
Hence the function asserting that the entry was previously set up was
simply wrong. Since the caller tracks the masked state (and setting up
of an entry would only be effected when that software bit is clear),
it's okay to skip both masking and unmasking requests in this case.
Fixes: d6281be9d0145 ('vpci/msix: add MSI-X handlers') Reported-by: Manuel Bouyer <bouyer@antioche.eu.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
Jan Beulich [Tue, 5 Jan 2021 12:13:18 +0000 (13:13 +0100)]
x86/build: restrict contents of asm-offsets.h when !HVM / !PV
This file has a long dependencies list (through asm-offsets.[cs]) and a
long list of dependents. IOW if any of the former changes, all of the
latter will be rebuilt, even if there's no actual change to the
generated file. Therefore avoid producing symbols we don't actually
need, depending on configuration.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 5 Jan 2021 12:12:37 +0000 (13:12 +0100)]
x86/build: limit #include-ing by asm-offsets.c
This file has a long dependencies list and asm-offsets.h, generated from
it, has a long list of dependents. IOW if any of the former changes, all
of the latter will be rebuilt, even if there's no actual change to the
generated file. Therefore avoid including headers we don't actually need
(generally or configuration dependent).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 5 Jan 2021 12:12:15 +0000 (13:12 +0100)]
x86/build: limit rebuilding of asm-offsets.h
This file has a long dependencies list (through asm-offsets.[cs]) and a
long list of dependents. IOW if any of the former changes, all of the
latter will be rebuilt, even if there's no actual change to the
generated file. This is the primary scenario we have the move-if-changed
macro for.
Since debug information may easily cause the file contents to change in
benign ways, also avoid emitting this into the output file.
Finally already before this change *.new files needed including in what
gets removed by the "clean" target.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 5 Jan 2021 12:11:04 +0000 (13:11 +0100)]
x86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be determined
We can be more tolerant as long as the data collected from FACS is only
needed to enter S3. A prior change already added suitable checking to
acpi_enter_sleep().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 5 Jan 2021 12:09:55 +0000 (13:09 +0100)]
x86/ACPI: fix S3 wakeup vector mapping
Use of __acpi_map_table() here was at least close to an abuse already
before, but it will now consistently return NULL here. Drop the layering
violation and use set_fixmap() directly. Re-use of the ACPI fixmap area
is hopefully going to remain "fine" for the time being.
Add checks to acpi_enter_sleep(): The vector now needs to be contained
within a single page, but the ACPI spec requires 64-byte alignment of
FACS anyway. Also bail if no wakeup vector was determined in the first
place, in part as preparation for a subsequent relaxation change.
Fixes: 1c4aa69ca1e1 ("xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Bertrand Marquis [Thu, 17 Dec 2020 15:38:08 +0000 (15:38 +0000)]
xen/arm: Activate TID3 in HCR_EL2
Activate TID3 bit in HCR register when starting a guest.
This will trap all coprecessor ID registers so that we can give to guest
values corresponding to what they can actually use and mask some
features to guests even though they would be supported by the underlying
hardware (like SVE or MPAM).
Bertrand Marquis [Thu, 17 Dec 2020 15:38:07 +0000 (15:38 +0000)]
xen/arm: Add CP10 exception support to handle MVFR
Add support for cp10 exceptions decoding to be able to emulate the
values for MVFR0, MVFR1 and MVFR2 when TID3 bit of HSR is activated.
This is required for aarch32 guests accessing MVFR registers using
vmrs and vmsr instructions.
Bertrand Marquis [Thu, 17 Dec 2020 15:38:06 +0000 (15:38 +0000)]
xen/arm: Add handler for cp15 ID registers
Add support for emulation of cp15 based ID registers (on arm32 or when
running a 32bit guest on arm64).
The handlers are returning the values stored in the guest_cpuinfo
structure for known registers and RAZ for all reserved registers.
In the current status the MVFR registers are no supported.
Bertrand Marquis [Thu, 17 Dec 2020 15:38:05 +0000 (15:38 +0000)]
xen/arm: Add handler for ID registers on arm64
Add vsysreg emulation for registers trapped when TID3 bit is activated
in HSR.
The emulation is returning the value stored in cpuinfo_guest structure
for know registers and is handling reserved registers as RAZ.
Bertrand Marquis [Thu, 17 Dec 2020 15:38:04 +0000 (15:38 +0000)]
xen/arm: create a cpuinfo structure for guest
Create a cpuinfo structure for guest and mask into it the features that
we do not support in Xen or that we do not want to publish to guests.
Modify some values in the cpuinfo structure for guests to mask some
features which we do not want to allow to guests (like AMU) or we do not
support (like SVE).
Modify some values in the guest cpuinfo structure to guests to hide some
processor features:
- SVE as this is not supported by Xen and guest are not allowed to use
this features (ZEN is set to 0 in CPTR_EL2).
- AMU as HCPTR_TAM is set in CPTR_EL2 so AMU cannot be used by guests
All other bits are left untouched.
- RAS as this is not supported by Xen.
The code is trying to group together registers modifications for the
same feature to be able in the long term to easily enable/disable a
feature depending on user parameters or add other registers modification
in the same place (like enabling/disabling HCR bits).
Bertrand Marquis [Thu, 17 Dec 2020 15:38:03 +0000 (15:38 +0000)]
xen/arm: Add arm64 ID registers definitions
Add coprocessor registers definitions for all ID registers trapped
through the TID3 bit of HSR.
Those are the one that will be emulated in Xen to only publish to guests
the features that are supported by Xen and that are accessible to
guests.
Bertrand Marquis [Thu, 17 Dec 2020 15:38:02 +0000 (15:38 +0000)]
xen/arm: Add ID registers and complete cpuinfo
Add definition and entries in cpuinfo for ID registers introduced in
newer Arm Architecture reference manual:
- ID_PFR2: processor feature register 2
- ID_DFR1: debug feature register 1
- ID_MMFR4 and ID_MMFR5: Memory model feature registers 4 and 5
- ID_ISA6: ISA Feature register 6
Add more bitfield definitions in PFR fields of cpuinfo.
Add MVFR2 register definition for aarch32.
Add MVFRx_EL1 defines for aarch32.
Add mvfr values in cpuinfo.
Add some registers definition for arm64 in sysregs as some are not
always know by compilers.
Initialize the new values added in cpuinfo in identify_cpu during init.
Bertrand Marquis [Thu, 17 Dec 2020 15:38:01 +0000 (15:38 +0000)]
xen/arm: Use READ_SYSREG instead of 32/64 versions
Modify identify_cpu function to use READ_SYSREG instead of READ_SYSREG32
or READ_SYSREG64.
All aarch32 specific registers (for example ID_PFR0_EL1) are 64bit when
accessed from aarch64 with upper bits read as 0, so it is right to
access them as 64bit registers on a 64bit platform.
Andrew Cooper [Thu, 31 Dec 2020 16:55:20 +0000 (16:55 +0000)]
x86/p2m: Fix paging_gva_to_gfn() for nested virt
nestedhap_walk_L1_p2m() takes guest physical addresses, not frame numbers.
This means the l2 input is off-by-PAGE_SHIFT, as is the l1 value eventually
returned to the caller.
Delete the misleading comment as well.
Fixes: bab2bd8e222de ("xen/nested_p2m: Don't walk EPT tables with a regular PT walker") Reported-by: Tamas K Lengyel <tamas@tklengyel.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Tamas K Lengyel <tamas@tklengyel.com>
Roger Pau Monné [Mon, 4 Jan 2021 09:03:23 +0000 (10:03 +0100)]
x86/p2m: fix p2m_add_foreign error path
One of the error paths in p2m_add_foreign could call put_page with a
NULL page, thus triggering a fault.
Split the checks into two different if statements, so the appropriate
error path can be taken.
Fixes: 173ae325026bd ('x86/p2m: tidy p2m_add_foreign() a little') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Wed, 30 Dec 2020 17:34:46 +0000 (18:34 +0100)]
xen: remove the usage of the P ar option
It's not part of the POSIX standard [0] and as such non GNU ar
implementations don't usually have it.
It's not relevant for the use case here anyway, as the archive file is
recreated every time due to the rm invocation before the ar call. No
file name matching should happen so matching using the full path name
or a relative one should yield the same result.
This fixes the build on FreeBSD.
While there also drop the s option, as ar will already generate a
symbol table by default when creating the archive.
Andrew Cooper [Tue, 29 Dec 2020 17:51:23 +0000 (17:51 +0000)]
x86/hpet: Fix return value of hpet_setup()
hpet_setup() is idempotent if the rate has already been calculated, and
returns the cached value. However, this only works correctly when the return
statements are identical.
Use a sensibly named local variable, rather than a dead one with a bad name.
Fixes: a60bb68219 ("x86/time: reduce rounding errors in calculations") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 28 Sep 2020 17:14:53 +0000 (18:14 +0100)]
xen/domain: Introduce domain_teardown()
There is no common equivelent of domain_reliquish_resources(), which has
caused various pieces of common cleanup to live in inappropriate
places.
Perhaps most obviously, evtchn_destroy() is called for every continuation of
domain_reliquish_resources(), which can easily be thousands of times.
Create domain_teardown() to be a new top level facility, and call it from the
appropriate positions in domain_kill() and domain_create()'s error path. The
intention is for this to supersede domain_reliquish_resources() in due course.
No change in behaviour yet.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 22 Dec 2020 11:01:12 +0000 (12:01 +0100)]
x86/mm: p2m_add_foreign() is HVM-only
This is the case also for xenmem_add_to_physmap_one(), as is it's only
caller of the function. Move the latter next to p2m_add_foreign(),
allowing it one to become static at the same time. While moving, adjust
indentation of the body of the main switch().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 22 Dec 2020 08:00:03 +0000 (09:00 +0100)]
x86/Intel: insert Tiger Lake model numbers
Both match prior generation processors as far as LBR and C-state MSRs
go (SDM rev 073). The if_pschange_mc erratum, according to the spec
update, is not applicable.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/EFI: don't insert timestamp when SOURCE_DATE_EPOCH is defined
By default a timestamp gets added to the xen efi binary. Unfortunately
ld doesn't seem to provide a way to set a custom date, like from
SOURCE_DATE_EPOCH, so set a zero value for the timestamp (option
--no-insert-timestamp) if SOURCE_DATE_EPOCH is defined. This makes
reproducible builds possible.
This is an alternative to the patch suggested in [1]. This patch only
omits the timestamp when SOURCE_DATE_EPOCH is defined.
Jan Beulich [Tue, 22 Dec 2020 07:57:19 +0000 (08:57 +0100)]
x86: verify function type (and maybe attribute) in switch_stack_and_jump()
It is imperative that the functions passed here are taking no arguments,
return no values, and don't return in the first place. While the type
can be checked uniformly, the attribute check is limited to gcc 9 and
newer (no clang support for this so far afaict).
Note that I didn't want to have the "true" fallback "implementation" of
__builtin_has_attribute(..., __noreturn__) generally available, as
"true" may not be a suitable fallback in other cases.
Note further that the noreturn addition to startup_cpu_idle_loop()'s
declaration requires adding unreachable() to Arm's
switch_stack_and_jump(), or else the build would break. I suppose this
should have been there already.
For vmx_asm_do_vmentry() along with adding the attribute, also restrict
its scope.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Acked-by: Julien Grall <jgrall@amazon.com>
Julien Grall [Fri, 18 Dec 2020 13:30:54 +0000 (13:30 +0000)]
xen: Rework WARN_ON() to return whether a warning was triggered
So far, our implementation of WARN_ON() cannot be used in the following
situation:
if ( WARN_ON() )
...
This is because WARN_ON() doesn't return whether a warning has been
triggered. Such construciton can be handy if you want to print more
information and also dump the stack trace.
Therefore, rework the WARN_ON() implementation to return whether a
warning was triggered. The idea was borrowed from Linux
Andrew Cooper [Mon, 21 Dec 2020 14:52:26 +0000 (14:52 +0000)]
x86/shadow: Fix build with !CONFIG_SHADOW_PAGING
Implement a stub for shadow_vcpu_teardown()
Fixes: d162f36848c4 ("xen/x86: Fix memory leak in vcpu_create() error path") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 28 Sep 2020 14:25:44 +0000 (15:25 +0100)]
xen/x86: Fix memory leak in vcpu_create() error path
Various paths in vcpu_create() end up calling paging_update_paging_modes(),
which eventually allocate a monitor pagetable if one doesn't exist.
However, an error in vcpu_create() results in the vcpu being cleaned up
locally, and not put onto the domain's vcpu list. Therefore, the monitor
table is not freed by {hap,shadow}_teardown()'s loop. This is caught by
assertions later that we've successfully freed the entire hap/shadow memory
pool.
The per-vcpu loops in domain teardown logic is conceptually wrong, but exist
due to insufficient existing structure in the existing logic.
Break paging_vcpu_teardown() out of paging_teardown(), with mirrored breakouts
in the hap/shadow code, and use it from arch_vcpu_create()'s error path. This
fixes the memory leak.
The new {hap,shadow}_vcpu_teardown() must be idempotent, and are written to be
as tolerable as possible, with the minimum number of safety checks possible.
In particular, drop the mfn_valid() check - if these fields are junk, then Xen
is going to explode anyway.
Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 18 Dec 2020 12:29:14 +0000 (13:29 +0100)]
x86/mm: p2m_add_foreign() is HVM-only
This is the case also for xenmem_add_to_physmap_one(), as is it's only
caller of the function. Move the latter next to p2m_add_foreign(),
allowing it one to become static at the same time. While moving, adjust
indentation of the body of the main switch().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 18 Dec 2020 12:28:30 +0000 (13:28 +0100)]
x86/p2m: tidy p2m_add_foreign() a little
Drop a bogus ASSERT() - we don't typically assert incoming domain
pointers to be non-NULL, and there's no particular reason to do so here.
Replace the open-coded DOMID_SELF check by use of
rcu_lock_remote_domain_by_id(), at the same time covering the request
being made with the current domain's actual ID.
Move the "both domains same" check into just the path where it really
is meaningful.
Swap the order of the two puts, such that
- the p2m lock isn't needlessly held across put_page(),
- a separate put_page() on an error path can be avoided,
- they're inverse to the order of the respective gets.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 18 Dec 2020 12:23:42 +0000 (13:23 +0100)]
lib: move bsearch code
Convert this code to an inline function (backed by an instance in an
archive in case the compiler decides against inlining), which results
in not having it in x86 final binaries. This saves a little bit of dead
code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jan Beulich [Fri, 18 Dec 2020 12:20:42 +0000 (13:20 +0100)]
lib: move list sorting code
Build the source file always, as by putting it into an archive it still
won't be linked into final binaries when not needed. This way possible
build breakage will be easier to notice, and it's more consistent with
us unconditionally building other library kind of code (e.g. sort() or
bsearch()).
While moving the source file, take the opportunity and drop the
pointless EXPORT_SYMBOL() and an unnecessary #include.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Fri, 18 Dec 2020 12:17:57 +0000 (13:17 +0100)]
lib: collect library files in an archive
In order to (subsequently) drop odd things like CONFIG_NEEDS_LIST_SORT
just to avoid bloating binaries when only some arch-es and/or
configurations need generic library routines, combine objects under lib/
into an archive, which the linker then can pick the necessary objects
out of.
Note that we can't use thin archives just yet, until we've raised the
minimum required binutils version suitably.
Note further that --start-group / --end-group get put in place right
away to allow for symbol resolution across all archives, once we gain
multuiple ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
Create a test job that starts Xen and Dom0 on QEMU based on the alpine
linux rootfs. Use the Linux kernel and rootfs from the tests-artifacts
containers. Add the Xen tools binaries from the Alpine Linux build job.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Acked-by: Wei Liu <wl@xen.org>
automation: make available the tests artifacts to the pipeline
In order to make available the pre-built binaries of the
automation/tests-artifacts containers to the gitlab-ci pipeline we need
to export them as gitlab artifacts.
To do that, we create two "fake" jobs that simply export the require
binaries as artifacts and do nothing else.
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Acked-by: Wei Liu <wl@xen.org>
Some tests (soon to come) will require pre-built binaries to run, such
as the Linux kernel binary. We don't want to rebuild the Linux kernel
for each gitlab-ci run: these builds should not be added to the current
list of build jobs.
Instead, create additional containers that today are built and uploaded
manually, but could be re-built automatically. The containers build the
required binarires during the "docker build" step and store them inside
the container itself.
gitlab-ci will be able to fetch these pre-built binaries during the
regular test runs, saving cycles.
Add two tests artifacts containers:
- one to build the Linux kernel ARM64
- one to create an Alpine Linux ARM64 rootfs for Dom0
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Acked-by: Wei Liu <wl@xen.org>
automation: add dom0less to the QEMU aarch64 smoke test
Add a trivial dom0less test:
- fetch the Debian arm64 kernel and use it ad dom0/U kernel
- use busybox-static to create a trivial dom0/U ramdisk
- use ImageBuilder to generate the uboot boot script automatically
- install and use u-boot from the Debian package to start the test
- binaries are loaded from uboot via tftp
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Thu, 17 Dec 2020 15:50:21 +0000 (16:50 +0100)]
xen/hypfs: add new enter() and exit() per node callbacks
In order to better support resource allocation and locking for dynamic
hypfs nodes add enter() and exit() callbacks to struct hypfs_funcs.
The enter() callback is called when entering a node during hypfs user
actions (traversing, reading or writing it), while the exit() callback
is called when leaving a node (accessing another node at the same or a
higher directory level, or when returning to the user).
For avoiding recursion this requires a parent pointer in each node.
Let the enter() callback return the entry address which is stored as
the last accessed node in order to be able to use a template entry for
that purpose in case of dynamic entries.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Thu, 17 Dec 2020 15:49:49 +0000 (16:49 +0100)]
xen/hypfs: switch write function handles to const
The node specific write functions take a void user address handle as
parameter. As a write won't change the user memory use a const_void
handle instead.
This requires a new macro for casting a guest handle to a const type.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>