Jan Beulich [Tue, 4 Jan 2022 09:18:18 +0000 (10:18 +0100)]
VT-d: use DMA_TLB_IVA_ADDR()
Let's use the macro in the one place it's supposed to be used, and in
favor of then unnecessary manipulations of the address in
iommu_flush_iotlb_psi(): All leaf functions then already deal correctly
with the supplied address.
There also has never been a need to require (i.e. assert for) the
passing in of 4k-aligned addresses - it'll always be the order-sized
range containing the address which gets flushed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 4 Jan 2022 09:16:48 +0000 (10:16 +0100)]
xenperf: name "newer" hypercalls
This table must not have got updated in quite a while; tmem_op for
example has managed to not only appear since then, but also disappear
again (adding a name for it nevertheless, to make more obvious that
something strange is going on if the slot would ever have a non-zero
value).
Also resolve arch_0 and arch_1 to more meaningful names on x86.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 4 Jan 2022 09:16:04 +0000 (10:16 +0100)]
VT-d: avoid allocating domid_{bit,}map[] when possible
When an IOMMU implements the full 16 bits worth of DID in context
entries, there's no point going through a memory base translation table.
For IOMMUs not using Caching Mode we can simply use the domain IDs
verbatim, while for Caching Mode we need to avoid DID 0.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 4 Jan 2022 09:13:06 +0000 (10:13 +0100)]
x86/EPT: squash meaningless TLB flush
ept_free_entry() gets called after a flush was already issued, if one is
necessary in the first place. That behavior is similar to NPT, which
also doesn't have any further flush in p2m_free_entry(). (Furthermore,
the function being recursive, in case of recursiveness way too many
flushes would have been issued.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 21 Dec 2021 09:42:02 +0000 (10:42 +0100)]
mm: introduce INVALID_{G,M}FN_RAW
This allows properly tying together INVALID_{G,M}FN and
INVALID_{G,M}FN_INITIALIZER as well as using the actual values in
compile time constant expressions (or even preprocessor directives).
Since INVALID_PFN is unused, and with x86'es paging_mark_pfn_dirty()
being the only user of pfn_t it also doesn't seem likely that new uses
would appear, remove that one at this same occasion.
Jan Beulich [Tue, 21 Dec 2021 09:38:18 +0000 (10:38 +0100)]
x86/perfc: conditionalize HVM and shadow counters
There's no point including them when the respective functionality isn't
enabled in the build. Note that this covers only larger groups; more
fine grained exclusion may want to be done later on.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 8 Oct 2021 12:40:17 +0000 (13:40 +0100)]
x86/traps: Clean up diagnostics
do{_unhandled,}_trap() should use fatal_trap() rather than opencoding part of
it. This lets the remote stack trace logic work in more fatal error
conditions.
With do_trap() converted, there is only one single user of trapstr()
remaining. Tweak the formatting in pv_inject_event(), and remove trapstr()
entirely. Rename vec_name() to vector_name() now that it is exported.
Take the opportunity of vector_name() being exported to improve the
diagnostics in stub_selftest().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 7 Oct 2021 13:04:03 +0000 (14:04 +0100)]
x86/traps: Drop exception_table[] and use if/else dispatching
There is also a lot of redundancy in the table. 8 vectors head to do_trap(),
3 are handled in the IST logic, and that only leaves 7 others not heading to
the do_reserved_trap() catch-all. This also removes the fragility that any
accidental NULL entry in the table becomes a ticking timebomb.
Function pointers are expensive under retpoline, and different vectors have
wildly different frequences. Drop the indirect call, and use an if/else chain
instead, which is a code layout technique used by profile-guided optimsiation.
Using Xen's own perfcounter infrastructure, we see the following frequences of
vectors measured from boot until I can SSH into dom0 and collect the stats:
vec | CFL-R | Milan | Notes
----+---------+---------+
NMI | 345 | 3768 | Watchdog. Milan has many more CPUs.
----+---------+---------+
#PF | 1233234 | 2006441 |
#GP | 90054 | 96193 |
#UD | 848 | 851 |
#NM | 0 | 132 | Per-vendor lazy vs eager FPU policy.
#DB | 67 | 67 | No clue, but it's something in userspace.
Bloat-o-meter (after some manual insertion of ELF metadata) reports:
add/remove: 0/1 grow/shrink: 2/0 up/down: 102/-256 (-154)
Function old new delta
handle_exception_saved 148 226 +78
handle_ist_exception 453 477 +24
exception_table 256 - -256
showing that the if/else chains are less than half the size that
exception_table[] was in the first place.
As part of this change, make two other minor changes. do_reserved_trap() is
renamed to do_unhandled_trap() because it is the catchall, and already covers
things that aren't reserved any more (#VE/#VC/#HV/#SX).
Furthermore, don't forward #TS to guests. #TS is specifically for errors
relating to the Task State Segment, which is a Xen-owned structure, not a
guest-owned structure. Even in the 32bit days, we never let guests register
their own Task State Segments.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 28 Oct 2021 03:07:02 +0000 (04:07 +0100)]
xen/domain: Remove function pointers from domain pause helpers
Function pointer calls are expensive (especially with Spectre v2 protections),
and all these do are select between the sync and nosync helpers. Pass a
boolean instead, and use direct calls everywhere.
Pause/unpause operations on behalf of dom0 are not fastpaths, so avoid
exposing the __domain_pause_by_systemcontroller() internal.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Michal Orzel [Fri, 17 Dec 2021 07:21:59 +0000 (08:21 +0100)]
xen/arm64: Zero the top 32 bits of gp registers on entry...
to hypervisor when switching from AArch32 state.
According to section D1.20.2 of Arm Arm(DDI 0487A.j):
"If the general-purpose register was accessible from AArch32 state the
upper 32 bits either become zero, or hold the value that the same
architectural register held before any AArch32 execution.
The choice between these two options is IMPLEMENTATION DEFINED"
Currently Xen does not ensure that the top 32 bits are zeroed and this
needs to be fixed. The reason why is that there are places in Xen
where we assume that top 32bits are zero for AArch32 guests.
If they are not, this can lead to misinterpretation of Xen regarding
what the guest requested. For example hypercalls returning an error
encoded in a signed long like do_sched_op, do_hmv_op, do_memory_op
would return -ENOSYS if the command passed as the first argument was
clobbered.
Create a macro clobber_gp_top_halves to clobber top 32 bits of gp
registers when hyp == 0 (guest mode) and compat == 1 (AArch32 mode).
Add a compile time check to ensure that save_x0_x1 == 1 if
compat == 1.
Signed-off-by: Michal Orzel <michal.orzel@arm.com>
[julieng: Tweak the comment in clobber_gp_top_halves] Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Fri, 17 Dec 2021 07:56:34 +0000 (08:56 +0100)]
hvmloader: tidy pci_mem_{start,end}
For one at least pci_mem_start has to be precisely 32 bits wide, so use
uint32_t for both. Otherwise expressions like "pci_mem_start <<= 1"
won't have the intended effect (in their context).
Further since its introduction pci_mem_end was never written to. Mark it
const to make this explicit.
Finally drop PCI_MEM_END: It is used just once and needlessly
disconnected from the other constant (RESERVED_MEMBASE) it needs to
match. Use RESERVED_MEMBASE as initializer of pci_mem_end instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
While its description is correct from an abstract or real hardware pov,
the range is special inside HVM guests. The range being UC in particular
gets in the way of OVMF, which places itself at [FFE00000,FFFFFFFF].
While this is benign to epte_get_entry_emt() as long as the IOMMU isn't
enabled for a guest, it becomes a very noticable problem otherwise: It
takes about half a minute for OVMF to decompress itself into its
designated address range.
And even beyond OVMF there's no reason to have e.g. the ACPI memory
range marked UC.
Fixes: c22bd567ce22 ("hvmloader: PA range 0xfc000000-0xffffffff should be UC") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Luca Fancellu [Mon, 13 Dec 2021 11:48:54 +0000 (11:48 +0000)]
arm/efi: Handle Xen bootargs from both xen.cfg and DT
Currently the Xen UEFI stub can accept Xen boot arguments from
the Xen configuration file using the "options=" keyword, but also
directly from the device tree specifying xen,xen-bootargs
property.
When the configuration file is used, device tree boot arguments
are ignored and overwritten even if the keyword "options=" is
not used.
This patch handle this case, so if the Xen configuration file is not
specifying boot arguments, the device tree boot arguments will be
used, if they are present.
Luca Fancellu [Thu, 16 Dec 2021 22:43:19 +0000 (14:43 -0800)]
xen/arm: increase memory banks number define value
Currently the maximum number of memory banks (NR_MEM_BANKS define)
is fixed to 128, but on some new platforms that have a large amount
of memory, this value is not enough and prevents Xen from booting.
Andrew Cooper [Tue, 14 Dec 2021 20:04:17 +0000 (20:04 +0000)]
x86/cpuid: Advertise SERIALIZE by default to guests
I've played with SERIALIZE, TSXLDTRK, MOVDIRI and MOVDIR64 on real hardware,
and they all seem fine, including emulation support.
SERIALIZE exists specifically to have a userspace usable serialising operation
without other side effects. (The only other two choices are CPUID which is a
VMExit under virt and clobbers 4 registers, and IRET-to-self which very slow
and consumes content from the stack.)
TSXLDTRK is a niche TSX feature, and TSX itself is niche outside of demos of
speculative sidechannels. Leave the feature opt-in until a usecase is found,
in an effort to preempt the multiple person years of effort it has taken to
mop up TSX issues impacting every processor line.
MOVDIRI and MOVDIR64 are harder to judge. They're architectural building
blocks towards ENQCMD{,S} without obvious usecases on their own. They're of
no use to domains without PCI devices, so leave them opt-in for now.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 14 Dec 2021 16:53:36 +0000 (16:53 +0000)]
x86/cpuid: Introduce dom0-cpuid command line option
Specifically, this lets the user opt in to non-default features.
Collect all dom0 settings together in dom0_{en,dis}able_feat[], and apply it
to dom0's policy when other tweaks are being made.
As recalculate_cpuid_policy() is an expensive action, and dom0-cpuid= is
likely to only be used by the x86 maintainers for development purposes, forgo
the recalculation in the general case.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 15 Dec 2021 16:30:25 +0000 (16:30 +0000)]
x86/cpuid: Factor common parsing out of parse_xen_cpuid()
dom0-cpuid= is going to want to reuse the common parsing loop, so factor it
out into parse_cpuid().
Irritatingly, despite being static const, the features[] array gets duplicated
each time parse_cpuid() is inlined. As it is a large (and ever growing with
new CPU features) datastructure, move it to being file scope so all inlines
use the same single object.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 15 Dec 2021 15:36:59 +0000 (15:36 +0000)]
x86/cpuid: Split dom0 handling out of init_domain_cpuid_policy()
To implement dom0-cpuid= support, the special cases would need extending.
However there is already a problem with late hwdom where the special cases
override toolstack settings, which is unintended and poor behaviour.
Introduce a new init_dom0_cpuid_policy() for the purpose, moving the ITSC and
ARCH_CAPS logic. The is_hardware_domain() can be dropped, and for now there
is no need to rerun recalculate_cpuid_policy(); this is a relatively expensive
operation, and will become more-so over time.
Rearrange the logic in create_dom0() to make room for a call to
init_dom0_cpuid_policy(). The AMX plans for having variable sized XSAVE
states require that modifications to the policy happen before vCPUs are
created.
Additionally, factor out domid into a variable so we can be slightly more
correct in the case of a failure, and also print the error from
domain_create(). This will at least help distinguish -EINVAL from -ENOMEM.
No practical change in behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Mon, 6 Dec 2021 17:02:33 +0000 (17:02 +0000)]
tools/Rules.mk: Cleanup %.pc rules
PKG_CONFIG_VARS isn't set anymore, so is dead logic.
For "local" pkg-config file, we only have one headers directory now,
"tools/include", so there is no need to specify it twice. So remove
$(CFLAGS_xeninclude) from "Cflags:".
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
With "xentoolcore_internal.h" been in LIBHEADER, it was installed. But
its dependency "_xentoolcore_list.h" wasn't installed so the header
couldn't be used anyway.
This patch also mean that the rule "headers.chk" doesn't check it
anymore as well.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com>
Anthony PERARD [Mon, 6 Dec 2021 17:02:17 +0000 (17:02 +0000)]
tools/xcutils: rework Makefile
Use TARGETS to collect targets to build
Remove "build" target.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Clean up $(RM)] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Mon, 6 Dec 2021 17:02:15 +0000 (17:02 +0000)]
tools/misc: rework Makefile
Add missing "xen-detect" rule. It only works without it because we
still have make's built-ins rules and variables, but fix this to not
have to rely on them.
Rename $(TARGETS_BUILD) to $(TARGETS).
Remove the unused "build" target.
Also, they are no more "build-only" targets, remove the extra code.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Mon, 6 Dec 2021 17:02:04 +0000 (17:02 +0000)]
tools/include/xen-foreign: avoid to rely on default .SUFFIXES
When a rule isn't a pattern rule, and thus don't have a %, the
value of the automatic variable stem $* depends on .SUFFIXES. GNU make
manual explain that it is better to avoid this "bizarre" behavior
which exist for compatibility.
Use $(basename ) instead. So we can one day avoid make's build-in
rules and variables.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Mon, 6 Dec 2021 17:01:57 +0000 (17:01 +0000)]
tools/flask/utils: remove unused variables/targets from Makefile
They are no *.opic or *.so in this subdir, so no need to clean them.
The TEST* variables doesn't seems to be used anywhere, and they weren't
used by xen.git when introduced.
Both CLIENTS_* variables aren't used.
Both target "print-dir" and "print-end" only exist in this directory
and are probably not used anywhere.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
[Drop trailing whitespace and use $(RM) consistently] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 16 Dec 2021 02:38:57 +0000 (02:38 +0000)]
xen/build: Fix `make cscope` rune
There are two problems, both in the all_sources definition.
First, everything in arch/*/include gets double hits with cscope queries,
because they end up getting listed twice in cscope.files.
Drop the first `find` rune of the three, because it's redundant with the third
rune following c/s 725381a5eab3 ("xen: move include/asm-* to
arch/*/include/asm").
This is caused by these being symlinks to common/efi. Restrict all find runes
to `-type f` to skip symlinks, because common/efi/*.c are already listed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
xen/arm: do not map PCI ECAM and MMIO space to Domain-0's p2m
PCI host bridges are special devices in terms of implementing PCI
passthrough. According to [1] the current implementation depends on
Domain-0 to perform the initialization of the relevant PCI host
bridge hardware and perform PCI device enumeration. In order to
achieve that one of the required changes is to not map all the memory
ranges in map_range_to_domain as we traverse the device tree on startup
and perform some additional checks if the range needs to be mapped to
Domain-0.
The generic PCI host controller device tree binding says [2]:
- ranges: As described in IEEE Std 1275-1994, but must provide
at least a definition of non-prefetchable memory. One
or both of prefetchable Memory and IO Space may also
be provided.
- reg : The Configuration Space base address and size, as accessed
from the parent bus. The base address corresponds to
the first bus in the "bus-range" property. If no
"bus-range" is specified, this will be bus 0 (the default).
From the above none of the memory ranges from the "ranges" property
needs to be mapped to Domain-0 at startup as MMIO mapping is going to
be handled dynamically by vPCI as we assign PCI devices, e.g. each
device assigned to Domain-0/guest will have its MMIOs mapped/unmapped
as needed by Xen.
The "reg" property covers not only ECAM space, but may also have other
then the configuration memory ranges described, for example [3]:
- reg: Should contain rc_dbi, config registers location and length.
- reg-names: Must include the following entries:
"rc_dbi": controller configuration registers;
"config": PCIe configuration space registers.
This patch makes it possible to not map all the ranges from the
"ranges" property and also ECAM from the "reg". All the rest from the
"reg" property still needs to be mapped to Domain-0, so the PCI
host bridge remains functional in Domain-0. This is done by first
skipping the mappings while traversing the device tree as it is done for
usual devices and then by calling a dedicated pci_host_bridge_mappings
function which only maps MMIOs required by the host bridges leaving the
regions, needed for vPCI traps, unmapped.
xen/arm: account IO handler for emulated PCI host bridge
At the moment, we always allocate an extra 16 slots for IO handlers
(see MAX_IO_HANDLER). So while adding an IO trap handler for the emulated
PCI host bridge we are not breaking anything, but we have a latent bug
as the maximum number of IOs may be exceeded.
Fix this by explicitly telling that we have an additional IO handler, so it is
accounted.
Fixes: d59168dc05a5 ("xen/arm: Enable the existing x86 virtual PCI support for ARM") Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Acked-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Rahul Singh <rahul.singh@arm.com> Tested-by: Rahul Singh <rahul.singh@arm.com>
xen/arm: setup MMIO range trap handlers for hardware domain
In order for vPCI to work it needs to maintain guest and hardware
domain's views of the configuration space. For example, BARs and
COMMAND registers require emulation for guests and the guest view
of the registers needs to be in sync with the real contents of the
relevant registers. For that ECAM address space needs to also be
trapped for the hardware domain, so we need to implement PCI host
bridge specific callbacks to properly setup MMIO handlers for those
ranges depending on particular host bridge implementation.
If a PCI host bridge device is present in the device tree, but is
disabled, then its PCI host bridge driver was not instantiated.
This results in the failure of the pci_get_host_bridge_segment()
and the following panic during Xen start:
(XEN) Device tree generation failed (-22).
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not set up DOM0 guest OS
(XEN) ****************************************
Fix this by adding "linux,pci-domain" property for all device tree nodes
which have "pci" device type, so we know which segments will be used by
the guest for which bridges.
Fixes: 4cfab4425d39 ("xen/arm: Add linux,pci-domain property for hwdom if not available.") Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Rahul Singh <rahul.singh@arm.com> Tested-by: Rahul Singh <rahul.singh@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Wed, 15 Dec 2021 09:24:45 +0000 (10:24 +0100)]
Arm: drop memguard_{,un}guard_range() stubs
These exist for no reason: The code using them is only ever built for
Arm32. And memguard_guard_stack() has no use outside of x86-specific
code at all.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Wed, 15 Dec 2021 09:23:51 +0000 (10:23 +0100)]
x86: drop MEMORY_GUARD
The functions it guards are dead code. Worse, while intended to exist in
debug builds only, as of commit bacbf0cb7349 ("build: convert debug to
Kconfig") they also get compiled in release builds.
The remaining uses in show_stack_overflow() aren't really related to any
memory guarding anymore - with CET-SS support the stacks now get set up
the same in debug and release builds. Drop them as well; there's no harm
providing the information there in all cases.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 15 Dec 2021 09:20:35 +0000 (10:20 +0100)]
x86/PVH: permit more physdevop-s to be used by Dom0
Certain notifications of Dom0 to Xen are independent of the mode Dom0 is
running in. Permit further PCI related ones (only their modern forms).
Also include the USB2 debug port operation at this occasion. While
largely relevant for the latter, drop the has_vpci() part of the
conditional as redundant with is_hardware_domain(): There's no PVH Dom0
without vPCI.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 15 Dec 2021 09:19:54 +0000 (10:19 +0100)]
x86/PVH: improve Dom0 memory size calculation
Assuming that the accounting for IOMMU page tables will also take care
of the P2M needs was wrong: dom0_paging_pages() can determine a far
higher value, high enough for the system to run out of memory while
setting up Dom0. Hence in the case of shared page tables the larger of
the two values needs to be used (without shared page tables the sum of
both continues to be applicable).
To not further complicate the logic, eliminate the up-to-2-iteration
loop in favor of doing a few calculations twice (before and after
calling dom0_paging_pages()). While this will lead to slightly too high
a value in "cpu_pages", it is deemed better to account a few too many
than a few too little.
As a result the calculation is now deemed good enough to no longer
warrant the warning message, which therefore gets dropped.
Also uniformly use paging_mode_enabled(), not is_hvm_domain().
While there also account for two further aspects in the PV case: With
"iommu=dom0-passthrough" no IOMMU page tables would get allocated, so
none need accounting for. And if shadow mode is to be enabled (including
only potentially, because of "pv-l1tf=dom0"), setting aside a suitable
amount for the P2M pool to get populated is also necessary (i.e. similar
to the non-shared-page-tables case of PVH).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Anthony PERARD [Wed, 15 Dec 2021 09:17:34 +0000 (10:17 +0100)]
build: adjust $(TARGET).efi creation in arch/arm
There is no need to try to guess a relative path to the "xen.efi" file,
we can simply use $@. Also, there's no need to use `notdir`, make
already do that work via $(@F).
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Anthony PERARD [Wed, 15 Dec 2021 09:16:51 +0000 (10:16 +0100)]
build: generate "include/xen/compile.h" with if_changed
This will avoid regenerating "compile.h" if the content hasn't changed.
As it's currently the case, the file isn't regenerated during `sudo
make install` if it exist and does belong to a different user, thus we
can remove the target "delete-unfresh-files". Target "$(TARGET)" still
need a phony dependency, so add "FORCE".
Use "$(dot-target).tmp" as temporary file as this is already cover by
".*.tmp" partern in ".gitconfig".
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Wed, 15 Dec 2021 09:14:13 +0000 (10:14 +0100)]
xen: move include/asm-* to arch/*/include/asm
This avoid the need to create the symbolic link "include/asm".
Whenever a comment refer to an "asm" headers, this patch avoid
spelling the arch when not needed to avoid some code churn.
One unrelated change is to sort entries in MAINTAINERS for "INTEL(R)
VT FOR X86 (VT-X)"
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Paul Durrant <paul@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 15 Dec 2021 09:08:38 +0000 (10:08 +0100)]
build: factorise generation of the linker scripts
In Arm and X86 makefile, generating the linker script is the same, so
we can simply have both call the same macro.
We need to add *.lds files into extra-y so that Rules.mk can find the
.*.cmd dependency file and load it.
Change made to the command line:
- Use cpp_flags macro which simply filter -Wa,% options from $(a_flags).
- Added -D__LINKER__ even it is only used by Arm's lds.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Andrew Cooper [Mon, 13 Dec 2021 20:33:42 +0000 (20:33 +0000)]
x86/cpuid: Fix TSXLDTRK definition
TSXLDTRK lives in CPUID leaf 7[0].edx, not 7[0].ecx.
Bit 16 in ecx is LA57.
Fixes: a6d1b558471f ("x86emul: support X{SUS,RES}LDTRK") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Tue, 14 Dec 2021 08:50:07 +0000 (09:50 +0100)]
perfc: drop calls_to_multicall performance counter
The calls_to_multicall performance counter is basically redundant to
the multicall hypercall counter. The only difference is the counting
of continuation calls, which isn't really that interesting.
Drop the calls_to_multicall performance counter.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Tue, 14 Dec 2021 08:49:23 +0000 (09:49 +0100)]
x86/perfc: add hypercall performance counters for hvm, correct pv
The HVM hypercall handler is missing incrementing the per hypercall
counters. Add that.
The counters for PV are handled wrong, as they are not using
perf_incra() with the number of the hypercall as index, but are
incrementing the first hypercall entry (set_trap_table) for each
hypercall. Fix that.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 14 Dec 2021 08:48:17 +0000 (09:48 +0100)]
x86emul: drop "seg" parameter from insn_fetch() hook
This is specified (and asserted for in a number of places) to always be
CS. Passing this as an argument in various places is therefore
pointless. The price to pay is two simple new functions, with the
benefit of the PTWR case now gaining a more appropriate error code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Tue, 14 Dec 2021 08:47:31 +0000 (09:47 +0100)]
SUPPORT.md: limit security support for hosts with very much memory
Sufficient and in particular regular testing on very large hosts cannot
currently be guaranteed. Anyone wanting us to support larger hosts is
free to propose so, but will need to supply not only test results, but
also a test plan.
This is a follow-up to XSA-385.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed by: Alexandru Isaila <aisaila@bitdefender.com>
vpci: fix function attributes for vpci_process_pending
vpci_process_pending is defined with different attributes, e.g.
with __must_check if CONFIG_HAS_VPCI enabled and not otherwise.
Fix this by defining both of the definitions with __must_check.
Fixes: 14583a590783 ("7fbb096bf345 kconfig: don't select VPCI if building a shim-only binary") Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 13 Dec 2021 17:50:48 +0000 (17:50 +0000)]
tools/libfsimage: Fix SONAME
This gets missed on each release. Follow the same example as libs.mk and pick
the version up dynamically.
Fixes: a5706b80f42e ("Set version to 4.17: rerun autogen.sh") Suggested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Fri, 10 Dec 2021 13:03:56 +0000 (14:03 +0100)]
x86/HVM: permit CLFLUSH{,OPT} on execute-only code segments
Both SDM and PM explicitly permit this.
Fixes: 52dba7bd0b36 ("x86emul: generalize wbinvd() hook") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Fri, 10 Dec 2021 13:02:59 +0000 (14:02 +0100)]
EFI: constify EFI_LOADED_IMAGE * function parameters
Instead of altering Arm's forward declarations, drop them. Like
elsewhere we should limit such to cases where the first use lives ahead
of the definition.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Fri, 10 Dec 2021 09:27:27 +0000 (10:27 +0100)]
MAINTAINERS: widen Anthony's area
As was briefly discussed on the December Community Call, I'd like to
propose to widen Anthony's maintainership to all of tools/. This then
means that the special LIBXENLIGHT entry can go away.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <iwj@xenproject.org> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Fri, 10 Dec 2021 09:26:52 +0000 (10:26 +0100)]
x86: avoid wrong use of all-but-self IPI shorthand
With "nosmp" I did observe a flood of "APIC error on CPU0: 04(04), Send
accept error" log messages on an AMD system. And rightly so - nothing
excludes the use of the shorthand in send_IPI_mask() in this case. Set
"unaccounted_cpus" to "true" also when command line restrictions are the
cause.
Note that PV-shim mode is unaffected by this change, first and foremost
because "nosmp" and "maxcpus=" are ignored in this case.
Fixes: 5500d265a2a8 ("x86/smp: use APIC ALLBUT destination shorthand when possible") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
hvmloader's last subdir have been removed in 73b72736e6 ("acpi: Move
ACPI code to tools/libacpi"), so there is no need to use "subdirs-*"
target anymore.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Mon, 6 Dec 2021 17:01:54 +0000 (17:01 +0000)]
libs/store: Remove PKG_CONFIG_REMOVE
PKG_CONFIG_REMOVE doesn't do anything anymore. Commit dd33fd2e81
(tools: split libxenstore into new tools/libs/store directory) had
reintroduced it without saying why.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Fri, 3 Dec 2021 07:30:58 +0000 (08:30 +0100)]
tools/libs/light: set video_mem for PVH guests
The size of the video memory of PVH guests should be set to 0 in case
no value has been specified.
Doing not so will leave it to be -1, resulting in an additional 1 kB
of RAM being advertised in the memory map (here the output of a PVH
Mini-OS boot with 16 MB of RAM assigned):
Juergen Gross [Thu, 9 Dec 2021 13:40:54 +0000 (14:40 +0100)]
tools/libs/ctrl: Save errno only once in *PRINTF() and *ERROR()
All *PRINTF() and *ERROR() macros are based on xc_reportv() which is
saving and restoring errno in order to not modify it. There is no need
to save and restore in those macros, too.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Wed, 8 Dec 2021 08:47:45 +0000 (09:47 +0100)]
tools: set event channel HVM parameters in libxenguest
The HVM parameters for pre-allocated event channels should be set in
libxenguest, like it is done for PV guests, and the ring pages that
libxenguest allocates.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Andrew Cooper [Mon, 6 Dec 2021 13:07:08 +0000 (13:07 +0000)]
x86/build: Move exception tables into __ro_after_init
Since c/s 79713ed0a94c ("x86: move both exception tables into .rodata") in
2016, we've been (ab)using the fact that .rodata is read/write during early
boot, so we can sort the two tables.
Now that we have a real __ro_after_init concept, reposition them to better
match reality.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
--- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org>
xen/arm: process pending vPCI map/unmap operations
vPCI may map and unmap PCI device memory (BARs) being passed through which
may take a lot of time. For this those operations may be deferred to be
performed later, so that they can be safely preempted.
Currently this deferred processing is happening in common IOREQ code
which doesn't seem to be the right place for x86 and is even more
doubtful because IOREQ may not be enabled for Arm at all.
So, for Arm the pending vPCI work may have no chance to be executed
if the processing is left as is in the common IOREQ code only.
For that reason make vPCI processing happen in arch specific code.
Please be aware that there are a few outstanding TODOs affecting this
code path, see xen/drivers/vpci/header.c:map_range and
xen/drivers/vpci/header.c:vpci_process_pending.
Jan Beulich [Mon, 6 Dec 2021 13:16:37 +0000 (14:16 +0100)]
EFI: drop copy-in from QueryVariableInfo()'s OUT-only variable bouncing
While be12fcca8b78 ("efi: fix alignment of function parameters in compat
mode") intentionally bounced them both ways to avoid any functional
change so close to the release of 4.16, the bouncing-in shouldn't really
be needed. In exchange the local variables need to gain initializers to
avoid copying back prior stack contents.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Jan Beulich [Mon, 6 Dec 2021 13:15:54 +0000 (14:15 +0100)]
EFI: move efi-boot.h inclusion point
When it was introduced, it was imo placed way too high up, making it
necessary to forward-declare way too many static functions. Move it down
together with
- the efi_check_dt_boot() stub, which afaict was deliberately placed
immediately ahead of the #include,
- blexit(), because of its use of the efi_arch_blexit() hook.
Move up get_value() and set_color() to before the inclusion so their
forward declarations can also be zapped.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Jan Beulich [Mon, 6 Dec 2021 13:15:05 +0000 (14:15 +0100)]
x86/HVM: fail virt-to-linear conversion for insn fetches from non-code segments
Just like (in protected mode) reads may not go to exec-only segments and
writes may not go to non-writable ones, insn fetches may not access data
segments.
Fixes: 623e83716791 ("hvm: Support hardware task switching") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
paging_mfn_is_dirty() is moderately expensive, so avoid its use unless
its result might actually change anything. This means moving the
surrounding if() down below all other checks that can result in clearing
_PAGE_RW from sflags, in order to then check whether _PAGE_RW is
actually still set there before calling the function.
While moving the block of code, fold two if()s and make a few style
adjustments.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Tue, 30 Nov 2021 21:28:48 +0000 (21:28 +0000)]
x86/vPMU: Drop supported parameter from the wrmsr path
The supported parameter was added in 2d9b91f1aeaa ("VMX/vPMU: fix DebugCtl MSR
handling"). It unfortunately laid the groundwork for XSA-269, and the fix 2a8a8e99feb9 ("x86/vtx: Fix the checking for unknown/invalid MSR_DEBUGCTL
bits") totally rewrote MSR_DEBUGCTL handling.
Strip out the parameter again.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 1 Dec 2021 10:35:20 +0000 (10:35 +0000)]
xsm: Drop extern of non-existent variable
dummy_xsm_ops was dropped as part of organising XSM to be altcall compatible,
but the extern was accidentally left around.
A later change reintroduced dummy_ops which is logically the same thing, but
is private to xsm/dummy.c
Fixes: 164a0b9653f4 ("xsm: refactor xsm_ops handling") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Andrew Cooper [Wed, 1 Dec 2021 10:34:00 +0000 (10:34 +0000)]
xsm: Switch xsm_ops to __alt_call_maybe_initdata
This should have been done at the point xsm_ops became fully altcall'd. This
puts the xsm_ops structure in .init on architectures where it is no longer
referenced at runtime.
Fixes: d868feb95a8a ("xen/xsm: Complete altcall conversion of xsm interface") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
xen/arm: do not use void pointer in pci_host_common_probe
There is no reason to use void pointer while passing ECAM ops to the
pci_host_common_probe function as it is anyway casted to struct pci_ecam_ops
inside. For that reason remove the void pointer and pass struct pci_ecam_ops
pointer as is.
Jan Beulich [Fri, 3 Dec 2021 12:54:28 +0000 (13:54 +0100)]
gnttab: remove guest_physmap_remove_page() call from gnttab_map_frame()
Without holding appropriate locks, attempting to remove a prior mapping
of the underlying page is pointless, as the same (or another) mapping
could be re-established by a parallel request on another vCPU. Move the
code to Arm's gnttab_set_frame_gfn(); it cannot be dropped there since
xenmem_add_to_physmap_one() doesn't call it either (unlike on x86). Of
course this new placement doesn't improve things in any way as far as
the security of grant status frame mappings goes (see XSA-379). Proper
locking would be needed here to allow status frames to be mapped
securely.
In turn this then requires replacing the other use in
gnttab_unpopulate_status_frames(), which yet in turn requires adjusting
x86's gnttab_set_frame_gfn(). Note that with proper locking inside
guest_physmap_remove_page() combined with checking the GFN's mapping
there against the passed in MFN, there then is no issue with the
involved multiple gnttab_set_frame_gfn()-s potentially returning varying
values (due to a racing XENMAPSPACE_grant_table request).
This, as a side effect, does away with gnttab_map_frame() having a local
variable "gfn" which shadows a function parameter of the same name.
Together with XSA-379 this points out that XSA-255's addition to
gnttab_map_frame() was really useless.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Michal Orzel [Fri, 3 Dec 2021 09:58:37 +0000 (10:58 +0100)]
arm/vgic: Fix reference to a non-existing function
Commit 68dcdf942326ad90ca527831afbee9cd4a867f84 (xen/arm:
s/gic_set_guest_irq/gic_raise_guest_irq) forgot to modify a comment
about lr_pending list, referring to a function that has been renamed.