Jan Beulich [Tue, 14 Dec 2021 08:47:31 +0000 (09:47 +0100)]
SUPPORT.md: limit security support for hosts with very much memory
Sufficient and in particular regular testing on very large hosts cannot
currently be guaranteed. Anyone wanting us to support larger hosts is
free to propose so, but will need to supply not only test results, but
also a test plan.
This is a follow-up to XSA-385.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed by: Alexandru Isaila <aisaila@bitdefender.com>
vpci: fix function attributes for vpci_process_pending
vpci_process_pending is defined with different attributes, e.g.
with __must_check if CONFIG_HAS_VPCI enabled and not otherwise.
Fix this by defining both of the definitions with __must_check.
Fixes: 14583a590783 ("7fbb096bf345 kconfig: don't select VPCI if building a shim-only binary") Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 13 Dec 2021 17:50:48 +0000 (17:50 +0000)]
tools/libfsimage: Fix SONAME
This gets missed on each release. Follow the same example as libs.mk and pick
the version up dynamically.
Fixes: a5706b80f42e ("Set version to 4.17: rerun autogen.sh") Suggested-by: Anthony PERARD <anthony.perard@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Fri, 10 Dec 2021 13:03:56 +0000 (14:03 +0100)]
x86/HVM: permit CLFLUSH{,OPT} on execute-only code segments
Both SDM and PM explicitly permit this.
Fixes: 52dba7bd0b36 ("x86emul: generalize wbinvd() hook") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Fri, 10 Dec 2021 13:02:59 +0000 (14:02 +0100)]
EFI: constify EFI_LOADED_IMAGE * function parameters
Instead of altering Arm's forward declarations, drop them. Like
elsewhere we should limit such to cases where the first use lives ahead
of the definition.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Fri, 10 Dec 2021 09:27:27 +0000 (10:27 +0100)]
MAINTAINERS: widen Anthony's area
As was briefly discussed on the December Community Call, I'd like to
propose to widen Anthony's maintainership to all of tools/. This then
means that the special LIBXENLIGHT entry can go away.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <iwj@xenproject.org> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Fri, 10 Dec 2021 09:26:52 +0000 (10:26 +0100)]
x86: avoid wrong use of all-but-self IPI shorthand
With "nosmp" I did observe a flood of "APIC error on CPU0: 04(04), Send
accept error" log messages on an AMD system. And rightly so - nothing
excludes the use of the shorthand in send_IPI_mask() in this case. Set
"unaccounted_cpus" to "true" also when command line restrictions are the
cause.
Note that PV-shim mode is unaffected by this change, first and foremost
because "nosmp" and "maxcpus=" are ignored in this case.
Fixes: 5500d265a2a8 ("x86/smp: use APIC ALLBUT destination shorthand when possible") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
hvmloader's last subdir have been removed in 73b72736e6 ("acpi: Move
ACPI code to tools/libacpi"), so there is no need to use "subdirs-*"
target anymore.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Mon, 6 Dec 2021 17:01:54 +0000 (17:01 +0000)]
libs/store: Remove PKG_CONFIG_REMOVE
PKG_CONFIG_REMOVE doesn't do anything anymore. Commit dd33fd2e81
(tools: split libxenstore into new tools/libs/store directory) had
reintroduced it without saying why.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Fri, 3 Dec 2021 07:30:58 +0000 (08:30 +0100)]
tools/libs/light: set video_mem for PVH guests
The size of the video memory of PVH guests should be set to 0 in case
no value has been specified.
Doing not so will leave it to be -1, resulting in an additional 1 kB
of RAM being advertised in the memory map (here the output of a PVH
Mini-OS boot with 16 MB of RAM assigned):
Juergen Gross [Thu, 9 Dec 2021 13:40:54 +0000 (14:40 +0100)]
tools/libs/ctrl: Save errno only once in *PRINTF() and *ERROR()
All *PRINTF() and *ERROR() macros are based on xc_reportv() which is
saving and restoring errno in order to not modify it. There is no need
to save and restore in those macros, too.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Wed, 8 Dec 2021 08:47:45 +0000 (09:47 +0100)]
tools: set event channel HVM parameters in libxenguest
The HVM parameters for pre-allocated event channels should be set in
libxenguest, like it is done for PV guests, and the ring pages that
libxenguest allocates.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Andrew Cooper [Mon, 6 Dec 2021 13:07:08 +0000 (13:07 +0000)]
x86/build: Move exception tables into __ro_after_init
Since c/s 79713ed0a94c ("x86: move both exception tables into .rodata") in
2016, we've been (ab)using the fact that .rodata is read/write during early
boot, so we can sort the two tables.
Now that we have a real __ro_after_init concept, reposition them to better
match reality.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
--- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org>
xen/arm: process pending vPCI map/unmap operations
vPCI may map and unmap PCI device memory (BARs) being passed through which
may take a lot of time. For this those operations may be deferred to be
performed later, so that they can be safely preempted.
Currently this deferred processing is happening in common IOREQ code
which doesn't seem to be the right place for x86 and is even more
doubtful because IOREQ may not be enabled for Arm at all.
So, for Arm the pending vPCI work may have no chance to be executed
if the processing is left as is in the common IOREQ code only.
For that reason make vPCI processing happen in arch specific code.
Please be aware that there are a few outstanding TODOs affecting this
code path, see xen/drivers/vpci/header.c:map_range and
xen/drivers/vpci/header.c:vpci_process_pending.
Jan Beulich [Mon, 6 Dec 2021 13:16:37 +0000 (14:16 +0100)]
EFI: drop copy-in from QueryVariableInfo()'s OUT-only variable bouncing
While be12fcca8b78 ("efi: fix alignment of function parameters in compat
mode") intentionally bounced them both ways to avoid any functional
change so close to the release of 4.16, the bouncing-in shouldn't really
be needed. In exchange the local variables need to gain initializers to
avoid copying back prior stack contents.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Jan Beulich [Mon, 6 Dec 2021 13:15:54 +0000 (14:15 +0100)]
EFI: move efi-boot.h inclusion point
When it was introduced, it was imo placed way too high up, making it
necessary to forward-declare way too many static functions. Move it down
together with
- the efi_check_dt_boot() stub, which afaict was deliberately placed
immediately ahead of the #include,
- blexit(), because of its use of the efi_arch_blexit() hook.
Move up get_value() and set_color() to before the inclusion so their
forward declarations can also be zapped.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Jan Beulich [Mon, 6 Dec 2021 13:15:05 +0000 (14:15 +0100)]
x86/HVM: fail virt-to-linear conversion for insn fetches from non-code segments
Just like (in protected mode) reads may not go to exec-only segments and
writes may not go to non-writable ones, insn fetches may not access data
segments.
Fixes: 623e83716791 ("hvm: Support hardware task switching") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
paging_mfn_is_dirty() is moderately expensive, so avoid its use unless
its result might actually change anything. This means moving the
surrounding if() down below all other checks that can result in clearing
_PAGE_RW from sflags, in order to then check whether _PAGE_RW is
actually still set there before calling the function.
While moving the block of code, fold two if()s and make a few style
adjustments.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Tue, 30 Nov 2021 21:28:48 +0000 (21:28 +0000)]
x86/vPMU: Drop supported parameter from the wrmsr path
The supported parameter was added in 2d9b91f1aeaa ("VMX/vPMU: fix DebugCtl MSR
handling"). It unfortunately laid the groundwork for XSA-269, and the fix 2a8a8e99feb9 ("x86/vtx: Fix the checking for unknown/invalid MSR_DEBUGCTL
bits") totally rewrote MSR_DEBUGCTL handling.
Strip out the parameter again.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 1 Dec 2021 10:35:20 +0000 (10:35 +0000)]
xsm: Drop extern of non-existent variable
dummy_xsm_ops was dropped as part of organising XSM to be altcall compatible,
but the extern was accidentally left around.
A later change reintroduced dummy_ops which is logically the same thing, but
is private to xsm/dummy.c
Fixes: 164a0b9653f4 ("xsm: refactor xsm_ops handling") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Andrew Cooper [Wed, 1 Dec 2021 10:34:00 +0000 (10:34 +0000)]
xsm: Switch xsm_ops to __alt_call_maybe_initdata
This should have been done at the point xsm_ops became fully altcall'd. This
puts the xsm_ops structure in .init on architectures where it is no longer
referenced at runtime.
Fixes: d868feb95a8a ("xen/xsm: Complete altcall conversion of xsm interface") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
xen/arm: do not use void pointer in pci_host_common_probe
There is no reason to use void pointer while passing ECAM ops to the
pci_host_common_probe function as it is anyway casted to struct pci_ecam_ops
inside. For that reason remove the void pointer and pass struct pci_ecam_ops
pointer as is.
Jan Beulich [Fri, 3 Dec 2021 12:54:28 +0000 (13:54 +0100)]
gnttab: remove guest_physmap_remove_page() call from gnttab_map_frame()
Without holding appropriate locks, attempting to remove a prior mapping
of the underlying page is pointless, as the same (or another) mapping
could be re-established by a parallel request on another vCPU. Move the
code to Arm's gnttab_set_frame_gfn(); it cannot be dropped there since
xenmem_add_to_physmap_one() doesn't call it either (unlike on x86). Of
course this new placement doesn't improve things in any way as far as
the security of grant status frame mappings goes (see XSA-379). Proper
locking would be needed here to allow status frames to be mapped
securely.
In turn this then requires replacing the other use in
gnttab_unpopulate_status_frames(), which yet in turn requires adjusting
x86's gnttab_set_frame_gfn(). Note that with proper locking inside
guest_physmap_remove_page() combined with checking the GFN's mapping
there against the passed in MFN, there then is no issue with the
involved multiple gnttab_set_frame_gfn()-s potentially returning varying
values (due to a racing XENMAPSPACE_grant_table request).
This, as a side effect, does away with gnttab_map_frame() having a local
variable "gfn" which shadows a function parameter of the same name.
Together with XSA-379 this points out that XSA-255's addition to
gnttab_map_frame() was really useless.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Michal Orzel [Fri, 3 Dec 2021 09:58:37 +0000 (10:58 +0100)]
arm/vgic: Fix reference to a non-existing function
Commit 68dcdf942326ad90ca527831afbee9cd4a867f84 (xen/arm:
s/gic_set_guest_irq/gic_raise_guest_irq) forgot to modify a comment
about lr_pending list, referring to a function that has been renamed.
Jan Beulich [Fri, 3 Dec 2021 10:37:45 +0000 (11:37 +0100)]
x86/Viridian: fold duplicate vpset retrieval code
hvcall_{flush,ipi}_ex() use more almost identical code than what was
isolated into hv_vpset_to_vpmask(). Move that code there as well, to
have just one instance of it. This way all of HV_GENERIC_SET_SPARSE_4K
processing now happens in a single place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Fri, 3 Dec 2021 10:36:46 +0000 (11:36 +0100)]
x86/alternatives: adjust alternative_vcall0()
I'm puzzled about two inconsistencies with other alternative_vcall<N>()
here: There's a check missing that the supplied function pointer is
actually pointing to a function taking no args. And there's a pointless
pair of parentheses. Correct both.
Fixes: 67d01cdb5518 ("x86: infrastructure to allow converting certain indirect calls to direct ones") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Jan Beulich [Fri, 3 Dec 2021 10:36:10 +0000 (11:36 +0100)]
x86/x2APIC: defer probe until after IOMMU ACPI table parsing
While commit 46c4061cd2bf ("x86/IOMMU: mark IOMMU / intremap not in use
when ACPI tables are missing") deals with apic_x2apic_probe() as called
from x2apic_bsp_setup(), the check_x2apic_preenabled() path is similarly
affected: The call needs to occur after acpi_iommu_init(), such that
iommu_intremap getting disabled there can be properly taken into account
by apic_x2apic_probe().
Note that, for the time being (further cleanup patches following),
reversing the order of the calls to generic_apic_probe() and
acpi_boot_init() is not an option:
- acpi_process_madt() calls clustered_apic_check() and hence relies on
genapic to have got filled before,
- generic_bigsmp_probe() (called from acpi_process_madt()) needs to
occur after generic_apic_probe(),
- acpi_parse_madt() (called from acpi_process_madt()) calls
acpi_madt_oem_check(), which wants to be after generic_apic_probe().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 3 Dec 2021 10:34:24 +0000 (11:34 +0100)]
VT-d: introduce helper to convert DID to domid_t
This is in preparation of adding another "translation" method. Take the
combination of the extra validation both previously open-coded have been
doing: Bounds check and bitmap check. But don't propagate the previous
pointless check of whether ->domid_map[] was actually allocated, as
failure there would lead to overall failure of IOMMU initialization
anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Fri, 3 Dec 2021 10:33:43 +0000 (11:33 +0100)]
VT-d: tidy domid map handling
- Correct struct field type.
- Use unsigned int when that suffices.
- Eliminate a (badly typed) local variable from
context_set_domain_id().
- Don't use -EFAULT inappropriately.
- Move set_bit() such that it won't be done redundantly.
- Constification.
- Reduce scope of some variables.
- Coding style.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Fri, 3 Dec 2021 10:22:03 +0000 (11:22 +0100)]
x86/vPMU: move vpmu_ops to .init.data
Both vendors' code populates all hooks, so there's no need to have any
NULL checks before invoking the hook functions. With that the only
remaining uses of the object are in alternative_{,v}call(), i.e. none
after alternatives patching.
In vpmu_arch_initialise() the check gets replaced by an opt_vpmu_enabled
one, as I couldn't convince myself that the pre-existing checks would be
sufficient to cover all possible cases.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 3 Dec 2021 10:21:14 +0000 (11:21 +0100)]
x86/vPMU: invoke <vendor>_vpmu_initialise() through a hook as well
I see little point in having an open-coded switch() statement to achieve
the same; like other vendor-specific operations the function can be
supplied in the respective ops structure instances.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 3 Dec 2021 10:20:24 +0000 (11:20 +0100)]
x86/vPMU: convert vendor hook invocations to altcall
At least some vPMU functions will be invoked (and hence can further be
speculated into) even in the vPMU-disabled case. Convert vpmu_ops to
the standard single-instance model being a prerequisite to engaging the
alternative_call() machinery, and convert all respective calls. Note
that this requires vpmu_init() to become a pre-SMP initcall.
This change then also helps performance.
To replace a few vpmu->arch_vpmu_ops NULL checks, introduce a new
VPMU_INITIALIZED state, such that in the absence of any other suitable
vmpu_is_set() checks this state can be checked for.
While adding the inclusion of xen/err.h, also prune other xen/*.h
inclusions.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Performance analysis has shown that dropping the domctl lock during
domain destruction greatly increases the contention in the heap_lock,
thus making parallel destruction of domains slower.
The following lockperf data shows the difference between the current
code and the reverted one:
Given the current point in the release, revert the commit and
reinstate holding the domctl lock during domain destruction. Further
work should be done in order to re-add more fine grained locking to
the domain destruction path once a proper solution to avoid the
heap_lock contention is found.
Reported-by: Hongyan Xia <hongyxia@amazon.com> Reported-by: Dmitry Isaikin <isaikin-dmitry@yandex.ru> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Juergen Gross [Fri, 3 Dec 2021 10:18:38 +0000 (11:18 +0100)]
x86: limit number of hypercall parameters to 5
Today there is no hypercall with more than 5 parameters, while the ABI
allows up to 6 parameters. Especially for the X86 32-bit case using
6 parameters would require to run without frame pointer, which isn't
very fortunate. Note that for Arm the limit is 5 parameters already.
So limit the maximum number of parameters to 5 for x86, too.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 3 Dec 2021 10:17:50 +0000 (11:17 +0100)]
x86/HVM: skip offline vCPU-s when dumping VMCBs/VMCSes
There's not really any register state associated with vCPU-s that
haven't been initialized yet, so avoid spamming the log with largely
useless information while still leaving an indication of the fact.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 3 Dec 2021 10:14:24 +0000 (11:14 +0100)]
x86/PV: properly set shadow allocation for Dom0
Leaving shadow setup just to the L1TF tasklet means running Dom0 on a
minimally acceptable shadow memory pool, rather than what normally
would be used (also, for example, for PVH). Populate the pool before
triggering the tasklet (or in preparation for L1TF checking logic to
trigger it), on a best effort basis (again like done for PVH).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Mon, 29 Nov 2021 20:11:01 +0000 (20:11 +0000)]
x86/boot: Support __ro_after_init
For security hardening reasons, it advantageous to make setup-once data
immutable after boot. Borrow __ro_after_init from Linux.
On x86, place .data.ro_after_init at the start of .rodata, excluding it from
the early permission restrictions. Re-apply RO restrictions to the whole of
.rodata in init_done(), attempting to reform the superpage if possible.
For architectures which don't implement __ro_after_init explicitly, variables
merges into .data.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 29 Nov 2021 20:04:11 +0000 (20:04 +0000)]
x86/boot: Adjust .text/.rodata/etc permissions in one place
At the moment, we have two locations selecting restricted permissions, not
very far apart on boot, dependent on opposite answers from using_2M_mapping().
The later location however can shatter superpages if needed, while the former
cannot.
Collect together all the permission adjustments at the slightly later point in
boot, as we likely need to shatter a superpage to support __ro_after_init.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 29 Nov 2021 19:01:50 +0000 (19:01 +0000)]
x86/boot: Drop xen_virt_end
The calculation in __start_xen() for xen_virt_end is an opencoding of
ROUNDUP(_end, 2M). This is __2M_rwdata_end as provided by the linker script.
This corrects the bound calculations in arch_livepatch_init() and
update_xen_mappings() to not enforce 2M alignment when Xen is not compiled
with CONFIG_XEN_ALIGN_2M.
Furthermore, since 52975142d154 ("x86/boot: Create the l2_xenmap[] mappings
dynamically"), there have not been extraneous mappings to delete, meaning that
the call to destroy_xen_mappings() has been a no-op.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 29 Nov 2021 16:09:08 +0000 (16:09 +0000)]
x86/boot: Drop incorrect mapping at l2_xenmap[0]
It has been 4 years since the default load address changed from 1M to 2M, and
_stext ceased residing in l2_xenmap[0]. We should not be inserting an unused
mapping.
To ensure we don't create mappings accidentally, loop from 0 and obey
_PAGE_PRESENT on all entries.
Fixes: 7ed93f3a0dff ("x86: change default load address from 1 MiB to 2 MiB") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
GENMASK(30, 21) should be 0x7fe00000. Fixed this in the comment
in bitops.h.
Signed-off-by: Ayan Kumar Halder <ayankuma@xilinx.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Tweak text, to put an end to any further bikeshedding] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Wed, 24 Nov 2021 11:24:03 +0000 (12:24 +0100)]
CHANGELOG: add missing entries for work during the 4.16 release cycle
Document some of the relevant changes during the 4.16 release cycle.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Release-Acked-by: Ian Jackson <iwj@xenproject.org>
(cherry picked from commit e2544a28beacd854f295095d102a8773743ac917)
Currently, the code used to handle and possibly load from the filesystem
modules defined in the DT is allocating and closing the filesystem handle
for each module to be loaded.
To improve the performance, the filesystem handle pointer is passed
through the call stack, requested when it's needed only once and closed
if it was allocated.
Andrew Cooper [Thu, 7 Oct 2021 13:02:10 +0000 (14:02 +0100)]
x86/crash: Drop manual hooking of exception_table[]
NMI hooking in the crash path has undergone several revisions since its
introduction. What we have now is not sufficiently different from the regular
nmi_callback() mechanism to warrant special casing.
Use set_nmi_callback() directly, and do away with patching a read-only data
structure via a read-write alias. This also means that the
vmx_vmexit_handler() can and should call do_nmi() directly, rather than
indirecting through the exception table to pick up the crash path hook.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 8 Oct 2021 12:11:21 +0000 (13:11 +0100)]
x86/traps: Drop dummy_nmi_callback()
The unconditional nmi_callback() call in do_nmi() calls dummy_nmi_callback()
in all cases other than for a few specific and rare tasks (alternative
patching, microcode loading, etc).
Indirect calls are expensive under retpoline, so rearrange the logic to use
NULL as the default, and skip the call entirely in the common case.
While rearranging the code, fold the exit paths.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 19 Nov 2021 13:16:12 +0000 (13:16 +0000)]
x86/dom0: Fix command line parsing issues with dom0_nodes=
This is a simple comma separated list, so use the normal form.
* Don't cease processing subsequent elements on an error
* Do report -EINVAL for things like `dom0_nodes=4foo`
* Don't opencode the cmdline_strcmp() helper
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 17 Nov 2021 17:45:21 +0000 (17:45 +0000)]
x86/hvm: Remove callback from paging->flush_tlb() hook
TLB flushing is a hotpath, and function pointer calls are
expensive (especially under retpoline) for what amounts to an identity
transform on the data. Just pass the vcpu_bitmap bitmap directly.
As we use NULL for all rather than none, introduce a flush_vcpu() helper to
avoid the risk of logical errors from opencoding the expression. This also
means the viridian callers can avoid writing an all-ones bitmap for the
flushing logic to consume.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Sat, 30 Oct 2021 23:03:56 +0000 (00:03 +0100)]
x86/IO-APIC: Drop function pointers from __ioapic_{read,write}_entry()
Function pointers are expensive, and the raw parameter is a constant at the
root of all call trees, meaning that it predicts very well with local branch
history.
Furthermore, the knock-on effects are quite impressive.
Andrew Cooper [Wed, 17 Nov 2021 16:16:23 +0000 (16:16 +0000)]
xen/smp: Support NULL IPI function pointers
There are several cases where the act of interrupting a remote processor has
the required side effect. Explicitly allow NULL function pointers so the
calling code doesn't have to provide a stub implementation.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 5 Nov 2021 12:35:46 +0000 (13:35 +0100)]
x86/ACPI: drop dead interpreter-related code
CONFIG_ACPI_INTERPRETER does not get defined anywhere, the enclosed code
wouldn't build, and the default-to-phys logic works differently anyway
(see genapic/bigsmp.c:probe_bigsmp()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 5 Nov 2021 12:34:57 +0000 (13:34 +0100)]
x86/APIC: drop probe_default()
The function does nothing but return success. Simply treat absence of a
probe hook to mean just this. This then eliminates the (purely
theoretical at this point) risk of trying to call through
apic_x2apic_{cluster,phys}'s respective NULL pointers.
While doing this also eliminate generic_apic_probe()'s "changed"
variable: apic_probe[]'s default entry will now be used unconditionally
in yet more obvious a way, such that separately setting genapic from
apic_default is (hopefully) no longer justified. Yet that was the main
purpose of the variable.
To help prove that apic_default's probe() hook doesn't get used
elsewhere, further make apic_probe[] static at this occasion.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 5 Nov 2021 12:34:37 +0000 (13:34 +0100)]
x86/APIC: drop {acpi_madt,mps}_oem_check() hooks
The hook functions have been empty for a very long time, if not
(according to git history) forever. Ditch them alongside the then empty
mach_mpparse.h instances and the then unused APICFUNC() macro.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 5 Nov 2021 12:34:12 +0000 (13:34 +0100)]
x86/APIC: drop clustered_apic_check() hook
The hook functions have been empty forever (x2APIC) or issuing merely a
printk() for a long time (xAPIC). Since that printk() is (a) generally
useful (i.e. also in the x2APIC case) and (b) would better only be
issued once the final APIC driver to use was determined, move (and
generalize) it into connect_bsp_APIC().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 12 Nov 2021 16:00:13 +0000 (16:00 +0000)]
x86/cpufreq: Drop opencoded CPUID handling from powernow
Xen already collects CPUID.0x80000007.edx by default, meaning that we can
refer to per-cpu data directly. This also avoids the need IPI the onlining
CPU to identify whether Core Performance Boost is available.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 12 Nov 2021 16:28:24 +0000 (16:28 +0000)]
x86/cpufreq: Rework APERF/MPERF handling
Currently, each feature_detect() (called on CPU add) hook for both cpufreq
drivers duplicates cpu_has_aperfmperf in a per-cpu datastructure, and edits
cpufreq_driver.getavg to point at get_measured_perf().
As all parts of this are vendor-neutral, drop the function pointer and
duplicated boolean, and call get_measured_perf() directly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 12 Nov 2021 15:13:36 +0000 (15:13 +0000)]
x86/cpufreq: Clean up powernow registration
powernow_register_driver() is currently written with a K&R type definition;
I'm surprised that compilers don't object to a mismatch with its declaration,
which is written in an ANSI-C compatible way.
Furthermore, its sole caller is cpufreq_driver_init() which is a pre-smp
initcall. There are no other online CPUs, and even if there were, checking
the BSP's CPUID data $N times is pointless. Simplify registration to only
look at the BSP.
While at it, drop obviously unused includes. Also rewrite the expression in
cpufreq_driver_init() for clarity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 4 Nov 2021 03:12:49 +0000 (03:12 +0000)]
xen/xsm: Improve fallback handling in xsm_fixup_ops()
The current xsm_fixup_ops() is just shy of a full page when compiled, and very
fragile to NULL function pointer errors.
Address both of these issues with a minor piece of structure (ab)use.
Introduce dummy_ops, and fix up the provided xsm_ops pointer by treating both
as an array of unsigned longs.
The compiled size improvement speaks for itself:
$ ../scripts/bloat-o-meter xen-syms-before xen-syms-after
add/remove: 1/0 grow/shrink: 0/1 up/down: 712/-3897 (-3185)
Function old new delta
dummy_ops - 712 +712
xsm_fixup_ops 3987 90 -3897
and there is an additional safety check that will make it obvious during
development if there is an issue with the fallback handling.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Andrew Cooper [Thu, 4 Nov 2021 19:36:16 +0000 (19:36 +0000)]
xen/xsm: Complete altcall conversion of xsm interface
With alternative_call() capable of handling compound types, the three
remaining hooks can be optimised at boot time too.
Fixes: 164a0b9653f4 ("xsm: refactor xsm_ops handling") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Jan Beulich [Thu, 4 Nov 2021 16:04:05 +0000 (17:04 +0100)]
x86/altcall: allow compound types to be passed
Replace the conditional operator in ALT_CALL_ARG(), which was intended
to limit usable types to scalar ones, by a size check. Some restriction
here is necessary to make sure we don't violate the ABI's calling
conventions, but limiting to scalar types was both too restrictive
(disallowing e.g. guest handles) and too permissive (allowing e.g.
__int128_t).
Note that there was some anomaly with that conditional operator anyway:
Something - I don't recall what - made it impossible to omit the middle
operand.
Code-generation-wise this has the effect of removing certain zero- or
sign-extending in some altcall invocations. This ought to be fine as the
ABI doesn't require sub-sizeof(int) values to be extended, except when
passed through an ellipsis. No functions subject to altcall patching has
a variable number of arguments, though.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Unfortunately this triggers -Werror=sizeof-array-argument on some versions of
GCC, so alter xsm_{alloc,free}_security_evtchns() to use a pointer rather than
array parameter.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Andrew Cooper [Wed, 24 Nov 2021 19:06:02 +0000 (19:06 +0000)]
Revert "x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents"
OSSTest has identified a 3rd regression caused by this change. Migration
between Xen 4.15 and 4.16 on the nocera pair of machines (AMD Opteron 4133)
fails with:
which is a safety check to prevent resuming the guest when the CPUID data has
been truncated. The problem is caused by shrinking of the max policies, which
is an ABI that needs handling compatibly between different versions of Xen.
Furthermore, shrinking of the default policies also breaks things in some
cases, because certain cpuid= settings in a VM config file which used to work
will now be refused. Also external toolstacks that attempt to set the CPUID
policy from a featureset might now see some filled leaves not reachable due to
the shrinking done to the default domain policy before applying the
featureset.
Fixes: 540d911c2813 ("x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents") Fixes: 81da2b544cbb ("x86/cpuid: prevent shrinking migrated policies max leaves") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 24 Nov 2021 10:12:44 +0000 (11:12 +0100)]
VT-d: conditionalize IOTLB register offset check
As of commit 6773b1a7584a ("VT-d: Don't assume register-based
invalidation is always supported") we don't (try to) use register based
invalidation anymore when that's not supported by hardware. Hence
there's also no point in the respective check, avoiding pointless IOMMU
initialization failure. After all the spec (version 3.3 at the time of
writing) doesn't say what the respective Extended Capability Register
field would contain in such a case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Wed, 24 Nov 2021 10:12:03 +0000 (11:12 +0100)]
VT-d: correct off-by-1 in fault register range check
All our present implementation requires is that the range fully fits
in a single page. No need to exclude the case of the last register
extending right to the end of that page.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Wed, 24 Nov 2021 10:11:24 +0000 (11:11 +0100)]
VT-d: prune SAGAW recognition
Bit 0 of SAGAW in the capability register has become reserved at or
before spec version 2.2. Treat it as such. Replace the effective open-
coding of find_first_set_bit(). Adjust local variable types.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Wed, 24 Nov 2021 10:10:36 +0000 (11:10 +0100)]
x86/Viridian: drop dead variable updates
Both hvcall_flush_ex() and hvcall_ipi_ex() update "size" without
subsequently using the value; future compilers may warn about such.
Alongside dropping the updates, shrink the variables' scopes to
demonstrate that there are no outer scope uses.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Wed, 24 Nov 2021 10:09:56 +0000 (11:09 +0100)]
x86/Viridian: fix error code use
Both the wrong use of HV_STATUS_* and the return type of
hv_vpset_to_vpmask() can lead to viridian_hypercall()'s
ASSERT_UNREACHABLE() triggering when translating error codes from Xen
to Viridian representation.
Fixes: b4124682db6e ("viridian: add ExProcessorMasks variants of the flush hypercalls") Fixes: 9afa867d42ba ("viridian: add ExProcessorMasks variant of the IPI hypercall") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Roger Pau Monné [Wed, 24 Nov 2021 10:07:52 +0000 (11:07 +0100)]
MAINTAINERS: declare REMUS support orphaned
The designated maintainer email address for the remus entry is
bouncing, so remove it and declare the entry as orphaned as there's no
other maintainer.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Wed, 24 Nov 2021 10:07:11 +0000 (11:07 +0100)]
VT-d: don't leak domid mapping on error path
While domain_context_mapping() invokes domain_context_unmap() in a sub-
case of handling DEV_TYPE_PCI when encountering an error, thus avoiding
a leak, individual calls to domain_context_mapping_one() aren't
similarly covered. Such a leak might persist until domain destruction.
Leverage that these cases can be recognized by pdev being non-NULL.
Fixes: dec403cc668f ("VT-d: fix iommu_domid for PCI/PCIx devices assignment") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Wed, 24 Nov 2021 10:05:36 +0000 (11:05 +0100)]
VT-d: properly reserve DID 0 for caching mode IOMMUs
Merely setting bit 0 in the bitmap is insufficient, as then Dom0 will
still have DID 0 allocated to it, because of the zero-filling of
domid_map[]. Set slot 0 to DOMID_INVALID to keep DID 0 from getting
used.
Fixes: b9c20c78789f ("VT-d: per-iommu domain-id") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>