]> xenbits.xensource.com Git - xen.git/log
xen.git
2 years agox86/msr: fix X2APIC_LAST
Edwin Török [Wed, 27 Jul 2022 10:57:10 +0000 (12:57 +0200)]
x86/msr: fix X2APIC_LAST

The latest Intel manual now says the X2APIC reserved range is only
0x800 to 0x8ff (NOT 0xbff).
This changed between SDM 68 (Nov 2018) and SDM 69 (Jan 2019).
The AMD manual documents 0x800-0x8ff too.

There are non-X2APIC MSRs in the 0x900-0xbff range now:
e.g. 0x981 is IA32_TME_CAPABILITY, an architectural MSR.

The new MSR in this range appears to have been introduced in Icelake,
so this commit should be backported to Xen versions supporting Icelake.

Backport: 4.13+

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/vpmu: Fix build following vmfork addition
Andrew Cooper [Tue, 26 Jul 2022 13:11:33 +0000 (14:11 +0100)]
x86/vpmu: Fix build following vmfork addition

GCC with IBT extensions complains:

  arch/x86/cpu/vpmu.c:351:15: error: conflicting types for 'vpmu_save_force'; have 'void(void *)' with implied 'nocf_check' attribute
    351 | void cf_check vpmu_save_force(void *arg)
        |               ^~~~~~~~~~~~~~~
  In file included from ./arch/x86/include/asm/domain.h:10,
                   from ./include/xen/domain.h:8,
                   from ./include/xen/sched.h:11,
                   from ./include/xen/event.h:12,
                   from arch/x86/cpu/vpmu.c:23:
  ./arch/x86/include/asm/vpmu.h:117:6: note: previous declaration of 'vpmu_save_force' with type 'void(void *)'
    117 | void vpmu_save_force(void *arg);
        |      ^~~~~~~~~~~~~~~

Adjust the declaraion.

Fixes: 755087eb9b10 ("xen/mem_sharing: support forks with active vPMU state")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/pv: Inject #GP for implicit grant unmaps
Andrew Cooper [Tue, 19 Jul 2022 20:37:43 +0000 (21:37 +0100)]
x86/pv: Inject #GP for implicit grant unmaps

This is a debug behaviour to identify buggy kernels.  Crashing the domain is
the most unhelpful thing to do, because it discards the relevant context.

Instead, inject #GP[0] like other permission errors in x86.  In particular,
this lets the kernel provide a backtrace which is more likely to be helpful to
a developer.

As a bugfix, this always injects #GP[0] to current, not l1e_owner.  It is not
l1e_owner's fault if dom0 using superpowers triggers an implicit unmap.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/mm: correct TLB flush condition in _get_page_type()
Jan Beulich [Tue, 26 Jul 2022 12:54:34 +0000 (14:54 +0200)]
x86/mm: correct TLB flush condition in _get_page_type()

When this logic was moved, it was moved across the point where nx is
updated to hold the new type for the page. IOW originally it was
equivalent to using x (and perhaps x would better have been used), but
now it isn't anymore. Switch to using x, which then brings things in
line again with the slightly earlier comment there (now) talking about
transitions _from_ writable.

I have to confess though that I cannot make a direct connection between
the reported observed behavior of guests leaving several pages around
with pending general references and the change here. Repeated testing,
nevertheless, confirms the reported issue is no longer there.

This is CVE-2022-33745 / XSA-408.

Reported-by: Charles Arnold <carnold@suse.com>
Fixes: 8cc5036bc385 ("x86/pv: Fix ABAC cmpxchg() race in _get_page_type()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agocommon/memory: Fix ifdefs for ptdom_max_order
Luca Fancellu [Tue, 26 Jul 2022 06:33:46 +0000 (08:33 +0200)]
common/memory: Fix ifdefs for ptdom_max_order

In common/memory.c the ifdef code surrounding ptdom_max_order is
using HAS_PASSTHROUGH instead of CONFIG_HAS_PASSTHROUGH, fix the
problem using the correct macro.

Fixes: e0d44c1f9461 ("build: convert HAS_PASSTHROUGH use to Kconfig")
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agopage-alloc: fix initialization of cross-node regions
Jan Beulich [Tue, 26 Jul 2022 06:33:10 +0000 (08:33 +0200)]
page-alloc: fix initialization of cross-node regions

Quite obviously to determine the split condition successive pages'
attributes need to be evaluated, not always those of the initial page.

Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agoinclude: correct re-building conditions around hypercall-defs.h
Jan Beulich [Mon, 25 Jul 2022 13:46:21 +0000 (15:46 +0200)]
include: correct re-building conditions around hypercall-defs.h

For a .cmd file to be picked up, the respective target needs to be
listed in $(targets). This wasn't the case for hypercall-defs.i, leading
to permanent re-building even on an entirely unchanged tree (because of
the command apparently having changed).

In exchange the target doesn't need naming in $(clean-files) anymore.

Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2 years agoArm32: restore proper name of .dtb section start symbol
Jan Beulich [Mon, 25 Jul 2022 13:45:31 +0000 (15:45 +0200)]
Arm32: restore proper name of .dtb section start symbol

This addresses a build failure when CONFIG_DTB_FILE evaluates to a non-
empty string.

Fixes: d07358f2dccd ("xen/arm32: head.S: Introduce a macro to load the physical address of a symbol")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/mem_sharing: support forks with active vPMU state
Tamas K Lengyel [Mon, 25 Jul 2022 13:44:33 +0000 (15:44 +0200)]
xen/mem_sharing: support forks with active vPMU state

Currently the vPMU state from a parent isn't copied to VM forks. To enable the
vPMU state to be copied to a fork VM we export certain vPMU functions. First,
the vPMU context needs to be allocated for the fork if the parent has one. For
this we introduce vpmu->allocate_context, which has previously only been called
when the guest enables the PMU on itself. Furthermore, we export
vpmu_save_force so that the PMU context can be saved on-demand even if no
context switch took place on the parent's CPU yet. Additionally, we make sure
all relevant configuration MSRs are saved in the vPMU context so the copy is
complete and the fork starts with the same PMU config as the parent.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agogolang/xenlight: Update generated code
Oleksandr Tyshchenko [Mon, 25 Jul 2022 13:44:17 +0000 (15:44 +0200)]
golang/xenlight: Update generated code

Re-generate goland bindings to reflect changes to libxl_types.idl
from the following commit:
54d8f27d0477 tools/libxl: report trusted backend status to frontends

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2 years agoVT-d: fold dma_pte_clear_one() into its only caller
Jan Beulich [Mon, 25 Jul 2022 13:43:35 +0000 (15:43 +0200)]
VT-d: fold dma_pte_clear_one() into its only caller

This way intel_iommu_unmap_page() ends up quite a bit more similar to
intel_iommu_map_page().

No functional change intended.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoIOMMU/x86: add perf counters for page table splitting / coalescing
Jan Beulich [Mon, 25 Jul 2022 13:42:33 +0000 (15:42 +0200)]
IOMMU/x86: add perf counters for page table splitting / coalescing

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoVT-d: replace all-contiguous page tables by superpage mappings
Jan Beulich [Mon, 25 Jul 2022 13:41:48 +0000 (15:41 +0200)]
VT-d: replace all-contiguous page tables by superpage mappings

When a page table ends up with all contiguous entries (including all
identical attributes), it can be replaced by a superpage entry at the
next higher level. The page table itself can then be scheduled for
freeing.

The adjustment to LEVEL_MASK is merely to avoid leaving a latent trap
for whenever we (and obviously hardware) start supporting 512G mappings.

Note that cache sync-ing is likely more strict than necessary. This is
both to be on the safe side as well as to maintain the pattern of all
updates of (potentially) live tables being accompanied by a flush (if so
needed).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoAMD/IOMMU: replace all-contiguous page tables by superpage mappings
Jan Beulich [Mon, 25 Jul 2022 13:41:12 +0000 (15:41 +0200)]
AMD/IOMMU: replace all-contiguous page tables by superpage mappings

When a page table ends up with all contiguous entries (including all
identical attributes), it can be replaced by a superpage entry at the
next higher level. The page table itself can then be scheduled for
freeing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoVT-d: free all-empty page tables
Jan Beulich [Mon, 25 Jul 2022 13:40:41 +0000 (15:40 +0200)]
VT-d: free all-empty page tables

When a page table ends up with no present entries left, it can be
replaced by a non-present entry at the next higher level. The page table
itself can then be scheduled for freeing.

Note that while its output isn't used there yet,
pt_update_contig_markers() right away needs to be called in all places
where entries get updated, not just the one where entries get cleared.

Note further that while pt_update_contig_markers() updates perhaps
several PTEs within the table, since these are changes to "avail" bits
only I do not think that cache flushing would be needed afterwards. Such
cache flushing (of entire pages, unless adding yet more logic to me more
selective) would be quite noticable performance-wise (very prominent
during Dom0 boot).

Also note that cache sync-ing is likely more strict than necessary. This
is both to be on the safe side as well as to maintain the pattern of all
updates of (potentially) live tables being accompanied by a flush (if so
needed).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoAMD/IOMMU: free all-empty page tables
Jan Beulich [Mon, 25 Jul 2022 13:40:00 +0000 (15:40 +0200)]
AMD/IOMMU: free all-empty page tables

When a page table ends up with no present entries left, it can be
replaced by a non-present entry at the next higher level. The page table
itself can then be scheduled for freeing.

Note that while its output isn't used there yet,
pt_update_contig_markers() right away needs to be called in all places
where entries get updated, not just the one where entries get cleared.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoIOMMU/x86: prefill newly allocate page tables
Jan Beulich [Mon, 25 Jul 2022 13:38:22 +0000 (15:38 +0200)]
IOMMU/x86: prefill newly allocate page tables

Page tables are used for two purposes after allocation: They either
start out all empty, or they are filled to replace a superpage.
Subsequently, to replace all empty or fully contiguous page tables,
contiguous sub-regions will be recorded within individual page tables.
Install the initial set of markers immediately after allocation. Make
sure to retain these markers when further populating a page table in
preparation for it to replace a superpage.

The markers are simply 4-bit fields holding the order value of
contiguous entries. To demonstrate this, if a page table had just 16
entries, this would be the initial (fully contiguous) set of markers:

index  0 1 2 3 4 5 6 7 8 9 A B C D E F
marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0

"Contiguous" here means not only present entries with successively
increasing MFNs, each one suitably aligned for its slot, and identical
attributes, but also a respective number of all non-present (zero except
for the markers) entries.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agox86: introduce helper for recording degree of contiguity in page tables
Jan Beulich [Mon, 25 Jul 2022 13:37:34 +0000 (15:37 +0200)]
x86: introduce helper for recording degree of contiguity in page tables

This is a re-usable helper (kind of a template) which gets introduced
without users so that the individual subsequent patches introducing such
users can get committed independently of one another.

See the comment at the top of the new file. To demonstrate the effect,
if a page table had just 16 entries, this would be the set of markers
for a page table with fully contiguous mappings:

index  0 1 2 3 4 5 6 7 8 9 A B C D E F
marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0

"Contiguous" here means not only present entries with successively
increasing MFNs, each one suitably aligned for its slot, but also a
respective number of all non-present entries.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoVT-d: allow use of superpage mappings
Jan Beulich [Mon, 25 Jul 2022 13:36:33 +0000 (15:36 +0200)]
VT-d: allow use of superpage mappings

... depending on feature availability (and absence of quirks).

Also make the page table dumping function aware of superpages.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoAMD/IOMMU: allow use of superpage mappings
Jan Beulich [Mon, 25 Jul 2022 13:35:40 +0000 (15:35 +0200)]
AMD/IOMMU: allow use of superpage mappings

No separate feature flags exist which would control availability of
these; the only restriction is HATS (establishing the maximum number of
page table levels in general), and even that has a lower bound of 4.
Thus we can unconditionally announce 2M and 1G mappings. (Via non-
default page sizes the implementation in principle permits arbitrary
size mappings, but these require multiple identical leaf PTEs to be
written, which isn't all that different from having to write multiple
consecutive PTEs with increasing frame numbers. IMO that's therefore
beneficial only on hardware where suitable TLBs exist; I'm unaware of
such hardware.)

Note that in principle 512G and 256T mappings could also be supported
right away, but the freeing of page tables (to be introduced in
subsequent patches) when replacing a sufficiently populated tree with a
single huge page would need suitable preemption, which will require
extra work.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoIOMMU/x86: new command line option to suppress use of superpage mappings
Jan Beulich [Mon, 25 Jul 2022 13:34:55 +0000 (15:34 +0200)]
IOMMU/x86: new command line option to suppress use of superpage mappings

Before actually enabling their use, provide a means to suppress it in
case of problems. Note that using the option can also affect the sharing
of page tables in the VT-d / EPT combination: If EPT would use large
page mappings but the option is in effect, page table sharing would be
suppressed (to properly fulfill the admin request).

Requested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoIOMMU/x86: support freeing of pagetables
Jan Beulich [Mon, 25 Jul 2022 13:33:34 +0000 (15:33 +0200)]
IOMMU/x86: support freeing of pagetables

For vendor specific code to support superpages we need to be able to
deal with a superpage mapping replacing an intermediate page table (or
hierarchy thereof). Consequently an iommu_alloc_pgtable() counterpart is
needed to free individual page tables while a domain is still alive.
Since the freeing needs to be deferred until after a suitable IOTLB
flush was performed, released page tables get queued for processing by a
tasklet.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoIOMMU/x86: perform PV Dom0 mappings in batches
Jan Beulich [Mon, 25 Jul 2022 13:32:59 +0000 (15:32 +0200)]
IOMMU/x86: perform PV Dom0 mappings in batches

For large page mappings to be easily usable (i.e. in particular without
un-shattering of smaller page mappings) and for mapping operations to
then also be more efficient, pass batches of Dom0 memory to iommu_map().
In dom0_construct_pv() and its helpers (covering strict mode) this
additionally requires establishing the type of those pages (albeit with
zero type references).

The earlier establishing of PGT_writable_page | PGT_validated requires
the existing places where this gets done (through get_page_and_type())
to be updated: For pages which actually have a mapping, the type
refcount needs to be 1.

There is actually a related bug that gets fixed here as a side effect:
Typically the last L1 table would get marked as such only after
get_page_and_type(..., PGT_writable_page). While this is fine as far as
refcounting goes, the page did remain mapped in the IOMMU in this case
(when "iommu=dom0-strict").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoiommu: add preemption support to iommu_{un,}map()
Roger Pau Monné [Mon, 25 Jul 2022 13:31:41 +0000 (15:31 +0200)]
iommu: add preemption support to iommu_{un,}map()

The loop in iommu_{,un}map() can be arbitrary large, and as such it
needs to handle preemption.  Introduce a new flag that signals whether
the function should do preemption checks, returning the number of pages
that have been processed in case a need for preemption was actually
found.

Note that the cleanup done in iommu_map() can now be incomplete if
preemption has happened, and hence callers would need to take care of
unmapping the whole range (ie: ranges already mapped by previously
preempted calls).  So far none of the callers care about having those
ranges unmapped, so error handling in arch_iommu_hwdom_init() can be
kept as-is.

Note that iommu_legacy_{un,}map() are left without preemption handling:
callers of those interfaces aren't going to modified to pass bigger
chunks, and hence the functions won't be modified as they are legacy and
uses should be replaced with iommu_{un,}map() instead if preemption is
required.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
2 years agoautomation: use "needs" instead of "dependencies" for test jobs
Anthony PERARD [Thu, 21 Jul 2022 12:46:02 +0000 (13:46 +0100)]
automation: use "needs" instead of "dependencies" for test jobs

Like with "dependencies", the jobs will get artifacts from the jobs
listed in "needs". But the test jobs can run as soon as the build jobs
listed have finished.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoautomation: only run test artifact jobs when needed
Anthony PERARD [Thu, 21 Jul 2022 12:46:01 +0000 (13:46 +0100)]
automation: only run test artifact jobs when needed

Share the same "except" as the one used for tests.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoautomation: add a templates for test jobs
Anthony PERARD [Thu, 21 Jul 2022 12:46:00 +0000 (13:46 +0100)]
automation: add a templates for test jobs

Allow to set common configuration from a single place for all tests
jobs.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoautomation: fix typo in .gcc-tmpl
Anthony PERARD [Thu, 21 Jul 2022 12:45:59 +0000 (13:45 +0100)]
automation: fix typo in .gcc-tmpl

The name of the field doesn't matter because it's use as a YAML achor,
but it's nicer to have the proper spelling.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoxen/arm: mm: Add more ASSERT() in {destroy, modify}_xen_mappings()
Julien Grall [Wed, 20 Jul 2022 18:33:01 +0000 (19:33 +0100)]
xen/arm: mm: Add more ASSERT() in {destroy, modify}_xen_mappings()

Both destroy_xen_mappings() and modify_xen_mappings() will take in
parameter a range [start, end[. Both end should be page aligned.

Add extra ASSERT() to ensure start and end are page aligned. Take the
opportunity to rename 'v' to 's' to be consistent with the other helper.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
2 years agoxen/heap: pass order to free_heap_pages() in heap init
Hongyan Xia [Wed, 24 Feb 2021 18:43:13 +0000 (18:43 +0000)]
xen/heap: pass order to free_heap_pages() in heap init

The idea is to split the range into multiple aligned power-of-2 regions
which only needs to call free_heap_pages() once each. We check the least
significant set bit of the start address and use its bit index as the
order of this increment. This makes sure that each increment is both
power-of-2 and properly aligned, which can be safely passed to
free_heap_pages(). Of course, the order also needs to be sanity checked
against the upper bound and MAX_ORDER.

Tested on a nested environment on c5.metal with various amount
of RAM and CONFIG_DEBUG=n. Time for end_boot_allocator() to complete:
            Before         After
    - 90GB: 1445 ms         96 ms
    -  8GB:  126 ms          8 ms
    -  4GB:   62 ms          4 ms

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/heap: Split init_heap_pages() in two
Julien Grall [Wed, 20 Jul 2022 18:25:16 +0000 (19:25 +0100)]
xen/heap: Split init_heap_pages() in two

At the moment, init_heap_pages() will call free_heap_pages() page
by page. To reduce the time to initialize the heap, we will want
to provide multiple pages at the same time.

init_heap_pages() is now split in two parts:
    - init_heap_pages(): will break down the range in multiple set
      of contiguous pages. For now, the criteria is the pages should
      belong to the same NUMA node.
    - _init_heap_pages(): will initialize a set of pages belonging to
      the same NUMA node. In a follow-up patch, new requirements will
      be added (e.g. pages should belong to the same zone). For now the
      pages are still passed one by one to free_heap_pages().

Note that the comment before init_heap_pages() is heavily outdated and
does not reflect the current code. So update it.

This patch is a merge/rework of patches from David Woodhouse and
Hongyan Xia.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen: page_alloc: Don't open-code IS_ALIGNED()
Julien Grall [Wed, 20 Jul 2022 18:22:34 +0000 (19:22 +0100)]
xen: page_alloc: Don't open-code IS_ALIGNED()

init_heap_pages() is using an open-code version of IS_ALIGNED(). Replace
it to improve the readability of the code.

No functional change intended.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/gnttab: Store frame GFN in struct page_info on Arm
Oleksandr Tyshchenko [Sat, 16 Jul 2022 14:56:58 +0000 (17:56 +0300)]
xen/gnttab: Store frame GFN in struct page_info on Arm

Rework Arm implementation to store grant table frame GFN
in struct page_info directly instead of keeping it in
standalone status/shared arrays. This patch is based on
the assumption that a grant table page is a xenheap page.

To cover 64-bit/40-bit IPA on Arm64/Arm32 we need the space
to hold 52-bit/28-bit + extra bit value respectively. In order
to not grow the size of struct page_info borrow the required
amount of bits from type_info's count portion which current
context won't suffer (currently only 1 bit is used on Arm).
Please note, to minimize code changes and avoid introducing
an extra #ifdef-s to the header, we keep the same amount of
bits on both subarches, although the count portion on Arm64
could be wider, so we waste some bits here.

Introduce corresponding PGT_* constructs and access macro
page_get(set)_xenheap_gfn. Please note, all accesses to
the GFN portion of type_info field should always be protected
by the P2M lock. In case when it is not feasible to satisfy
that requirement (risk of deadlock, lock inversion, etc)
it is important to make sure that all non-protected updates
to this field are atomic.
As several non-protected read accesses still exist within
current code (most calls to page_get_xenheap_gfn() are not
protected by the P2M lock) the subsequent patch will introduce
hardening code for p2m_remove_mapping() to be called with P2M
lock held in order to check any difference between what is
already mapped and what is requested to be ummapped.

Update existing gnttab macros to deal with GFN value according
to new location. Also update the use of count portion of type_info
field on Arm in share_xen_page_with_guest().

While at it, extend this simplified M2P-like approach for any
xenheap pages which are proccessed in xenmem_add_to_physmap_one()
except foreign ones. Update the code to set GFN portion after
establishing new mapping for the xenheap page in said function
and to clean GFN portion when putting a reference on that page
in p2m_put_l3_page().

And for everything to work correctly introduce arch-specific
initialization pattern PGT_TYPE_INFO_INITIALIZER to be applied
to type_info field during initialization at alloc_heap_pages()
and acquire_staticmem_pages(). The pattern's purpose on Arm
is to clear the GFN portion before use, on x86 it is just
a stub.

This patch is intended to fix the potential issue on Arm
which might happen when remapping grant-table frame.
A guest (or the toolstack) will unmap the grant-table frame
using XENMEM_remove_physmap. This is a generic hypercall,
so on x86, we are relying on the fact the M2P entry will
be cleared on removal. For architecture without the M2P,
the GFN would still be present in the grant frame/status
array. So on the next call to map the page, we will end up to
request the P2M to remove whatever mapping was the given GFN.
This could well be another mapping.

Please note, this patch also changes the behavior how the shared_info
page (which is xenheap RAM page) is mapped in xenmem_add_to_physmap_one().
Now, we only allow to map the shared_info at once. The subsequent
attempts to map it will result in -EBUSY. Doing that we mandate
the caller to first unmap the page before mapping it again. This is
to prevent Xen creating an unwanted hole in the P2M. For instance,
this could happen if the firmware stole a RAM address for mapping
the shared_info page into but forgot to unmap it afterwards.

Besides that, this patch simplifies arch code on Arm by
removing arrays and corresponding management code and
as the result gnttab_init_arch/gnttab_destroy_arch helpers
and struct grant_table_arch become useless and can be
dropped globally.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agoxen/arm: Harden the P2M code in p2m_remove_mapping()
Oleksandr Tyshchenko [Sat, 16 Jul 2022 14:56:57 +0000 (17:56 +0300)]
xen/arm: Harden the P2M code in p2m_remove_mapping()

Borrow the x86's check from p2m_remove_page() which was added
by the following commit: c65ea16dbcafbe4fe21693b18f8c2a3c5d14600e
"x86/p2m: don't assert that the passed in MFN matches for a remove"
and adjust it to the Arm code base.

Basically, this check will be strictly needed for the xenheap pages
after applying a subsequent commit which will introduce xenheap based
M2P approach on Arm. But, it will be a good opportunity to harden
the P2M code for *every* RAM pages since it is possible to remove
any GFN - MFN mapping currently on Arm (even with the wrong helpers).

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
2 years agodocs: document dom0less + PV drivers
Stefano Stabellini [Thu, 5 May 2022 00:16:56 +0000 (17:16 -0700)]
docs: document dom0less + PV drivers

Document how to use the feature and how the implementation works.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agox86: also suppress use of MMX insns
Jan Beulich [Wed, 20 Jul 2022 13:48:49 +0000 (15:48 +0200)]
x86: also suppress use of MMX insns

Passing -mno-sse alone is not enough: The compiler may still find
(questionable) reasons to use MMX insns. In particular with gcc12 use
of MOVD+PUNPCKLDQ+MOVQ was observed in an apparent attempt to auto-
vectorize the storing of two adjacent zeroes, 32 bits each.

Reported-by: ChrisD <chris@dalessio.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86emul: add memory operand low bits checks for ENQCMD{,S}
Jan Beulich [Wed, 20 Jul 2022 13:46:48 +0000 (15:46 +0200)]
x86emul: add memory operand low bits checks for ENQCMD{,S}

Already ISE rev 044 added text to this effect; rev 045 further dropped
leftover earlier text indicating the contrary:
- ENQCMD requires the low 32 bits of the memory operand to be clear,
- ENDCMDS requires bits 20...30 of the memory operand to be clear.

Fixes: d27385968741 ("x86emul: support ENQCMD insns")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/spec-ctrl: Make svm_vmexit_spec_ctrl conditional
Andrew Cooper [Mon, 18 Jul 2022 13:15:08 +0000 (14:15 +0100)]
x86/spec-ctrl: Make svm_vmexit_spec_ctrl conditional

The logic was written this way out of an abundance of caution, but the reality
is that AMD parts don't currently have the RAS-flushing side effect, nor do
they intend to gain it.

This removes one WRMSR from the VMExit path by default on Zen2 systems.

Fixes: 614cec7d79d7 ("x86/svm: VMEntry/Exit logic for MSR_SPEC_CTRL")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/spec-ctrl: Consistently halt speculation using int3
Andrew Cooper [Thu, 30 Jun 2022 21:15:25 +0000 (22:15 +0100)]
x86/spec-ctrl: Consistently halt speculation using int3

The RSB stuffing loop and retpoline thunks date from the very beginning, when
halting speculation was a brand new field.

These days, we've largely settled on int3 for halting speculation in
non-architectural paths.  It's a single byte, and is fully serialising - a
requirement for delivering #BP if it were to execute.

Update the thunks.  Mostly for consistency across the codebase, but it does
shrink every entrypath in Xen by 6 bytes which is a marginal win.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agotools/xl: use sparse init for dom_info, remove duplicate vars
Elliott Mitchell [Tue, 19 Jul 2022 09:18:55 +0000 (11:18 +0200)]
tools/xl: use sparse init for dom_info, remove duplicate vars

Rather than having shadow variables for every element of dom_info, it is
better to properly initialize dom_info at the start.  This also removes
the misleading memset() in the middle of main_create().

Remove the dryrun element of domain_create as that has been displaced
by the global "dryrun_only" variable.

Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2 years agoRevert "tools/xenstore: add documentation for ..."
Jan Beulich [Tue, 19 Jul 2022 09:16:02 +0000 (11:16 +0200)]
Revert "tools/xenstore: add documentation for ..."

This reverts commits 3db29e8fac3f874aba0198f398e8eeaad9a091b8,
6574f387791f18c85c64399ed83b4391adcb4881, and
1a564e4b3b4fcb9a49fa09ade689ba38c0a890e8. They were committed
by mistake (newer version pending).

2 years agox86: deal with gcc12 release build issues
Jan Beulich [Tue, 19 Jul 2022 06:37:29 +0000 (08:37 +0200)]
x86: deal with gcc12 release build issues

While a number of issues we previously had with pre-release gcc12 were
fixed in the final release, we continue to have one issue (with multiple
instances) when doing release builds (i.e. at higher optimization
levels): The compiler takes issue with subtracting (always 1 in our
case) from artifical labels (expressed as array) marking the end of
certain regions. This isn't an unreasonable position to take. Simply
hide the "array-ness" by casting to an integer type. To keep things
looking consistently, apply the same cast also on the respective
expressions dealing with the starting addresses. (Note how
efi_arch_memory_setup()'s l2_table_offset() invocations avoid a similar
issue by already having the necessary casts.) In is_xen_fixed_mfn()
further switch from __pa() to virt_to_maddr() to better match the left
sides of the <= operators.

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/spec-ctrl: correct per-guest-type reporting of MD_CLEAR
Jan Beulich [Tue, 19 Jul 2022 06:36:53 +0000 (08:36 +0200)]
x86/spec-ctrl: correct per-guest-type reporting of MD_CLEAR

There are command line controls for this and the default also isn't "always
enable when hardware supports it", which logging should take into account.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86: log non-responding CPUs in fatal_trap()
Jan Beulich [Tue, 19 Jul 2022 06:36:10 +0000 (08:36 +0200)]
x86: log non-responding CPUs in fatal_trap()

This eases recognizing that something odd is going on.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agosched/credit: fix MISRA C 2012 Rule 8.7 violation
Xenia Ragiadakou [Mon, 18 Jul 2022 15:56:41 +0000 (17:56 +0200)]
sched/credit: fix MISRA C 2012 Rule 8.7 violation

The per-cpu variable last_tickle_cpu is referenced only in credit.c.
Change its linkage from external to internal by adding the storage-class
specifier static to its definitions.

Also, this patch aims to resolve indirectly a MISRA C 2012 Rule 8.4 violation
warning.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agovm_event: fix MISRA C 2012 Rule 8.7 violation
Xenia Ragiadakou [Mon, 18 Jul 2022 15:55:42 +0000 (17:55 +0200)]
vm_event: fix MISRA C 2012 Rule 8.7 violation

The function vm_event_wake() is referenced only in vm_event.c.
Change the linkage of the function from external to internal by adding
the storage-class specifier static to the function definition.

Also, this patch aims to resolve indirectly a MISRA C 2012 Rule 8.4 violation
warning.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoEFI: strip xen.efi when putting it on the EFI partition
Jan Beulich [Mon, 18 Jul 2022 15:48:40 +0000 (17:48 +0200)]
EFI: strip xen.efi when putting it on the EFI partition

With debug info retained, xen.efi can be quite large. Unlike for xen.gz
there's no intermediate step (mkelf32 there) involved which would strip
debug info kind of as a side effect. While the installing of xen.efi on
the EFI partition is an optional step (intended to be a courtesy to the
developer), adjust it also for the purpose of documenting what distros
would be expected to do during boot loader configuration (which is what
would normally put xen.efi into the EFI partition).

Model the control over stripping after Linux'es module installation,
except that the stripped executable is constructed in the build area
instead of in the destination location. This is to conserve on space
used there - EFI partitions tend to be only a few hundred Mb in size.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Henry Wang <Henry.Wang@arm.com>
Tested-by: Wei Chen <Wei.Chen@arm.com> # arm
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2 years agoxl: move freemem()'s "credit expired" loop exit
Jan Beulich [Mon, 18 Jul 2022 15:48:18 +0000 (17:48 +0200)]
xl: move freemem()'s "credit expired" loop exit

Move the "credit expired" loop exit to the middle of the loop,
immediately after "return true". This way having reached the goal on the
last iteration would be reported as success to the caller, rather than
as "timed out".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2 years agotools/xenstore: add documentation for extended watch command
Juergen Gross [Mon, 18 Jul 2022 15:47:23 +0000 (17:47 +0200)]
tools/xenstore: add documentation for extended watch command

Add documentation for an extension of the WATCH command used to limit
the scope of watched paths. Additionally it enables to receive more
information in the events related to special watches (@introduceDomain
or @releaseDomain).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
2 years agotools/xenstore: add documentation for new set/get-quota commands
Juergen Gross [Mon, 18 Jul 2022 15:47:04 +0000 (17:47 +0200)]
tools/xenstore: add documentation for new set/get-quota commands

Add documentation for two new Xenstore wire commands SET_QUOTA and
GET_QUOTA used to set or query the Xenstore quota of a given domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
2 years agotools/xenstore: add documentation for new set/get-feature commands
Juergen Gross [Mon, 18 Jul 2022 15:46:47 +0000 (17:46 +0200)]
tools/xenstore: add documentation for new set/get-feature commands

Add documentation for two new Xenstore wire commands SET_FEATURE and
GET_FEATURE used to set or query the Xenstore features visible in the
ring page of a given domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
2 years agoxen/wait: Minor asm improvements
Andrew Cooper [Fri, 15 Jul 2022 11:27:08 +0000 (12:27 +0100)]
xen/wait: Minor asm improvements

There is no point preserving all registers.  Instead, preserve an arbitrary 6
registers, and list the rest as clobbered.  This does not alter the register
scheduling at all, but does reduce the amount of state needing saving.

Use a named parameter for page size, instead of needing to parse which is
parameter 3.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/wait: Extend the description of how this logic actually works
Andrew Cooper [Fri, 15 Jul 2022 13:16:12 +0000 (14:16 +0100)]
xen/wait: Extend the description of how this logic actually works

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/wait: Drop vestigial remnants of TRAP_regs_partial
Andrew Cooper [Fri, 15 Jul 2022 12:39:29 +0000 (13:39 +0100)]
xen/wait: Drop vestigial remnants of TRAP_regs_partial

The preservation of entry_vector was introduced with ecf9846a6a20 ("x86:
save/restore only partial register state where possible") where
TRAP_regs_partial was introduced, but missed from f9eb74789af7 ("x86/entry:
Remove support for partial cpu_user_regs frames") where TRAP_regs_partial was
removed.

Fixes: f9eb74789af7 ("x86/entry: Remove support for partial cpu_user_regs frames")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen: Fix latent check-endbr.sh bug with 32bit build environments
Andrew Cooper [Fri, 15 Jul 2022 11:53:09 +0000 (12:53 +0100)]
xen: Fix latent check-endbr.sh bug with 32bit build environments

While Xen's current VMA means it works, the mawk fix (i.e. using $((0xN)) in
the shell) isn't portable in 32bit shells.  See the code comment for the fix.

The fix found a second latent bug.  Recombining $vma_hi/lo should have used
printf "%s%08x" and only worked previously because $vma_lo had bits set in
it's top nibble.  Combining with the main fix, %08x becomes %07x.

Fixes: b2ebe879a444 ("xen: Fix check-endbr.sh with mawk")
Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen: Fix check-endbr.sh with mawk
Anthony PERARD [Thu, 14 Jul 2022 14:39:06 +0000 (15:39 +0100)]
xen: Fix check-endbr.sh with mawk

check-endbr.sh works with gawk, but fails with mawk. The produced $ALL
file is smaller as it is missing 0x$vma_lo on every line.  With mawk,
int(0x2A) just produces 0, instead of the expected value.

The use of hexadecimal-constant in awk is an optional part of the posix
spec, and mawk doesn't seems to implemented.

There is a way to convert an hexadecimal to a number be putting it in a
string, and awk as I understand is supposed to use strtod() to convert
the string to a number when needed. The expression 'int("0x15") + 21'
would produce the expected value in `mawk` but now `gawk` won't convert
the string to a number unless we use the option "--non-decimal-data".

So let's convert the hexadecimal number before using it in the awk
script. The shell as no issue with dealing with hexadecimal-constant so
we'll simply use the expression "$(( 0x15 ))" to convert the value
before using it in awk.

Note: This does introduce a latent portability bug, which fixed in a separate
      change to avoid mixing complexity/explanations.

Fixes: 4d037425dc ("x86: Build check for embedded endbr64 instructions")
Resolves: xen-project/xen#26
Reported-by: Luca Fancellu <Luca.Fancellu@arm.com>
Reported-by: Mathieu Tarral <mathieu.tarral@protonmail.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoRevert "xen/arm: mm: Add more ASSERT() in {destroy, modify}_xen_mappings()"
Julien Grall [Sun, 17 Jul 2022 13:11:27 +0000 (14:11 +0100)]
Revert "xen/arm: mm: Add more ASSERT() in {destroy, modify}_xen_mappings()"

This reverts commit 9b962e618313109882b6ca78cf1e09f43c9d6e62. This
was committed by mistake (lack an x86 ack).

2 years agoxen/arm: mm: Add more ASSERT() in {destroy, modify}_xen_mappings()
Julien Grall [Sat, 16 Jul 2022 14:37:57 +0000 (15:37 +0100)]
xen/arm: mm: Add more ASSERT() in {destroy, modify}_xen_mappings()

Both destroy_xen_mappings() and modify_xen_mappings() will take in
parameter a range [start, end[. Both end should be page aligned.

Add extra ASSERT() to ensure start and end are page aligned. Take the
opportunity to rename 'v' to 's' to be consistent with the other helper.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
----
    Changes in v2:
        - Also modify prototype. Note that on x86, the first parameter
          was not matching in the declaration and prototype.
        - Add Bertrand's reviewed-by

2 years agoxen/arm: head: Add missing isb after writing to SCTLR_EL2/HSCTLR
Julien Grall [Sat, 16 Jul 2022 14:34:07 +0000 (15:34 +0100)]
xen/arm: head: Add missing isb after writing to SCTLR_EL2/HSCTLR

Write to SCTLR_EL2/HSCTLR may not be visible until the next context
synchronization. When initializing the CPU, we want the update to take
effect right now. So add an isb afterwards.

Spec references:
    - AArch64: D13.1.2 ARM DDI 0406C.d
    - AArch32 v8: G8.1.2 ARM DDI 0406C.d
    - AArch32 v7: B5.6.3 ARM DDI 0406C.d

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
2 years agoxen/arm32: head.S: Introduce a macro to load the physical address of a symbol
Julien Grall [Sat, 16 Jul 2022 14:33:47 +0000 (15:33 +0100)]
xen/arm32: head.S: Introduce a macro to load the physical address of a symbol

A lot of places in the ARM32 assembly requires to load the physical address
of a symbol. Rather than open-coding the translation, introduce a new macro
that will load the phyiscal address of a symbol.

Lastly, use the new macro to replace all the current open-coded version.

Note that most of the comments associated to the code changed have been
removed because the code is now self-explanatory.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
2 years agoREADME: State POSIX compatibility as a requirement for AWK
Andrew Cooper [Thu, 14 Jul 2022 18:45:36 +0000 (19:45 +0100)]
README: State POSIX compatibility as a requirement for AWK

In particular, we support FreeBSD and NetBSD build environments, and some
Linux build environments use MAWK over GAWK anyway.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen: Introduce $AWK in check-endbr.sh
Anthony PERARD [Thu, 14 Jul 2022 14:39:07 +0000 (15:39 +0100)]
xen: Introduce $AWK in check-endbr.sh

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agoxen/arm: traps: Fix MISRA C 2012 Rule 8.4 violation
Xenia Ragiadakou [Wed, 6 Jul 2022 12:11:56 +0000 (15:11 +0300)]
xen/arm: traps: Fix MISRA C 2012 Rule 8.4 violation

Add the function prototype of show_stack() in <asm/processor.h> header file
so that it is visible before its definition in traps.c.

Although show_stack() is referenced only in traps.c, it is declared with
external linkage because, during development, it is often called also by
other files for debugging purposes. Declaring it static would increase
development effort. Add appropriate comment

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoxen/arm: avoid overflow when setting vtimer in context switch
Jiamei Xie [Wed, 6 Jul 2022 08:25:58 +0000 (16:25 +0800)]
xen/arm: avoid overflow when setting vtimer in context switch

virt_vtimer_save() will calculate the next deadline when the vCPU is
scheduled out. At the moment, Xen will use the following equation:

  virt_timer.cval + virt_time_base.offset - boot_count

The three values are 64-bit and one (cval) is controlled by domain. In
theory, it would be possible that the domain has started a long time
after the system boot. So virt_time_base.offset - boot_count may be a
large numbers.

This means a domain may inadvertently set a cval so the result would
overflow. Consequently, the deadline would be set very far in the
future. This could result to loss of timer interrupts or the vCPU
getting block "forever".

One way to solve the problem, would be to separately
   1) compute when the domain was created in ns
   2) convert cval to ns
   3) Add 1 and 2 together

The first part of the equation never change (the value is set/known at
domain creation). So take the opportunity to store it in domain structure.

Signed-off-by: Jiamei Xie <jiamei.xie@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
2 years agox86/spec-ctrl: Mitigate Branch Type Confusion when possible
Andrew Cooper [Mon, 27 Jun 2022 18:29:40 +0000 (19:29 +0100)]
x86/spec-ctrl: Mitigate Branch Type Confusion when possible

Branch Type Confusion affects AMD/Hygon CPUs on Zen2 and earlier.  To
mitigate, we require SMT safety (STIBP on Zen2, no-SMT on Zen1), and to issue
an IBPB on each entry to Xen, to flush the BTB.

Due to performance concerns, dom0 (which is trusted in most configurations) is
excluded from protections by default.

Therefore:
 * Use STIBP by default on Zen2 too, which now means we want it on by default
   on all hardware supporting STIBP.
 * Break the current IBPB logic out into a new function, extending it with
   IBPB-at-entry logic.
 * Change the existing IBPB-at-ctxt-switch boolean to be tristate, and disable
   it by default when IBPB-at-entry is providing sufficient safety.

If all PV guests on the system are trusted, then it is recommended to boot
with `spec-ctrl=ibpb-entry=no-pv`, as this will provide an additional marginal
perf improvement.

This is part of XSA-407 / CVE-2022-23825.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/spec-ctrl: Enable Zen2 chickenbit
Andrew Cooper [Tue, 15 Mar 2022 18:30:25 +0000 (18:30 +0000)]
x86/spec-ctrl: Enable Zen2 chickenbit

... as instructed in the Branch Type Confusion whitepaper.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
2 years agox86/cpuid: Enumeration for BTC_NO
Andrew Cooper [Mon, 16 May 2022 14:48:24 +0000 (15:48 +0100)]
x86/cpuid: Enumeration for BTC_NO

BTC_NO indicates that hardware is not succeptable to Branch Type Confusion.

Zen3 CPUs don't suffer BTC.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/spec-ctrl: Support IBPB-on-entry
Andrew Cooper [Thu, 24 Feb 2022 13:44:33 +0000 (13:44 +0000)]
x86/spec-ctrl: Support IBPB-on-entry

We are going to need this to mitigate Branch Type Confusion on AMD/Hygon CPUs,
but as we've talked about using it in other cases too, arrange to support it
generally.  However, this is also very expensive in some cases, so we're going
to want per-domain controls.

Introduce SCF_ist_ibpb and SCF_entry_ibpb controls, adding them to the IST and
DOM masks as appropriate.  Also introduce X86_FEATURE_IBPB_ENTRY_{PV,HVM} to
to patch the code blocks.

For SVM, the STGI is serialising enough to protect against Spectre-v1 attacks,
so no "else lfence" is necessary.  VT-x will use use the MSR host load list,
so doesn't need any code in the VMExit path.

For the IST path, we can't safely check CPL==0 to skip a flush, as we might
have hit an entry path before it's IBPB.  As IST hitting Xen is rare, flush
irrespective of CPL.  A later path, SCF_ist_sc_msr, provides Spectre-v1
safety.

For the PV paths, we know we're interrupting CPL>0, while for the INTR paths,
we can safely check CPL==0.  Only flush when interrupting guest context.

An "else lfence" is needed for safety, but we want to be able to skip it on
unaffected CPUs, so the block wants to be an alternative, which means the
lfence has to be inline rather than UNLIKELY() (the replacement block doesn't
have displacements fixed up for anything other than the first instruction).

As with SPEC_CTRL_ENTRY_FROM_INTR_IST, %rdx is 0 on entry so rely on this to
shrink the logic marginally.  Update the comments to specify this new
dependency.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST
Andrew Cooper [Fri, 1 Jul 2022 14:59:40 +0000 (15:59 +0100)]
x86/spec-ctrl: Rework SPEC_CTRL_ENTRY_FROM_INTR_IST

We are shortly going to add a conditional IBPB in this path.

Therefore, we cannot hold spec_ctrl_flags in %eax, and rely on only clobbering
it after we're done with its contents.  %rbx is available for use, and the
more normal register to hold preserved information in.

With %rax freed up, use it instead of %rdx for the RSB tmp register, and for
the adjustment to spec_ctrl_flags.

This leaves no use of %rdx, except as 0 for the upper half of WRMSR.  In
practice, %rdx is 0 from SAVE_ALL on all paths and isn't likely to change in
the foreseeable future, so update the macro entry requirements to state this
dependency.  This marginal optimisation can be revisited if circumstances
change.

No practical change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch
Andrew Cooper [Mon, 4 Jul 2022 20:32:17 +0000 (21:32 +0100)]
x86/spec-ctrl: Rename opt_ibpb to opt_ibpb_ctxt_switch

We are about to introduce the use of IBPB at different points in Xen, making
opt_ibpb ambiguous.  Rename it to opt_ibpb_ctxt_switch.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr
Andrew Cooper [Tue, 28 Jun 2022 13:36:56 +0000 (14:36 +0100)]
x86/spec-ctrl: Rename SCF_ist_wrmsr to SCF_ist_sc_msr

We are about to introduce SCF_ist_ibpb, at which point SCF_ist_wrmsr becomes
ambiguous.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/spec-ctrl: Rework spec_ctrl_flags context switching
Andrew Cooper [Fri, 1 Jul 2022 14:59:40 +0000 (15:59 +0100)]
x86/spec-ctrl: Rework spec_ctrl_flags context switching

We are shortly going to need to context switch new bits in both the vcpu and
S3 paths.  Introduce SCF_IST_MASK and SCF_DOM_MASK, and rework d->arch.verw
into d->arch.spec_ctrl_flags to accommodate.

No functional change.

This is part of XSA-407.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/build: remove unneeded enumeration in clean-files of xen/include/Makefile
Juergen Gross [Tue, 12 Jul 2022 13:25:35 +0000 (15:25 +0200)]
xen/build: remove unneeded enumeration in clean-files of xen/include/Makefile

Enumerating a file from $(targets) in $(clean-files) isn't needed.

Remove hypercall-defs.h and headers*.chk from $(clean-files) in
xen/include/Makefile.

Reported-by: Jan Beulich <jbeulich@suse.com>
Fixes: eca1f00d0227 ("xen: generate hypercall interface related code")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agotools/init-xenstore-domain: fix memory map for PVH stubdom
Juergen Gross [Tue, 12 Jul 2022 13:25:20 +0000 (15:25 +0200)]
tools/init-xenstore-domain: fix memory map for PVH stubdom

In case of maxmem != memsize the E820 map of the PVH stubdom is wrong,
as it is missing the RAM above memsize.

Additionally the memory map should only specify the Xen special pages
as reserved.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2 years agoxl: relax freemem()'s retry calculation
Jan Beulich [Tue, 12 Jul 2022 13:25:00 +0000 (15:25 +0200)]
xl: relax freemem()'s retry calculation

While in principle possible also under other conditions as long as other
parallel operations potentially consuming memory aren't "locked out", in
particular with IOMMU large page mappings used in Dom0 (for PV when in
strict mode; for PVH when not sharing page tables with HAP) ballooning
out of individual pages can actually lead to less free memory available
afterwards. This is because to split a large page, one or more page
table pages are necessary (one per level that is split).

When rebooting a guest I've observed freemem() to fail: A single page
was required to be ballooned out (presumably because of heap
fragmentation in the hypervisor). This ballooning out of a single page
of course went fast, but freemem() then found that it would require to
balloon out another page. This repeating just another time leads to the
function to signal failure to the caller - without having come anywhere
near the designated 30s that the whole process is allowed to not make
any progress at all.

Convert from a simple retry count to actually calculating elapsed time,
subtracting from an initial credit of 30s. Don't go as far as limiting
the "wait_secs" value passed to libxl_wait_for_memory_target(), though.
While this leads to the overall process now possibly taking longer (if
the previous iteration ended very close to the intended 30s), this
compensates to some degree for the value passed really meaning "allowed
to run for this long without making progress".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
2 years agoMAINTAINERS: Make Daniel P. Smith sole XSM maintainer
George Dunlap [Tue, 12 Jul 2022 13:24:30 +0000 (15:24 +0200)]
MAINTAINERS: Make Daniel P. Smith sole XSM maintainer

While mail hasn't been bouncing, Daniel De Graaf has not been
responding to patch submissions or otherwise interacting with the
community for several years.  Daniel Smith has at least been working
with the code, and is a regular member of our community; and he has
agreed to step up into the role.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoEFI: preserve the System Resource Table for dom0
Demi Marie Obenour [Tue, 12 Jul 2022 06:39:19 +0000 (08:39 +0200)]
EFI: preserve the System Resource Table for dom0

The EFI System Resource Table (ESRT) is necessary for fwupd to identify
firmware updates to install.  According to the UEFI specification §23.4,
the ESRT shall be stored in memory of type EfiBootServicesData.  However,
memory of type EfiBootServicesData is considered general-purpose memory
by Xen, so the ESRT needs to be moved somewhere where Xen will not
overwrite it.  Copy the ESRT to memory of type EfiRuntimeServicesData,
which Xen will not reuse.  dom0 can use the ESRT if (and only if) it is
in memory of type EfiRuntimeServicesData.

Earlier versions of this patch reserved the memory in which the ESRT was
located.  This created awkward alignment problems, and required either
splitting the E820 table or wasting memory.  It also would have required
a new platform op for dom0 to use to indicate if the ESRT is reserved.
By copying the ESRT into EfiRuntimeServicesData memory, the E820 table
does not need to be modified, and dom0 can just check the type of the
memory region containing the ESRT.  The copy is only done if the ESRT is
not already in EfiRuntimeServicesData memory, avoiding memory leaks on
repeated kexec.

See https://lore.kernel.org/xen-devel/20200818184018.GN1679@mail-itl/T/
for details.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Tested-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agolibxl: check return value of libxl__xs_directory in name2bdf
Anthony PERARD [Tue, 12 Jul 2022 06:38:51 +0000 (08:38 +0200)]
libxl: check return value of libxl__xs_directory in name2bdf

libxl__xs_directory() can potentially return NULL without setting `n`.
As `n` isn't initialised, we need to check libxl__xs_directory()
return value before checking `n`. Otherwise, `n` might be non-zero
with `bdfs` NULL which would lead to a segv.

Fixes: 57bff091f4 ("libxl: add 'name' field to 'libxl_device_pci' in the IDL...")
Reported-by: "G.R." <firemeteor@users.sourceforge.net>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Tested-by: "G.R." <firemeteor@users.sourceforge.net>
2 years agotools/helpers: fix build of xen-init-dom0 with -Werror
Anthony PERARD [Tue, 12 Jul 2022 06:38:35 +0000 (08:38 +0200)]
tools/helpers: fix build of xen-init-dom0 with -Werror

Missing prototype of asprintf() without _GNU_SOURCE.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Henry Wang <Henry.Wang@arm.com>
2 years agotools/fuzz/libelf: rework makefile
Anthony PERARD [Tue, 12 Jul 2022 06:35:48 +0000 (08:35 +0200)]
tools/fuzz/libelf: rework makefile

Rename ELF_LIB_OBJS to LIBELF_OBJS as to have the same name as in
libs/guest/.

Replace "-I" by "-iquote".

Remove the use of "vpath". It will not works when we will convert this
makefile to subdirmk. Instead, we create symlinks to the source files.

Since we are creating a new .gitignore for the links, also move the
existing entry to it.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
2 years agox86/spec-ctrl: Add fine-grained cmdline suboptions for primitives
Andrew Cooper [Fri, 8 Jul 2022 15:44:43 +0000 (16:44 +0100)]
x86/spec-ctrl: Add fine-grained cmdline suboptions for primitives

Support controling the PV/HVM suboption of msr-sc/rsb/md-clear, which
previously wasn't possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen/cmdline: Extend parse_boolean() to signal a name match
Andrew Cooper [Tue, 5 Jul 2022 18:19:01 +0000 (19:19 +0100)]
xen/cmdline: Extend parse_boolean() to signal a name match

This will help parsing a sub-option which has boolean and non-boolean options
available.

First, rework 'int val' into 'bool has_neg_prefix'.  This inverts it's value,
but the resulting logic is far easier to follow.

Second, reject anything of the form 'no-$FOO=' which excludes ambiguous
constructs such as 'no-$foo=yes' which have never been valid.

This just leaves the case where everything is otherwise fine, but parse_bool()
can't interpret the provided string.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/spec-ctrl: Honour spec-ctrl=0 for unpriv-mmio sub-option
Andrew Cooper [Fri, 8 Jul 2022 15:11:40 +0000 (16:11 +0100)]
x86/spec-ctrl: Honour spec-ctrl=0 for unpriv-mmio sub-option

This was an oversight from when unpriv-mmio was introduced.

Fixes: 8c24b70fedcb ("x86/spec-ctrl: Add spec-ctrl=unpriv-mmio")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agox86/HVM: allow per-domain usage of hardware virtualized APIC
Jane Malalane [Mon, 11 Jul 2022 10:15:05 +0000 (12:15 +0200)]
x86/HVM: allow per-domain usage of hardware virtualized APIC

Introduce a new per-domain creation x86 specific flag to
select whether hardware assisted virtualization should be used for
x{2}APIC.

A per-domain option is added to xl in order to select the usage of
x{2}APIC hardware assisted virtualization, as well as a global
configuration option.

Having all APIC interaction exit to Xen for emulation is slow and can
induce much overhead. Hardware can speed up x{2}APIC by decoding the
APIC access and providing a VM exit with a more specific exit reason
than a regular EPT fault or by altogether avoiding a VM exit.

On the other hand, being able to disable x{2}APIC hardware assisted
virtualization can be useful for testing and debugging purposes.

Note:

- vmx_install_vlapic_mapping doesn't require modifications regardless
of whether the guest has "Virtualize APIC accesses" enabled or not,
i.e., setting the APIC_ACCESS_ADDR VMCS field is fine so long as
virtualize_apic_accesses is supported by the CPU.

- Both per-domain and global assisted_x{2}apic options are not part of
the migration stream, unless explicitly set in the respective
configuration files. Default settings of assisted_x{2}apic done
internally by the toolstack, based on host capabilities at create
time, are not migrated.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
2 years agox86: report Interrupt Controller Virtualization capabilities
Jane Malalane [Mon, 11 Jul 2022 10:13:59 +0000 (12:13 +0200)]
x86: report Interrupt Controller Virtualization capabilities

Add XEN_SYSCTL_PHYSCAP_X86_ASSISTED_XAPIC and
XEN_SYSCTL_PHYSCAP_X86_ASSISTED_X2APIC to report accelerated xAPIC and
x2APIC, on x86 hardware. This is so that xAPIC and x2APIC virtualization
can subsequently be enabled on a per-domain basis.
No such features are currently implemented on AMD hardware.

HW assisted xAPIC virtualization will be reported if HW, at the
minimum, supports virtualize_apic_accesses as this feature alone means
that an access to the APIC page will cause an APIC-access VM exit. An
APIC-access VM exit provides a VMM with information about the access
causing the VM exit, unlike a regular EPT fault, thus simplifying some
internal handling.

HW assisted x2APIC virtualization will be reported if HW supports
virtualize_x2apic_mode and, at least, either apic_reg_virt or
virtual_intr_delivery. This also means that
sysctl follows the conditionals in vmx_vlapic_msr_changed().

For that purpose, also add an arch-specific "capabilities" parameter
to struct xen_sysctl_physinfo.

Note that this interface is intended to be compatible with AMD so that
AVIC support can be introduced in a future patch. Unlike Intel that
has multiple controls for APIC Virtualization, AMD has one global
'AVIC Enable' control bit, so fine-graining of APIC virtualization
control cannot be done on a common interface.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
2 years agodocs: add reference to release cycle discussion
Juergen Gross [Mon, 11 Jul 2022 10:13:40 +0000 (12:13 +0200)]
docs: add reference to release cycle discussion

As it is coming up basically every release cycle of Xen, add a
reference to the discussion why the current release scheme has been
selected in the release management documentation.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Henry Wang <Henry.Wang@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agotools/examples: cleanup Makefile
Anthony PERARD [Mon, 11 Jul 2022 10:13:24 +0000 (12:13 +0200)]
tools/examples: cleanup Makefile

Don't check if a target exist before installing it. For directory,
install doesn't complain, and for file it would prevent from updating
them. Also remove the existing loop and instead install all files with
a single call to $(INSTALL_DATA).

Remove XEN_CONFIGS-y which isn't used.

Remove "build" target.

Add an empty line after the first comment. The comment isn't about
$(XEN_READMES), it is about the makefile as a whole.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
2 years agotools/console: have one Makefile per program/directory
Anthony PERARD [Mon, 11 Jul 2022 10:13:07 +0000 (12:13 +0200)]
tools/console: have one Makefile per program/directory

Sources of both xenconsoled and xenconsole are already separated into
different directory and don't share anything in common. Having two
different Makefile means it's easier to deal with *FLAGS.

Some common changes:
Rename $(BIN) to $(TARGETS), this will be useful later.
Stop removing *.so *.rpm *.a as they aren't created here.
Use $(OBJS-y) to list objects.
Update $(CFLAGS) for the directory rather than a single object.

daemon:
    Remove the need for $(LDLIBS_xenconsoled), use $(LDLIBS) instead.
    Remove the need for $(CONSOLE_CFLAGS-y) and use $(CFLAGS-y)
instead.

client:
    Remove the unused $(LDLIBS_xenconsole)

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
2 years agoxen/x86: remove cf_check attribute from hypercall handlers
Juergen Gross [Mon, 11 Jul 2022 10:11:17 +0000 (12:11 +0200)]
xen/x86: remove cf_check attribute from hypercall handlers

Now that the hypercall handlers are all being called directly instead
through a function vector, the "cf_check" attribute can be removed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com> # xsm parts
Acked-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Téo Couprie Diaz <teo.coupriediaz@arm.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
2 years agoxen/arm: call hypercall handlers via generated macro
Juergen Gross [Mon, 11 Jul 2022 10:09:48 +0000 (12:09 +0200)]
xen/arm: call hypercall handlers via generated macro

Instead of using a function table use the generated macros for calling
the appropriate hypercall handlers.

This makes the calls of the handlers type safe.

For deprecated hypercalls define stub functions.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Tested-by: Michal Orzel <michal.orzel@arm.com>
2 years agoxen/x86: call hypercall handlers via generated macro
Juergen Gross [Mon, 11 Jul 2022 10:09:13 +0000 (12:09 +0200)]
xen/x86: call hypercall handlers via generated macro

Instead of using a function table use the generated macros for calling
the appropriate hypercall handlers.

This is beneficial to performance and avoids speculation issues.

With calling the handlers using the correct number of parameters now
it is possible to do the parameter register clobbering in the NDEBUG
case after returning from the handler. With the additional generated
data the hard coded hypercall_args_table[] can be replaced by tables
using the generated number of parameters.

Note that this change modifies behavior of clobbering registers in a
minor way: in case a hypercall is returning -ENOSYS (or the unsigned
equivalent thereof) for any reason the parameter registers will no
longer be clobbered. This should be of no real concern, as those cases
ought to be extremely rare and reuse of the registers in those cases
seems rather far fetched.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen: use generated prototypes for hypercall handlers
Juergen Gross [Mon, 11 Jul 2022 10:08:03 +0000 (12:08 +0200)]
xen: use generated prototypes for hypercall handlers

Remove the hypercall handler's prototypes in the related header files
and use the generated ones instead.

Some handlers having been static before need to be made globally
visible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agoxen: generate hypercall interface related code
Juergen Gross [Mon, 11 Jul 2022 10:06:19 +0000 (12:06 +0200)]
xen: generate hypercall interface related code

Instead of repeating similar data multiple times use a single source
file and a generator script for producing prototypes and call sequences
of the hypercalls.

As the script already knows the number of parameters used add generating
a macro for populating an array with the number of parameters per
hypercall.

The priorities for the specific hypercalls are based on two benchamrks
performed in guests (PV and PVH):

- make -j 4 of the Xen hypervisor (resulting in cpu load with lots of
  processes created)
- scp of a large file to the guest (network load)

With a small additional debug patch applied the number of the
different hypercalls in the guest and in dom0 (for looking at backend
activity related hypercalls) were counted while the benchmark in domU
was running:

PV-hypercall    PV-guest build   PV-guest scp    dom0 build     dom0 scp
mmu_update           186175729           2865         20936        33725
stack_switch           1273311          62381        108589       270764
multicall              2182803             50           302          524
update_va_mapping       571868             10            60           80
xen_version              73061            850           859         5432
grant_table_op               0              0         35557       139110
iret                  75673006         484132        268157       757958
vcpu_op                 453037          71199        138224       334988
set_segment_base       1650249          62387        108645       270823
mmuext_op             11225681            188          7239         3426
sched_op                280153         134645         70729       137943
event_channel_op        192327          66204         71409       214191
physdev_op                   0              0          7721         4315
(the dom0 values are for the guest running the build or scp test, so
dom0 acting as backend)

HVM-hypercall   PVH-guest build    PVH-guest scp
vcpu_op                  277684             2324
event_channel_op         350233            57383
(the related dom0 counter values are in the same range as with the test
running in the PV guest)

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agoxen: include compat/platform.h from hypercall.h
Juergen Gross [Mon, 11 Jul 2022 09:59:16 +0000 (11:59 +0200)]
xen: include compat/platform.h from hypercall.h

The definition of compat_platform_op_t is in compat/platform.h
already, so include that file from hypercall.h instead of repeating
the typedef.

This allows to remove the related include statement from
arch/x86/x86_64/platform_hypercall.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoxen: harmonize return types of hypercall handlers
Juergen Gross [Mon, 11 Jul 2022 09:58:21 +0000 (11:58 +0200)]
xen: harmonize return types of hypercall handlers

Today most hypercall handlers have a return type of long, while the
compat ones return an int. There are a few exceptions from that rule,
however.

Get rid of the exceptions by letting compat handlers always return int
and others always return long, with the exception of the Arm specific
physdev_op handler.

For the compat hvm case use eax instead of rax for the stored result as
it should have been from the beginning.

Additionally move some prototypes to include/asm-x86/hypercall.h
as they are x86 specific. Move the compat_platform_op() prototype to
the common header.

Rename paging_domctl_continuation() to do_paging_domctl_cont() and add
a matching define for the associated hypercall.

Make do_callback_op() and compat_callback_op() more similar by adding
the const attribute to compat_callback_op()'s 2nd parameter.

Change the type of the cmd parameter for [do|compat]_kexec_op() to
unsigned int, as this is more appropriate for the compat case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com> # argo
2 years agoConfig.mk: use newest Mini-OS commit
Juergen Gross [Fri, 8 Jul 2022 07:42:27 +0000 (09:42 +0200)]
Config.mk: use newest Mini-OS commit

Switch to use the newest Mini-OS commit in order to get the recent
fixes.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2 years agoupdate SUPPORT.md for static allocation
Penny Zheng [Fri, 8 Jul 2022 07:42:14 +0000 (09:42 +0200)]
update SUPPORT.md for static allocation

SUPPORT.md doesn't seem to explicitly say whether static memory is
supported, so this commit updates SUPPORT.md to add feature static
allocation tech preview for now.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
2 years agoxen/pv_console: Fix MISRA C 2012 Rule 2.1 violation
Xenia Ragiadakou [Fri, 8 Jul 2022 07:41:36 +0000 (09:41 +0200)]
xen/pv_console: Fix MISRA C 2012 Rule 2.1 violation

Remove the definition of the function pv_console_evtchn(),
when CONFIG_XEN_GUEST is not set, because the function is not used.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Jiamei Xie <jiamei.xie@arm.com> # arm64
2 years agoxen/time: fix MISRA C 2012 Rule 8.7 violation
Xenia Ragiadakou [Wed, 6 Jul 2022 11:07:43 +0000 (13:07 +0200)]
xen/time: fix MISRA C 2012 Rule 8.7 violation

The variable __mon_lengths is referenced only in time.c.
Change its linkage from external to internal by adding the storage-class
specifier static to its definitions.

Also, this patch resolves indirectly a MISRA C 2012 Rule 8.4 violation warning.

Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
Acked-by: Julien Grall <jgrall@amazon.com>
2 years agox86/Kconfig: add option for default x2APIC destination mode
Roger Pau Monné [Wed, 6 Jul 2022 11:06:57 +0000 (13:06 +0200)]
x86/Kconfig: add option for default x2APIC destination mode

Allow setting the default x2APIC destination mode from Kconfig to
Physical.

Note the default destination mode is still Logical (Cluster) mode.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>