]> xenbits.xensource.com Git - xen.git/log
xen.git
4 years agox86emul: support RDPKRU/WRPKRU
Jan Beulich [Mon, 3 May 2021 13:31:43 +0000 (15:31 +0200)]
x86emul: support RDPKRU/WRPKRU

Since we support PKU for HVM guests, the respective insns should also be
recognized by the emulator.

In emul_test_read_cr() instead of further extending the comment to
explain the hex numbers, switch to using X86_CR4_* values.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoiommu: move iommu_update_ire_from_msi() to xen/iommu.h
Rahul Singh [Mon, 3 May 2021 13:30:57 +0000 (15:30 +0200)]
iommu: move iommu_update_ire_from_msi() to xen/iommu.h

Move iommu_update_ire_from_msi(..) from passthrough/pci.c to
xen/iommu.h and wrap it under CONFIG_X86 as it is referenced in x86
code only to avoid compilation error for other architecture when
HAS_PCI is enabled.

No functional change intended.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/p2m: re-arrange struct p2m_domain
Jan Beulich [Mon, 3 May 2021 13:30:16 +0000 (15:30 +0200)]
x86/p2m: re-arrange struct p2m_domain

Combine two HVM-specific sections in two cases (i.e. going from four of
them to just two). Make defer_nested_flush bool and HVM-only, moving it
next to other nested stuff. Move default_access up into a padding hole.

When moving them anyway, also adjust comment style.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/p2m: write_p2m_entry_{pre,post} hooks are HVM-only
Jan Beulich [Mon, 3 May 2021 13:29:49 +0000 (15:29 +0200)]
x86/p2m: write_p2m_entry_{pre,post} hooks are HVM-only

Move respective shadow code to its HVM-only source file, thus making it
possible to exclude the hooks as well. This then shows that
shadow_p2m_init() also isn't needed in !HVM builds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/p2m: {get,set}_entry hooks and p2m-pt.c are HVM-only
Jan Beulich [Mon, 3 May 2021 13:29:24 +0000 (15:29 +0200)]
x86/p2m: {get,set}_entry hooks and p2m-pt.c are HVM-only

With the hooks no longer needing setting, p2m_pt_init() degenerates to
(about) nothing when !HVM. As a result, p2m-pt.c doesn't need building
anymore in this case, as long as p2m_pt_init() has proper surrogates put
in place.

Unfortunately this means some new #ifdef-ary in p2m.c, but the mid-term
plan there is to get rid of (almost) all of it by splitting out the then
hopefully very few remaining non-HVM pieces.

While the movement of the paging_mode_translate() check from
p2m_remove_page() to guest_physmap_remove_page() may not be strictly
necessary anymore (it was in an early version of this change), it looks
more logical to live in the latter function, allowing to avoid acquiring
the lock in the PV case. All other callers of p2m_remove_page() already
have such a check anyway (in the altp2m case up the call stack).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86: make mem-paging configurable and default it to off
Jan Beulich [Mon, 3 May 2021 13:28:53 +0000 (15:28 +0200)]
x86: make mem-paging configurable and default it to off

... for being unsupported.

While doing so, make the option dependent upon HVM, which really is the
main purpose of the change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/p2m: the recalc hook is HVM-only
Jan Beulich [Mon, 3 May 2021 13:28:33 +0000 (15:28 +0200)]
x86/p2m: the recalc hook is HVM-only

Exclude functions involved in its use from !HVM builds, thus making it
possible to exclude the hook as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/p2m: hardware-log-dirty related hooks are HVM-only
Jan Beulich [Mon, 3 May 2021 13:28:16 +0000 (15:28 +0200)]
x86/p2m: hardware-log-dirty related hooks are HVM-only

Exclude functions using them from !HVM builds, thus making it possible
to exclude the hooks as well.

By moving an #endif in p2m.c (instead of introducing yet another one)
p2m_{get,set}_ioreq_server() get excluded for !HVM builds as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/p2m: change_entry_type_* hooks are HVM-only
Jan Beulich [Mon, 3 May 2021 13:27:56 +0000 (15:27 +0200)]
x86/p2m: change_entry_type_* hooks are HVM-only

Exclude functions using them from !HVM builds, thus making it possible
to exclude the hooks as well. Also cover the already unused
memory_type_changed hook while inserting the #ifdef in the struct.

While no respective check was there, I can't see how
XEN_DOMCTL_set_broken_page_p2m could have been meant to work for PV the
way it is implemented. Restrict this operation to acting on HVM guests.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoAMD/IOMMU: guest IOMMU support is for HVM only
Jan Beulich [Mon, 3 May 2021 13:27:42 +0000 (15:27 +0200)]
AMD/IOMMU: guest IOMMU support is for HVM only

Generally all of this is dead code anyway, but there's a caller of
guest_iommu_add_ppr_log(), and the code itself calls
p2m_change_type_one(), which is about to become HVM-only. Hence this
code, short of deleting it altogether, needs to become properly HVM-
only as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/mm: the gva_to_gfn() hook is HVM-only
Jan Beulich [Mon, 3 May 2021 13:27:21 +0000 (15:27 +0200)]
x86/mm: the gva_to_gfn() hook is HVM-only

As is the adjacent ga_to_gfn() one as well as paging_gva_to_gfn().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/p2m: {,un}map_mmio_regions() are HVM-only
Jan Beulich [Mon, 3 May 2021 13:26:08 +0000 (15:26 +0200)]
x86/p2m: {,un}map_mmio_regions() are HVM-only

Mirror the "translated" check the functions do to do_domctl(), allowing
the calls to be DCEd by the compiler. Add ASSERT_UNREACHABLE() to the
original checks.

Also arrange for {set,clear}_mmio_p2m_entry() and
{set,clear}_identity_p2m_entry() to respectively live next to each
other, such that clear_mmio_p2m_entry() can also be covered by the
#ifdef already covering set_mmio_p2m_entry().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/p2m: set_{foreign,mmio}_p2m_entry() are HVM-only
Jan Beulich [Mon, 3 May 2021 13:17:19 +0000 (15:17 +0200)]
x86/p2m: set_{foreign,mmio}_p2m_entry() are HVM-only

Extend a respective #ifdef from inside set_typed_p2m_entry() to around
all three functions. Add ASSERT_UNREACHABLE() to the latter one's safety
check path.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/svm: Enumeration for CET
Andrew Cooper [Tue, 21 Apr 2020 16:43:56 +0000 (17:43 +0100)]
x86/svm: Enumeration for CET

On CET-capable hardware, VMRUN/EXIT unconditionally swaps S_CET, SSP and
ISST (subject to cleanbits) without further settings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/hvm: Introduce control register handling for CET
Andrew Cooper [Tue, 21 Apr 2020 16:43:56 +0000 (17:43 +0100)]
x86/hvm: Introduce control register handling for CET

Allow CR4.CET to be set, based on the CPUID policy (although these bits are
not selectable yet for VMs).  CR4.CET needs interlocing with CR0.WP to
prohibit CET && !WP as a legal combination.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86: Always have CR4.PKE set in HVM context
Andrew Cooper [Thu, 29 Apr 2021 13:28:43 +0000 (14:28 +0100)]
x86: Always have CR4.PKE set in HVM context

The sole user of read_pkru() is the emulated pagewalk, and guarded behind
guest_pku_enabled() which restricts the path to HVM (hap, even) context only.

The commentary in read_pkru() concerning _PAGE_GNTTAB overlapping with
_PAGE_PKEY_BITS is only applicable to PV guests.

The context switch path, via write_ptbase() unconditionally writes CR4 on any
context switch.

Therefore, we can guarantee to separate CR4.PKE between PV and HVM context at
no extra cost.  Set PKE in mmu_cr4_features on boot, so it becomes set in HVM
context, and clear it in pv_make_cr4().

Rename read_pkru() to rdpkru() now that it is a simple wrapper around the
instruction.  This saves two CR4 writes on every pagewalk, which typically
occur more than one per emulation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agotests/cpu-policy: add sorted MSR test
Roger Pau Monne [Tue, 13 Apr 2021 12:51:44 +0000 (14:51 +0200)]
tests/cpu-policy: add sorted MSR test

Further changes will rely on MSR entries being sorted, so add a test
to assert it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolibs/guest: introduce a helper to apply a cpu policy to a domain
Roger Pau Monne [Wed, 17 Mar 2021 14:30:57 +0000 (15:30 +0100)]
libs/guest: introduce a helper to apply a cpu policy to a domain

Such helper is very similar to the existing xc_set_domain_cpu_policy
interface, but takes an opaque xc_cpu_policy_t instead of arrays of
CPUID leaves and MSRs.

No callers of the interface introduced in this patch.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agotools: switch existing users of xc_get_{system,domain}_cpu_policy
Roger Pau Monne [Mon, 22 Mar 2021 10:59:04 +0000 (11:59 +0100)]
tools: switch existing users of xc_get_{system,domain}_cpu_policy

With the introduction of xc_cpu_policy_get_{system,domain} and
xc_cpu_policy_serialise the current users of
xc_get_{system,domain}_cpu_policy can be switched to the new
interface.

Note that xc_get_{system,domain}_cpu_policy is removed from the public
interface and the functions are made static, since there are still
internal consumers in xg_cpuid_x86.c

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolibs/guest: introduce helper to serialize a cpu policy
Roger Pau Monne [Wed, 17 Mar 2021 14:31:50 +0000 (15:31 +0100)]
libs/guest: introduce helper to serialize a cpu policy

Such helper allow converting a cpu policy into an array of
xen_cpuid_leaf_t and xen_msr_entry_t elements, which matches the
current interface of the CPUID/MSR functions. This is required in
order for the user to be able to parse the CPUID/MSR data.

No user of the interface introduced in this patch.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agolibs/guest: introduce helper to fetch a domain cpu policy
Roger Pau Monne [Wed, 17 Mar 2021 13:46:11 +0000 (14:46 +0100)]
libs/guest: introduce helper to fetch a domain cpu policy

Such helper is based on the existing functions to fetch a CPUID and
MSR policies, but uses the xc_cpu_policy_t type to return the data to
the caller.

No user of the interface introduced on the patch.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolibs/guest: introduce helper to fetch a system cpu policy
Roger Pau Monne [Wed, 17 Mar 2021 13:45:41 +0000 (14:45 +0100)]
libs/guest: introduce helper to fetch a system cpu policy

Such helper is based on the existing functions to fetch a CPUID and
MSR policies, but uses the xc_cpu_policy_t type to return the data to
the caller.

Note some helper functions are introduced, those are split from
xc_cpu_policy_get_system because they will be used by other functions
also.

No user of the interface introduced on the patch.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolibs/guest: introduce xc_cpu_policy_t
Roger Pau Monne [Tue, 16 Mar 2021 15:39:00 +0000 (16:39 +0100)]
libs/guest: introduce xc_cpu_policy_t

Introduce an opaque type that is used to store the CPUID and MSRs
policies of a domain. Such type uses the existing {cpuid,msr}_policy
structures to store the data, but doesn't expose the type to the users
of the xenguest library. There are also two arrays to allow for easier
serialization without requiring an allocation each time.

Introduce an allocation (init) and freeing function (destroy) to
manage the type.

Note the type is not yet used anywhere.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolibs/guest: rename xc_get_cpu_policy_size to xc_cpu_policy_get_size
Roger Pau Monne [Mon, 22 Mar 2021 09:37:35 +0000 (10:37 +0100)]
libs/guest: rename xc_get_cpu_policy_size to xc_cpu_policy_get_size

Preparatory change to introduce a new set of xc_cpu_policy_* functions
that will replace the current CPUID/MSR helpers.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/oprofile: remove compat accessors usage from backtrace
Roger Pau Monné [Thu, 29 Apr 2021 14:05:00 +0000 (16:05 +0200)]
x86/oprofile: remove compat accessors usage from backtrace

Remove the unneeded usage of the compat layer to copy frame pointers
from guest address space. Instead just use raw_copy_from_guest.

While there change the accessibility check of one frame_head beyond to
be performed as part of the copy, like it's done in the Linux code in
5.11 and earlier versions. Note it's unclear why this is needed.

Also drop the explicit truncation of the head pointer in the 32bit
case as all callers already pass a zero extended value. The first
value being rsp from the guest registers, and further calls will use
ebp from frame_head_32bit struct.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86: correct comment about alternatives ordering
Jan Beulich [Thu, 29 Apr 2021 14:04:35 +0000 (16:04 +0200)]
x86: correct comment about alternatives ordering

Unlike Linux, Xen has never (so far) used alternatives patching for
memcpy() or memset(), even less such utilizing multiple alternatives.
Correct the Linux-inherited comment to match reality.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/cpuid: do not expand max leaves on restore
Roger Pau Monné [Thu, 29 Apr 2021 14:04:11 +0000 (16:04 +0200)]
x86/cpuid: do not expand max leaves on restore

When restoring limit the maximum leaves to the ones supported by Xen
4.12 in order to not expand the maximum leaves a guests sees. Note
this is unlikely to cause real issues.

Guests restored from Xen versions 4.13 or greater will contain CPUID
data on the stream that will override the values set by
xc_cpuid_apply_policy.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/decompress: drop STATIC and INIT
Jan Beulich [Thu, 29 Apr 2021 14:03:38 +0000 (16:03 +0200)]
xen/decompress: drop STATIC and INIT

Except for one last instance, all users have been removed in earlier
changes.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agounzstd: replace INIT and STATIC
Jan Beulich [Thu, 29 Apr 2021 14:02:59 +0000 (16:02 +0200)]
unzstd: replace INIT and STATIC

With xen/common/decompress.h now agreeing in both build modes about
what STATIC expands to, there's no need for these abstractions anymore.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agox86/shadow: streamline shadow_get_page_from_l1e()
Jan Beulich [Tue, 27 Apr 2021 12:36:13 +0000 (14:36 +0200)]
x86/shadow: streamline shadow_get_page_from_l1e()

Trying get_page_from_l1e() up to three times isn't helpful; in debug
builds it may lead to log messages making things look as if there was a
problem somewhere. And there's no need to have more than one try: The
function can only possibly succeed for one domain passed as 3rd
argument (unless the page is an MMIO one to which both have access,
but MMIO pages should be "got" by specifying the requesting domain
anyway). Re-arrange things so just the one call gets made which has a
chance of succeeding.

The code could in principle be arranged such that there's only a single
call to get_page_from_l1e(), but the conditional would become pretty
complex then and hence hard to follow / understand / adjust.

The redundant (with shadow_mode_refcounts()) shadow_mode_translate()
gets dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/shadow: re-use variables in shadow_get_page_from_l1e()
Jan Beulich [Tue, 27 Apr 2021 12:35:49 +0000 (14:35 +0200)]
x86/shadow: re-use variables in shadow_get_page_from_l1e()

There's little point in doing multiple mfn_to_page() or page_get_owner()
on all the same MFN. Calculate them once at the start of the function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agoVMX: use a single, global APIC access page
Jan Beulich [Tue, 27 Apr 2021 12:34:59 +0000 (14:34 +0200)]
VMX: use a single, global APIC access page

The address of this page is used by the CPU only to recognize when to
access the virtual APIC page instead. No accesses would ever go to this
page. It only needs to be present in the (CPU) page tables so that
address translation will produce its address as result for respective
accesses.

By making this page global, we also eliminate the need to refcount it,
or to assign it to any domain in the first place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/EFI: don't have an overly large image size
Jan Beulich [Mon, 26 Apr 2021 08:26:04 +0000 (10:26 +0200)]
x86/EFI: don't have an overly large image size

While without debug info the difference is benign (so far), since we pad
the image to 16Mb anyway, forcing the .reloc section to a 2Mb boundary
causes subsequent .debug_* sections to go farther beyond 16Mb than
needed. There's no reason to advance . for establishing __2M_rwdata_end,
as all data past _end is of no interest at runtime anymore anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/EFI: keep debug info in xen.efi
Jan Beulich [Mon, 26 Apr 2021 08:25:10 +0000 (10:25 +0200)]
x86/EFI: keep debug info in xen.efi

... provided the linker supports it (which it does as of commit
2dfa8341e079 ["ELF DWARF in PE output"]).

Without mentioning debugging sections, the linker would put them at
VA 0, thus making them unreachable by 32-bit (relative or absolute)
relocations. If relocations were resolvable (or absent) the resulting
binary would have invalid section RVAs (0 - __image_base__, truncated to
32 bits). Mentioning debugging sections without specifying an address
will result in the linker putting them all on the same RVA. A loader is,
afaict, free to reject loading such an image, as sections shouldn't
overlap. (The above describes GNU ld 2.36 behavior, which - if deemed
buggy - could change.)

Make sure our up-to-16Mb padding doesn't unnecessarily further extend
the image.

Take the opportunity and also switch to using $(call ld-option,...).

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/EFI: sections may not live at VA 0 in PE binaries
Jan Beulich [Mon, 26 Apr 2021 08:24:20 +0000 (10:24 +0200)]
x86/EFI: sections may not live at VA 0 in PE binaries

PE binaries specify section addresses by (32-bit) RVA. GNU ld up to at
least 2.36 would silently truncate the (negative) difference when a
section is placed below the image base. Such sections would also be
wrongly placed ahead of all "normal" ones. Since, for the time being,
we build xen.efi with --strip-debug anyway, .stab* can't appear. And
.comment has an entry in /DISCARD/ already anyway in the EFI case.

Because of their unclear origin, keep the directives for the ELF case
though.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/intel: insert Ice Lake-SP and Ice Lake-D model numbers
Igor Druzhinin [Mon, 26 Apr 2021 08:22:48 +0000 (10:22 +0200)]
x86/intel: insert Ice Lake-SP and Ice Lake-D model numbers

LBR, C-state MSRs should correspond to Ice Lake desktop according to
SDM rev. 74 for both models.

Ice Lake-SP is known to expose IF_PSCHANGE_MC_NO in IA32_ARCH_CAPABILITIES MSR
(as advisory tells and Whitley SDP confirms) which means the erratum is fixed
in hardware for that model and therefore it shouldn't be present in
has_if_pschange_mc list. Provisionally assume the same to be the case
for Ice Lake-D.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agox86/vtx: add LBR_SELECT to the list of LBR MSRs
Igor Druzhinin [Mon, 26 Apr 2021 08:22:04 +0000 (10:22 +0200)]
x86/vtx: add LBR_SELECT to the list of LBR MSRs

This MSR exists since Nehalem / Silvermont and is actively used by Linux,
for instance, to improve sampling efficiency.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agox86/vPMU: Extend vPMU support to version 5
Igor Druzhinin [Mon, 26 Apr 2021 08:21:09 +0000 (10:21 +0200)]
x86/vPMU: Extend vPMU support to version 5

Version 5 is backwards compatible with version 3. This allows to enable
vPMU on Ice Lake CPUs.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agoVT-d: Don't assume register-based invalidation is always supported
Chao Gao [Mon, 26 Apr 2021 08:16:50 +0000 (10:16 +0200)]
VT-d: Don't assume register-based invalidation is always supported

According to Intel VT-d SPEC rev3.3 Section 6.5, Register-based Invalidation
isn't supported by Intel VT-d version 6 and beyond.

This hardware change impacts following two scenarios: admin can disable
queued invalidation via 'qinval' cmdline and use register-based interface;
VT-d switches to register-based invalidation when queued invalidation needs
to be disabled, for example, during disabling x2apic or during system
suspension or after enabling queued invalidation fails.

To deal with this hardware change, if register-based invalidation isn't
supported, queued invalidation cannot be disabled through Xen cmdline; and
if queued invalidation has to be disabled temporarily in some scenarios,
VT-d won't switch to register-based interface but use some dummy functions
to catch errors in case there is any invalidation request issued when queued
invalidation is disabled.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agotools/xenstored: Wire properly the command line option -M/--path-max
Julien Grall [Wed, 21 Apr 2021 13:56:38 +0000 (14:56 +0100)]
tools/xenstored: Wire properly the command line option -M/--path-max

The command line option -M/--path-max was meant to be added by
commit 924bf8c793cb "tools/xenstore: rework path length check" but this
wasn't wired through properly.

Fix it by adding the missing "case 'M':".

Fixes: 924bf8c793cb ("tools/xenstore: rework path length check")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agotools/xenstored: Remove unused prototype
Julien Grall [Tue, 20 Apr 2021 13:46:06 +0000 (14:46 +0100)]
tools/xenstored: Remove unused prototype

A prototype for dump_conn() has been present for quite a long time
but there are no implementation. Even, AFAICT in the patch that
introduced it. So drop it.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agox86/shadow: depend on PV || HVM
Jan Beulich [Fri, 16 Apr 2021 12:32:46 +0000 (14:32 +0200)]
x86/shadow: depend on PV || HVM

With the building of guest_?.o now depending on PV or HVM, without
further #ifdef-ary shadow code won't link anymore when !PV && !HVM.
Since this isn't a useful configuration anyway, exclude shadow code from
being built in this case.

Fixes: aff8bf94ce65 ("x86/shadow: only 4-level guest code needs building when !HVM")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/pv: fix clang build without CONFIG_PV32
Roger Pau Monné [Fri, 23 Apr 2021 13:58:37 +0000 (15:58 +0200)]
x86/pv: fix clang build without CONFIG_PV32

Clang reports the following build error without CONFIG_PV32:

hypercall.c:253:10: error: variable 'op' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
    if ( !is_pv_32bit_vcpu(curr) )
         ^~~~~~~~~~~~~~~~~~~~~~~
hypercall.c:282:21: note: uninitialized use occurs here
    return unlikely(op == __HYPERVISOR_iret)
                    ^~
/root/src/xen/xen/include/xen/compiler.h:21:43: note: expanded from macro 'unlikely'
#define unlikely(x)   __builtin_expect(!!(x),0)
                                          ^
hypercall.c:253:5: note: remove the 'if' if its condition is always true
    if ( !is_pv_32bit_vcpu(curr) )
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
hypercall.c:251:21: note: initialize the variable 'op' to silence this warning
    unsigned long op;
                    ^
                     = 0

Rearrange the code in arch_do_multicall_call so that the if guards the
32bit branch and when CONFIG_PV32 is not set there's no conditional at
all.

Fixes: 527922008bc ('x86: slim down hypercall handling when !PV32')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/oprof: fix !HVM && !PV32 build
Jan Beulich [Fri, 23 Apr 2021 13:57:27 +0000 (15:57 +0200)]
x86/oprof: fix !HVM && !PV32 build

clang, at the very least, doesn't like unused inline functions, unless
their definitions live in a header.

Fixes: d23d792478 ("x86: avoid building COMPAT code when !HVM && !PV32")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/cpuid: support LFENCE always serialising CPUID bit
Roger Pau Monné [Thu, 15 Apr 2021 14:47:31 +0000 (16:47 +0200)]
x86/cpuid: support LFENCE always serialising CPUID bit

AMD Milan (Zen3) CPUs have an LFENCE Always Serialising CPUID bit in
leaf 80000021.eax. Previous AMD versions used to have a user settable
bit in DE_CFG MSR to select whether LFENCE was dispatch serialising,
which Xen always attempts to set. The forcefully always on setting is
due to the addition of SEV-SNP so that a VMM cannot break the
confidentiality of a guest.

In order to support this new CPUID bit move the LFENCE_DISPATCH
synthetic CPUID bit to map the hardware bit (leaving a hole in the
synthetic range) and either rely on the bit already being set by the
native CPUID output, or attempt to fake it in Xen by modifying the
DE_CFG MSR. This requires adding one more entry to the featureset to
support leaf 80000021.eax.

The bit is always exposed to guests by default even if the underlying
hardware doesn't support leaf 80000021. Note that Xen doesn't allow
guests to change the DE_CFG value, so once set by Xen LFENCE will always
be serialising.

Note that the access to DE_CFG by guests is left as-is: reads will
unconditionally return LFENCE_SERIALISE bit set, while writes are
silently dropped.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Always expose to guests by default]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/mm: fix wrong unmap call
Hongyan Xia [Thu, 22 Apr 2021 17:42:30 +0000 (18:42 +0100)]
x86/mm: fix wrong unmap call

Commit 'x86/mm: switch to new APIs in modify_xen_mappings' applied the
hunk of the unmap call to map_pages_to_xen() which was wrong and clearly
should have been at the end of modify_xen_mappings(). Fix.

Fixes: dd68f2e49bea ("x86/mm: switch to new APIs in modify_xen_mappings")
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Tested-by: Julien Grall <jgrall@amazon.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoautomation: build in openSUSE Tumbleweed
Dario Faggioli [Wed, 31 Jul 2019 16:58:51 +0000 (18:58 +0200)]
automation: build in openSUSE Tumbleweed

Mark the tests as non-fatal, as Tumbleweed is a bleeding edge rolling release.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolib: move strsep()
Jan Beulich [Thu, 22 Apr 2021 12:53:21 +0000 (14:53 +0200)]
lib: move strsep()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strpbrk()
Jan Beulich [Thu, 22 Apr 2021 12:53:10 +0000 (14:53 +0200)]
lib: move strpbrk()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strspn()
Jan Beulich [Thu, 22 Apr 2021 12:52:57 +0000 (14:52 +0200)]
lib: move strspn()

Allow the function to be individually linkable, discardable, and
overridable. In fact the function is unused at present, and hence will
now get omitted from the final binaries.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move/rename strnicmp() to strncasecmp()
Jan Beulich [Thu, 22 Apr 2021 12:51:47 +0000 (14:51 +0200)]
lib: move/rename strnicmp() to strncasecmp()

While moving the implementation, also rename it to match strcasecmp(),
allowing the similar use of a compiler builtin in this case as well.

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strcasecmp()
Jan Beulich [Thu, 22 Apr 2021 12:51:08 +0000 (14:51 +0200)]
lib: move strcasecmp()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strstr()
Jan Beulich [Thu, 22 Apr 2021 12:50:54 +0000 (14:50 +0200)]
lib: move strstr()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strrchr()
Jan Beulich [Thu, 22 Apr 2021 12:50:44 +0000 (14:50 +0200)]
lib: move strrchr()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strchr()
Jan Beulich [Thu, 22 Apr 2021 12:50:25 +0000 (14:50 +0200)]
lib: move strchr()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strlcat()
Jan Beulich [Thu, 22 Apr 2021 12:49:10 +0000 (14:49 +0200)]
lib: move strlcat()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strlcpy()
Jan Beulich [Thu, 22 Apr 2021 12:48:59 +0000 (14:48 +0200)]
lib: move strlcpy()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strncmp()
Jan Beulich [Thu, 22 Apr 2021 12:48:38 +0000 (14:48 +0200)]
lib: move strncmp()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strcmp()
Jan Beulich [Thu, 22 Apr 2021 12:48:25 +0000 (14:48 +0200)]
lib: move strcmp()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strnlen()
Jan Beulich [Thu, 22 Apr 2021 12:48:14 +0000 (14:48 +0200)]
lib: move strnlen()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move strlen()
Jan Beulich [Thu, 22 Apr 2021 12:48:01 +0000 (14:48 +0200)]
lib: move strlen()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move memchr_inv()
Jan Beulich [Thu, 22 Apr 2021 12:45:33 +0000 (14:45 +0200)]
lib: move memchr_inv()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move memchr()
Jan Beulich [Thu, 22 Apr 2021 12:45:21 +0000 (14:45 +0200)]
lib: move memchr()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move memcmp()
Jan Beulich [Thu, 22 Apr 2021 12:45:06 +0000 (14:45 +0200)]
lib: move memcmp()

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move memmove()
Jan Beulich [Thu, 22 Apr 2021 12:44:53 +0000 (14:44 +0200)]
lib: move memmove()

By moving the function into an archive, x86 doesn't need to announce
anymore that is has its own implementation - symbol resolution by the
linker will now guarantee that the generic function remains unused, and
the forwarding to the compiler built-in gets done by the common header
anyway.

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move memcpy()
Jan Beulich [Thu, 22 Apr 2021 12:44:35 +0000 (14:44 +0200)]
lib: move memcpy()

By moving the function into an archive, x86 doesn't need to announce
anymore that is has its own implementation - symbol resolution by the
linker will now guarantee that the generic function remains unused, and
the forwarding to the compiler built-in gets done by the common header
anyway.

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: move memset()
Jan Beulich [Thu, 22 Apr 2021 12:42:31 +0000 (14:42 +0200)]
lib: move memset()

By moving the function into an archive, x86 doesn't need to announce
anymore that is has its own implementation - symbol resolution by the
linker will now guarantee that the generic function remains unused, and
the forwarding to the compiler built-in gets done by the common header
anyway.

Allow the function to be individually linkable, discardable, and
overridable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agox86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents
Jan Beulich [Thu, 22 Apr 2021 12:39:24 +0000 (14:39 +0200)]
x86/CPUID: shrink max_{,sub}leaf fields according to actual leaf contents

Zapping leaf data for out of range leaves is just one half of it: To
avoid guests (bogusly or worse) inferring information from mere leaf
presence, also shrink maximum indicators such that the respective
trailing entry is not all blank (unless of course it's the initial
subleaf of a leaf that's not the final one).

This is also in preparation of bumping the maximum basic leaf we
support, to ensure guests not getting exposed related features won't
observe a change in behavior.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agotools/libs/light: Remove unnecessary libxl_list_vm() call
Costin Lupu [Mon, 19 Apr 2021 13:01:42 +0000 (16:01 +0300)]
tools/libs/light: Remove unnecessary libxl_list_vm() call

The removed lines were initially added by commit 314e64084d31, but the
subsequent code which was using the nb_vm variable was later removed by
commit 2ba368d13893, which makes these lines of code an overlooked
reminiscence. Moreover, the call becomes very expensive when there is a
considerable number of VMs (~1000 instances) running on the host.

Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro>
Acked-by: Wei Liu <wl@xen.org>
4 years agoCI: Drop TravisCI
Andrew Cooper [Wed, 21 Apr 2021 09:16:13 +0000 (10:16 +0100)]
CI: Drop TravisCI

Travis-ci.org is shutting down shortly.  The arm cross-compile testing has
been broken for a long time now, and all testing has now been superseded by
our Gitlab infrastructure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agox86/shim: Simplify compat handling in write_start_info()
Andrew Cooper [Mon, 19 Apr 2021 14:33:05 +0000 (15:33 +0100)]
x86/shim: Simplify compat handling in write_start_info()

Factor out a compat boolean to remove the lfence overhead from multiple
is_pv_32bit_domain() calls.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/mm: drop _new suffix for page table APIs
Wei Liu [Thu, 22 Apr 2021 12:14:52 +0000 (14:14 +0200)]
x86/mm: drop _new suffix for page table APIs

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86: switch to use domheap page for page tables
Hongyan Xia [Thu, 22 Apr 2021 12:14:41 +0000 (14:14 +0200)]
x86: switch to use domheap page for page tables

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/mm: drop old page table APIs
Hongyan Xia [Thu, 22 Apr 2021 12:14:22 +0000 (14:14 +0200)]
x86/mm: drop old page table APIs

Two sets of old APIs, alloc/free_xen_pagetable() and lXe_to_lYe(), are
now dropped to avoid the dependency on direct map.

There are two special cases which still have not been re-written into
the new APIs, thus need special treatment:

rpt in smpboot.c cannot use ephemeral mappings yet. The problem is that
rpt is read and written in context switch code, but the mapping
infrastructure is NOT context-switch-safe, meaning we cannot map rpt in
one domain and unmap in another. Before the mapping infrastructure
supports context switches, rpt has to be globally mapped.

Also, lXe_to_lYe() during Xen image relocation cannot be converted into
map/unmap pairs. We cannot hold on to mappings while the mapping
infrastructure is being relocated! It is enough to remove the direct map
in the second e820 pass, so we still use the direct map (<4GiB) in Xen
relocation (which is during the first e820 pass).

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/smpboot: switch clone_mapping() to new APIs
Wei Liu [Thu, 22 Apr 2021 12:14:13 +0000 (14:14 +0200)]
x86/smpboot: switch clone_mapping() to new APIs

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/smpboot: add exit path for clone_mapping()
Wei Liu [Thu, 22 Apr 2021 12:14:03 +0000 (14:14 +0200)]
x86/smpboot: add exit path for clone_mapping()

We will soon need to clean up page table mappings in the exit path.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoefi: switch to new APIs in EFI code
Wei Liu [Thu, 22 Apr 2021 12:13:54 +0000 (14:13 +0200)]
efi: switch to new APIs in EFI code

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoefi: use new page table APIs in copy_mapping
Wei Liu [Thu, 22 Apr 2021 12:13:44 +0000 (14:13 +0200)]
efi: use new page table APIs in copy_mapping

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86_64/mm: switch to new APIs in setup_m2p_table
Wei Liu [Thu, 22 Apr 2021 12:13:34 +0000 (14:13 +0200)]
x86_64/mm: switch to new APIs in setup_m2p_table

While doing so, avoid repetitive mapping of l2_ro_mpt by keeping it
across loops, and only unmap and map it when crossing 1G boundaries.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86_64/mm: switch to new APIs in paging_init
Wei Liu [Thu, 22 Apr 2021 12:13:24 +0000 (14:13 +0200)]
x86_64/mm: switch to new APIs in paging_init

Map and unmap pages instead of relying on the direct map.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86_64/mm: introduce pl2e in paging_init
Wei Liu [Thu, 22 Apr 2021 12:13:13 +0000 (14:13 +0200)]
x86_64/mm: introduce pl2e in paging_init

We will soon map and unmap pages in paging_init(). Introduce pl2e so
that we can use l2_ro_mpt to point to the page table itself.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/mm: switch to new APIs in modify_xen_mappings
Wei Liu [Thu, 22 Apr 2021 12:13:02 +0000 (14:13 +0200)]
x86/mm: switch to new APIs in modify_xen_mappings

Page tables allocated in that function should be mapped and unmapped
now.

Note that pl2e now maybe mapped and unmapped in different iterations, so
we need to add clean-ups for that.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/mm: switch to new APIs in map_pages_to_xen
Wei Liu [Thu, 22 Apr 2021 12:12:51 +0000 (14:12 +0200)]
x86/mm: switch to new APIs in map_pages_to_xen

Page tables allocated in that function should be mapped and unmapped
now.

Take the opportunity to avoid a potential double map in
map_pages_to_xen() by initialising pl1e to NULL and only map it if it
was not mapped earlier.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/mm: rewrite virt_to_xen_l*e
Wei Liu [Thu, 22 Apr 2021 12:12:31 +0000 (14:12 +0200)]
x86/mm: rewrite virt_to_xen_l*e

Rewrite those functions to use the new APIs. Modify its callers to unmap
the pointer returned. Since alloc_xen_pagetable_new() is almost never
useful unless accompanied by page clearing and a mapping, introduce a
helper alloc_map_clear_xen_pt() for this sequence.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/EFI: avoid use of GNU ld's --disable-reloc-section when possible
Jan Beulich [Thu, 22 Apr 2021 11:29:49 +0000 (13:29 +0200)]
x86/EFI: avoid use of GNU ld's --disable-reloc-section when possible

As of commit 6fa7408d72b3 ("ld: don't generate base relocations in PE
output for absolute symbols") I'm feeling sufficiently confident in GNU
ld to use its logic for generating base relocations, which was enabled
for executables at some point last year (prior to that this would have
got done only for DLLs).

GNU ld, seeing the original relocations coming from the ELF object files,
generates different relocation types for our page tables (64-bit ones,
while mkreloc produces 32-bit ones). This requires also permitting and
handling that type in efi_arch_relocate_image().

Note that in the case that we leave base relocation generation to ld,
while efi/relocs-dummy.o then won't be linked into any executable
anymore, it still needs generating (and hence dependencies need to be
kept as they are) in order to have VIRT_BASE pulled out of it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86: drop use of prelink-efi.o
Jan Beulich [Thu, 22 Apr 2021 11:28:37 +0000 (13:28 +0200)]
x86: drop use of prelink-efi.o

Now that its contents matches prelink.o, use that one uniformly.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/EFI: redo .reloc section bounds determination
Jan Beulich [Thu, 22 Apr 2021 11:27:47 +0000 (13:27 +0200)]
x86/EFI: redo .reloc section bounds determination

There's no need to link relocs-dummy.o into the ELF binary. The two
symbols needed can as well be provided by the linker script. Then our
mkreloc tool also doesn't need to put them in the generated assembler
source.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/EFI: program headers are an ELF concept
Jan Beulich [Thu, 22 Apr 2021 11:27:06 +0000 (13:27 +0200)]
x86/EFI: program headers are an ELF concept

While they apparently do no harm when building xen.efi, their use is
potentially misleading. Conditionalize their use to be for just the ELF
binary we produce.

No change to the resulting binaries.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/time: yield to hyperthreads after updating TSC during rendezvous
Jan Beulich [Thu, 22 Apr 2021 11:26:26 +0000 (13:26 +0200)]
x86/time: yield to hyperthreads after updating TSC during rendezvous

Since we'd like the updates to be done as synchronously as possible,
make an attempt at yielding immediately after the TSC write.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/time: latch to-be-written TSC value early in rendezvous loop
Jan Beulich [Thu, 22 Apr 2021 11:25:53 +0000 (13:25 +0200)]
x86/time: latch to-be-written TSC value early in rendezvous loop

To reduce latency on time_calibration_tsc_rendezvous()'s last loop
iteration, read the value to be written on the last iteration at the end
of the loop body (i.e. in particular at the end of the second to last
iteration).

On my single-socket 18-core Skylake system this reduces the average loop
exit time on CPU0 (from the TSC write on the last iteration to until
after the main loop) from around 32k cycles to around 29k (albeit the
values measured on separate runs vary quite significantly).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoautomation: add arm32 cross-build tests for Xen
Stefano Stabellini [Thu, 15 Apr 2021 01:11:33 +0000 (18:11 -0700)]
automation: add arm32 cross-build tests for Xen

Add a debian build container with cross-gcc for arm32 installed.
Add build jobs to cross-compile Xen-only for arm32.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen/arm: smmuv1: Revert associating the group pointer with the S2CR
Rahul Singh [Fri, 16 Apr 2021 11:25:02 +0000 (12:25 +0100)]
xen/arm: smmuv1: Revert associating the group pointer with the S2CR

Revert the code that associates the group pointer with the S2CR as this
code causing an issue when the SMMU device has more than one master
device with same stream-id. This issue is introduced by commit
0435784cc75d ("xen/arm: smmuv1: Intelligent SMR allocation”

Reverting the code will not impact to use of SMMU if two devices use the
same stream-id but each device will be in a separate group. This is the same
behaviour before the code is merged.

Fixes: 0435784cc75d ("xen/arm: smmuv1: Intelligent SMR allocation”
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
4 years agoxen/arm64: Place a speculation barrier following an ret instruction
Julien Grall [Tue, 16 Jun 2020 15:33:12 +0000 (16:33 +0100)]
xen/arm64: Place a speculation barrier following an ret instruction

Some CPUs can speculate past a RET instruction and potentially perform
speculative accesses to memory before processing the return.

There is no known gadget available after the RET instruction today.
However some of the registers (such as in check_pending_guest_serror())
may contain a value provided by the guest.

In order to harden the code, it would be better to add a speculation
barrier after each RET instruction. The performance impact is meant to
be negligeable as the speculation barrier is not meant to be
architecturally executed.

Rather than manually inserting a speculation barrier, use a macro
which overrides the mnemonic RET and replace with RET + SB. We need to
use the opcode for RET to prevent any macro recursion.

This patch is only covering the assembly code. C code would need to be
covered separately using the compiler support.

Note that the definition of the macros sb needs to be moved earlier in
asm-arm/macros.h so it can be used by the new macro.

This is part of the work to mitigate straight-line speculation.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agox86/dpci: remove the dpci EOI timer
Roger Pau Monné [Tue, 20 Apr 2021 09:36:54 +0000 (11:36 +0200)]
x86/dpci: remove the dpci EOI timer

Current interrupt pass though code will setup a timer for each
interrupt injected to the guest that requires an EOI from the guest.
Such timer would perform two actions if the guest doesn't EOI the
interrupt before a given period of time. The first one is deasserting
the virtual line, the second is perform an EOI of the physical
interrupt source if it requires such.

The deasserting of the guest virtual line is wrong, since it messes
with the interrupt status of the guest. This seems to have been done
in order to compensate for missing deasserts when certain interrupt
controller actions are performed. The original motivation of the
introduction of the timer was to fix issues when a GSI was shared
between different guests. We believe that other changes in the
interrupt handling code (ie: proper propagation of EOI related actions
to dpci) will have fixed such errors now.

Performing an EOI of the physical interrupt source is redundant, since
there's already a timer that takes care of this for all interrupts,
not just the HVM dpci ones, see irq_guest_action_t struct eoi_timer
field.

Since both of the actions performed by the dpci timer are not
required, remove it altogether.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/vpic: issue dpci EOI for cleared pins at ICW1
Roger Pau Monné [Tue, 20 Apr 2021 09:36:09 +0000 (11:36 +0200)]
x86/vpic: issue dpci EOI for cleared pins at ICW1

When pins are cleared from either ISR or IRR as part of the
initialization sequence forward the clearing of those pins to the dpci
EOI handler, as it is equivalent to an EOI. Not doing so can bring the
interrupt controller state out of sync with the dpci handling logic,
that expects a notification when a pin has been EOI'ed.

Fixes: 7b3cb5e5416 ('IRQ injection changes for HVM PCI passthru.')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/vpic: don't trigger unmask event until end of init
Roger Pau Monné [Tue, 20 Apr 2021 09:35:29 +0000 (11:35 +0200)]
x86/vpic: don't trigger unmask event until end of init

Wait until the end of the init sequence to trigger the unmask event.
Note that it will be unconditionally triggered, but that's harmless if
not unmask actually happened.

While there change the variable type to bool.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/vpic: force int output to low when in init mode
Roger Pau Monné [Tue, 20 Apr 2021 09:34:53 +0000 (11:34 +0200)]
x86/vpic: force int output to low when in init mode

When the PIC is on the init sequence prevent interrupt delivery. The
state of the registers is in the process of being set during the init
phase, so it makes sense to prevent any int line changes during that
process.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/CPUID: add further "fast repeated string ops" feature flags
Jan Beulich [Mon, 19 Apr 2021 13:29:39 +0000 (15:29 +0200)]
x86/CPUID: add further "fast repeated string ops" feature flags

Like ERMS this can always be exposed to guests, but I guess once we
introduce full validation we want to make sure we don't reject incoming
policies with any of these set when in the raw/host policies they're
clear.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86: use is_pv_64bit_domain() to avoid double evaluate_nospec()
Jan Beulich [Mon, 19 Apr 2021 13:29:06 +0000 (15:29 +0200)]
x86: use is_pv_64bit_domain() to avoid double evaluate_nospec()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86: mem-access is HVM-only
Jan Beulich [Mon, 19 Apr 2021 13:28:00 +0000 (15:28 +0200)]
x86: mem-access is HVM-only

By excluding the file from being built for !HVM, #ifdef-ary can be
removed from it.

The new HVM dependency on the Kconfig option is benign for Arm.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>