]> xenbits.xensource.com Git - xen.git/log
xen.git
3 years agox86/spec-ctrl: Print all AMD speculative hints/features
Andrew Cooper [Wed, 8 Sep 2021 17:21:10 +0000 (18:21 +0100)]
x86/spec-ctrl: Print all AMD speculative hints/features

We already print Intel features that aren't yet implemented/used, so be
consistent on AMD too.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/msr: Cleanup of misc constants
Andrew Cooper [Fri, 25 May 2018 15:13:02 +0000 (16:13 +0100)]
x86/msr: Cleanup of misc constants

Move two blocks of MSRs into the cleaned up section, updating the style as
they move.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/msr: Clean up the MSR_EFER constants
Andrew Cooper [Fri, 25 May 2018 15:12:05 +0000 (16:12 +0100)]
x86/msr: Clean up the MSR_EFER constants

There are no remaining users of the bit position constants.  Move the used
constants into the cleaned-up area of msr-index.h and apply appropriate style.

Rename EFER_NX to EFER_NXE to match both the Intel and AMD specs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/amd: Use newer SSBD mechanisms if they exist
Andrew Cooper [Fri, 30 Nov 2018 17:17:38 +0000 (17:17 +0000)]
x86/amd: Use newer SSBD mechanisms if they exist

The opencoded legacy Memory Disambiguation logic in init_amd() neglected
Fam19h for the Zen3 microarchitecture.  Further more, all Zen2 based system
have the architectural MSR_SPEC_CTRL and the SSBD bit within it, so shouldn't
be using MSR_AMD64_LS_CFG.

Implement the algorithm given in AMD's SSBD whitepaper, and leave a
printk_once() behind in the case that no controls can be found.

This now means that a user explicitly choosing `spec-ctrl=ssbd` will properly
turn off Memory Disambiguation on Fam19h/Zen3 systems.

This still remains a single system-wide setting (for now), and is not context
switched between vCPUs.  As such, it doesn't interact with Intel's use of
MSR_SPEC_CTRL and default_xen_spec_ctrl (yet).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/amd: Enumeration for speculative features/hints
Andrew Cooper [Mon, 12 Jul 2021 14:13:32 +0000 (15:13 +0100)]
x86/amd: Enumeration for speculative features/hints

There is a step change in speculation protections between the Zen1 and Zen2
microarchitectures.

Zen1 and older have no special support.  Control bits in non-architectural
MSRs are used to make lfence be dispatch-serialising (Spectre v1), and to
disable Memory Disambiguation (Speculative Store Bypass).  IBPB was
retrofitted in a microcode update, and software methods are required for
Spectre v2 protections.

Because the bit controlling Memory Disambiguation is model specific,
hypervisors are expected to expose a MSR_VIRT_SPEC_CTRL interface which
abstracts the model specific details.

Zen2 and later implement the MSR_SPEC_CTRL interface in hardware, and
virtualise the interface for HVM guests to use.  A number of hint bits are
specified too to help guide OS software to the most efficient mitigation
strategy.

Zen3 introduced a new feature, Predictive Store Forwarding, along with a
control to disable it in sensitive code.

Add CPUID and VMCB details for all the new functionality.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Split the "Hardware features" diagnostic line
Andrew Cooper [Thu, 29 Jul 2021 10:59:22 +0000 (11:59 +0100)]
x86/spec-ctrl: Split the "Hardware features" diagnostic line

Separate the read-only hints from the features requiring active actions on
Xen's behalf.

Also take the opportunity split the IBRS/IBPB and IBPB mess.  More features
with overlapping enumeration are on the way, and and it is not useful to split
them like this.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: set policy filename on make command line
Anthony PERARD [Wed, 8 Sep 2021 12:40:00 +0000 (14:40 +0200)]
build: set policy filename on make command line

In order to avoid flask/Makefile.common calling `make xenversion`, we
override POLICY_FILENAME with the value we are going to use anyway.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/cpuid: detect null segment behaviour on Zen2 CPUs
Jane Malalane [Wed, 8 Sep 2021 12:39:18 +0000 (14:39 +0200)]
x86/cpuid: detect null segment behaviour on Zen2 CPUs

All Zen2 CPUs actually have this behaviour, but the CPUID bit couldn't
be introduced into Zen2 due to a lack of leaves. So, it was added in a
new leaf in Zen3. Nonetheless, hypervisors can synthesize the CPUID
bit in software.

So, Xen probes for NSCB (NullSelectorClearsBit) and
synthesizes the bit, if the behaviour is present.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agodomain: try to address Coverity pointing out a missing "break" in domain_teardown()
Jan Beulich [Wed, 8 Sep 2021 12:38:33 +0000 (14:38 +0200)]
domain: try to address Coverity pointing out a missing "break" in domain_teardown()

Commit 806448806264 ("xen/domain: Fix label position in
domain_teardown()" has caused Coverity to report a _new_ supposedly
un-annotated fall-through in a switch(). I find this (once again)
puzzling; I'm having an increasingly hard time figuring what patterns
the tool is actually after. I would have expected that the tool would
either have spotted an issue also before this change, or not at all. Yet
if it had spotted one before, the statistics report should have included
an eliminated instance alongside the new one (because then the issue
would simply have moved by a few lines).

Hence the only thing I could guess is that the treatment of comments in
macro expansions might be subtly different. Therefore try whether
switching the comments to the still relatively new "fallthrough" pseudo
keyword actually helps.

Coverity-ID: 1490865
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agognttab: deal with status frame mapping race
Jan Beulich [Wed, 8 Sep 2021 12:37:45 +0000 (14:37 +0200)]
gnttab: deal with status frame mapping race

Once gnttab_map_frame() drops the grant table lock, the MFN it reports
back to its caller is free to other manipulation. In particular
gnttab_unpopulate_status_frames() might free it, by a racing request on
another CPU, thus resulting in a reference to a deallocated page getting
added to a domain's P2M.

Obtain a page reference in gnttab_map_frame() to prevent freeing of the
page until xenmem_add_to_physmap_one() has actually completed its acting
on the page. Do so uniformly, even if only strictly required for v2
status pages, to avoid extra conditionals (which then would all need to
be kept in sync going forward).

This is CVE-2021-28701 / XSA-384.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86/p2m-pt: fix p2m_flags_to_access()
Jan Beulich [Tue, 7 Sep 2021 12:24:49 +0000 (14:24 +0200)]
x86/p2m-pt: fix p2m_flags_to_access()

The initial if() was inverted, invalidating all output from this
function. Which in turn means the mirroring of P2M mappings into the
IOMMU didn't always work as intended: Mappings may have got updated when
there was no need to. There would not have been too few (un)mappings;
what saves us is that alongside the flags comparison MFNs also get
compared, with non-present entries always having an MFN of 0 or
INVALID_MFN while present entries always have MFNs different from these
two (0 in the table also meant to cover INVALID_MFN):

OLD NEW
P W access MFN P W access MFN
0 0 r 0 0 0 n 0
0 1 rw 0 0 1 n 0
1 0 n non-0 1 0 r non-0
1 1 n non-0 1 1 rw non-0

present <-> non-present transitions are fine because the MFNs differ.
present -> present transitions as well as non-present -> non-present
ones are potentially causing too many map/unmap operations, but never
too few, because in that case old (bogus) and new access differ.

Fixes: d1bb6c97c31e ("IOMMU: also pass p2m_access_t to p2m_get_iommu_flags())
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/cpuid: expose NullSelectorClearsBase CPUID bit to guests
Jane Malalane [Tue, 7 Sep 2021 07:40:25 +0000 (09:40 +0200)]
x86/cpuid: expose NullSelectorClearsBase CPUID bit to guests

AMD Zen3 adds the NullSelectorClearsBase bit to indicate that loading
a NULL segment selector zeroes the base and limit fields, as well as
just attributes.

Expose bit to all guests.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/P2M: relax guarding of MMIO entries
Jan Beulich [Tue, 7 Sep 2021 07:39:38 +0000 (09:39 +0200)]
x86/P2M: relax guarding of MMIO entries

One of the changes comprising the fixes for XSA-378 disallows replacing
MMIO mappings by code paths not intended for this purpose. At least in
the case of PVH Dom0 hitting an RMRR covered by an E820 ACPI region,
this is too strict. Generally short-circuit requests establishing the
same kind of mapping (mfn, type), but allow permissions to differ.

While there, also add a log message to the other domain_crash()
invocation that did prevent PVH Dom0 from coming up after the XSA-378
changes.

Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agognttab: maptrack handle shortage is not IOMMU related
Jan Beulich [Tue, 7 Sep 2021 07:38:42 +0000 (09:38 +0200)]
gnttab: maptrack handle shortage is not IOMMU related

Both comment and message string associated with GNTST_no_device_space
suggest a connection to the IOMMU. A lack of maptrack handles has
nothing to do with that; it's unclear to me why commit 6213b696ba65
("Grant-table interface redone") introduced it this way. Introduce a
new error indicator.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agognttab: adjust unmap checking of dev_bus_addr
Jan Beulich [Tue, 7 Sep 2021 07:37:50 +0000 (09:37 +0200)]
gnttab: adjust unmap checking of dev_bus_addr

There's no point checking ->dev_bus_addr when GNTMAP_device_map isn't
set (and hence the field isn't going to be consumed). And if there is a
mismatch, use the so far unused GNTST_bad_dev_addr error indicator - if
not here, where else would this (so far unused) value be used?

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agons16550: MMIO r/o ranges are maintained at page granularity
Jan Beulich [Tue, 7 Sep 2021 07:36:59 +0000 (09:36 +0200)]
ns16550: MMIO r/o ranges are maintained at page granularity

Passing byte granular values will not have the intended effect. Address
the immediate issue, but I don't think what we do is actually
sufficient: At least some devices allow access to their registers via
either I/O ports or MMIO. In such aliasing cases we'd need to protect
the MMIO range even when we use I/O port accesses to drive the port.

Note that this way we may write-protect MMIO ranges of unrelated devices
as well. To deal with this, faults resulting from this would need
handling, to emulate the accesses outside of the protected range. (An
alternative would be to relocate the BAR, but I'm afraid this might end
up even more challenging.)

Fixes: c9f8e0aee507 ("ns16550: Add support for UART present in Broadcom TruManage capable NetXtreme chips")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agognttab: check handle early in gnttab_get_status_frames()
Jan Beulich [Tue, 7 Sep 2021 07:36:20 +0000 (09:36 +0200)]
gnttab: check handle early in gnttab_get_status_frames()

Like done in gnttab_setup_table(), check the handle once early in the
function and use the lighter-weight (for PV) copying function in the
loop.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agognttab: fold recurring is_iomem_page()
Jan Beulich [Tue, 7 Sep 2021 07:35:38 +0000 (09:35 +0200)]
gnttab: fold recurring is_iomem_page()

In all cases call the function just once instead of up to four times, at
the same time avoiding to store a dangling pointer in a local variable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agognttab: drop a redundant expression from gnttab_release_mappings()
Jan Beulich [Tue, 7 Sep 2021 07:34:57 +0000 (09:34 +0200)]
gnttab: drop a redundant expression from gnttab_release_mappings()

This gnttab_host_mapping_get_page_type() invocation sits in the "else"
path of a conditional controlled by "map->flags & GNTMAP_readonly".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agobuild: adjust arch/x86/note.o rule
Anthony PERARD [Tue, 7 Sep 2021 07:32:14 +0000 (09:32 +0200)]
build: adjust arch/x86/note.o rule

Avoid different spelling for the location of "xen-syms", and simply
use the dependency variable. This avoid the assumption about $(TARGET)
value.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: move make option changes check earlier
Anthony PERARD [Tue, 7 Sep 2021 07:31:02 +0000 (09:31 +0200)]
build: move make option changes check earlier

And thus avoiding checking for those variable over and over again.

Also, add "e.g." in the error messages to hint that "menuconfig"
isn't the only way.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: use subdir-y in test/Makefile
Anthony PERARD [Tue, 7 Sep 2021 07:30:42 +0000 (09:30 +0200)]
build: use subdir-y in test/Makefile

This allows Makefile.clean to recurse into livepatch without help.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: fix clean targets when subdir-y is used
Anthony PERARD [Tue, 7 Sep 2021 07:30:25 +0000 (09:30 +0200)]
build: fix clean targets when subdir-y is used

The make variable $(subdir-y) isn't used yet but will be in a
following patch. Anything in $(subdir-y) doesn't to have a '/' as
suffix as we already now it's a directory.

Rework the rules so that it doesn't matter whether there is a '/' or
not. It also mimic more closely to the way Linux's Kbuild descend in
subdirectories.

FORCE phony target isn't needed anymore running clean, so it can be
removed.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild,include: rework compat-build-header.py
Anthony PERARD [Tue, 7 Sep 2021 07:29:33 +0000 (09:29 +0200)]
build,include: rework compat-build-header.py

Replace a mix of shell script and python script by all python script.

No change to the final generated headers.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
3 years agobuild,include: rework compat-build-source.py
Anthony PERARD [Tue, 7 Sep 2021 07:28:43 +0000 (09:28 +0200)]
build,include: rework compat-build-source.py

Improvement are:
- give the path to xlat.lst as argument
- include `grep -v` in compat-build-source.py script, we don't need to
  write this in several scripted language.

No changes in final compat/%.h headers.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: use if_changed_rule with %.o:%.c targets
Anthony PERARD [Tue, 7 Sep 2021 07:16:45 +0000 (09:16 +0200)]
build: use if_changed_rule with %.o:%.c targets

Use $(dot-target) to have the target name prefix with a dot.

Now, when the CC command has run, it is recorded in .*.cmd
file, then if_changed_rules will compare it on subsequent runs.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: use if_changed on built_in.o
Anthony PERARD [Tue, 7 Sep 2021 07:14:32 +0000 (09:14 +0200)]
build: use if_changed on built_in.o

In the case where $(obj-y) is empty, we also replace $(c_flags) by
$(XEN_CFLAGS) to avoid generating an .%.d dependency file. This avoid
make trying to include %.h file in the ld command if $(obj-y) isn't
empty anymore on a second run.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: introduce cpp_flags macro
Anthony PERARD [Tue, 7 Sep 2021 07:14:12 +0000 (09:14 +0200)]
build: introduce cpp_flags macro

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm64: Remove vreg_emulate_sysreg32
Michal Orzel [Thu, 29 Jul 2021 10:42:58 +0000 (12:42 +0200)]
xen/arm64: Remove vreg_emulate_sysreg32

According to ARMv8A architecture, AArch64 registers are 64bit wide
even though in many cases the upper 32bit is reserved. Therefore there
is no need for function vreg_emulate_sysreg32 on arm64. This means
that we can have just one function vreg_emulate_sysreg using new
function pointer:

typedef bool (*vreg_reg_fn_t)(struct cpu_user_regs *regs,
                              register_t *r, bool read);

Modify vreg_emulate_cp32 to use the new function pointer as well.

This change allows to properly use 64bit registers in AArch64 state.
In case of AArch32 the documentation (D1.20.2, DDI 0487A.j) states
that "the upper 32 bits either become zero, or hold the value that
the same architectural register held before any AArch32 execution." As
the choice between them is IMPLEMENTATION DEFINED we cannot assume they
are zeroed. Xen should ensure that but currently it does not. This is
not a new bug and must be fixed as agreed during a discussion over this
patch.

Take the opportunity to switch CNTx_CTL_* to use UL to avoid any
surprise with the negation of any bits (as used in vtimer_cntp_ctl)

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agotests/xenstore: link in librt if necessary
Jan Beulich [Fri, 3 Sep 2021 13:10:43 +0000 (15:10 +0200)]
tests/xenstore: link in librt if necessary

Old enough glibc has clock_gettime() in librt.so, hence the library
needs to be specified to the linker. Newer glibc has the symbol
available in both libraries, so make sure that libc.so is preferred (to
avoid an unnecessary dependency on librt.so).

Fixes: 93c9edbef51b ("tests/xenstore: Rework Makefile")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agotools/libs: ROUNDUP() related adjustments
Jan Beulich [Fri, 3 Sep 2021 13:10:24 +0000 (15:10 +0200)]
tools/libs: ROUNDUP() related adjustments

For one xc_private.h needlessly repeats xen-tools/libs.h's definition.

And then there are two suspicious uses (resulting from the inconsistency
with the respective 2nd parameter of DIV_ROUNDUP()): While the one in
tools/console/daemon/io.c - as per the code comment - intentionally uses
8 as the second argument (meaning to align to a multiple of 256), the
one in alloc_magic_pages_hvm() pretty certainly does not: There the goal
is to align to a uint64_t boundary, for the following module struct to
end up aligned.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agolibxc: split xc_logdirty_control() from xc_shadow_control()
Jan Beulich [Fri, 3 Sep 2021 13:09:48 +0000 (15:09 +0200)]
libxc: split xc_logdirty_control() from xc_shadow_control()

For log-dirty operations a 64-bit field is being truncated to become an
"int" return value. Seeing the large number of arguments the present
function takes, reduce its set of parameters to that needed for all
operations not involving the log-dirty bitmap, while introducing a new
wrapper for the log-dirty bitmap operations. This new function in turn
doesn't need an "mb" parameter, but has a 64-bit return type. (Using the
return value in favor of a pointer-type parameter is left as is, to
disturb callers as little as possible.)

While altering xc_shadow_control() anyway, also adjust the types of the
last two of the remaining parameters.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agolibs/light: fix tv_sec fprintf format
Fabrice Fontaine [Sat, 28 Aug 2021 09:07:09 +0000 (11:07 +0200)]
libs/light: fix tv_sec fprintf format

Don't assume tv_sec is a unsigned long, it is 64 bits on NetBSD 32 bits.
Use %jd and cast to (intmax_t) instead

Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agox86/PVH: de-duplicate mappings for first Mb of Dom0 memory
Jan Beulich [Tue, 31 Aug 2021 15:43:36 +0000 (17:43 +0200)]
x86/PVH: de-duplicate mappings for first Mb of Dom0 memory

One of the changes comprising the fixes for XSA-378 disallows replacing
MMIO mappings by code paths not intended for this purpose. This means we
need to be more careful about the mappings put in place in this range -
mappings should be created exactly once:
- iommu_hwdom_init() comes first; it should avoid the first Mb,
- pvh_populate_p2m() should insert identity mappings only into ranges
  not populated as RAM,
- pvh_setup_acpi() should again avoid the first Mb, which was already
  dealt with at that point.

Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/EPT: drop "tm" field of EPT entry
Jan Beulich [Tue, 31 Aug 2021 15:42:28 +0000 (17:42 +0200)]
x86/EPT: drop "tm" field of EPT entry

VT-d spec 3.2 converted this bit (back) to reserved. Since there's no
use of it anywhere in the tree, simply rename it and adjust its comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agolibxenguest/x86: ensure CPUID[1].EBX[32:16] is non-zero for HVM
Jan Beulich [Mon, 30 Aug 2021 13:19:31 +0000 (15:19 +0200)]
libxenguest/x86: ensure CPUID[1].EBX[32:16] is non-zero for HVM

We unconditionally set HTT, so merely doubling the value read from
hardware isn't going to be correct if that value is zero.

Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Julien Grall <julien@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen/domain: Fix label position in domain_teardown()
Andrew Cooper [Fri, 27 Aug 2021 13:46:52 +0000 (14:46 +0100)]
xen/domain: Fix label position in domain_teardown()

As explained in the comments, a progress label wants to be before the function
it refers to for the higher level logic to make sense.  As it happens, the
effects are benign because gnttab_mappings is immediately adjacent to teardown
in terms of co-routine exit points.

There is and will always be a corner case with 0.  Help alleviate this
visually (at least slightly) with a BUILD_BUG_ON() to ensure the property
which makes this function do anything useful.

There is also a visual corner case when changing from PROGRESS() to
PROGRESS_VCPU().  The important detail is to check that there is a "return
rc;" logically between each PROGRESS*() marker.

Fixes: b1ee10be5625 ("gnttab: add preemption check to gnttab_release_mappings()")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Skip RSB overwriting when safe to do so
Andrew Cooper [Thu, 19 Aug 2021 12:53:15 +0000 (13:53 +0100)]
x86/spec-ctrl: Skip RSB overwriting when safe to do so

In some configurations, it is safe to not overwrite the RSB on entry to Xen.
Both Intel and AMD have guidelines in this area, because of the performance
difference it makes for native kernels.

A simple microperf test, measuring the amount of time a XENVER_version
hypercall takes, shows the following improvements:

  KabyLake:     -13.9175% +/- 6.85387%
  CoffeeLake-R:  -9.1183% +/- 5.04519%
  Milan:        -17.7803% +/- 1.29808%

This is best case improvement, because no real workloads are making
XENVER_version hypercalls in a tight loop.  However, this is the hypercall
used by PV kernels to force evtchn delivery if one is pending, so it is a
common hypercall to see, especially in dom0.

The avoidance of RSB-overwriting speeds up all interrupts, exceptions and
system calls from PV or Xen context.  RSB-overwriting is still required on
VMExit from HVM guests for now.

In terms of more realistic testing, LMBench in dom0 on an AMD Rome system
shows improvements across the board, with the best improvement at 8% for
simple syscall and simple write.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agognttab: avoid triggering assertion in radix_tree_ulong_to_ptr()
Jan Beulich [Fri, 27 Aug 2021 08:54:46 +0000 (10:54 +0200)]
gnttab: avoid triggering assertion in radix_tree_ulong_to_ptr()

Relevant quotes from the C11 standard:

"Except where explicitly stated otherwise, for the purposes of this
 subclause unnamed members of objects of structure and union type do not
 participate in initialization. Unnamed members of structure objects
 have indeterminate value even after initialization."

"If there are fewer initializers in a brace-enclosed list than there are
 elements or members of an aggregate, [...], the remainder of the
 aggregate shall be initialized implicitly the same as objects that have
 static storage duration."

"If an object that has static or thread storage duration is not
 initialized explicitly, then:
 [...]
 — if it is an aggregate, every member is initialized (recursively)
   according to these rules, and any padding is initialized to zero
   bits;
 [...]"

"A bit-field declaration with no declarator, but only a colon and a
 width, indicates an unnamed bit-field." Footnote: "An unnamed bit-field
 structure member is useful for padding to conform to externally imposed
 layouts."

"There may be unnamed padding within a structure object, but not at its
 beginning."

Which makes me conclude:
- Whether an unnamed bit-field member is an unnamed member or padding is
  unclear, and hence also whether the last quote above would render the
  big endian case of the structure declaration invalid.
- Whether the number of members of an aggregate includes unnamed ones is
  also not really clear.
- The initializer in map_grant_ref() initializes all fields of the "cnt"
  sub-structure of the union, so assuming the second quote above applies
  here (indirectly), the compiler isn't required to implicitly
  initialize the rest (i.e. in particular any padding) like would happen
  for static storage duration objects.

Gcc 7.4.1 can be observed (apparently in debug builds only) to translate
aforementioned initializer to a read-modify-write operation of a stack
variable, leaving unchanged the top two bits of whatever was previously
in that stack slot. Clearly if either of the two bits were set,
radix_tree_ulong_to_ptr()'s assertion would trigger.

Therefore, to be on the safe side, add an explicit padding field for the
non-big-endian-bitfields case and give a dummy name to both padding
fields.

Fixes: 9781b51efde2 ("gnttab: replace mapkind()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agognttab: drop GNTMAP_can_fail
Jan Beulich [Fri, 27 Aug 2021 08:53:48 +0000 (10:53 +0200)]
gnttab: drop GNTMAP_can_fail

There's neither documentation of what this flag is supposed to mean, nor
any implementation. Commit 4d45702cf0398 ("paging: Updates to public
grant table header file") suggests there might have been plans to use it
for interaction with mem-paging, but no such functionality has ever
materialized. With this, don't even bother enclosing the #define-s in a
__XEN_INTERFACE_VERSION__ conditional, but drop them altogether.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoAMD/IOMMU: avoid recording each level's MFN when walking page table
Jan Beulich [Fri, 27 Aug 2021 08:53:11 +0000 (10:53 +0200)]
AMD/IOMMU: avoid recording each level's MFN when walking page table

Both callers only care about the target (level 1) MFN. I also cannot
see what we might need higher level MFNs for down the road. And even
modern gcc doesn't recognize the optimization potential.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoVT-d: fix caching mode IOTLB flushing
Jan Beulich [Fri, 27 Aug 2021 08:52:15 +0000 (10:52 +0200)]
VT-d: fix caching mode IOTLB flushing

While for context cache entry flushing use of did 0 is indeed correct
(after all upon reading the context entry the IOMMU wouldn't know any
domain ID if the entry is not present, and hence a surrogate one needs
to be used), for IOTLB entries the normal domain ID (from the [present]
context entry) gets used. See sub-section "IOTLB" of section "Address
Translation Caches" in the VT-d spec.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoxen/arm: Restrict the amount of memory that dom0less domU and dom0 can allocate
Julien Grall [Wed, 25 Aug 2021 12:19:31 +0000 (14:19 +0200)]
xen/arm: Restrict the amount of memory that dom0less domU and dom0 can allocate

Currently, both dom0less domUs and dom0 can allocate an "unlimited"
amount of memory because d->max_pages is set to ~0U.

In particular, the former are meant to be unprivileged. Therefore the
memory they could allocate should be bounded. As the domain are not yet
officially aware of Xen (we don't expose advertise it in the DT, yet
the hypercalls are accessible), they should not need to allocate more
than the initial amount. So cap set d->max_pages directly the amount of
memory we are meant to allocate.

Take the opportunity to also restrict the memory for dom0 as the
domain is direct mapped (e.g. MFN == GFN) and therefore cannot
allocate outside of the pre-allocated region.

This is CVE-2021-28700 / XSA-383.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agognttab: fix array capacity check in gnttab_get_status_frames()
Jan Beulich [Wed, 25 Aug 2021 12:19:09 +0000 (14:19 +0200)]
gnttab: fix array capacity check in gnttab_get_status_frames()

The number of grant frames is of no interest here; converting the passed
in op.nr_frames this way means we allow for 8 times as many GFNs to be
written as actually fit in the array. We would corrupt xlat areas of
higher vCPU-s (after having faulted many times while trying to write to
the guard pages between any two areas) for 32-bit PV guests. For HVM
guests we'd simply crash as soon as we hit the first guard page, as
accesses to the xlat area are simply memcpy() there.

This is CVE-2021-28699 / XSA-382.

Fixes: 18b1be5e324b ("gnttab: make resource limits per domain")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
3 years agognttab: replace mapkind()
Jan Beulich [Wed, 25 Aug 2021 12:18:39 +0000 (14:18 +0200)]
gnttab: replace mapkind()

mapkind() doesn't scale very well with larger maptrack entry counts,
using a brute force linear search through all entries, with the only
option of an early loop exit if a matching writable entry was found.
Introduce a radix tree alongside the main maptrack table, thus
allowing much faster MFN-based lookup. To avoid the need to actually
allocate space for the individual nodes, encode the two counters in the
node pointers themselves, thus limiting the number of permitted
simultaneous r/o and r/w mappings of the same MFN to 2³¹-1 (64-bit) /
2¹⁵-1 (32-bit) each.

To avoid enforcing an unnecessarily low bound on the number of
simultaneous mappings of a single MFN, introduce
radix_tree_{ulong_to_ptr,ptr_to_ulong} paralleling
radix_tree_{int_to_ptr,ptr_to_int}.

As a consequence locking changes are also applicable: With there no
longer being any inspection of the remote domain's active entries,
there's also no need anymore to hold the remote domain's grant table
lock. And since we're no longer iterating over the local domain's map
track table, the lock in map_grant_ref() can also be dropped before the
new maptrack entry actually gets populated.

As a nice side effect this also reduces the number of IOMMU operations
in unmap_common(): Previously we would have "established" a readable
mapping whenever we didn't find a writable entry anymore (yet, of
course, at least one readable one). But we only need to do this if we
actually dropped the last writable entry, not if there were none already
before.

This is part of CVE-2021-28698 / XSA-380.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agognttab: add preemption check to gnttab_release_mappings()
Jan Beulich [Wed, 25 Aug 2021 12:18:18 +0000 (14:18 +0200)]
gnttab: add preemption check to gnttab_release_mappings()

A guest may die with many grant mappings still in place, or simply with
a large maptrack table. Iterating through this may take more time than
is reasonable without intermediate preemption (to run softirqs and
perhaps the scheduler).

Move the invocation of the function to the section where other
restartable functions get invoked, and have the function itself check
for preemption every once in a while. Have it iterate the table
backwards, such that decreasing the maptrack limit is all it takes to
convey restart information.

In domain_teardown() introduce PROG_none such that inserting at the
front will be easier going forward.

This is part of CVE-2021-28698 / XSA-380.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86/mm: widen locked region in xenmem_add_to_physmap_one()
Jan Beulich [Wed, 25 Aug 2021 12:17:56 +0000 (14:17 +0200)]
x86/mm: widen locked region in xenmem_add_to_physmap_one()

For pages which can be made part of the P2M by the guest, but which can
also later be de-allocated (grant table v2 status pages being the
present example), it is imperative that they be mapped at no more than a
single GFN. We therefore need to make sure that of two parallel
XENMAPSPACE_grant_table requests for the same status page one completes
before the second checks at which other GFN the underlying MFN is
presently mapped.

Pull ahead the respective get_gfn() and push down the respective
put_gfn(). This leverages that gfn_lock() really aliases p2m_lock(), but
the function makes this assumption already anyway: In the
XENMAPSPACE_gmfn case lock nesting constraints for both involved GFNs
would otherwise need to be enforced to avoid ABBA deadlocks.

This is CVE-2021-28697 / XSA-379.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86/p2m: guard (in particular) identity mapping entries
Jan Beulich [Wed, 25 Aug 2021 12:17:32 +0000 (14:17 +0200)]
x86/p2m: guard (in particular) identity mapping entries

Such entries, created by set_identity_p2m_entry(), should only be
destroyed by clear_identity_p2m_entry(). However, similarly, entries
created by set_mmio_p2m_entry() should only be torn down by
clear_mmio_p2m_entry(), so the logic gets based upon p2m_mmio_direct as
the entry type (separation between "ordinary" and 1:1 mappings would
require a further indicator to tell apart the two).

As to the guest_remove_page() change, commit 48dfb297a20a ("x86/PVH:
allow guest_remove_page to remove p2m_mmio_direct pages"), which
introduced the call to clear_mmio_p2m_entry(), claimed this was done for
hwdom only without this actually having been the case. However, this
code shouldn't be there in the first place, as MMIO entries shouldn't be
dropped this way. Avoid triggering the warning again that 48dfb297a20a
silenced by an adjustment to xenmem_add_to_physmap_one() instead.

Note that guest_physmap_mark_populate_on_demand() gets tightened beyond
the immediate purpose of this change.

Note also that I didn't inspect code which isn't security supported,
e.g. sharing, paging, or altp2m.

This is CVE-2021-28694 / part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agox86/p2m: introduce p2m_is_special()
Jan Beulich [Wed, 25 Aug 2021 12:17:07 +0000 (14:17 +0200)]
x86/p2m: introduce p2m_is_special()

Seeing the similarity of grant, foreign, and (subsequently) direct-MMIO
handling, introduce a new P2M type group named "special" (as in "needing
special accessors to create/destroy").

Also use -EPERM instead of other error codes on the two domain_crash()
paths touched.

This is part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: re-arrange exclusion range and unity map recording
Jan Beulich [Wed, 25 Aug 2021 12:16:46 +0000 (14:16 +0200)]
AMD/IOMMU: re-arrange exclusion range and unity map recording

The spec makes no provisions for OS behavior here to depend on the
amount of RAM found on the system. While the spec may not sufficiently
clearly distinguish both kinds of regions, they are surely meant to be
separate things: Only regions with ACPI_IVMD_EXCLUSION_RANGE set should
be candidates for putting in the exclusion range registers. (As there's
only a single such pair of registers per IOMMU, secondary non-adjacent
regions with the flag set already get converted to unity mapped
regions.)

First of all, drop the dependency on max_page. With commit b4f042236ae0
("AMD/IOMMU: Cease using a dynamic height for the IOMMU pagetables") the
use of it here was stale anyway; it was bogus already before, as it
didn't account for max_page getting increased later on. Simply try an
exclusion range registration first, and if it fails (for being
unsuitable or non-mergeable), register a unity mapping range.

With this various local variables become unnecessary and hence get
dropped at the same time.

With the max_page boundary dropped for using unity maps, the minimum
page table tree height now needs both recording and enforcing in
amd_iommu_domain_init(). Since we can't predict which devices may get
assigned to a domain, our only option is to uniformly force at least
that height for all domains, now that the height isn't dynamic anymore.

Further don't make use of the exclusion range unless ACPI data says so.

Note that exclusion range registration in
register_range_for_all_devices() is on a best effort basis. Hence unity
map entries also registered are redundant when the former succeeded, but
they also do no harm. Improvements in this area can be done later imo.

Also adjust types where suitable without touching extra lines.

This is part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: re-arrange/complete re-assignment handling
Jan Beulich [Wed, 25 Aug 2021 12:16:26 +0000 (14:16 +0200)]
AMD/IOMMU: re-arrange/complete re-assignment handling

Prior to the assignment step having completed successfully, devices
should not get associated with their new owner. Hand the device to DomIO
(perhaps temporarily), until after the de-assignment step has completed.

De-assignment of a device (from other than Dom0) as well as failure of
reassign_device() during assignment should result in unity mappings
getting torn down. This in turn requires switching to a refcounted
mapping approach, as was already used by VT-d for its RMRRs, to prevent
unmapping a region used by multiple devices.

This is CVE-2021-28696 / part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoIOMMU: generalize VT-d's tracking of mapped RMRR regions
Jan Beulich [Wed, 25 Aug 2021 12:15:57 +0000 (14:15 +0200)]
IOMMU: generalize VT-d's tracking of mapped RMRR regions

In order to re-use it elsewhere, move the logic to vendor independent
code and strip it of RMRR specifics.

Note that the prior "map" parameter gets folded into the new "p2ma" one
(which AMD IOMMU code will want to make use of), assigning alternative
meaning ("unmap") to p2m_access_x. Prepare set_identity_p2m_entry() and
p2m_get_iommu_flags() for getting passed access types other than
p2m_access_rw (in the latter case just for p2m_mmio_direct requests).

Note also that, to be on the safe side, an overlap check gets added to
the main loop of iommu_identity_mapping().

This is part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoIOMMU: also pass p2m_access_t to p2m_get_iommu_flags()
Jan Beulich [Wed, 25 Aug 2021 12:15:32 +0000 (14:15 +0200)]
IOMMU: also pass p2m_access_t to p2m_get_iommu_flags()

A subsequent change will want to customize the IOMMU permissions based
on this.

This is part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: correct device unity map handling
Jan Beulich [Wed, 25 Aug 2021 12:15:11 +0000 (14:15 +0200)]
AMD/IOMMU: correct device unity map handling

Blindly assuming all addresses between any two such ranges, specified by
firmware in the ACPI tables, should also be unity-mapped can't be right.
Nor can it be correct to merge ranges with differing permissions. Track
ranges individually; don't merge at all, but check for overlaps instead.
This requires bubbling up error indicators, such that IOMMU init can be
failed when allocation of a new tracking struct wasn't possible, or an
overlap was detected.

At this occasion also stop ignoring
amd_iommu_reserve_domain_unity_map()'s return value.

This is part of XSA-378 / CVE-2021-28695.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: correct global exclusion range extending
Jan Beulich [Wed, 25 Aug 2021 12:12:13 +0000 (14:12 +0200)]
AMD/IOMMU: correct global exclusion range extending

Besides unity mapping regions, the AMD IOMMU spec also provides for
exclusion ranges (areas of memory not to be subject to DMA translation)
to be specified by firmware in the ACPI tables. The spec does not put
any constraints on the number of such regions.

Blindly assuming all addresses between any two such ranges should also
be excluded can't be right. Since hardware has room for just a single
such range (comprised of the Exclusion Base Register and the Exclusion
Range Limit Register), combine only adjacent or overlapping regions (for
now; this may require further adjustment in case table entries aren't
sorted by address) with matching exclusion_allow_all settings. This
requires bubbling up error indicators, such that IOMMU init can be
failed when concatenation wasn't possible.

Furthermore, since the exclusion range specified in IOMMU registers
implies R/W access, reject requests asking for less permissions (this
will be brought closer to the spec by a subsequent change).

This is part of XSA-378 / CVE-2021-28695.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoxen/public: arch-arm: Add mention of argo_op hypercall
Michal Orzel [Fri, 20 Aug 2021 09:39:24 +0000 (11:39 +0200)]
xen/public: arch-arm: Add mention of argo_op hypercall

Commit 1ddc0d43c20cb1c1125d4d6cefc78624b2a9ccb7 introducing
argo_op hypercall forgot to add a mention of it in the
comment listing supported hypercalls. Fix that.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/arm: smmu: Set/clear IOMMU domain for device
Oleksandr Andrushchenko [Wed, 18 Aug 2021 05:22:02 +0000 (08:22 +0300)]
xen/arm: smmu: Set/clear IOMMU domain for device

When a device is assigned/de-assigned it is required to properly set
IOMMU domain used to protect the device. This assignment was missing,
thus it was not possible to de-assign the device:

(XEN) Deassigning device 0000:03:00.0 from dom2
(XEN) smmu: 0000:03:00.0:  not attached to domain 2
(XEN) d2: deassign (0000:03:00.0) failed (-3)

Fix this by assigning IOMMU domain on arm_smmu_assign_dev and reset it
to NULL on arm_smmu_deassign_dev.

Fixes: 06d1f7a278dd ("xen/arm: smmuv1: Keep track of S2CR state")
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agons16550: properly gate Exar PCIe UART cards support
Oleksandr Andrushchenko [Fri, 20 Aug 2021 14:18:12 +0000 (16:18 +0200)]
ns16550: properly gate Exar PCIe UART cards support

Arm is about to get PCI passthrough support which means CONFIG_HAS_PCI
will be enabled, so this code will fail as Arm doesn't have ns16550
PCI support:

ns16550.c:313:5: error: implicit declaration of function 'enable_exar_enhanced_bits' [-Werror=implicit-function-declaration]
  313 |     enable_exar_enhanced_bits(uart);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by gating Exar PCIe UART cards support with the above in mind.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoAMD/IOMMU: don't increase perms when splitting superpage
Jan Beulich [Fri, 20 Aug 2021 10:31:08 +0000 (12:31 +0200)]
AMD/IOMMU: don't increase perms when splitting superpage

The old (super)page's permissions ought to be propagated, rather than
blindly allowing both reads and writes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: don't leave page table mapped when unmapping ...
Jan Beulich [Fri, 20 Aug 2021 10:30:35 +0000 (12:30 +0200)]
AMD/IOMMU: don't leave page table mapped when unmapping ...

... an already not mapped page. With all other exit paths doing the
unmap, I have no idea how I managed to miss that aspect at the time.

Fixes: ad591454f069 ("AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agons16550: add Exar PCIe UART cards support
Marek Marczykowski-Górecki [Fri, 20 Aug 2021 10:29:45 +0000 (12:29 +0200)]
ns16550: add Exar PCIe UART cards support

Besides standard UART setup, this device needs enabling
(vendor-specific) "Enhanced Control Bits" - otherwise disabling hardware
control flow (MCR[2]) is ignored. Add appropriate quirk to the
ns16550_setup_preirq(), similar to the handle_dw_usr_busy_quirk(). The
new function act on Exar 2-, 4-, and 8- port cards only. I have tested
the functionality on 2-port card but based on the Linux driver, the same
applies to other models too.

Additionally, Exar card supports fractional divisor (DLD[3:0] register,
at 0x02). This part is not supported here yet, and seems to not
be required for working 115200bps at the very least.

The specification for the 2-port card is available at:
https://www.maxlinear.com/product/interface/uarts/pcie-uarts/xr17v352

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agons16550: specify uart param for ns_{read,write}_reg as const
Marek Marczykowski-Górecki [Fri, 20 Aug 2021 10:29:05 +0000 (12:29 +0200)]
ns16550: specify uart param for ns_{read,write}_reg as const

They don't modify it, after all.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/PV: account for 32-bit Dom0 in mark_pv_pt_pages_rdonly()'s ASSERT()s
Jan Beulich [Fri, 20 Aug 2021 10:28:07 +0000 (12:28 +0200)]
x86/PV: account for 32-bit Dom0 in mark_pv_pt_pages_rdonly()'s ASSERT()s

Clearly I neglected the special needs here, and also failed to test the
change with a debug build of Xen.

Fixes: 6b1ca51b1a91 ("x86/PV: assert page state in mark_pv_pt_pages_rdonly()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agolibs/guest: Move the guest ABI check earlier into xc_dom_parse_image()
Jane Malalane [Tue, 17 Aug 2021 15:19:24 +0000 (16:19 +0100)]
libs/guest: Move the guest ABI check earlier into xc_dom_parse_image()

Xen may not support 32-bit PV guest for a number of reasons (lack of
CONFIG_PV32, explicit pv=no-32 command line argument, or implicitly
due to CET being enabled) and advertises this to the toolstack via the
absence of xen-3.0-x86_32p ABI.

Currently, when trying to boot a 32-bit PV guest, the ABI check is too
late and the build explodes in the following manner yielding an
unhelpful error message:

  xc: error: panic: xg_dom_boot.c:121: xc_dom_boot_mem_init: can't allocate low memory for domain: Out of memory
  libxl: error: libxl_dom.c:586:libxl__build_dom: xc_dom_boot_mem_init failed: Operation not supported
  libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
  libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 1:Non-existant domain
  libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 1:Unable to destroy guest
  libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 1:Destruction of domain failed

Move the ABI check earlier into xc_dom_parse_image() along with other
ELF-note feature checks.  With this adjustment, it now looks like
this:

  xc: error: panic: xg_dom_boot.c:88: xc_dom_compat_check: guest type xen-3.0-x86_32p not supported by xen kernel, sorry: Invalid kernel
  libxl: error: libxl_dom.c:571:libxl__build_dom: xc_dom_parse_image failed
  domainbuilder: detail: xc_dom_release: called
  libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 11:cannot (re-)build domain: -3
  libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 11:Non-existant domain
  libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 11:Unable to destroy guest
  libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 11:Destruction of domain failed

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agoxen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
Juergen Gross [Thu, 19 Aug 2021 11:38:31 +0000 (13:38 +0200)]
xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume

With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.

Fix that by letting get_cpu_idle_time() deal with this case. Drop a
redundant check in exchange.

Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
3 years agoArm: relax iomem_access_permitted() check
Jan Beulich [Thu, 19 Aug 2021 11:37:42 +0000 (13:37 +0200)]
Arm: relax iomem_access_permitted() check

Ranges checked by iomem_access_permitted() are inclusive; to permit a
mapping there's no need for access to also have been granted for the
subsequent page.

Fixes: 80f9c3167084 ("xen/arm: acpi: Map MMIO on fault in stage-2 page table for the hardware domain")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86: mark compat hypercall regs clobbering for intended fall-through
Jan Beulich [Thu, 19 Aug 2021 11:36:54 +0000 (13:36 +0200)]
x86: mark compat hypercall regs clobbering for intended fall-through

Oddly enough in the original report Coverity only complained about the
native hypercall related switch() statements. Now that it has seen those
fixed, it complains about (only HVM) compat ones. Hence the CIDs below
are all for the HVM side of things, yet while at it take care of the PV
side as well.

Coverity-ID: 14871051487106148710714871081487109.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoVT-d: Tylersburg errata apply to further steppings
Jan Beulich [Wed, 18 Aug 2021 07:44:14 +0000 (09:44 +0200)]
VT-d: Tylersburg errata apply to further steppings

While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
spec update, X58's also mentions B2, and searching the internet suggests
systems with this stepping are actually in use. Even worse, for X58
erratum #69 is marked applicable even to C2. Split the check to cover
all applicable steppings and to also report applicable errata numbers in
the log message. The splitting requires using the DMI port instead of
the System Management Registers device, but that's then in line (also
revision checking wise) with the spec updates.

Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agox86/PV: assert page state in mark_pv_pt_pages_rdonly()
Jan Beulich [Wed, 18 Aug 2021 07:40:08 +0000 (09:40 +0200)]
x86/PV: assert page state in mark_pv_pt_pages_rdonly()

About every time I look at dom0_construct_pv()'s "calculation" of
nr_pt_pages I question (myself) whether the result is precise or merely
an upper bound. I think it is meant to be precise, but I think we would
be better off having some checking in place. Hence add ASSERT()s to
verify that
- all pages have a valid L1...Ln (currently L4) page table type and
- no other bits are set, in particular the type refcount is still zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com>
3 years agox86/PV: suppress unnecessary Dom0 construction output
Jan Beulich [Wed, 18 Aug 2021 07:39:08 +0000 (09:39 +0200)]
x86/PV: suppress unnecessary Dom0 construction output

v{xenstore,console}_{start,end} can only ever be zero in PV shim
configurations. Similarly reporting just zeros for an unmapped (or
absent) initrd is not useful. Particularly in case video is the only
output configured, space is scarce: Split the printk() and omit lines
carrying no information at all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/cet: Fix build on newer versions of GCC
Andrew Cooper [Tue, 17 Aug 2021 10:38:07 +0000 (11:38 +0100)]
x86/cet: Fix build on newer versions of GCC

Some versions of GCC complain with:

  traps.c:405:22: error: 'get_shstk_bottom' defined but not used [-Werror=unused-function]
   static unsigned long get_shstk_bottom(unsigned long sp)
                        ^~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Change #ifdef to if ( IS_ENABLED(...) ) to make the sole user of
get_shstk_bottom() visible to the compiler.

Fixes: 35727551c070 ("x86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Compile-tested-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
3 years agox86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}
Andrew Cooper [Thu, 12 Aug 2021 16:39:16 +0000 (17:39 +0100)]
x86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}

This was a clear oversight in the original CET work.  The BUGFRAME_run_fn and
BUGFRAME_warn paths update regs->rip without an equivalent adjustment to the
shadow stack, causing IRET to suffer #CP because of the mismatch.

One subtle, and therefore fragile, aspect of extable_shstk_fixup() was that it
required regs->rip to have its old value as a cross-check that the right word
in the shadow stack was being edited.

Rework extable_shstk_fixup() into fixup_exception_return() which takes
ownership of the update to both the regular and shadow stacks, ensuring that
the regs->rip update is ordered correctly.

Use the new fixup_exception_return() for BUGFRAME_run_fn and BUGFRAME_warn to
ensure that the shadow stack is updated too.

Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/ACPI: Insert missing newlines into FACS error messages
Andrew Cooper [Mon, 16 Aug 2021 13:24:44 +0000 (14:24 +0100)]
x86/ACPI: Insert missing newlines into FACS error messages

Booting Xen as a PVH guest currently yields:

  (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:b004,1:0], pm1x_evt[1:b000,1:0]
  (XEN) ACPI: FACS is not 64-byte aligned: 0xfc001010<2>ACPI: wakeup_vec[fc00101c], vec_size[20]
  (XEN) ACPI: Local APIC address 0xfee00000

Insert newlines as appropriate.

Fixes: d3faf9badf52 ("[host s3] Retrieve necessary sleep information from plain-text ACPI tables (FADT/FACS), and keep one hypercall remained for sleep notification.")
Fixes: 0f089bbf43ec ("x86/ACPI: fix S3 wakeup vector mapping")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoMAINTAINERS: Fix file path for kexec headers
Andrew Cooper [Thu, 12 Aug 2021 13:49:57 +0000 (14:49 +0100)]
MAINTAINERS: Fix file path for kexec headers

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/ioapic: remove use of TRUE/FALSE/1/0
Kevin Stefanov [Mon, 16 Aug 2021 13:16:56 +0000 (15:16 +0200)]
x86/ioapic: remove use of TRUE/FALSE/1/0

Also fix stray usage in VT-d.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Kevin Stefanov <kevin.stefanov@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/pv: provide more helpful error when CONFIG_PV32 is absent
Jane Malalane [Mon, 16 Aug 2021 13:16:20 +0000 (15:16 +0200)]
x86/pv: provide more helpful error when CONFIG_PV32 is absent

Currently, when booting a 32bit dom0 kernel, the message isn't very
helpful:

  (XEN)  Xen  kernel: 64-bit, lsb
  (XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0x100000 -> 0x112000
  (XEN) Mismatch between Xen and DOM0 kernel
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) Could not construct domain 0
  (XEN) ****************************************

With this adjustment, it now looks like this:

  (XEN)  Xen  kernel: 64-bit, lsb
  (XEN) Found 32-bit PV kernel, but CONFIG_PV32 missing
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) Could not construct domain 0
  (XEN) ****************************************

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/pv: remove unnecessary use of goto out in construct_dom0()
Jane Malalane [Mon, 16 Aug 2021 13:15:43 +0000 (15:15 +0200)]
x86/pv: remove unnecessary use of goto out in construct_dom0()

elf_check_broken() only needs to be invoked after elf_xen_parse() and
after elf_load_binary().

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agons16550: do not override fifo size if explicitly set
Marek Marczykowski-Górecki [Mon, 16 Aug 2021 13:14:37 +0000 (15:14 +0200)]
ns16550: do not override fifo size if explicitly set

If fifo size is already set via uart_params, do not force it to 16 - which
may not match the actual hardware. Specifically Exar cards have fifo of
256 bytes.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agolibxc: simplify HYPERCALL_BUFFER()
Jan Beulich [Fri, 13 Aug 2021 14:50:09 +0000 (16:50 +0200)]
libxc: simplify HYPERCALL_BUFFER()

_hcbuf_buf1 has been there only for a pointer comparison to validate
type compatibility. The same can be achieved by not using typeof() on
the definition of what so far was _hcbuf_buf2, as the initializer has
to also be type-compatible. Drop _hcbuf_buf1 and the comaprison;
rename _hcbuf_buf2.

Since we're already using compiler extensions here, don't be shy and
also omit the middle operand of the involved ?: operator.

Bring line continuation character placement in line with that of
related macros.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agolibxenguest: fix off-by-1 in colo-secondary-bitmap merging
Jan Beulich [Fri, 13 Aug 2021 14:49:46 +0000 (16:49 +0200)]
libxenguest: fix off-by-1 in colo-secondary-bitmap merging

Valid GFNs (having a representation in the dirty bitmap) need to be
strictly below p2m_size.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agolibxenguest: complete loops in xc_map_domain_meminfo()
Jan Beulich [Fri, 13 Aug 2021 14:49:10 +0000 (16:49 +0200)]
libxenguest: complete loops in xc_map_domain_meminfo()

minfo->p2m_size may have more than 31 significant bits. Change the
induction variable to unsigned long, and (largely for signed-ness
consistency) a helper variable to unsigned int. While there also avoid
open-coding min().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen/bitmap: don't open code DIV_ROUND_UP()
Jane Malalane [Thu, 12 Aug 2021 15:14:25 +0000 (17:14 +0200)]
xen/bitmap: don't open code DIV_ROUND_UP()

Also, change bitmap_long_to_byte() and bitmap_byte_to_long() to take
'unsigned int' instead of 'int' number of bits, to match the type of
their callers.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agokexec: remove use of TRUE/FALSE
Kevin Stefanov [Thu, 12 Aug 2021 15:10:23 +0000 (17:10 +0200)]
kexec: remove use of TRUE/FALSE

Whilst fixing this, also changed bool_t to bool, and use __read_mostly.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Kevin Stefanov <kevin.stefanov@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobitmap: make bitmap_long_to_byte() and bitmap_byte_to_long() static
Jane Malalane [Tue, 10 Aug 2021 07:29:52 +0000 (09:29 +0200)]
bitmap: make bitmap_long_to_byte() and bitmap_byte_to_long() static

Functions made static as there are no external callers.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agocredit2: avoid picking a spurious idle unit when caps are used
Dario Faggioli [Tue, 10 Aug 2021 07:29:10 +0000 (09:29 +0200)]
credit2: avoid picking a spurious idle unit when caps are used

Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.

In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have enough budget for being run,
we should continue looking instead than giving up and picking the idle
unit.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: remove unneeded deps of x86_emulate.o
Anthony PERARD [Tue, 10 Aug 2021 07:28:31 +0000 (09:28 +0200)]
build: remove unneeded deps of x86_emulate.o

Those two dependencies already exist so make doesn't need to know
about them. The dependency will be generated by $(CC).

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: rework .banner generation
Anthony PERARD [Tue, 10 Aug 2021 07:27:13 +0000 (09:27 +0200)]
build: rework .banner generation

Avoid depending on Makefile but still allow to rebuild the banner when
$(XEN_FULLVERSION) changes.

Also add a dependency on tools/xen.flf, even if not expected to
change.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: Do not invalidate the P2M when the PT is shared with the IOMMU
Stefano Stabellini [Wed, 4 Aug 2021 20:57:07 +0000 (13:57 -0700)]
xen/arm: Do not invalidate the P2M when the PT is shared with the IOMMU

Set/Way flushes never work correctly in a virtualized environment.

Our current implementation is based on clearing the valid bit in the p2m
pagetable to track guest memory accesses. This technique doesn't work
when the IOMMU is enabled for the domain and the pagetable is shared
between IOMMU and MMU because it triggers IOMMU faults.

Specifically, p2m_invalidate_root causes IOMMU faults if
iommu_use_hap_pt returns true for the domain.

Add a check in p2m_set_way_flush: if a set/way instruction is used
and iommu_use_hap_pt returns true, rather than failing with obscure
IOMMU faults, inject an undef exception straight away into the guest,
and print a verbose error message to explain the problem.

Also add an ASSERT in p2m_invalidate_root to make sure we don't
inadvertently stumble across this problem again in the future.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoarm,smmu: add support for generic DT bindings. Implement add_device and dt_xlate.
Brian Woods [Tue, 3 Aug 2021 00:24:09 +0000 (17:24 -0700)]
arm,smmu: add support for generic DT bindings. Implement add_device and dt_xlate.

For the legacy path, arm_smmu_dt_add_device_legacy is called by
register_smmu_master scanning mmu-masters (a fwspec entry is also
created.) For the generic path, arm_smmu_dt_add_device_generic gets
called instead. Then, arm_smmu_dt_add_device_generic calls
arm_smmu_dt_add_device_legacy afterwards, shared with the legacy path.
This way most of the low level implementation is shared between the two
paths.

If both legacy bindings and generic bindings are present in device tree,
the legacy bindings are the ones that are used. That's because
mmu-masters is parsed by
xen/drivers/passthrough/arm/smmu.c:arm_smmu_device_dt_probe which is
called by arm_smmu_dt_init. It happens very early. iommus is parsed by
xen/drivers/passthrough/device_tree.c:iommu_add_dt_device which is
called by xen/arch/arm/domain_build.c:handle_device and happens
afterwards.

arm_smmu_dt_xlate_generic is a verbatim copy from Linux
(drivers/iommu/arm/arm-smmu/arm-smmu.c:arm_smmu_of_xlate, version
v5.10).

A workaround was introduced by cf4af9d6d6c (xen/arm: boot with device
trees with "mmu-masters" and "iommus") because the SMMU driver only
supported the legacy bindings. Remove it now.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoarm,smmu: restructure code in preparation to new bindings support
Brian Woods [Tue, 3 Aug 2021 00:24:08 +0000 (17:24 -0700)]
arm,smmu: restructure code in preparation to new bindings support

Restructure some of the code and add supporting functions for adding
generic device tree (DT) binding support.  This will allow for using
current Linux device trees with just modifying the chosen field to
enable Xen.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoarm,smmu: switch to using iommu_fwspec functions
Brian Woods [Tue, 3 Aug 2021 00:24:06 +0000 (17:24 -0700)]
arm,smmu: switch to using iommu_fwspec functions

Modify the smmu driver so that it uses the iommu_fwspec helper
functions.  This means both ARM IOMMU drivers will both use the
iommu_fwspec helper functions, making enabling generic device tree
bindings in the SMMU driver much cleaner.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoxen: do not return -EEXIST if iommu_add_dt_device is called twice
Stefano Stabellini [Tue, 3 Aug 2021 00:24:07 +0000 (17:24 -0700)]
xen: do not return -EEXIST if iommu_add_dt_device is called twice

iommu_add_dt_device() returns -EEXIST if the device was already
registered. At the moment, this can only happen if the device was
already assigned to a domain (either dom0 at boot or via
XEN_DOMCTL_assign_device).

In a follow-up patch, we will convert the SMMU driver to use the FW
spec. When the legacy bindings are used, all the devices will be
registered at probe. Therefore, iommu_add_dt_device() will always
returns -EEXIST.

Currently, one caller (XEN_DOMCTL_assign_device) will check the return
and ignore -EEXIST. All the other will fail because it was technically a
programming error.

However, there is no harm to call iommu_add_dt_device() twice, so we can
simply return 0.

With that in place the caller doesn't need to check -EEXIST anymore, so
remove the check.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agotools/xenstored: Don't assume errno will not be overwritten in lu_arch()
Julien Grall [Fri, 30 Jul 2021 15:14:14 +0000 (16:14 +0100)]
tools/xenstored: Don't assume errno will not be overwritten in lu_arch()

At the moment, do_control_lu() will set errno to 0 before calling
lu_arch() and then check errno. The expectation is nothing in lu_arch()
will change the value unless there is an error.

However, per errno(3), a function that succeeds is allowed to change
errno. In fact, syslog() will overwrite errno if the logs are rotated
at the time it is called.

To prevent any further issue, errno is now always set before
returning NULL.

Additionally, errno is only checked when returning NULL so the client
can see the error message if there is any.

Reported-by: Michael Kurth <mku@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agotools/xenstored: Propagate correctly the error message from lu_start()
Julien Grall [Thu, 29 Jul 2021 11:06:02 +0000 (12:06 +0100)]
tools/xenstored: Propagate correctly the error message from lu_start()

lu_start() will only set errno when it returns NULL. For all the
other cases, the value is unknown.

This means that when lu_start() returns an error message, it may not
be propagated to the client.

The check that errno is a non-zero value is now dropped and instead
the value is returned when no error message is provided. This
relies on errno to always be set when ret == NULL.

Fixes: af216a99fb ("tools/xenstore: add the basic framework for doing the live update")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agotools/xenstored: Fix off-by-one in dump_state_nodes()
Julien Grall [Thu, 29 Jul 2021 09:34:20 +0000 (10:34 +0100)]
tools/xenstored: Fix off-by-one in dump_state_nodes()

The maximum path length supported by Xenstored protocol is
XENSTORE_ABS_PATH_MAX (i.e 3072). This doesn't take into account the
NUL at the end of the path.

However, the code to dump the nodes will allocate a buffer
of XENSTORE_ABS_PATH. As a result it may not be possible to live-update
if there is a node name of XENSTORE_ABS_PATH.

Fix it by allocating a buffer of XENSTORE_ABS_PATH_MAX + 1 characters.

Take the opportunity to pass the max length of the buffer as a
parameter of dump_state_node_tree(). This will be clearer that the
check in the function is linked to the allocation in dump_state_nodes().

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agoxen/lib: Fix strcmp() and strncmp()
Jane Malalane [Tue, 27 Jul 2021 18:47:15 +0000 (19:47 +0100)]
xen/lib: Fix strcmp() and strncmp()

The C standard requires that each character be compared as unsigned
char. Xen's current behaviour compares as signed char, which changes
the answer when chars with a value greater than 0x7f are used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
3 years agox86: work around build issue with GNU ld 2.37
Jan Beulich [Thu, 22 Jul 2021 09:20:38 +0000 (11:20 +0200)]
x86: work around build issue with GNU ld 2.37

I suspect it is commit 40726f16a8d7 ("ld script expression parsing")
which broke the hypervisor build, by no longer accepting section names
with a dash in them inside ADDR() (and perhaps other script directives
expecting just a section name, not an expression): .note.gnu.build-id
is such a section.

Quoting all section names passed to ADDR() via DECL_SECTION() works
around the regression.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/libxl: add missing blank in message
Alan Robinson [Tue, 27 Jul 2021 07:47:03 +0000 (09:47 +0200)]
tools/libxl: add missing blank in message

Add missing blank giving "an emulation" instead of "anemulation"
while making the text a single source line.

Signed-off-by: Alan Robinson <alan.robinson@fujitsu.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agotools/firmware/ovmf: Use OvmfXen platform file if exist and update OVMF
Anthony PERARD [Mon, 19 Jul 2021 13:48:45 +0000 (14:48 +0100)]
tools/firmware/ovmf: Use OvmfXen platform file if exist and update OVMF

A platform introduced in EDK II named OvmfXen is now the one to use for
Xen instead of OvmfX64. It comes with PVH support.

Also, the Xen support in OvmfX64 is deprecated,
    "deprecation notice: *dynamic* multi-VMM (QEMU vs. Xen) support in OvmfPkg"
    https://edk2.groups.io/g/devel/message/75498
and has been removed upstream.

We need to also update to a newer version of OVMF as OvmfXen in the
release "edk2-stable202105" doesn't work well with Xen, so we need the
fix b37cfdd28071 ("OvmfPkg/XenPlatformPei: Relocate shared_info page
mapping").

Also, don't set anymore the number of thread for parallel build when
building the newer platform, OvmfPkg/build.sh is now doing parallel
build by default.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agotools/xl: Add stubdomain_cmdline option to xl.cfg
Scott Davis [Thu, 22 Jul 2021 16:54:30 +0000 (12:54 -0400)]
tools/xl: Add stubdomain_cmdline option to xl.cfg

This adds an option to the xl domain configuration file syntax for specifying
a kernel command line for device-model stubdomains. It is intended for use with
Linux-based stubdomains.

Signed-off-by: Scott Davis <scott.davis@starlab.io>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <iwj@xenproject.org>