]> xenbits.xensource.com Git - xen.git/log
xen.git
3 years agognttab: fold recurring is_iomem_page()
Jan Beulich [Tue, 7 Sep 2021 07:35:38 +0000 (09:35 +0200)]
gnttab: fold recurring is_iomem_page()

In all cases call the function just once instead of up to four times, at
the same time avoiding to store a dangling pointer in a local variable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agognttab: drop a redundant expression from gnttab_release_mappings()
Jan Beulich [Tue, 7 Sep 2021 07:34:57 +0000 (09:34 +0200)]
gnttab: drop a redundant expression from gnttab_release_mappings()

This gnttab_host_mapping_get_page_type() invocation sits in the "else"
path of a conditional controlled by "map->flags & GNTMAP_readonly".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agobuild: adjust arch/x86/note.o rule
Anthony PERARD [Tue, 7 Sep 2021 07:32:14 +0000 (09:32 +0200)]
build: adjust arch/x86/note.o rule

Avoid different spelling for the location of "xen-syms", and simply
use the dependency variable. This avoid the assumption about $(TARGET)
value.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: move make option changes check earlier
Anthony PERARD [Tue, 7 Sep 2021 07:31:02 +0000 (09:31 +0200)]
build: move make option changes check earlier

And thus avoiding checking for those variable over and over again.

Also, add "e.g." in the error messages to hint that "menuconfig"
isn't the only way.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: use subdir-y in test/Makefile
Anthony PERARD [Tue, 7 Sep 2021 07:30:42 +0000 (09:30 +0200)]
build: use subdir-y in test/Makefile

This allows Makefile.clean to recurse into livepatch without help.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: fix clean targets when subdir-y is used
Anthony PERARD [Tue, 7 Sep 2021 07:30:25 +0000 (09:30 +0200)]
build: fix clean targets when subdir-y is used

The make variable $(subdir-y) isn't used yet but will be in a
following patch. Anything in $(subdir-y) doesn't to have a '/' as
suffix as we already now it's a directory.

Rework the rules so that it doesn't matter whether there is a '/' or
not. It also mimic more closely to the way Linux's Kbuild descend in
subdirectories.

FORCE phony target isn't needed anymore running clean, so it can be
removed.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild,include: rework compat-build-header.py
Anthony PERARD [Tue, 7 Sep 2021 07:29:33 +0000 (09:29 +0200)]
build,include: rework compat-build-header.py

Replace a mix of shell script and python script by all python script.

No change to the final generated headers.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
3 years agobuild,include: rework compat-build-source.py
Anthony PERARD [Tue, 7 Sep 2021 07:28:43 +0000 (09:28 +0200)]
build,include: rework compat-build-source.py

Improvement are:
- give the path to xlat.lst as argument
- include `grep -v` in compat-build-source.py script, we don't need to
  write this in several scripted language.

No changes in final compat/%.h headers.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: use if_changed_rule with %.o:%.c targets
Anthony PERARD [Tue, 7 Sep 2021 07:16:45 +0000 (09:16 +0200)]
build: use if_changed_rule with %.o:%.c targets

Use $(dot-target) to have the target name prefix with a dot.

Now, when the CC command has run, it is recorded in .*.cmd
file, then if_changed_rules will compare it on subsequent runs.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: use if_changed on built_in.o
Anthony PERARD [Tue, 7 Sep 2021 07:14:32 +0000 (09:14 +0200)]
build: use if_changed on built_in.o

In the case where $(obj-y) is empty, we also replace $(c_flags) by
$(XEN_CFLAGS) to avoid generating an .%.d dependency file. This avoid
make trying to include %.h file in the ld command if $(obj-y) isn't
empty anymore on a second run.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: introduce cpp_flags macro
Anthony PERARD [Tue, 7 Sep 2021 07:14:12 +0000 (09:14 +0200)]
build: introduce cpp_flags macro

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm64: Remove vreg_emulate_sysreg32
Michal Orzel [Thu, 29 Jul 2021 10:42:58 +0000 (12:42 +0200)]
xen/arm64: Remove vreg_emulate_sysreg32

According to ARMv8A architecture, AArch64 registers are 64bit wide
even though in many cases the upper 32bit is reserved. Therefore there
is no need for function vreg_emulate_sysreg32 on arm64. This means
that we can have just one function vreg_emulate_sysreg using new
function pointer:

typedef bool (*vreg_reg_fn_t)(struct cpu_user_regs *regs,
                              register_t *r, bool read);

Modify vreg_emulate_cp32 to use the new function pointer as well.

This change allows to properly use 64bit registers in AArch64 state.
In case of AArch32 the documentation (D1.20.2, DDI 0487A.j) states
that "the upper 32 bits either become zero, or hold the value that
the same architectural register held before any AArch32 execution." As
the choice between them is IMPLEMENTATION DEFINED we cannot assume they
are zeroed. Xen should ensure that but currently it does not. This is
not a new bug and must be fixed as agreed during a discussion over this
patch.

Take the opportunity to switch CNTx_CTL_* to use UL to avoid any
surprise with the negation of any bits (as used in vtimer_cntp_ctl)

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agotests/xenstore: link in librt if necessary
Jan Beulich [Fri, 3 Sep 2021 13:10:43 +0000 (15:10 +0200)]
tests/xenstore: link in librt if necessary

Old enough glibc has clock_gettime() in librt.so, hence the library
needs to be specified to the linker. Newer glibc has the symbol
available in both libraries, so make sure that libc.so is preferred (to
avoid an unnecessary dependency on librt.so).

Fixes: 93c9edbef51b ("tests/xenstore: Rework Makefile")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agotools/libs: ROUNDUP() related adjustments
Jan Beulich [Fri, 3 Sep 2021 13:10:24 +0000 (15:10 +0200)]
tools/libs: ROUNDUP() related adjustments

For one xc_private.h needlessly repeats xen-tools/libs.h's definition.

And then there are two suspicious uses (resulting from the inconsistency
with the respective 2nd parameter of DIV_ROUNDUP()): While the one in
tools/console/daemon/io.c - as per the code comment - intentionally uses
8 as the second argument (meaning to align to a multiple of 256), the
one in alloc_magic_pages_hvm() pretty certainly does not: There the goal
is to align to a uint64_t boundary, for the following module struct to
end up aligned.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agolibxc: split xc_logdirty_control() from xc_shadow_control()
Jan Beulich [Fri, 3 Sep 2021 13:09:48 +0000 (15:09 +0200)]
libxc: split xc_logdirty_control() from xc_shadow_control()

For log-dirty operations a 64-bit field is being truncated to become an
"int" return value. Seeing the large number of arguments the present
function takes, reduce its set of parameters to that needed for all
operations not involving the log-dirty bitmap, while introducing a new
wrapper for the log-dirty bitmap operations. This new function in turn
doesn't need an "mb" parameter, but has a 64-bit return type. (Using the
return value in favor of a pointer-type parameter is left as is, to
disturb callers as little as possible.)

While altering xc_shadow_control() anyway, also adjust the types of the
last two of the remaining parameters.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agolibs/light: fix tv_sec fprintf format
Fabrice Fontaine [Sat, 28 Aug 2021 09:07:09 +0000 (11:07 +0200)]
libs/light: fix tv_sec fprintf format

Don't assume tv_sec is a unsigned long, it is 64 bits on NetBSD 32 bits.
Use %jd and cast to (intmax_t) instead

Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agox86/PVH: de-duplicate mappings for first Mb of Dom0 memory
Jan Beulich [Tue, 31 Aug 2021 15:43:36 +0000 (17:43 +0200)]
x86/PVH: de-duplicate mappings for first Mb of Dom0 memory

One of the changes comprising the fixes for XSA-378 disallows replacing
MMIO mappings by code paths not intended for this purpose. This means we
need to be more careful about the mappings put in place in this range -
mappings should be created exactly once:
- iommu_hwdom_init() comes first; it should avoid the first Mb,
- pvh_populate_p2m() should insert identity mappings only into ranges
  not populated as RAM,
- pvh_setup_acpi() should again avoid the first Mb, which was already
  dealt with at that point.

Fixes: 753cb68e6530 ("x86/p2m: guard (in particular) identity mapping entries")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/EPT: drop "tm" field of EPT entry
Jan Beulich [Tue, 31 Aug 2021 15:42:28 +0000 (17:42 +0200)]
x86/EPT: drop "tm" field of EPT entry

VT-d spec 3.2 converted this bit (back) to reserved. Since there's no
use of it anywhere in the tree, simply rename it and adjust its comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agolibxenguest/x86: ensure CPUID[1].EBX[32:16] is non-zero for HVM
Jan Beulich [Mon, 30 Aug 2021 13:19:31 +0000 (15:19 +0200)]
libxenguest/x86: ensure CPUID[1].EBX[32:16] is non-zero for HVM

We unconditionally set HTT, so merely doubling the value read from
hardware isn't going to be correct if that value is zero.

Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Julien Grall <julien@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen/domain: Fix label position in domain_teardown()
Andrew Cooper [Fri, 27 Aug 2021 13:46:52 +0000 (14:46 +0100)]
xen/domain: Fix label position in domain_teardown()

As explained in the comments, a progress label wants to be before the function
it refers to for the higher level logic to make sense.  As it happens, the
effects are benign because gnttab_mappings is immediately adjacent to teardown
in terms of co-routine exit points.

There is and will always be a corner case with 0.  Help alleviate this
visually (at least slightly) with a BUILD_BUG_ON() to ensure the property
which makes this function do anything useful.

There is also a visual corner case when changing from PROGRESS() to
PROGRESS_VCPU().  The important detail is to check that there is a "return
rc;" logically between each PROGRESS*() marker.

Fixes: b1ee10be5625 ("gnttab: add preemption check to gnttab_release_mappings()")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/spec-ctrl: Skip RSB overwriting when safe to do so
Andrew Cooper [Thu, 19 Aug 2021 12:53:15 +0000 (13:53 +0100)]
x86/spec-ctrl: Skip RSB overwriting when safe to do so

In some configurations, it is safe to not overwrite the RSB on entry to Xen.
Both Intel and AMD have guidelines in this area, because of the performance
difference it makes for native kernels.

A simple microperf test, measuring the amount of time a XENVER_version
hypercall takes, shows the following improvements:

  KabyLake:     -13.9175% +/- 6.85387%
  CoffeeLake-R:  -9.1183% +/- 5.04519%
  Milan:        -17.7803% +/- 1.29808%

This is best case improvement, because no real workloads are making
XENVER_version hypercalls in a tight loop.  However, this is the hypercall
used by PV kernels to force evtchn delivery if one is pending, so it is a
common hypercall to see, especially in dom0.

The avoidance of RSB-overwriting speeds up all interrupts, exceptions and
system calls from PV or Xen context.  RSB-overwriting is still required on
VMExit from HVM guests for now.

In terms of more realistic testing, LMBench in dom0 on an AMD Rome system
shows improvements across the board, with the best improvement at 8% for
simple syscall and simple write.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agognttab: avoid triggering assertion in radix_tree_ulong_to_ptr()
Jan Beulich [Fri, 27 Aug 2021 08:54:46 +0000 (10:54 +0200)]
gnttab: avoid triggering assertion in radix_tree_ulong_to_ptr()

Relevant quotes from the C11 standard:

"Except where explicitly stated otherwise, for the purposes of this
 subclause unnamed members of objects of structure and union type do not
 participate in initialization. Unnamed members of structure objects
 have indeterminate value even after initialization."

"If there are fewer initializers in a brace-enclosed list than there are
 elements or members of an aggregate, [...], the remainder of the
 aggregate shall be initialized implicitly the same as objects that have
 static storage duration."

"If an object that has static or thread storage duration is not
 initialized explicitly, then:
 [...]
 — if it is an aggregate, every member is initialized (recursively)
   according to these rules, and any padding is initialized to zero
   bits;
 [...]"

"A bit-field declaration with no declarator, but only a colon and a
 width, indicates an unnamed bit-field." Footnote: "An unnamed bit-field
 structure member is useful for padding to conform to externally imposed
 layouts."

"There may be unnamed padding within a structure object, but not at its
 beginning."

Which makes me conclude:
- Whether an unnamed bit-field member is an unnamed member or padding is
  unclear, and hence also whether the last quote above would render the
  big endian case of the structure declaration invalid.
- Whether the number of members of an aggregate includes unnamed ones is
  also not really clear.
- The initializer in map_grant_ref() initializes all fields of the "cnt"
  sub-structure of the union, so assuming the second quote above applies
  here (indirectly), the compiler isn't required to implicitly
  initialize the rest (i.e. in particular any padding) like would happen
  for static storage duration objects.

Gcc 7.4.1 can be observed (apparently in debug builds only) to translate
aforementioned initializer to a read-modify-write operation of a stack
variable, leaving unchanged the top two bits of whatever was previously
in that stack slot. Clearly if either of the two bits were set,
radix_tree_ulong_to_ptr()'s assertion would trigger.

Therefore, to be on the safe side, add an explicit padding field for the
non-big-endian-bitfields case and give a dummy name to both padding
fields.

Fixes: 9781b51efde2 ("gnttab: replace mapkind()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agognttab: drop GNTMAP_can_fail
Jan Beulich [Fri, 27 Aug 2021 08:53:48 +0000 (10:53 +0200)]
gnttab: drop GNTMAP_can_fail

There's neither documentation of what this flag is supposed to mean, nor
any implementation. Commit 4d45702cf0398 ("paging: Updates to public
grant table header file") suggests there might have been plans to use it
for interaction with mem-paging, but no such functionality has ever
materialized. With this, don't even bother enclosing the #define-s in a
__XEN_INTERFACE_VERSION__ conditional, but drop them altogether.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoAMD/IOMMU: avoid recording each level's MFN when walking page table
Jan Beulich [Fri, 27 Aug 2021 08:53:11 +0000 (10:53 +0200)]
AMD/IOMMU: avoid recording each level's MFN when walking page table

Both callers only care about the target (level 1) MFN. I also cannot
see what we might need higher level MFNs for down the road. And even
modern gcc doesn't recognize the optimization potential.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoVT-d: fix caching mode IOTLB flushing
Jan Beulich [Fri, 27 Aug 2021 08:52:15 +0000 (10:52 +0200)]
VT-d: fix caching mode IOTLB flushing

While for context cache entry flushing use of did 0 is indeed correct
(after all upon reading the context entry the IOMMU wouldn't know any
domain ID if the entry is not present, and hence a surrogate one needs
to be used), for IOTLB entries the normal domain ID (from the [present]
context entry) gets used. See sub-section "IOTLB" of section "Address
Translation Caches" in the VT-d spec.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agoxen/arm: Restrict the amount of memory that dom0less domU and dom0 can allocate
Julien Grall [Wed, 25 Aug 2021 12:19:31 +0000 (14:19 +0200)]
xen/arm: Restrict the amount of memory that dom0less domU and dom0 can allocate

Currently, both dom0less domUs and dom0 can allocate an "unlimited"
amount of memory because d->max_pages is set to ~0U.

In particular, the former are meant to be unprivileged. Therefore the
memory they could allocate should be bounded. As the domain are not yet
officially aware of Xen (we don't expose advertise it in the DT, yet
the hypercalls are accessible), they should not need to allocate more
than the initial amount. So cap set d->max_pages directly the amount of
memory we are meant to allocate.

Take the opportunity to also restrict the memory for dom0 as the
domain is direct mapped (e.g. MFN == GFN) and therefore cannot
allocate outside of the pre-allocated region.

This is CVE-2021-28700 / XSA-383.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agognttab: fix array capacity check in gnttab_get_status_frames()
Jan Beulich [Wed, 25 Aug 2021 12:19:09 +0000 (14:19 +0200)]
gnttab: fix array capacity check in gnttab_get_status_frames()

The number of grant frames is of no interest here; converting the passed
in op.nr_frames this way means we allow for 8 times as many GFNs to be
written as actually fit in the array. We would corrupt xlat areas of
higher vCPU-s (after having faulted many times while trying to write to
the guard pages between any two areas) for 32-bit PV guests. For HVM
guests we'd simply crash as soon as we hit the first guard page, as
accesses to the xlat area are simply memcpy() there.

This is CVE-2021-28699 / XSA-382.

Fixes: 18b1be5e324b ("gnttab: make resource limits per domain")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
3 years agognttab: replace mapkind()
Jan Beulich [Wed, 25 Aug 2021 12:18:39 +0000 (14:18 +0200)]
gnttab: replace mapkind()

mapkind() doesn't scale very well with larger maptrack entry counts,
using a brute force linear search through all entries, with the only
option of an early loop exit if a matching writable entry was found.
Introduce a radix tree alongside the main maptrack table, thus
allowing much faster MFN-based lookup. To avoid the need to actually
allocate space for the individual nodes, encode the two counters in the
node pointers themselves, thus limiting the number of permitted
simultaneous r/o and r/w mappings of the same MFN to 2³¹-1 (64-bit) /
2¹⁵-1 (32-bit) each.

To avoid enforcing an unnecessarily low bound on the number of
simultaneous mappings of a single MFN, introduce
radix_tree_{ulong_to_ptr,ptr_to_ulong} paralleling
radix_tree_{int_to_ptr,ptr_to_int}.

As a consequence locking changes are also applicable: With there no
longer being any inspection of the remote domain's active entries,
there's also no need anymore to hold the remote domain's grant table
lock. And since we're no longer iterating over the local domain's map
track table, the lock in map_grant_ref() can also be dropped before the
new maptrack entry actually gets populated.

As a nice side effect this also reduces the number of IOMMU operations
in unmap_common(): Previously we would have "established" a readable
mapping whenever we didn't find a writable entry anymore (yet, of
course, at least one readable one). But we only need to do this if we
actually dropped the last writable entry, not if there were none already
before.

This is part of CVE-2021-28698 / XSA-380.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agognttab: add preemption check to gnttab_release_mappings()
Jan Beulich [Wed, 25 Aug 2021 12:18:18 +0000 (14:18 +0200)]
gnttab: add preemption check to gnttab_release_mappings()

A guest may die with many grant mappings still in place, or simply with
a large maptrack table. Iterating through this may take more time than
is reasonable without intermediate preemption (to run softirqs and
perhaps the scheduler).

Move the invocation of the function to the section where other
restartable functions get invoked, and have the function itself check
for preemption every once in a while. Have it iterate the table
backwards, such that decreasing the maptrack limit is all it takes to
convey restart information.

In domain_teardown() introduce PROG_none such that inserting at the
front will be easier going forward.

This is part of CVE-2021-28698 / XSA-380.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86/mm: widen locked region in xenmem_add_to_physmap_one()
Jan Beulich [Wed, 25 Aug 2021 12:17:56 +0000 (14:17 +0200)]
x86/mm: widen locked region in xenmem_add_to_physmap_one()

For pages which can be made part of the P2M by the guest, but which can
also later be de-allocated (grant table v2 status pages being the
present example), it is imperative that they be mapped at no more than a
single GFN. We therefore need to make sure that of two parallel
XENMAPSPACE_grant_table requests for the same status page one completes
before the second checks at which other GFN the underlying MFN is
presently mapped.

Pull ahead the respective get_gfn() and push down the respective
put_gfn(). This leverages that gfn_lock() really aliases p2m_lock(), but
the function makes this assumption already anyway: In the
XENMAPSPACE_gmfn case lock nesting constraints for both involved GFNs
would otherwise need to be enforced to avoid ABBA deadlocks.

This is CVE-2021-28697 / XSA-379.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86/p2m: guard (in particular) identity mapping entries
Jan Beulich [Wed, 25 Aug 2021 12:17:32 +0000 (14:17 +0200)]
x86/p2m: guard (in particular) identity mapping entries

Such entries, created by set_identity_p2m_entry(), should only be
destroyed by clear_identity_p2m_entry(). However, similarly, entries
created by set_mmio_p2m_entry() should only be torn down by
clear_mmio_p2m_entry(), so the logic gets based upon p2m_mmio_direct as
the entry type (separation between "ordinary" and 1:1 mappings would
require a further indicator to tell apart the two).

As to the guest_remove_page() change, commit 48dfb297a20a ("x86/PVH:
allow guest_remove_page to remove p2m_mmio_direct pages"), which
introduced the call to clear_mmio_p2m_entry(), claimed this was done for
hwdom only without this actually having been the case. However, this
code shouldn't be there in the first place, as MMIO entries shouldn't be
dropped this way. Avoid triggering the warning again that 48dfb297a20a
silenced by an adjustment to xenmem_add_to_physmap_one() instead.

Note that guest_physmap_mark_populate_on_demand() gets tightened beyond
the immediate purpose of this change.

Note also that I didn't inspect code which isn't security supported,
e.g. sharing, paging, or altp2m.

This is CVE-2021-28694 / part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agox86/p2m: introduce p2m_is_special()
Jan Beulich [Wed, 25 Aug 2021 12:17:07 +0000 (14:17 +0200)]
x86/p2m: introduce p2m_is_special()

Seeing the similarity of grant, foreign, and (subsequently) direct-MMIO
handling, introduce a new P2M type group named "special" (as in "needing
special accessors to create/destroy").

Also use -EPERM instead of other error codes on the two domain_crash()
paths touched.

This is part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: re-arrange exclusion range and unity map recording
Jan Beulich [Wed, 25 Aug 2021 12:16:46 +0000 (14:16 +0200)]
AMD/IOMMU: re-arrange exclusion range and unity map recording

The spec makes no provisions for OS behavior here to depend on the
amount of RAM found on the system. While the spec may not sufficiently
clearly distinguish both kinds of regions, they are surely meant to be
separate things: Only regions with ACPI_IVMD_EXCLUSION_RANGE set should
be candidates for putting in the exclusion range registers. (As there's
only a single such pair of registers per IOMMU, secondary non-adjacent
regions with the flag set already get converted to unity mapped
regions.)

First of all, drop the dependency on max_page. With commit b4f042236ae0
("AMD/IOMMU: Cease using a dynamic height for the IOMMU pagetables") the
use of it here was stale anyway; it was bogus already before, as it
didn't account for max_page getting increased later on. Simply try an
exclusion range registration first, and if it fails (for being
unsuitable or non-mergeable), register a unity mapping range.

With this various local variables become unnecessary and hence get
dropped at the same time.

With the max_page boundary dropped for using unity maps, the minimum
page table tree height now needs both recording and enforcing in
amd_iommu_domain_init(). Since we can't predict which devices may get
assigned to a domain, our only option is to uniformly force at least
that height for all domains, now that the height isn't dynamic anymore.

Further don't make use of the exclusion range unless ACPI data says so.

Note that exclusion range registration in
register_range_for_all_devices() is on a best effort basis. Hence unity
map entries also registered are redundant when the former succeeded, but
they also do no harm. Improvements in this area can be done later imo.

Also adjust types where suitable without touching extra lines.

This is part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: re-arrange/complete re-assignment handling
Jan Beulich [Wed, 25 Aug 2021 12:16:26 +0000 (14:16 +0200)]
AMD/IOMMU: re-arrange/complete re-assignment handling

Prior to the assignment step having completed successfully, devices
should not get associated with their new owner. Hand the device to DomIO
(perhaps temporarily), until after the de-assignment step has completed.

De-assignment of a device (from other than Dom0) as well as failure of
reassign_device() during assignment should result in unity mappings
getting torn down. This in turn requires switching to a refcounted
mapping approach, as was already used by VT-d for its RMRRs, to prevent
unmapping a region used by multiple devices.

This is CVE-2021-28696 / part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoIOMMU: generalize VT-d's tracking of mapped RMRR regions
Jan Beulich [Wed, 25 Aug 2021 12:15:57 +0000 (14:15 +0200)]
IOMMU: generalize VT-d's tracking of mapped RMRR regions

In order to re-use it elsewhere, move the logic to vendor independent
code and strip it of RMRR specifics.

Note that the prior "map" parameter gets folded into the new "p2ma" one
(which AMD IOMMU code will want to make use of), assigning alternative
meaning ("unmap") to p2m_access_x. Prepare set_identity_p2m_entry() and
p2m_get_iommu_flags() for getting passed access types other than
p2m_access_rw (in the latter case just for p2m_mmio_direct requests).

Note also that, to be on the safe side, an overlap check gets added to
the main loop of iommu_identity_mapping().

This is part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoIOMMU: also pass p2m_access_t to p2m_get_iommu_flags()
Jan Beulich [Wed, 25 Aug 2021 12:15:32 +0000 (14:15 +0200)]
IOMMU: also pass p2m_access_t to p2m_get_iommu_flags()

A subsequent change will want to customize the IOMMU permissions based
on this.

This is part of XSA-378.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: correct device unity map handling
Jan Beulich [Wed, 25 Aug 2021 12:15:11 +0000 (14:15 +0200)]
AMD/IOMMU: correct device unity map handling

Blindly assuming all addresses between any two such ranges, specified by
firmware in the ACPI tables, should also be unity-mapped can't be right.
Nor can it be correct to merge ranges with differing permissions. Track
ranges individually; don't merge at all, but check for overlaps instead.
This requires bubbling up error indicators, such that IOMMU init can be
failed when allocation of a new tracking struct wasn't possible, or an
overlap was detected.

At this occasion also stop ignoring
amd_iommu_reserve_domain_unity_map()'s return value.

This is part of XSA-378 / CVE-2021-28695.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: correct global exclusion range extending
Jan Beulich [Wed, 25 Aug 2021 12:12:13 +0000 (14:12 +0200)]
AMD/IOMMU: correct global exclusion range extending

Besides unity mapping regions, the AMD IOMMU spec also provides for
exclusion ranges (areas of memory not to be subject to DMA translation)
to be specified by firmware in the ACPI tables. The spec does not put
any constraints on the number of such regions.

Blindly assuming all addresses between any two such ranges should also
be excluded can't be right. Since hardware has room for just a single
such range (comprised of the Exclusion Base Register and the Exclusion
Range Limit Register), combine only adjacent or overlapping regions (for
now; this may require further adjustment in case table entries aren't
sorted by address) with matching exclusion_allow_all settings. This
requires bubbling up error indicators, such that IOMMU init can be
failed when concatenation wasn't possible.

Furthermore, since the exclusion range specified in IOMMU registers
implies R/W access, reject requests asking for less permissions (this
will be brought closer to the spec by a subsequent change).

This is part of XSA-378 / CVE-2021-28695.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoxen/public: arch-arm: Add mention of argo_op hypercall
Michal Orzel [Fri, 20 Aug 2021 09:39:24 +0000 (11:39 +0200)]
xen/public: arch-arm: Add mention of argo_op hypercall

Commit 1ddc0d43c20cb1c1125d4d6cefc78624b2a9ccb7 introducing
argo_op hypercall forgot to add a mention of it in the
comment listing supported hypercalls. Fix that.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Christopher Clark <christopher.w.clark@gmail.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/arm: smmu: Set/clear IOMMU domain for device
Oleksandr Andrushchenko [Wed, 18 Aug 2021 05:22:02 +0000 (08:22 +0300)]
xen/arm: smmu: Set/clear IOMMU domain for device

When a device is assigned/de-assigned it is required to properly set
IOMMU domain used to protect the device. This assignment was missing,
thus it was not possible to de-assign the device:

(XEN) Deassigning device 0000:03:00.0 from dom2
(XEN) smmu: 0000:03:00.0:  not attached to domain 2
(XEN) d2: deassign (0000:03:00.0) failed (-3)

Fix this by assigning IOMMU domain on arm_smmu_assign_dev and reset it
to NULL on arm_smmu_deassign_dev.

Fixes: 06d1f7a278dd ("xen/arm: smmuv1: Keep track of S2CR state")
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agons16550: properly gate Exar PCIe UART cards support
Oleksandr Andrushchenko [Fri, 20 Aug 2021 14:18:12 +0000 (16:18 +0200)]
ns16550: properly gate Exar PCIe UART cards support

Arm is about to get PCI passthrough support which means CONFIG_HAS_PCI
will be enabled, so this code will fail as Arm doesn't have ns16550
PCI support:

ns16550.c:313:5: error: implicit declaration of function 'enable_exar_enhanced_bits' [-Werror=implicit-function-declaration]
  313 |     enable_exar_enhanced_bits(uart);
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~

Fix this by gating Exar PCIe UART cards support with the above in mind.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoAMD/IOMMU: don't increase perms when splitting superpage
Jan Beulich [Fri, 20 Aug 2021 10:31:08 +0000 (12:31 +0200)]
AMD/IOMMU: don't increase perms when splitting superpage

The old (super)page's permissions ought to be propagated, rather than
blindly allowing both reads and writes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: don't leave page table mapped when unmapping ...
Jan Beulich [Fri, 20 Aug 2021 10:30:35 +0000 (12:30 +0200)]
AMD/IOMMU: don't leave page table mapped when unmapping ...

... an already not mapped page. With all other exit paths doing the
unmap, I have no idea how I managed to miss that aspect at the time.

Fixes: ad591454f069 ("AMD/IOMMU: don't needlessly trigger errors/crashes when unmapping a page")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agons16550: add Exar PCIe UART cards support
Marek Marczykowski-Górecki [Fri, 20 Aug 2021 10:29:45 +0000 (12:29 +0200)]
ns16550: add Exar PCIe UART cards support

Besides standard UART setup, this device needs enabling
(vendor-specific) "Enhanced Control Bits" - otherwise disabling hardware
control flow (MCR[2]) is ignored. Add appropriate quirk to the
ns16550_setup_preirq(), similar to the handle_dw_usr_busy_quirk(). The
new function act on Exar 2-, 4-, and 8- port cards only. I have tested
the functionality on 2-port card but based on the Linux driver, the same
applies to other models too.

Additionally, Exar card supports fractional divisor (DLD[3:0] register,
at 0x02). This part is not supported here yet, and seems to not
be required for working 115200bps at the very least.

The specification for the 2-port card is available at:
https://www.maxlinear.com/product/interface/uarts/pcie-uarts/xr17v352

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agons16550: specify uart param for ns_{read,write}_reg as const
Marek Marczykowski-Górecki [Fri, 20 Aug 2021 10:29:05 +0000 (12:29 +0200)]
ns16550: specify uart param for ns_{read,write}_reg as const

They don't modify it, after all.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/PV: account for 32-bit Dom0 in mark_pv_pt_pages_rdonly()'s ASSERT()s
Jan Beulich [Fri, 20 Aug 2021 10:28:07 +0000 (12:28 +0200)]
x86/PV: account for 32-bit Dom0 in mark_pv_pt_pages_rdonly()'s ASSERT()s

Clearly I neglected the special needs here, and also failed to test the
change with a debug build of Xen.

Fixes: 6b1ca51b1a91 ("x86/PV: assert page state in mark_pv_pt_pages_rdonly()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agolibs/guest: Move the guest ABI check earlier into xc_dom_parse_image()
Jane Malalane [Tue, 17 Aug 2021 15:19:24 +0000 (16:19 +0100)]
libs/guest: Move the guest ABI check earlier into xc_dom_parse_image()

Xen may not support 32-bit PV guest for a number of reasons (lack of
CONFIG_PV32, explicit pv=no-32 command line argument, or implicitly
due to CET being enabled) and advertises this to the toolstack via the
absence of xen-3.0-x86_32p ABI.

Currently, when trying to boot a 32-bit PV guest, the ABI check is too
late and the build explodes in the following manner yielding an
unhelpful error message:

  xc: error: panic: xg_dom_boot.c:121: xc_dom_boot_mem_init: can't allocate low memory for domain: Out of memory
  libxl: error: libxl_dom.c:586:libxl__build_dom: xc_dom_boot_mem_init failed: Operation not supported
  libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
  libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 1:Non-existant domain
  libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 1:Unable to destroy guest
  libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 1:Destruction of domain failed

Move the ABI check earlier into xc_dom_parse_image() along with other
ELF-note feature checks.  With this adjustment, it now looks like
this:

  xc: error: panic: xg_dom_boot.c:88: xc_dom_compat_check: guest type xen-3.0-x86_32p not supported by xen kernel, sorry: Invalid kernel
  libxl: error: libxl_dom.c:571:libxl__build_dom: xc_dom_parse_image failed
  domainbuilder: detail: xc_dom_release: called
  libxl: error: libxl_create.c:1573:domcreate_rebuild_done: Domain 11:cannot (re-)build domain: -3
  libxl: error: libxl_domain.c:1182:libxl__destroy_domid: Domain 11:Non-existant domain
  libxl: error: libxl_domain.c:1136:domain_destroy_callback: Domain 11:Unable to destroy guest
  libxl: error: libxl_domain.c:1063:domain_destroy_cb: Domain 11:Destruction of domain failed

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agoxen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume
Juergen Gross [Thu, 19 Aug 2021 11:38:31 +0000 (13:38 +0200)]
xen/sched: fix get_cpu_idle_time() for smt=0 suspend/resume

With smt=0 during a suspend/resume cycle of the machine the threads
which have been parked before will briefly come up again. This can
result in problems e.g. with cpufreq driver being active as this will
call into get_cpu_idle_time() for a cpu without initialized scheduler
data.

Fix that by letting get_cpu_idle_time() deal with this case. Drop a
redundant check in exchange.

Fixes: 132cbe8f35632fb2 ("sched: fix get_cpu_idle_time() with core scheduling")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
3 years agoArm: relax iomem_access_permitted() check
Jan Beulich [Thu, 19 Aug 2021 11:37:42 +0000 (13:37 +0200)]
Arm: relax iomem_access_permitted() check

Ranges checked by iomem_access_permitted() are inclusive; to permit a
mapping there's no need for access to also have been granted for the
subsequent page.

Fixes: 80f9c3167084 ("xen/arm: acpi: Map MMIO on fault in stage-2 page table for the hardware domain")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agox86: mark compat hypercall regs clobbering for intended fall-through
Jan Beulich [Thu, 19 Aug 2021 11:36:54 +0000 (13:36 +0200)]
x86: mark compat hypercall regs clobbering for intended fall-through

Oddly enough in the original report Coverity only complained about the
native hypercall related switch() statements. Now that it has seen those
fixed, it complains about (only HVM) compat ones. Hence the CIDs below
are all for the HVM side of things, yet while at it take care of the PV
side as well.

Coverity-ID: 14871051487106148710714871081487109.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoVT-d: Tylersburg errata apply to further steppings
Jan Beulich [Wed, 18 Aug 2021 07:44:14 +0000 (09:44 +0200)]
VT-d: Tylersburg errata apply to further steppings

While for 5500 and 5520 chipsets only B3 and C2 are mentioned in the
spec update, X58's also mentions B2, and searching the internet suggests
systems with this stepping are actually in use. Even worse, for X58
erratum #69 is marked applicable even to C2. Split the check to cover
all applicable steppings and to also report applicable errata numbers in
the log message. The splitting requires using the DMI port instead of
the System Management Registers device, but that's then in line (also
revision checking wise) with the spec updates.

Fixes: 6890cebc6a98 ("VT-d: deal with 5500/5520/X58 errata")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
3 years agox86/PV: assert page state in mark_pv_pt_pages_rdonly()
Jan Beulich [Wed, 18 Aug 2021 07:40:08 +0000 (09:40 +0200)]
x86/PV: assert page state in mark_pv_pt_pages_rdonly()

About every time I look at dom0_construct_pv()'s "calculation" of
nr_pt_pages I question (myself) whether the result is precise or merely
an upper bound. I think it is meant to be precise, but I think we would
be better off having some checking in place. Hence add ASSERT()s to
verify that
- all pages have a valid L1...Ln (currently L4) page table type and
- no other bits are set, in particular the type refcount is still zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com>
3 years agox86/PV: suppress unnecessary Dom0 construction output
Jan Beulich [Wed, 18 Aug 2021 07:39:08 +0000 (09:39 +0200)]
x86/PV: suppress unnecessary Dom0 construction output

v{xenstore,console}_{start,end} can only ever be zero in PV shim
configurations. Similarly reporting just zeros for an unmapped (or
absent) initrd is not useful. Particularly in case video is the only
output configured, space is scarce: Split the printk() and omit lines
carrying no information at all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/cet: Fix build on newer versions of GCC
Andrew Cooper [Tue, 17 Aug 2021 10:38:07 +0000 (11:38 +0100)]
x86/cet: Fix build on newer versions of GCC

Some versions of GCC complain with:

  traps.c:405:22: error: 'get_shstk_bottom' defined but not used [-Werror=unused-function]
   static unsigned long get_shstk_bottom(unsigned long sp)
                        ^~~~~~~~~~~~~~~~
  cc1: all warnings being treated as errors

Change #ifdef to if ( IS_ENABLED(...) ) to make the sole user of
get_shstk_bottom() visible to the compiler.

Fixes: 35727551c070 ("x86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Compile-tested-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
3 years agox86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}
Andrew Cooper [Thu, 12 Aug 2021 16:39:16 +0000 (17:39 +0100)]
x86/cet: Fix shskt manipulation error with BUGFRAME_{warn,run_fn}

This was a clear oversight in the original CET work.  The BUGFRAME_run_fn and
BUGFRAME_warn paths update regs->rip without an equivalent adjustment to the
shadow stack, causing IRET to suffer #CP because of the mismatch.

One subtle, and therefore fragile, aspect of extable_shstk_fixup() was that it
required regs->rip to have its old value as a cross-check that the right word
in the shadow stack was being edited.

Rework extable_shstk_fixup() into fixup_exception_return() which takes
ownership of the update to both the regular and shadow stacks, ensuring that
the regs->rip update is ordered correctly.

Use the new fixup_exception_return() for BUGFRAME_run_fn and BUGFRAME_warn to
ensure that the shadow stack is updated too.

Fixes: 209fb9919b50 ("x86/extable: Adjust extable handling to be shadow stack compatible")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/ACPI: Insert missing newlines into FACS error messages
Andrew Cooper [Mon, 16 Aug 2021 13:24:44 +0000 (14:24 +0100)]
x86/ACPI: Insert missing newlines into FACS error messages

Booting Xen as a PVH guest currently yields:

  (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:b004,1:0], pm1x_evt[1:b000,1:0]
  (XEN) ACPI: FACS is not 64-byte aligned: 0xfc001010<2>ACPI: wakeup_vec[fc00101c], vec_size[20]
  (XEN) ACPI: Local APIC address 0xfee00000

Insert newlines as appropriate.

Fixes: d3faf9badf52 ("[host s3] Retrieve necessary sleep information from plain-text ACPI tables (FADT/FACS), and keep one hypercall remained for sleep notification.")
Fixes: 0f089bbf43ec ("x86/ACPI: fix S3 wakeup vector mapping")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoMAINTAINERS: Fix file path for kexec headers
Andrew Cooper [Thu, 12 Aug 2021 13:49:57 +0000 (14:49 +0100)]
MAINTAINERS: Fix file path for kexec headers

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/ioapic: remove use of TRUE/FALSE/1/0
Kevin Stefanov [Mon, 16 Aug 2021 13:16:56 +0000 (15:16 +0200)]
x86/ioapic: remove use of TRUE/FALSE/1/0

Also fix stray usage in VT-d.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Kevin Stefanov <kevin.stefanov@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/pv: provide more helpful error when CONFIG_PV32 is absent
Jane Malalane [Mon, 16 Aug 2021 13:16:20 +0000 (15:16 +0200)]
x86/pv: provide more helpful error when CONFIG_PV32 is absent

Currently, when booting a 32bit dom0 kernel, the message isn't very
helpful:

  (XEN)  Xen  kernel: 64-bit, lsb
  (XEN)  Dom0 kernel: 32-bit, PAE, lsb, paddr 0x100000 -> 0x112000
  (XEN) Mismatch between Xen and DOM0 kernel
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) Could not construct domain 0
  (XEN) ****************************************

With this adjustment, it now looks like this:

  (XEN)  Xen  kernel: 64-bit, lsb
  (XEN) Found 32-bit PV kernel, but CONFIG_PV32 missing
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) Could not construct domain 0
  (XEN) ****************************************

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/pv: remove unnecessary use of goto out in construct_dom0()
Jane Malalane [Mon, 16 Aug 2021 13:15:43 +0000 (15:15 +0200)]
x86/pv: remove unnecessary use of goto out in construct_dom0()

elf_check_broken() only needs to be invoked after elf_xen_parse() and
after elf_load_binary().

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agons16550: do not override fifo size if explicitly set
Marek Marczykowski-Górecki [Mon, 16 Aug 2021 13:14:37 +0000 (15:14 +0200)]
ns16550: do not override fifo size if explicitly set

If fifo size is already set via uart_params, do not force it to 16 - which
may not match the actual hardware. Specifically Exar cards have fifo of
256 bytes.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agolibxc: simplify HYPERCALL_BUFFER()
Jan Beulich [Fri, 13 Aug 2021 14:50:09 +0000 (16:50 +0200)]
libxc: simplify HYPERCALL_BUFFER()

_hcbuf_buf1 has been there only for a pointer comparison to validate
type compatibility. The same can be achieved by not using typeof() on
the definition of what so far was _hcbuf_buf2, as the initializer has
to also be type-compatible. Drop _hcbuf_buf1 and the comaprison;
rename _hcbuf_buf2.

Since we're already using compiler extensions here, don't be shy and
also omit the middle operand of the involved ?: operator.

Bring line continuation character placement in line with that of
related macros.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agolibxenguest: fix off-by-1 in colo-secondary-bitmap merging
Jan Beulich [Fri, 13 Aug 2021 14:49:46 +0000 (16:49 +0200)]
libxenguest: fix off-by-1 in colo-secondary-bitmap merging

Valid GFNs (having a representation in the dirty bitmap) need to be
strictly below p2m_size.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agolibxenguest: complete loops in xc_map_domain_meminfo()
Jan Beulich [Fri, 13 Aug 2021 14:49:10 +0000 (16:49 +0200)]
libxenguest: complete loops in xc_map_domain_meminfo()

minfo->p2m_size may have more than 31 significant bits. Change the
induction variable to unsigned long, and (largely for signed-ness
consistency) a helper variable to unsigned int. While there also avoid
open-coding min().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen/bitmap: don't open code DIV_ROUND_UP()
Jane Malalane [Thu, 12 Aug 2021 15:14:25 +0000 (17:14 +0200)]
xen/bitmap: don't open code DIV_ROUND_UP()

Also, change bitmap_long_to_byte() and bitmap_byte_to_long() to take
'unsigned int' instead of 'int' number of bits, to match the type of
their callers.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agokexec: remove use of TRUE/FALSE
Kevin Stefanov [Thu, 12 Aug 2021 15:10:23 +0000 (17:10 +0200)]
kexec: remove use of TRUE/FALSE

Whilst fixing this, also changed bool_t to bool, and use __read_mostly.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Kevin Stefanov <kevin.stefanov@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobitmap: make bitmap_long_to_byte() and bitmap_byte_to_long() static
Jane Malalane [Tue, 10 Aug 2021 07:29:52 +0000 (09:29 +0200)]
bitmap: make bitmap_long_to_byte() and bitmap_byte_to_long() static

Functions made static as there are no external callers.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agocredit2: avoid picking a spurious idle unit when caps are used
Dario Faggioli [Tue, 10 Aug 2021 07:29:10 +0000 (09:29 +0200)]
credit2: avoid picking a spurious idle unit when caps are used

Commit 07b0eb5d0ef0 ("credit2: make sure we pick a runnable unit from the
runq if there is one") did not fix completely the problem of potentially
selecting a scheduling unit that will then not be able to run.

In fact, in case caps are used and the unit we are currently looking
at, during the runqueue scan, does not have enough budget for being run,
we should continue looking instead than giving up and picking the idle
unit.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: remove unneeded deps of x86_emulate.o
Anthony PERARD [Tue, 10 Aug 2021 07:28:31 +0000 (09:28 +0200)]
build: remove unneeded deps of x86_emulate.o

Those two dependencies already exist so make doesn't need to know
about them. The dependency will be generated by $(CC).

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: rework .banner generation
Anthony PERARD [Tue, 10 Aug 2021 07:27:13 +0000 (09:27 +0200)]
build: rework .banner generation

Avoid depending on Makefile but still allow to rebuild the banner when
$(XEN_FULLVERSION) changes.

Also add a dependency on tools/xen.flf, even if not expected to
change.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: Do not invalidate the P2M when the PT is shared with the IOMMU
Stefano Stabellini [Wed, 4 Aug 2021 20:57:07 +0000 (13:57 -0700)]
xen/arm: Do not invalidate the P2M when the PT is shared with the IOMMU

Set/Way flushes never work correctly in a virtualized environment.

Our current implementation is based on clearing the valid bit in the p2m
pagetable to track guest memory accesses. This technique doesn't work
when the IOMMU is enabled for the domain and the pagetable is shared
between IOMMU and MMU because it triggers IOMMU faults.

Specifically, p2m_invalidate_root causes IOMMU faults if
iommu_use_hap_pt returns true for the domain.

Add a check in p2m_set_way_flush: if a set/way instruction is used
and iommu_use_hap_pt returns true, rather than failing with obscure
IOMMU faults, inject an undef exception straight away into the guest,
and print a verbose error message to explain the problem.

Also add an ASSERT in p2m_invalidate_root to make sure we don't
inadvertently stumble across this problem again in the future.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agoarm,smmu: add support for generic DT bindings. Implement add_device and dt_xlate.
Brian Woods [Tue, 3 Aug 2021 00:24:09 +0000 (17:24 -0700)]
arm,smmu: add support for generic DT bindings. Implement add_device and dt_xlate.

For the legacy path, arm_smmu_dt_add_device_legacy is called by
register_smmu_master scanning mmu-masters (a fwspec entry is also
created.) For the generic path, arm_smmu_dt_add_device_generic gets
called instead. Then, arm_smmu_dt_add_device_generic calls
arm_smmu_dt_add_device_legacy afterwards, shared with the legacy path.
This way most of the low level implementation is shared between the two
paths.

If both legacy bindings and generic bindings are present in device tree,
the legacy bindings are the ones that are used. That's because
mmu-masters is parsed by
xen/drivers/passthrough/arm/smmu.c:arm_smmu_device_dt_probe which is
called by arm_smmu_dt_init. It happens very early. iommus is parsed by
xen/drivers/passthrough/device_tree.c:iommu_add_dt_device which is
called by xen/arch/arm/domain_build.c:handle_device and happens
afterwards.

arm_smmu_dt_xlate_generic is a verbatim copy from Linux
(drivers/iommu/arm/arm-smmu/arm-smmu.c:arm_smmu_of_xlate, version
v5.10).

A workaround was introduced by cf4af9d6d6c (xen/arm: boot with device
trees with "mmu-masters" and "iommus") because the SMMU driver only
supported the legacy bindings. Remove it now.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoarm,smmu: restructure code in preparation to new bindings support
Brian Woods [Tue, 3 Aug 2021 00:24:08 +0000 (17:24 -0700)]
arm,smmu: restructure code in preparation to new bindings support

Restructure some of the code and add supporting functions for adding
generic device tree (DT) binding support.  This will allow for using
current Linux device trees with just modifying the chosen field to
enable Xen.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoarm,smmu: switch to using iommu_fwspec functions
Brian Woods [Tue, 3 Aug 2021 00:24:06 +0000 (17:24 -0700)]
arm,smmu: switch to using iommu_fwspec functions

Modify the smmu driver so that it uses the iommu_fwspec helper
functions.  This means both ARM IOMMU drivers will both use the
iommu_fwspec helper functions, making enabling generic device tree
bindings in the SMMU driver much cleaner.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoxen: do not return -EEXIST if iommu_add_dt_device is called twice
Stefano Stabellini [Tue, 3 Aug 2021 00:24:07 +0000 (17:24 -0700)]
xen: do not return -EEXIST if iommu_add_dt_device is called twice

iommu_add_dt_device() returns -EEXIST if the device was already
registered. At the moment, this can only happen if the device was
already assigned to a domain (either dom0 at boot or via
XEN_DOMCTL_assign_device).

In a follow-up patch, we will convert the SMMU driver to use the FW
spec. When the legacy bindings are used, all the devices will be
registered at probe. Therefore, iommu_add_dt_device() will always
returns -EEXIST.

Currently, one caller (XEN_DOMCTL_assign_device) will check the return
and ignore -EEXIST. All the other will fail because it was technically a
programming error.

However, there is no harm to call iommu_add_dt_device() twice, so we can
simply return 0.

With that in place the caller doesn't need to check -EEXIST anymore, so
remove the check.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agotools/xenstored: Don't assume errno will not be overwritten in lu_arch()
Julien Grall [Fri, 30 Jul 2021 15:14:14 +0000 (16:14 +0100)]
tools/xenstored: Don't assume errno will not be overwritten in lu_arch()

At the moment, do_control_lu() will set errno to 0 before calling
lu_arch() and then check errno. The expectation is nothing in lu_arch()
will change the value unless there is an error.

However, per errno(3), a function that succeeds is allowed to change
errno. In fact, syslog() will overwrite errno if the logs are rotated
at the time it is called.

To prevent any further issue, errno is now always set before
returning NULL.

Additionally, errno is only checked when returning NULL so the client
can see the error message if there is any.

Reported-by: Michael Kurth <mku@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agotools/xenstored: Propagate correctly the error message from lu_start()
Julien Grall [Thu, 29 Jul 2021 11:06:02 +0000 (12:06 +0100)]
tools/xenstored: Propagate correctly the error message from lu_start()

lu_start() will only set errno when it returns NULL. For all the
other cases, the value is unknown.

This means that when lu_start() returns an error message, it may not
be propagated to the client.

The check that errno is a non-zero value is now dropped and instead
the value is returned when no error message is provided. This
relies on errno to always be set when ret == NULL.

Fixes: af216a99fb ("tools/xenstore: add the basic framework for doing the live update")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agotools/xenstored: Fix off-by-one in dump_state_nodes()
Julien Grall [Thu, 29 Jul 2021 09:34:20 +0000 (10:34 +0100)]
tools/xenstored: Fix off-by-one in dump_state_nodes()

The maximum path length supported by Xenstored protocol is
XENSTORE_ABS_PATH_MAX (i.e 3072). This doesn't take into account the
NUL at the end of the path.

However, the code to dump the nodes will allocate a buffer
of XENSTORE_ABS_PATH. As a result it may not be possible to live-update
if there is a node name of XENSTORE_ABS_PATH.

Fix it by allocating a buffer of XENSTORE_ABS_PATH_MAX + 1 characters.

Take the opportunity to pass the max length of the buffer as a
parameter of dump_state_node_tree(). This will be clearer that the
check in the function is linked to the allocation in dump_state_nodes().

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agoxen/lib: Fix strcmp() and strncmp()
Jane Malalane [Tue, 27 Jul 2021 18:47:15 +0000 (19:47 +0100)]
xen/lib: Fix strcmp() and strncmp()

The C standard requires that each character be compared as unsigned
char. Xen's current behaviour compares as signed char, which changes
the answer when chars with a value greater than 0x7f are used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
3 years agox86: work around build issue with GNU ld 2.37
Jan Beulich [Thu, 22 Jul 2021 09:20:38 +0000 (11:20 +0200)]
x86: work around build issue with GNU ld 2.37

I suspect it is commit 40726f16a8d7 ("ld script expression parsing")
which broke the hypervisor build, by no longer accepting section names
with a dash in them inside ADDR() (and perhaps other script directives
expecting just a section name, not an expression): .note.gnu.build-id
is such a section.

Quoting all section names passed to ADDR() via DECL_SECTION() works
around the regression.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/libxl: add missing blank in message
Alan Robinson [Tue, 27 Jul 2021 07:47:03 +0000 (09:47 +0200)]
tools/libxl: add missing blank in message

Add missing blank giving "an emulation" instead of "anemulation"
while making the text a single source line.

Signed-off-by: Alan Robinson <alan.robinson@fujitsu.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agotools/firmware/ovmf: Use OvmfXen platform file if exist and update OVMF
Anthony PERARD [Mon, 19 Jul 2021 13:48:45 +0000 (14:48 +0100)]
tools/firmware/ovmf: Use OvmfXen platform file if exist and update OVMF

A platform introduced in EDK II named OvmfXen is now the one to use for
Xen instead of OvmfX64. It comes with PVH support.

Also, the Xen support in OvmfX64 is deprecated,
    "deprecation notice: *dynamic* multi-VMM (QEMU vs. Xen) support in OvmfPkg"
    https://edk2.groups.io/g/devel/message/75498
and has been removed upstream.

We need to also update to a newer version of OVMF as OvmfXen in the
release "edk2-stable202105" doesn't work well with Xen, so we need the
fix b37cfdd28071 ("OvmfPkg/XenPlatformPei: Relocate shared_info page
mapping").

Also, don't set anymore the number of thread for parallel build when
building the newer platform, OvmfPkg/build.sh is now doing parallel
build by default.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agotools/xl: Add stubdomain_cmdline option to xl.cfg
Scott Davis [Thu, 22 Jul 2021 16:54:30 +0000 (12:54 -0400)]
tools/xl: Add stubdomain_cmdline option to xl.cfg

This adds an option to the xl domain configuration file syntax for specifying
a kernel command line for device-model stubdomains. It is intended for use with
Linux-based stubdomains.

Signed-off-by: Scott Davis <scott.davis@starlab.io>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agotools/libxc: use uint32_t for pirq in xc_domain_irq_permission
Igor Druzhinin [Tue, 13 Jul 2021 01:31:41 +0000 (02:31 +0100)]
tools/libxc: use uint32_t for pirq in xc_domain_irq_permission

Current unit8_t for pirq argument in this interface is too restrictive
causing failures on modern hardware with lots of GSIs. That extends down to
XEN_DOMCTL_irq_permission ABI structure where it needs to be fixed up
as well.

Internal Xen structures appear to be fine. Existing users of the interface
in tree (libxl, ocaml and python bindings) are currently using signed int
for pirq representation which should be wide enough. Converting them to
uint32_t now is desirable to avoid accidental passing of a negative
number (probably denoting an error code) by caller as pirq, but left for
the future clean up.

Domctl interface version is needed to be bumped with this change but that
was already done by 918b8842a8 ("arm64: Change type of hsr, cpsr, spsr_el1
to uint64_t") in this release cycle.

Additionally, take a change and convert allow_access argument to bool.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoxen/arm64: Remove READ/WRITE_SYSREG32 helper macros
Michal Orzel [Mon, 12 Jul 2021 08:53:29 +0000 (10:53 +0200)]
xen/arm64: Remove READ/WRITE_SYSREG32 helper macros

AArch64 system registers are 64bit whereas AArch32 ones
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.

The last place in code making use of READ/WRITE_SYSREG32
on arm64 is in TVM_REG macro defining functions vreg_emulate_<register>.
Implement a macro WRITE_SYSREG_SZ which expands as follows:
-on arm64: WRITE_SYSREG
-on arm32: WRITE_SYSREG{32/64}

As there are no other places in the code using these helpers
on arm64 - remove them.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agox86/hvm: Propagate real error information up through hvm_load()
Andrew Cooper [Mon, 19 Jul 2021 10:44:06 +0000 (11:44 +0100)]
x86/hvm: Propagate real error information up through hvm_load()

hvm_load() is currently a mix of -errno and -1 style error handling, which
aliases -EPERM.  This leads to the following confusing diagnostics:

From userspace:
  xc: info: Restoring domain
  xc: error: Unable to restore HVM context (1 = Operation not permitted): Internal error
  xc: error: Restore failed (1 = Operation not permitted): Internal error
  xc_domain_restore: [1] Restore failed (1 = Operation not permitted)

From Xen:
  (XEN) HVM10.0 restore: inconsistent xsave state (feat=0x2ff accum=0x21f xcr0=0x7 bv=0x3 err=-22)
  (XEN) HVM10 restore: failed to load entry 16/0

The actual error was a bad backport, but the -EINVAL got converted to -EPERM
on the way out of the hypercall.

The overwhelming majority of *_load() handlers already use -errno consistenty.
Fix up the rest to be consistent, and fix a few other errors noticed along the
way.

 * Failures of hvm_load_entry() indicate a truncated record or other bad data
   size.  Use -ENODATA.
 * Don't use {g,}dprintk().  Omitting diagnostics in release builds is rude,
   and almost everything uses unconditional printk()'s.
 * Switch some errors for more appropriate ones.

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/AMD: adjust SYSCFG, TOM, etc exposure to deal with running nested
Jan Beulich [Mon, 19 Jul 2021 10:28:50 +0000 (12:28 +0200)]
x86/AMD: adjust SYSCFG, TOM, etc exposure to deal with running nested

In the original change I neglected to consider the case of us running as
L1 under another Xen. In this case we're not Dom0, so the underlying Xen
wouldn't permit us access to these MSRs. As an immediate workaround use
rdmsr_safe(); I don't view this as the final solution though, as the
original problem the earlier change tried to address also applies when
running nested. Yet it is then unclear to me how to properly address the
issue: We shouldn't generally expose the MSR values, but handing back
zero (or effectively any other static value) doesn't look appropriate
either.

Fixes: bfcdaae9c210 ("x86/AMD: expose SYSCFG, TOM, TOM2, and IORRs to Dom0")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
3 years agolibxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl
Jan Beulich [Mon, 19 Jul 2021 10:28:09 +0000 (12:28 +0200)]
libxl/x86: check return value of SHADOW_OP_SET_ALLOCATION domctl

The hypervisor may not have enough memory to satisfy the request. While
there, make the unit of the value clear by renaming the local variable.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agostubdom: foreignmemory: Fix build after 0dbb4be739c5
Julien Grall [Tue, 13 Jul 2021 09:20:19 +0000 (10:20 +0100)]
stubdom: foreignmemory: Fix build after 0dbb4be739c5

Commit 0dbb4be739c5 add the inclusion of xenctrl.h from private.h and
wreck the build in an interesting way:

In file included from xen/stubdom/include/xen/domctl.h:39:0,
                 from xen/tools/include/xenctrl.h:36,
                 from private.h:4,
                 from minios.c:29:
xen/include/public/memory.h:407:5: error: expected specifier-qualifier-list before ‘XEN_GUEST_HANDLE_64’
     XEN_GUEST_HANDLE_64(const_uint8) buffer;
     ^~~~~~~~~~~~~~~~~~~

This is happening because xenctrl.h defines __XEN_TOOLS__ and therefore
the public headers will start to expose the non-stable ABI. However,
xen.h has already been included by a mini-OS header before hand. So
there is a mismatch in the way the headers are included.

For now solve it in a very simple (and gross) way by including
xenctrl.h before the mini-os headers.

Fixes: 0dbb4be739c5 ("tools/libs/foreignmemory: Fix PAGE_SIZE redefinition error")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoCHANGELOG: record changed PCI device quarantining default
Jan Beulich [Tue, 13 Jul 2021 08:17:33 +0000 (10:17 +0200)]
CHANGELOG: record changed PCI device quarantining default

This amends commit 980d6acf1517 ("IOMMU: make DMA containment of
quarantined devices optional").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoIOMMU: correct parsing of "quarantine=scratch-page"
Jan Beulich [Tue, 13 Jul 2021 08:16:18 +0000 (10:16 +0200)]
IOMMU: correct parsing of "quarantine=scratch-page"

During the multiple renames of the sub-option I apparently forgot to
update the left side of the &&, and this pretty consistently.

Fixes: 980d6acf1517 ("IOMMU: make DMA containment of quarantined devices optional")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agotests/xenstore: Rework Makefile
Andrew Cooper [Tue, 15 Jun 2021 15:02:29 +0000 (16:02 +0100)]
tests/xenstore: Rework Makefile

In particular, fill in the install/uninstall rules so this test can be
packaged to be automated sensibly.

This causes the code to be noticed by CI, which objects as follows:

  test-xenstore.c: In function 'main':
  test-xenstore.c:486:5: error: ignoring return value of 'asprintf', declared
  with attribute warn_unused_result [-Werror=unused-result]
       asprintf(&path, "%s/%u", TEST_PATH, getpid());
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Address the CI failure by checking the asprintf() return value and exiting.

Rename xs-test to test-xenstore to be consistent with other tests.  Honour
APPEND_FLAGS too.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotests/cpu-policy: Rework Makefile
Andrew Cooper [Tue, 15 Jun 2021 14:37:49 +0000 (15:37 +0100)]
tests/cpu-policy: Rework Makefile

In particular, fill in the install/uninstall rules so this test can be
packaged to be automated sensibly.

Rework TARGET-y to be TARGETS, drop redundant -f's for $(RM), drop the
unconditional -O3 and use the default instead, and drop CFLAGS from the link
line but honour APPEND_LDFLAGS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotests/resource: Rework Makefile
Andrew Cooper [Tue, 15 Jun 2021 14:22:11 +0000 (15:22 +0100)]
tests/resource: Rework Makefile

In particular, fill in the install/uninstall rules so this test can be
packaged to be automated sensibly.

Make all object files depend on the Makefile, drop redundant -f's for $(RM),
and use $(TARGET) when appropriate.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/tests: Drop obsolete mce-test infrastructure
Andrew Cooper [Tue, 15 Jun 2021 13:19:15 +0000 (14:19 +0100)]
tools/tests: Drop obsolete mce-test infrastructure

mce-test has a test suite, but it depends on xend, needs to run in-tree, and
requires manual setup of at least one guest, and manual parameters to pass
into cases.  Drop the test infrasturcture.

Move the one useful remaining item, xen-mceinj, into misc/, fixing some minor
style issues as it goes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/misc/xen-vmtrace: handle more signals and install by default
Tamas K Lengyel [Fri, 7 May 2021 15:28:36 +0000 (11:28 -0400)]
tools/misc/xen-vmtrace: handle more signals and install by default

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoautomation: provide pciutils in opensuse packages
Olaf Hering [Fri, 9 Jul 2021 14:32:48 +0000 (16:32 +0200)]
automation: provide pciutils in opensuse packages

qemu-xen-traditional may make use of pciutils-devel, for PCI passthrough.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoautomation: provide SDL and SDL2 in opensuse images
Olaf Hering [Fri, 9 Jul 2021 14:32:47 +0000 (16:32 +0200)]
automation: provide SDL and SDL2 in opensuse images

qemu-xen-traditional may make use of SDL, qemu-xen may make use of SDL2.
Use pkgconfig() as resolvable instead of a rpm name, the latter may change.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoautomation: add meson and ninja to tumbleweed container
Olaf Hering [Fri, 9 Jul 2021 14:06:53 +0000 (16:06 +0200)]
automation: add meson and ninja to tumbleweed container

qemu uses meson as for configuration, and requires ninja for building.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agotools/ocaml: Fix redefinition errors
Costin Lupu [Tue, 8 Jun 2021 12:35:29 +0000 (15:35 +0300)]
tools/ocaml: Fix redefinition errors

If PAGE_SIZE is already defined in the system (e.g. in /usr/include/limits.h
header) then gcc will trigger a redefinition error because of -Werror. This
patch replaces usage of PAGE_* macros with XC_PAGE_* macros in order to avoid
confusion between control domain page granularity (PAGE_* definitions) and
guest domain page granularity (which is what we are dealing with here).

Same issue applies for redefinitions of Val_none and Some_val macros which
can be already define in the OCaml system headers (e.g.
/usr/lib/ocaml/caml/mlvalues.h).

Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
Tested-by: Dario Faggioli <dfaggioli@suse.com>