]> xenbits.xensource.com Git - xen.git/log
xen.git
20 months agox86: Drop struct old_cpu_policy
Andrew Cooper [Wed, 29 Mar 2023 11:01:33 +0000 (12:01 +0100)]
x86: Drop struct old_cpu_policy

With all the complicated callers of x86_cpu_policies_are_compatible() updated
to use a single cpu_policy object, we can drop the final user of struct
old_cpu_policy.

Update x86_cpu_policies_are_compatible() to take (new) cpu_policy pointers,
reducing the amount of internal pointer chasing, and update all callers to
pass their cpu_policy objects directly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 66c5c99656314451ff9520f91cff5bb39fee9fed)

20 months agox86: Merge xc_cpu_policy's cpuid and msr objects
Andrew Cooper [Wed, 29 Mar 2023 11:37:33 +0000 (12:37 +0100)]
x86: Merge xc_cpu_policy's cpuid and msr objects

Right now, they're the same underlying type, containing disjoint information.

Use a single object instead.  Also take the opportunity to rename 'entries' to
'msrs' which is more descriptive, and more in line with nr_msrs being the
count of MSR entries in the API.

test-tsx uses xg_private.h to access the internals of xc_cpu_policy, so needs
updating at the same time.  Take the opportunity to improve the code clarity
by passing a cpu_policy rather than an xc_cpu_policy into some functions.

No practical change.  This undoes the transient doubling of storage space from
earlier patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c9985233ca663fea20fc8807cf509d2e3fef0dca)

20 months agox86: Merge a domain's {cpuid,msr} policy objects
Andrew Cooper [Wed, 29 Mar 2023 10:32:25 +0000 (11:32 +0100)]
x86: Merge a domain's {cpuid,msr} policy objects

Right now, they're the same underlying type, containing disjoint information.

Drop the d->arch.msr pointer, and union d->arch.cpuid to give it a second name
of cpu_policy in the interim.

Merge init_domain_{cpuid,msr}_policy() into a single init_domain_cpu_policy(),
moving the implementation into cpu-policy.c

No practical change.  This undoes the transient doubling of storage space from
earlier patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit bd13dae34809e61e37ba1cd5de893c5c10c46256)

20 months agox86: Merge the system {cpuid,msr} policy objects
Andrew Cooper [Wed, 29 Mar 2023 06:39:44 +0000 (07:39 +0100)]
x86: Merge the system {cpuid,msr} policy objects

Right now, they're the same underlying type, containing disjoint information.

Introduce a new cpu-policy.{h,c} to be the new location for all policy
handling logic.  Place the combined objects in __ro_after_init, which is new
since the original logic was written.

As we're trying to phase out the use of struct old_cpu_policy entirely, rework
update_domain_cpu_policy() to not pointer-chase through system_policies[].

This in turn allows system_policies[] in sysctl.c to become static and reduced
in scope to XEN_SYSCTL_get_cpu_policy.

No practical change.  This undoes the transient doubling of storage space from
earlier patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 6bc33366795d14a21a3244d0f3b63f7dccea87ef)

20 months agox86: Merge struct msr_policy into struct cpu_policy
Andrew Cooper [Tue, 28 Mar 2023 20:24:20 +0000 (21:24 +0100)]
x86: Merge struct msr_policy into struct cpu_policy

As with the cpuid side, use a temporary define to make struct msr_policy still
work.

Note, this means that domains now have two separate struct cpu_policy
allocations with disjoint information, and system policies are in a similar
position, as well as xc_cpu_policy objects in libxenguest.  All of these
duplications will be addressed in the following patches.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 03812da3754d550dd8cbee7289469069ea6f0073)

20 months agox86: Rename struct cpuid_policy to struct cpu_policy
Andrew Cooper [Tue, 28 Mar 2023 17:55:19 +0000 (18:55 +0100)]
x86: Rename struct cpuid_policy to struct cpu_policy

Also merge lib/x86/cpuid.h entirely into lib/x86/cpu-policy.h

Use a temporary define to make struct cpuid_policy still work.

There's one forward declaration of struct cpuid_policy in
tools/tests/x86_emulator/x86-emulate.h that isn't covered by the define, and
it's easier to rename that now than to rearrange the includes.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 743e530380a007774017df9dc2d8cb0659040ee3)

20 months agox86: Rename {domctl,sysctl}.cpu_policy.{cpuid,msr}_policy fields
Andrew Cooper [Tue, 28 Mar 2023 19:48:29 +0000 (20:48 +0100)]
x86: Rename {domctl,sysctl}.cpu_policy.{cpuid,msr}_policy fields

These weren't great names to begin with, and using {leaves,msrs} matches up
better with the existing nr_{leaves,msr} parameters anyway.

Furthermore, by renaming these fields we can get away with using some #define
trickery to avoid the struct {cpuid,msr}_policy merge needing to happen in a
single changeset.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 21e3ef57e0406b6b9a783f721f29df8f91a00f99)

xen: Correct comments after renaming xen_{dom,sys}ctl_cpu_policy fields

Fixes: 21e3ef57e040 ("x86: Rename {domctl,sysctl}.cpu_policy.{cpuid,msr}_policy fields")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit 6e06d229d538ea51b92dc189546c522f5e903511)

20 months agox86: Rename struct cpu_policy to struct old_cpuid_policy
Andrew Cooper [Tue, 28 Mar 2023 19:31:33 +0000 (20:31 +0100)]
x86: Rename struct cpu_policy to struct old_cpuid_policy

We want to merge struct cpuid_policy and struct msr_policy together, and the
result wants to be called struct cpu_policy.

The current struct cpu_policy, being a pair of pointers, isn't terribly
useful.  Rename the type to struct old_cpu_policy, but it will disappear
entirely once the merge is complete.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c2ec94c370f211d73f336ccfbdb32499f1b05f82)

20 months agox86/sysctl: Retrofit XEN_SYSCTL_cpu_featureset_{pv,hvm}_max
Andrew Cooper [Fri, 10 Mar 2023 19:37:56 +0000 (19:37 +0000)]
x86/sysctl: Retrofit XEN_SYSCTL_cpu_featureset_{pv,hvm}_max

Featuresets are supposed to be disappearing when the CPU policy infrastructure
is complete, but that has taken longer than expected, and isn't going to be
complete imminently either.

In the meantime, Xen does have proper default/max featuresets, and xen-cpuid
can even get them via the XEN_SYSCTL_cpu_policy_* interface, but only knows
now to render them nicely via the featureset interface.

Differences between default and max are a frequent source of errors,
frequently too in secret leading up to an embargo, so extend the featureset
sysctl to allow xen-cpuid to render them all nicely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
(cherry picked from commit 433d012c6c2737ad5a9aaa994355a4140d601852)

20 months agotools/xen-cpuid: Rework the handling of dynamic featuresets
Andrew Cooper [Fri, 10 Mar 2023 19:04:22 +0000 (19:04 +0000)]
tools/xen-cpuid: Rework the handling of dynamic featuresets

struct fsinfo is the vestigial remnant of an older internal design which
didn't survive very long.

Simplify things by inlining get_featureset() and having a single memory
allocation that gets reused.  This in turn changes featuresets[] to be a
simple list of names, so rename it to fs_names[].

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit ec3474e1dd42e6f410601f50b6e74fb7c442cfb9)

20 months agox86/cpuid: Introduce dom0-cpuid command line option
Andrew Cooper [Tue, 14 Dec 2021 16:53:36 +0000 (16:53 +0000)]
x86/cpuid: Introduce dom0-cpuid command line option

Specifically, this lets the user opt in to non-default features.

Collect all dom0 settings together in dom0_{en,dis}able_feat[], and apply it
to dom0's policy when other tweaks are being made.

As recalculate_cpuid_policy() is an expensive action, and dom0-cpuid= is
likely to only be used by the x86 maintainers for development purposes, forgo
the recalculation in the general case.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 5bd2b82df28cb7390f5ffb00fac635d0b9e36674)

20 months agox86/cpuid: Factor common parsing out of parse_xen_cpuid()
Andrew Cooper [Wed, 15 Dec 2021 16:30:25 +0000 (16:30 +0000)]
x86/cpuid: Factor common parsing out of parse_xen_cpuid()

dom0-cpuid= is going to want to reuse the common parsing loop, so factor it
out into parse_cpuid().

Irritatingly, despite being static const, the features[] array gets duplicated
each time parse_cpuid() is inlined.  As it is a large (and ever growing with
new CPU features) datastructure, move it to being file scope so all inlines
use the same single object.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 94c3df9188d6deed6fe213754492b11b9d409262)

20 months agox86/cpuid: Split dom0 handling out of init_domain_cpuid_policy()
Andrew Cooper [Wed, 15 Dec 2021 15:36:59 +0000 (15:36 +0000)]
x86/cpuid: Split dom0 handling out of init_domain_cpuid_policy()

To implement dom0-cpuid= support, the special cases would need extending.
However there is already a problem with late hwdom where the special cases
override toolstack settings, which is unintended and poor behaviour.

Introduce a new init_dom0_cpuid_policy() for the purpose, moving the ITSC and
ARCH_CAPS logic.  The is_hardware_domain() can be dropped, and for now there
is no need to rerun recalculate_cpuid_policy(); this is a relatively expensive
operation, and will become more-so over time.

Rearrange the logic in create_dom0() to make room for a call to
init_dom0_cpuid_policy().  The AMX plans for having variable sized XSAVE
states require that modifications to the policy happen before vCPUs are
created.

Additionally, factor out domid into a variable so we can be slightly more
correct in the case of a failure, and also print the error from
domain_create().  This will at least help distinguish -EINVAL from -ENOMEM.

No practical change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c17072fc164a72583fda8e2b836c71d2e3f8e84d)

20 months agolibs/vchan: Fix -Wsingle-bit-bitfield-constant-conversion
Andrew Cooper [Tue, 8 Aug 2023 13:53:42 +0000 (14:53 +0100)]
libs/vchan: Fix -Wsingle-bit-bitfield-constant-conversion

Gitlab reports:

  init.c:348:18: error: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Werror,-Wsingle-bit-bitfield-constant-conversion]
          ctrl->is_server = 1;
                          ^ ~
  1 error generated.
  make[6]: *** [/builds/xen-project/people/andyhhp/xen/tools/libs/vchan/../../../tools/Rules.mk:188: init.o] Error 1

In Xen 4.18, this was fixed with c/s 99ab02f63ea8 ("tools: convert bitfields
to unsigned type") but this is an ABI change which can't be backported.

Swich 1 for -1 to provide a minimally invasive way to fix the build.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
20 months agosubdom: Fix -Werror=address failure in tmp_emulator
Andrew Cooper [Thu, 3 Aug 2023 19:52:08 +0000 (20:52 +0100)]
subdom: Fix -Werror=address failure in tmp_emulator

The opensuse-tumbleweed build jobs currently fail with:

  /builds/xen-project/xen/stubdom/tpm_emulator-x86_64/crypto/rsa.c: In function 'rsa_private':
  /builds/xen-project/xen/stubdom/tpm_emulator-x86_64/crypto/rsa.c:56:7: error: the comparison will always evaluate as 'true' for the address of 'p' will never be NULL [-Werror=address]
     56 |   if (!key->p || !key->q || !key->u) {
        |       ^
  In file included from /builds/xen-project/xen/stubdom/tpm_emulator-x86_64/crypto/rsa.c:17:
  /builds/xen-project/xen/stubdom/tpm_emulator-x86_64/crypto/rsa.h:28:12: note: 'p' declared here
     28 |   tpm_bn_t p;
        |            ^

This is because all tpm_bn_t's are 1-element arrays (of either a GMP or
OpenSSL BIGNUM flavour).

Adjust it to compile.  No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
(cherry picked from commit 46c5ef609b09cf51d7535aebbc05816eafca4c8d)

20 months agotools: drop bogus and obsolete ptyfuncs.m4
Olaf Hering [Fri, 12 May 2023 12:26:14 +0000 (12:26 +0000)]
tools: drop bogus and obsolete ptyfuncs.m4

According to openpty(3) it is required to include <pty.h> to get the
prototypes for openpty() and login_tty(). But this is not what the
function AX_CHECK_PTYFUNCS actually does. It makes no attempt to include
the required header.

The two source files which call openpty() and login_tty() already contain
the conditionals to include the required header.

Remove the bogus m4 file to fix build with clang, which complains about
calls to undeclared functions.

Remove usage of INCLUDE_LIBUTIL_H in libxl_bootloader.c, it is already
covered by inclusion of libxl_osdep.h.

Remove usage of PTYFUNCS_LIBS in libxl/Makefile, it is already covered
by UTIL_LIBS from config/StdGNU.mk.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit 42abf5b9c53eb1b1a902002fcda68708234152c3)

21 months agoarm: Avoid using solaris syntax for .section directive
Khem Raj [Thu, 3 Aug 2023 14:34:42 +0000 (16:34 +0200)]
arm: Avoid using solaris syntax for .section directive

Assembler from binutils 2.41 will rejects ([1], [2]) the following
syntax

.section "name", #alloc

for any other any target other than ELF SPARC. This means we can't use
it in the Arm code.

So switch to the GNU syntax

.section name [, "flags"[, @type]]

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=11601
[2] https://sourceware.org/binutils/docs-2.41/as.html#Section

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
[jgrall: Reword commit message]
Acked-by: Julien Grall <jgrall@amazon.com>
master commit: dfc490a3740bb7d6889939934afadcb58891fbce
master date: 2023-08-02 22:29:52 +0100

21 months agoamd: disable C6 after 1000 days on Zen2
Roger Pau Monné [Thu, 3 Aug 2023 14:34:06 +0000 (16:34 +0200)]
amd: disable C6 after 1000 days on Zen2

As specified on Errata 1474:

"A core will fail to exit CC6 after about 1044 days after the last
system reset. The time of failure may vary depending on the spread
spectrum and REFCLK frequency."

Detect when running on AMD Zen2 and setup a timer to prevent entering
C6 after 1000 days of uptime.  Take into account the TSC value at boot
in order to account for any time elapsed before Xen has been booted.
Worst case we end up disabling C6 before strictly necessary, but that
would still be safe, and it's better than not taking the TSC value
into account and hanging.

Disable C6 by updating the MSR listed in the revision guide, this
avoids applying workarounds in the CPU idle drivers, as the processor
won't be allowed to enter C6 by the hardware itself.

Print a message once C6 is disabled in order to let the user know.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f7065b24f4fb8813a896b883e6ffd03d67f8a8f2
master date: 2023-07-31 15:05:48 +0200

21 months agotools/xenstore: fix XSA-417 patch
Juergen Gross [Thu, 3 Aug 2023 14:33:43 +0000 (16:33 +0200)]
tools/xenstore: fix XSA-417 patch

The fix for XSA-417 had a bug: domain_alloc_permrefs() will not return
a negative value in case of an error, but a plain errno value.

Note this is not considered to be a security issue, as the only case
where domain_alloc_permrefs() will return an error is a failed memory
allocation. As a guest should not be able to drive Xenstore out of
memory, this is NOT a problem a guest can trigger at will.

Fixes: ab128218225d ("tools/xenstore: fix checking node permissions")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
master commit: 0c53c638e16278078371ce028c74693841d7738a
master date: 2023-07-21 08:32:43 +0200

21 months agox86: fix early boot output
Jan Beulich [Thu, 3 Aug 2023 14:32:28 +0000 (16:32 +0200)]
x86: fix early boot output

Loading the VGA base address involves sym_esi(), i.e. %esi still needs
to hold the relocation base address. Therefore the address of the
message to output cannot be "passed" in %esi. Put the message offset in
%ecx instead, adding it into %esi _after_ its last use as base address.

Fixes: b28044226e1c ("x86: make Xen early boot code relocatable")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: b1c16800e52743d9afd9af62c810f03af16dd942
master date: 2023-07-19 10:22:56 +0200

21 months agoxen/arm: Add Cortex-A77 erratum 1508412 handling
Luca Fancellu [Mon, 17 Jul 2023 12:25:46 +0000 (13:25 +0100)]
xen/arm: Add Cortex-A77 erratum 1508412 handling

Cortex-A77 cores (r0p0, r1p0) could deadlock on a sequence of a
store-exclusive or read of PAR_EL1 and a load with device or non-cacheable
memory attributes.
A workaround is available, but it depends on a firmware counterpart.

The proposed workaround from the errata document is to modify the software
running at EL1 and above to include a DMB SY before and after accessing
PAR_EL1.

In conjunction to the above, the firmware needs to use a specific write
sequence to several IMPLEMENTATION DEFINED registers to have the hardware
insert a DMB SY after all load-exclusive and store-exclusive instructions.

Apply the workaround to Xen where PAR_EL1 is read, implementing an helper
function to do that.
Since Xen can be interrupted by irqs in any moment, add a barrier on
entry/exit when we are running on the affected cores.

A guest without the workaround can deadlock the system, so warn the users
of Xen with the above type of cores to use only trusted guests, by
printing a message on Xen startup.

This is XSA-436 / CVE-2023-34320.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
[stefano: add XSA-436 to commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
21 months agox86/amd: Fix DE_CFG truncation in amd_check_zenbleed()
Andrew Cooper [Fri, 28 Jul 2023 17:42:12 +0000 (18:42 +0100)]
x86/amd: Fix DE_CFG truncation in amd_check_zenbleed()

This line:

val &= ~chickenbit;

ends up truncating val to 32 bits, and turning off various errata workarounds
in Zen2 systems.

Fixes: f91c5ea97067 ("x86/amd: Mitigations for Zenbleed")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c0dd53b8cbd1e47e9c89873a9265a7170bdc6b4c)

21 months agox86/amd: Mitigations for Zenbleed
Andrew Cooper [Mon, 22 May 2023 22:03:00 +0000 (23:03 +0100)]
x86/amd: Mitigations for Zenbleed

Zenbleed is a malfunction on AMD Zen2 uarch parts which results in corruption
of the vector registers.  An attacker can trigger this bug deliberately in
order to access stale data in the physical vector register file.  This can
include data from sibling threads, or a higher-privilege context.

Microcode is the preferred mitigation but in the case that's not available use
the chickenbit as instructed by AMD.  Re-evaluate the mitigation on late
microcode load too.

This is XSA-433 / CVE-2023-20593.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
(cherry picked from commit f91c5ea970675637721bb7f18adaa189837eb783)

21 months agoupdate qemuu tag
Jan Beulich [Tue, 18 Jul 2023 08:26:07 +0000 (10:26 +0200)]
update qemuu tag

21 months agotools: Remove the use of K&R functions
Andrew Cooper [Tue, 18 Jul 2023 08:25:20 +0000 (10:25 +0200)]
tools: Remove the use of K&R functions

Clang-15 (as seen in the FreeBSD 14 tests) complains:

  xg_main.c:1248 error: a function declaration without a
  prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
  xg_init()
         ^
          void

The error message is a bit confusing but appears to new as part of
-Wdeprecated-non-prototype which is part of supporting C2x which formally
removes K&R syntax.

Either way, fix the identified function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: e2312e41f05c0f2e3b714710bd2551a3cd74cedd
master date: 2023-02-17 11:01:54 +0000

21 months agoxen/x86: Remove the use of K&R functions
Andrew Cooper [Mon, 17 Jul 2023 07:41:22 +0000 (09:41 +0200)]
xen/x86: Remove the use of K&R functions

Clang-15 (as seen in the FreeBSD 14 tests) complains:

  arch/x86/time.c:1364:20: error: a function declaration without a
  prototype is deprecated in all versions of C [-Werror,-Wstrict-prototypes]
  s_time_t get_s_time()
                     ^
                      void

The error message is a bit confusing but appears to new as part of
-Wdeprecated-non-prototype which is part of supporting C2x which formally
removes K&R syntax.

Either way, fix the identified functions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Backport: Also deal with powernow_register_driver() and flush_all_cache().

master commit: 22b2fa4766728c3057757c00e79da5f7803fff33
master date: 2023-02-17 11:01:54 +0000

21 months agoiommu/amd-vi: fix checking for Invalidate All support in amd_iommu_resume()
Roger Pau Monné [Mon, 17 Jul 2023 06:34:33 +0000 (08:34 +0200)]
iommu/amd-vi: fix checking for Invalidate All support in amd_iommu_resume()

The iommu local variable does not point to to a valid amd_iommu element
after the call to for_each_amd_iommu().  Instead check whether any IOMMU
on the system doesn't support Invalidate All in order to perform the
per-domain and per-device flushes.

Fixes: 9c46139de889 ('amd iommu: Support INVALIDATE_IOMMU_ALL command.')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 5ecbb779748a56495f2c892f0610d57dd623c7cd
master date: 2023-06-13 14:41:32 +0200

21 months agox86/microcode: Add missing unlock in microcode_update_helper()
Alejandro Vallejo [Mon, 17 Jul 2023 06:34:11 +0000 (08:34 +0200)]
x86/microcode: Add missing unlock in microcode_update_helper()

microcode_update_helper() may return early while holding
cpu_add_remove_lock, hence preventing any writers from taking it again.

Leave through the `put` label instead so it's properly released.

Fixes: 5ed12565aa32 ("microcode: rendezvous CPUs in NMI handler and load ucode")
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: b35b22acb887f682efe8385b3df165220bc84c86
master date: 2023-06-05 16:11:10 +0100

21 months agovpci/header: cope with devices not having vpci allocated
Roger Pau Monné [Mon, 17 Jul 2023 06:32:34 +0000 (08:32 +0200)]
vpci/header: cope with devices not having vpci allocated

When traversing the list of pci devices assigned to a domain cope with
some of them not having the vpci struct allocated. It should be
possible for the hardware domain to have read-only devices assigned
that are not handled by vPCI, such support will be added by further
patches.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ee045f3a4a6dddb09f5aa96a50cceaae97d3245f
master date: 2023-05-26 09:18:37 +0200

21 months agotools: convert bitfields to unsigned type
Olaf Hering [Mon, 17 Jul 2023 06:32:19 +0000 (08:32 +0200)]
tools: convert bitfields to unsigned type

clang complains about the signed type:

implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Backport: Dropped the libxenvchan change, for the original commit saying

"The potential ABI change in libxenvchan is covered by the Xen version
 based SONAME."

which won't hold on stable trees.
master commit: 99ab02f63ea813f2e467a39a7736bf460a3f3495
master date: 2023-05-16 20:03:02 +0100

23 months agopci: fix pci_get_pdev_by_domain() to always account for the segment
Roger Pau Monné [Tue, 23 May 2023 13:03:41 +0000 (15:03 +0200)]
pci: fix pci_get_pdev_by_domain() to always account for the segment

When a domain parameter is provided to pci_get_pdev_by_domain() the
search function would match against bus and devfn, without taking the
segment into account.

Fix this and also account for the passed segment.

Fixes: 8cf6e0738906 ('PCI: simplify (and thus correct) pci_get_pdev{,_by_domain}()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: c7908869ac26961a3919491705e521179ad3fc0e
master date: 2023-05-22 16:11:55 +0200

23 months agosched/null: avoid crash after failed domU creation
Stewart Hildebrand [Tue, 23 May 2023 13:03:19 +0000 (15:03 +0200)]
sched/null: avoid crash after failed domU creation

When creating a domU, but the creation fails, there is a corner case that may
lead to a crash in the null scheduler when running a debug build of Xen.

(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'npc->unit == unit' failed at common/sched/null.c:379
(XEN) ****************************************

The events leading to the crash are:

* null_unit_insert() was invoked with the unit offline. Since the unit was
  offline, unit_assign() was not called, and null_unit_insert() returned.
* Later during domain creation, the unit was onlined
* Eventually, domain creation failed due to bad configuration
* null_unit_remove() was invoked with the unit still online. Since the unit was
  online, it called unit_deassign() and triggered an ASSERT.

To fix this, only call unit_deassign() when npc->unit is non-NULL in
null_unit_remove.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
master commit: c2eae2614c8f04e384cd3334c3f06f31a6cb5f41
master date: 2023-05-22 16:11:40 +0200

23 months agoiommu/amd-vi: fix assert comparing boolean to enum
Roger Pau Monné [Tue, 23 May 2023 13:02:50 +0000 (15:02 +0200)]
iommu/amd-vi: fix assert comparing boolean to enum

Or else when iommu_intremap is set to iommu_intremap_full the assert
triggers.

Fixes: 1ba66a870eba ('AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 4c507d8a6b6e8be90881a335b0a66eb28e0f7737
master date: 2023-05-12 09:35:36 +0200

23 months agodocs/man: fix xenstore-write synopsis
Yann Dirson [Tue, 23 May 2023 13:02:34 +0000 (15:02 +0200)]
docs/man: fix xenstore-write synopsis

Reported-by: zithro <slack@rabbit.lu>
Signed-off-by: Yann Dirson <yann.dirson@vates.fr>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 8b1ac353b4db7c5bb2f82cb6afee9cc641e756a4
master date: 2023-05-09 10:37:29 +0100

23 months agons16550: enable memory decoding on MMIO-based PCI console card
Marek Marczykowski-Górecki [Tue, 23 May 2023 13:02:09 +0000 (15:02 +0200)]
ns16550: enable memory decoding on MMIO-based PCI console card

pci_serial_early_init() enables PCI_COMMAND_IO for IO-based UART
devices, add setting PCI_COMMAND_MEMORY for MMIO-based UART devices too.
Note the MMIO-based devices in practice need a "pci" sub-option,
otherwise a few parameters are not initialized (including bar_idx,
reg_shift, reg_width etc). The "pci" is not supposed to be used with
explicit BDF, so do not key setting PCI_COMMAND_MEMORY on explicit BDF
being set. Contrary to the IO-based UART, pci_serial_early_init() will
not attempt to set BAR0 address, even if user provided io_base manually
- in most cases, those are with an offest and the current cmdline syntax
doesn't allow expressing it. Due to this, enable PCI_COMMAND_MEMORY only
if uart->bar is already populated. In similar spirit, this patch does
not support setting BAR0 of the bridge.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: a16fb78515d54be95f81c0d1c0a3a7b954a54d0a
master date: 2023-05-08 14:15:38 +0200

23 months agotools/libs/guest: assist gcc13's realloc analyzer
Olaf Hering [Tue, 23 May 2023 13:01:53 +0000 (15:01 +0200)]
tools/libs/guest: assist gcc13's realloc analyzer

gcc13 fails to track the allocated memory in backup_ptes:

xg_offline_page.c: In function 'backup_ptes':
xg_offline_page.c:191:13: error: pointer 'orig' may be used after 'realloc' [-Werror=use-after-free]
  191 |             free(orig);

Assist the analyzer by slightly rearranging the code:
In case realloc succeeds, the previous allocation is either extended
or released internally. In case realloc fails, the previous allocation
is left unchanged. Return an error in this case, the caller will
release the currently allocated memory in its error path.

http://bugzilla.suse.com/show_bug.cgi?id=1210570

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Compile-tested-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 99a9c3d7141063ae3f357892c6181cfa3be8a280
master date: 2023-05-03 15:06:41 +0200

23 months agox86/mm: replace bogus assertion in paging_log_dirty_op()
Jan Beulich [Tue, 23 May 2023 13:01:24 +0000 (15:01 +0200)]
x86/mm: replace bogus assertion in paging_log_dirty_op()

While I was the one to introduce it, I don't think it is correct: A
bogus continuation call issued by a tool stack domain may find another
continuation in progress. IOW we've been asserting caller controlled
state (which is reachable only via a domctl), and the early (lock-less)
check in paging_domctl() helps in a limited way only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 0956aa2219745a198bb6a0a99e2108a3c09b280e
master date: 2023-05-03 13:38:30 +0200

23 months agoxen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM
Juergen Gross [Tue, 23 May 2023 13:00:59 +0000 (15:00 +0200)]
xen/sysctl: fix XEN_SYSCTL_getdomaininfolist handling with XSM

In case XSM is active, the handling of XEN_SYSCTL_getdomaininfolist
can fail if the last domain scanned isn't allowed to be accessed by
the calling domain (i.e. xsm_getdomaininfo(XSM_HOOK, d) is failing).

Fix that by just ignoring scanned domains where xsm_getdomaininfo()
is returning an error, like it is effectively done when such a
situation occurs for a domain not being the last one scanned.

Fixes: d046f361dc93 ("Xen Security Modules: XSM")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b033eddc9779109c06a26936321d27a2ef4e088b
master date: 2023-05-02 12:04:58 +0200

23 months agox86/msi: clear initial MSI-X state on boot
Marek Marczykowski-Górecki [Tue, 23 May 2023 13:00:30 +0000 (15:00 +0200)]
x86/msi: clear initial MSI-X state on boot

Some firmware/devices are found to not reset MSI-X properly, leaving
MASKALL set. Jason reports on his machine MASKALL persists through a
warm reboot, but is cleared on cold boot. Xen relies on initial state
being MASKALL clear. Especially, pci_reset_msix_state() assumes if
MASKALL is set, it was Xen setting it due to msix->host_maskall or
msix->guest_maskall. Clearing just MASKALL is risky if ENABLE is set,
so clear them both.

Reported-by: Jason Andryuk <jandryuk@gmail.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
master commit: 913751d7af6e78d65c1e2adf4887193c827f0c5e
master date: 2023-04-25 12:16:17 +0200

23 months agox86/extable: hide use of negative offset from array start
Jan Beulich [Tue, 23 May 2023 13:00:05 +0000 (15:00 +0200)]
x86/extable: hide use of negative offset from array start

In COVERAGE=y but DEBUG=n builds (observed by randconfig testing) gcc12
takes issue with the subtraction of 1 from __stop___pre_ex_table[],
considering this an out of bounds access. Not being able to know that
the symbol actually marks the end of an array, the compiler is kind of
right with this diagnosis. Move the subtraction into the function.

Reported-by: Anthony Perard <anthony.perard@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 353b8cc56862dd808b75c6c96cd780cfee8f28bc
master date: 2023-02-22 13:50:20 +0100

23 months agoupdate Xen version to 4.16.5-pre
Jan Beulich [Tue, 23 May 2023 12:58:17 +0000 (14:58 +0200)]
update Xen version to 4.16.5-pre

2 years agoupdate Xen version to 4.16.4 RELEASE-4.16.4
Jan Beulich [Thu, 27 Apr 2023 12:54:26 +0000 (14:54 +0200)]
update Xen version to 4.16.4

2 years agoautomation: Remove installation of packages from test scripts
Michal Orzel [Tue, 25 Apr 2023 07:22:59 +0000 (09:22 +0200)]
automation: Remove installation of packages from test scripts

Now, when these packages are already installed in the respective
containers, we can remove them from the test scripts.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 72cfe1c3ad1fae95f4f0ac51dbdd6838264fdd7f
master date: 2022-12-09 14:55:33 -0800

2 years agoxen/ELF: Fix ELF32 PRI formatters
Andrew Cooper [Mon, 24 Apr 2023 11:06:28 +0000 (13:06 +0200)]
xen/ELF: Fix ELF32 PRI formatters

It is rude to hide width formatting inside a PRI* macro, doubly so when it's
only in one bitness of the macro.

However its fully buggy when all the users use %#"PRI because then it expands
to the common trap of %#08x which does not do what the author intends.

Switch the 32bit ELF PRI formatters to use plain integer PRI's, just like on
the 64bit side already.  No practical change.

Fixes: 7597fabca76e ("livepatch: Include sizes when an mismatch occurs")
Fixes: 380b229634f8 ("xsplice: Implement payload loading")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: cfa2bb82c01f0c656804cedd8f44eb2a99a2b5bc
master date: 2023-04-19 15:55:29 +0100

2 years agox86/livepatch: Fix livepatch application when CET is active
Andrew Cooper [Mon, 24 Apr 2023 11:05:52 +0000 (13:05 +0200)]
x86/livepatch: Fix livepatch application when CET is active

Right now, trying to apply a livepatch on any system with CET shstk (AMD Zen3
or later, Intel Tiger Lake or Sapphire Rapids and later) fails as follows:

  (XEN) livepatch: lp: Verifying enabled expectations for all functions
  (XEN) common/livepatch.c:1591: livepatch: lp: timeout is 30000000ns
  (XEN) common/livepatch.c:1703: livepatch: lp: CPU28 - IPIing the other 127 CPUs
  (XEN) livepatch: lp: Applying 1 functions
  (XEN) hi_func: Hi! (called 1 times)
  (XEN) Hook executing.
  (XEN) Assertion 'local_irq_is_enabled() || cpumask_subset(mask, cpumask_of(cpu))' failed at arch/x86/smp.c:265
  (XEN) *** DOUBLE FAULT ***
  <many double faults>

The assertion failure is from a global (system wide) TLB flush initiated by
modify_xen_mappings().  I'm not entirely sure when this broke, and I'm not
sure exactly what causes the #DF's, but it doesn't really matter either
because they highlight a latent bug that I'd overlooked with the CET-SS vs
patching work the first place.

While we're careful to arrange for the patching CPU to avoid encountering
non-shstk memory with transient shstk perms, other CPUs can pick these
mappings up too if they need to re-walk for uarch reasons.

Another bug is that for livepatching, we only disable CET if shadow stacks are
in use.  Running on Intel CET systems when Xen is only using CET-IBT will
crash in arch_livepatch_quiesce() when trying to clear CR0.WP with CR4.CET
still active.

Also, we never went and cleared the dirty bits on .rodata.  This would
matter (for the same reason it matters on .text - it becomes a valid target
for WRSS), but we never actually patch .rodata anyway.

Therefore rework how we do patching for both alternatives and livepatches.

Introduce modify_xen_mappings_lite() with a purpose similar to
modify_xen_mappings(), but stripped down to the bare minimum as it's used in
weird contexts.  Leave all complexity to the caller to handle.

Instead of patching by clearing CR0.WP (and having to jump through some
fragile hoops to disable CET in order to do this), just transiently relax the
permissions on .text via l2_identmap[].

Note that neither alternatives nor livepatching edit .rodata, so we don't need
to relax those permissions at this juncture.

The perms are relaxed globally, but this is safe enough.  Alternatives run
before we boot APs, and Livepatching runs in a quiesced state where the other
CPUs are not doing anything interesting.

This approach is far more robust.

Fixes: 48cdc15a424f ("x86/alternatives: Clear CR4.CET when clearing CR0.WP")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
master commit: 8676092a0f16ca6ad188d3fb270784a2caecf542
master date: 2023-04-18 20:20:26 +0100

2 years agox86/hvm: Disallow disabling paging in 64bit mode
Andrew Cooper [Mon, 24 Apr 2023 11:05:24 +0000 (13:05 +0200)]
x86/hvm: Disallow disabling paging in 64bit mode

The Long Mode consistency checks exist to "ensure that the processor does not
enter an undefined mode or state that results in unpredictable behavior".  APM
Vol2 Table 14-5 "Long-Mode Consistency Checks" lists them, but there is no row
preventing the OS from trying to exit Long mode while in 64bit mode.  This
could leave the CPU in Protected Mode with an %rip above the 4G boundary.

Experimentally, AMD CPUs really do permit this state transition.  An OS which
tries it hits an instant SHUTDOWN, even in cases where the truncation I expect
to be going on behind the scenes ought to result in sane continued execution.

Furthermore, right from the very outset, the APM Vol2 14.7 "Leaving Long Mode"
section instructs peoples to switch to a compatibility mode segment first
before clearing CR0.PG, which does clear out the upper bits in %rip.  This is
further backed up by Vol2 Figure 1-6 "Operating Modes of the AMD64
Architecture".

Either way, this appears to have been a genuine oversight in the AMD64 spec.

Intel, on the other hand, rejects this state transition with #GP.

Between revision 71 (Nov 2019) and 72 (May 2020) of SDM Vol3, a footnote to
4.1.2 "Paging-Mode Enable" was altered from

  If CR4.PCIDE= 1, an attempt to clear CR0.PG causes a general-protection
  exception (#GP); software should clear CR4.PCIDE before attempting to
  disable paging.

to

  If the logical processor is in 64-bit mode or if CR4.PCIDE= 1, an attempt to
  clear CR0.PG causes a general-protection exception (#GP). Software should
  transition to compatibility mode and clear CR4.PCIDE before attempting to
  disable paging.

which acknowledges this corner case, but there doesn't appear to be any other
discussion even in the relevant Long Mode sections.

So it appears that Intel spotted and addressed the corner case in IA-32e mode,
but were 15 years late to document it.

Xen was written to the AMD spec, and misses the check.  Follow the Intel
behaviour, because it is more sensible and avoids hitting a VMEntry failure.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
master commit: 18c128ba66e6308744850aca96dbffd18f91c29b
master date: 2023-04-14 18:18:20 +0100

2 years agox86emul: pull permission check ahead for REP INS/OUTS
Jan Beulich [Mon, 24 Apr 2023 11:03:59 +0000 (13:03 +0200)]
x86emul: pull permission check ahead for REP INS/OUTS

Based on observations on a fair range of hardware from both primary
vendors even zero-iteration-count instances of these insns perform the
port related permission checking first.

Fixes: fe300600464c ("x86: Fix emulation of REP prefix")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: f41c88a6fca59f99a2eb5e7ed3d90ab7bca08b1b
master date: 2023-03-30 13:07:16 +0200

2 years agotools/xenstore: fix quota check in transaction_fix_domains()
Juergen Gross [Mon, 24 Apr 2023 11:03:36 +0000 (13:03 +0200)]
tools/xenstore: fix quota check in transaction_fix_domains()

Today when finalizing a transaction the number of node quota is checked
to not being exceeded after the transaction. This check is always done,
even if the transaction is being performed by a privileged connection,
or if there were no nodes created in the transaction.

Correct that by checking quota only if:
- the transaction is being performed by an unprivileged guest, and
- at least one node was created in the transaction

Reported-by: Julien Grall <julien@xen.org>
Fixes: f2bebf72c4d5 ("xenstore: rework of transaction handling")
Signed-off-by: Juergen Gross <jgross@suse.com>
master commit: f6b801c36bd5e4ab22a9f80c8d57121b62b139af
master date: 2023-03-29 22:02:36 +0100

2 years agoCI: Remove llvm-8 from the Debian Stretch container
Andrew Cooper [Fri, 24 Mar 2023 17:59:56 +0000 (17:59 +0000)]
CI: Remove llvm-8 from the Debian Stretch container

For similar reasons to c/s a6b1e2b80fe20.  While this container is still
build-able for now, all the other problems with explicitly-versioned compilers
remain.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7a298375721636290a57f31bb0f7c2a5a38956a4)

2 years agoautomation: Remove non-debug x86_32 build jobs
Anthony PERARD [Fri, 24 Feb 2023 17:29:15 +0000 (17:29 +0000)]
automation: Remove non-debug x86_32 build jobs

In the interest of having less jobs, we remove the x86_32 build jobs
that do release build. Debug build is very likely to be enough to find
32bit build issues.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit 7b66792ea7f77fb9e587e1e9c530a7c869eecba1)

2 years agoautomation: Remove CentOS 7.2 containers and builds
Anthony PERARD [Tue, 21 Feb 2023 16:55:36 +0000 (16:55 +0000)]
automation: Remove CentOS 7.2 containers and builds

We already have a container which track the latest CentOS 7, no need
for this one as well.

Also, 7.2 have outdated root certificate which prevent connection to
website which use Let's Encrypt.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit ba512629f76dfddb39ea9133ee51cdd9e392a927)

2 years agoautomation: Switch arm32 cross builds to run on arm64
Michal Orzel [Tue, 14 Feb 2023 15:38:38 +0000 (16:38 +0100)]
automation: Switch arm32 cross builds to run on arm64

Due to the limited x86 CI resources slowing down the whole pipeline,
switch the arm32 cross builds to be executed on arm64 which is much more
capable. For that, rename the existing debian container dockerfile
from unstable-arm32-gcc to unstable-arm64v8-arm32-gcc and use
arm64v8/debian:unstable as an image. Note, that we cannot use the same
container name as we have to keep the backwards compatibility.
Take the opportunity to remove extra empty line at the end of a file.

Modify the tag of .arm32-cross-build-tmpl to arm64 and update the build
jobs accordingly.

Signed-off-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit a35fccc8df93de7154dba87db6e7bcf391e9d51c)

2 years agoCI: Drop automation/configs/
Andrew Cooper [Thu, 29 Dec 2022 15:39:13 +0000 (15:39 +0000)]
CI: Drop automation/configs/

Having 3 extra hypervisor builds on the end of a full build is deeply
confusing to debug if one of them fails, because the .config file presented in
the artefacts is not the one which caused a build failure.  Also, the log
tends to be truncated in the UI.

PV-only is tested as part of PV-Shim in a full build anyway, so doesn't need
repeating.  HVM-only and neither appear frequently in randconfig, so drop all
the logic here to simplify things.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
(cherry picked from commit 7b20009a812f26e74bdbde2ab96165376b3dad34)

2 years agobump default SeaBIOS version to 1.16.0
Jan Beulich [Fri, 6 May 2022 12:46:52 +0000 (14:46 +0200)]
bump default SeaBIOS version to 1.16.0

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 944e389daa133dd310d87c4eebacba9f6da76018)

2 years agons16550: correct name/value pair parsing for PCI port/bridge
Jan Beulich [Fri, 31 Mar 2023 06:42:02 +0000 (08:42 +0200)]
ns16550: correct name/value pair parsing for PCI port/bridge

First of all these were inverted: "bridge=" caused the port coordinates
to be established, while "port=" controlled the bridge coordinates. And
then the error messages being identical also wasn't helpful. While
correcting this also move both case blocks close together.

Fixes: 97fd49a7e074 ("ns16550: add support for UART parameters to be specifed with name-value pairs")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: e692b22230b411d762ac9e278a398e28df474eae
master date: 2023-03-29 14:55:37 +0200

2 years agovpci/msix: handle accesses adjacent to the MSI-X table
Roger Pau Monné [Fri, 31 Mar 2023 06:41:27 +0000 (08:41 +0200)]
vpci/msix: handle accesses adjacent to the MSI-X table

The handling of the MSI-X table accesses by Xen requires that any
pages part of the MSI-X related tables are not mapped into the domain
physmap.  As a result, any device registers in the same pages as the
start or the end of the MSIX or PBA tables is not currently
accessible, as the accesses are just dropped.

Note the spec forbids such placing of registers, as the MSIX and PBA
tables must be 4K isolated from any other registers:

"If a Base Address register that maps address space for the MSI-X
Table or MSI-X PBA also maps other usable address space that is not
associated with MSI-X structures, locations (e.g., for CSRs) used in
the other address space must not share any naturally aligned 4-KB
address range with one where either MSI-X structure resides."

Yet the 'Intel Wi-Fi 6 AX201' device on one of my boxes has registers
in the same page as the MSIX tables, and thus won't work on a PVH dom0
without this fix.

In order to cope with the behavior passthrough any accesses that fall
on the same page as the MSIX tables (but don't fall in between) to the
underlying hardware.  Such forwarding also takes care of the PBA
accesses, so it allows to remove the code doing this handling in
msix_{read,write}.  Note that as a result accesses to the PBA array
are no longer limited to 4 and 8 byte sizes, there's no access size
restriction for PBA accesses documented in the specification.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
vpci/msix: restore PBA access length and alignment restrictions

Accesses to the PBA array have the same length and alignment
limitations as accesses to the MSI-X table:

"For all accesses to MSI-X Table and MSI-X PBA fields, software must
use aligned full DWORD or aligned full QWORD transactions; otherwise,
the result is undefined."

Introduce such length and alignment checks into the handling of PBA
accesses for vPCI.  This was a mistake of mine for not reading the
specification correctly.

Note that accesses must now be aligned, and hence there's no longer a
need to check that the end of the access falls into the PBA region as
both the access and the region addresses must be aligned.

Fixes: b177892d2d ('vpci/msix: handle accesses adjacent to the MSI-X table')
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b177892d2d0e8a31122c218989f43130aeba5282
master date: 2023-03-28 14:20:35 +0200
master commit: 7a502b4fbc339e9d3d3d45fb37f09da06bc3081c
master date: 2023-03-29 14:56:33 +0200

2 years agox86/ucode: Fix error paths control_thread_fn()
Andrew Cooper [Fri, 31 Mar 2023 06:40:56 +0000 (08:40 +0200)]
x86/ucode: Fix error paths control_thread_fn()

These two early exits skipped re-enabling the watchdog, restoring the NMI
callback, and clearing the nmi_patch global pointer.  Always execute the tail
of the function on the way out.

Fixes: 8dd4dfa92d62 ("x86/microcode: Synchronize late microcode loading")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: fc2e1f3aad602a66c14b8285a1bd38a82f8fd02d
master date: 2023-03-28 11:57:56 +0100

2 years agox86/vmx: Don't spuriously crash the domain when INIT is received
Andrew Cooper [Fri, 31 Mar 2023 06:40:27 +0000 (08:40 +0200)]
x86/vmx: Don't spuriously crash the domain when INIT is received

In VMX operation, the handling of INIT IPIs is changed.  Instead of the CPU
resetting, the next VMEntry fails with EXIT_REASON_INIT.  From the TXT spec,
the intent of this behaviour is so that an entity which cares can scrub
secrets from RAM before participating in an orderly shutdown.

Right now, Xen's behaviour is that when an INIT arrives, the HVM VM which
schedules next is killed (citing an unknown VMExit), *and* we ignore the INIT
and continue blindly onwards anyway.

This patch addresses only the first of these two problems by ignoring the INIT
and continuing without crashing the VM in question.

The second wants addressing too, just as soon as we've figured out something
better to do...

Discovered as collateral damage from when an AP triple faults on S3 resume on
Intel TigerLake platforms.

Link: https://github.com/QubesOS/qubes-issues/issues/7283
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: b1f11273d5a774cc88a3685c96c2e7cf6385e3b6
master date: 2023-03-24 22:49:58 +0000

2 years agox86/shadow: Fix build with no PG_log_dirty
Andrew Cooper [Fri, 31 Mar 2023 06:39:49 +0000 (08:39 +0200)]
x86/shadow: Fix build with no PG_log_dirty

Gitlab Randconfig found:

  arch/x86/mm/shadow/common.c: In function 'shadow_prealloc':
  arch/x86/mm/shadow/common.c:1023:18: error: implicit declaration of function
      'paging_logdirty_levels'; did you mean 'paging_log_dirty_init'? [-Werror=implicit-function-declaration]
   1023 |         count += paging_logdirty_levels();
        |                  ^~~~~~~~~~~~~~~~~~~~~~
        |                  paging_log_dirty_init
  arch/x86/mm/shadow/common.c:1023:18: error: nested extern declaration of 'paging_logdirty_levels' [-Werror=nested-externs]

The '#if PG_log_dirty' expression is currently SHADOW_PAGING && !HVM &&
PV_SHIM_EXCLUSIVE.  Move the declaration outside.

Fixes: 33fb3a661223 ("x86/shadow: account for log-dirty mode when pre-allocating")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6d14cb105b1c54ad7b4228d858ae85aa8a672bbd
master date: 2023-03-24 12:16:31 +0000

2 years agox86/nospec: Fix evaluate_nospec() code generation under Clang
Andrew Cooper [Fri, 31 Mar 2023 06:39:32 +0000 (08:39 +0200)]
x86/nospec: Fix evaluate_nospec() code generation under Clang

It turns out that evaluate_nospec() code generation is not safe under Clang.
Given:

  void eval_nospec_test(int x)
  {
      if ( evaluate_nospec(x) )
          asm volatile ("nop #true" ::: "memory");
      else
          asm volatile ("nop #false" ::: "memory");
  }

Clang emits:

  <eval_nospec_test>:
         0f ae e8                lfence
         85 ff                   test   %edi,%edi
         74 02                   je     <eval_nospec_test+0x9>
         90                      nop
         c3                      ret
         90                      nop
         c3                      ret

which is not safe because the lfence has been hoisted above the conditional
jump.  Clang concludes that both barrier_nospec_true()'s have identical side
effects and can safely be merged.

Clang can be persuaded that the side effects are different if there are
different comments in the asm blocks.  This is fragile, but no more fragile
that other aspects of this construct.

Introduce barrier_nospec_false() with a separate internal comment to prevent
Clang merging it with barrier_nospec_true() despite the otherwise-identical
content.  The generated code now becomes:

  <eval_nospec_test>:
         85 ff                   test   %edi,%edi
         74 05                   je     <eval_nospec_test+0x9>
         0f ae e8                lfence
         90                      nop
         c3                      ret
         0f ae e8                lfence
         90                      nop
         c3                      ret

which has the correct number of lfence's, and in the correct place.

Link: https://github.com/llvm/llvm-project/issues/55084
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: bc3c133841435829ba5c0a48427e2a77633502ab
master date: 2023-03-24 12:16:31 +0000

2 years agox86/shadow: fix and improve sh_page_has_multiple_shadows()
Jan Beulich [Fri, 31 Mar 2023 06:38:42 +0000 (08:38 +0200)]
x86/shadow: fix and improve sh_page_has_multiple_shadows()

While no caller currently invokes the function without first making sure
there is at least one shadow [1], we'd better eliminate UB here:
find_first_set_bit() requires input to be non-zero to return a well-
defined result.

Further, using find_first_set_bit() isn't very efficient in the first
place for the intended purpose.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[1] The function has exactly two uses, and both are from OOS code, which
    is HVM-only. For HVM (but not for PV) sh_mfn_is_a_page_table(),
    guarding the call to sh_unsync(), guarantees at least one shadow.
    Hence even if sh_page_has_multiple_shadows() returned a bogus value
    when invoked for a PV domain, the subsequent is_hvm_vcpu() and
    oos_active checks (the former being redundant with the latter) will
    compensate. (Arguably that oos_active check should come first, for
    both clarity and efficiency reasons.)
master commit: 2896224a4e294652c33f487b603d20bd30955f21
master date: 2023-03-24 11:07:08 +0100

2 years agoVT-d: fix iommu=no-igfx if the IOMMU scope contains fake device(s)
Marek Marczykowski-Górecki [Fri, 31 Mar 2023 06:38:07 +0000 (08:38 +0200)]
VT-d: fix iommu=no-igfx if the IOMMU scope contains fake device(s)

If the scope for IGD's IOMMU contains additional device that doesn't
actually exist, iommu=no-igfx would not disable that IOMMU. In this
particular case (Thinkpad x230) it included 00:02.1, but there is no
such device on this platform. Consider only existing devices for the
"gfx only" check as well as the establishing of IGD DRHD address
(underlying is_igd_drhd(), which is used to determine applicability of
two workarounds).

Fixes: 2d7f191b392e ("VT-d: generalize and correct "iommu=no-igfx" handling")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: 49de6749baa8d0addc3048defd4ef3e85cb135e9
master date: 2023-03-23 09:16:41 +0100

2 years agoAMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
Jan Beulich [Fri, 31 Mar 2023 06:36:59 +0000 (08:36 +0200)]
AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

An earlier change with the same title (commit 1ba66a870eba) altered only
the path where x2apic_phys was already set to false (perhaps from the
command line). The same of course needs applying when the variable
wasn't modified yet from its initial value.

Reported-by: Elliott Mitchell <ehem+xen@m5p.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 0d2686f6b66b4b1b3c72c3525083b0ce02830054
master date: 2023-03-21 09:23:25 +0100

2 years agolibacpi: fix PCI hotplug AML
David Woodhouse [Tue, 21 Mar 2023 12:53:25 +0000 (13:53 +0100)]
libacpi: fix PCI hotplug AML

The emulated PIIX3 uses a nybble for the status of each PCI function,
so the status for e.g. slot 0 functions 0 and 1 respectively can be
read as (\_GPE.PH00 & 0x0F), and (\_GPE.PH00 >> 0x04).

The AML that Xen gives to a guest gets the operand order for the odd-
numbered functions the wrong way round, returning (0x04 >> \_GPE.PH00)
instead.

As far as I can tell, this was the wrong way round in Xen from the
moment that PCI hotplug was first introduced in commit 83d82e6f35a8:

+                    ShiftRight (0x4, \_GPE.PH00, Local1)
+                    Return (Local1) /* IN status as the _STA */

Or maybe there's bizarre AML operand ordering going on there, like
Intel's wrong-way-round assembler, and it only broke later when it was
changed to being generated?

Either way, it's definitely wrong now, and instrumenting a Linux guest
shows that it correctly sees _STA being 0x00 in function 0 of an empty
slot, but then the loop in acpiphp_glue.c::get_slot_status() goes on to
look at function 1 and sees that _STA evaluates to 0x04. Thus reporting
an adapter is present in every slot in /sys/bus/pci/slots/*

Quite why Linux wants to look for function 1 being physically present
when function 0 isn't... I don't want to think about right now.

Fixes: 83d82e6f35a8 ("hvmloader: pass-through: multi-function PCI hot-plug")
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b190af7d3e90f58da5f58044b8dea7261b8b483d
master date: 2023-03-20 17:12:34 +0100

2 years agobunzip: work around gcc13 warning
Jan Beulich [Tue, 21 Mar 2023 12:52:58 +0000 (13:52 +0100)]
bunzip: work around gcc13 warning

While provable that length[0] is always initialized (because symCount
cannot be zero), upcoming gcc13 fails to recognize this and warns about
the unconditional use of the value immediately following the loop.

See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511.

Reported-by: Martin Liška <martin.liska@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 402195e56de0aacf97e05c80ed367d464ca6938b
master date: 2023-03-14 10:45:28 +0100

2 years agoVT-d: constrain IGD check
Jan Beulich [Tue, 21 Mar 2023 12:52:20 +0000 (13:52 +0100)]
VT-d: constrain IGD check

Marking a DRHD as controlling an IGD isn't very sensible without
checking that at the very least it's a graphics device that lives at
0000:00:02.0. Re-use the reading of the class-code to control both the
clearing of "gfx_only" and the setting of "igd_drhd_address".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
master commit: f8c4317295fa1cde1a81779b7e362651c084efb8
master date: 2023-03-14 10:44:08 +0100

2 years agox86/altp2m: help gcc13 to avoid it emitting a warning
Jan Beulich [Tue, 21 Mar 2023 12:51:42 +0000 (13:51 +0100)]
x86/altp2m: help gcc13 to avoid it emitting a warning

Switches of altp2m-s always expect a valid altp2m to be in place (and
indeed altp2m_vcpu_initialise() sets the active one to be at index 0).
The compiler, however, cannot know that, and hence it cannot eliminate
p2m_get_altp2m()'s case of returnin (literal) NULL. If then the compiler
decides to special case that code path in the caller, the dereference in
instances of

    atomic_dec(&p2m_get_altp2m(v)->active_vcpus);

can, to the code generator, appear to be NULL dereferences, leading to

In function 'atomic_dec',
    inlined from '...' at ...:
./arch/x86/include/asm/atomic.h:182:5: error: array subscript 0 is outside array bounds of 'int[0]' [-Werror=array-bounds=]

Aid the compiler by adding a BUG_ON() checking the return value of the
problematic p2m_get_altp2m(). Since with the use of the local variable
the 2nd p2m_get_altp2m() each will look questionable at the first glance
(Why is the local variable not used here?), open-code the only relevant
piece of p2m_get_altp2m() there.

To avoid repeatedly doing these transformations, and also to limit how
"bad" the open-coding really is, convert the entire operation to an
inline helper, used by all three instances (and accepting the redundant
BUG_ON(idx >= MAX_ALTP2M) in two of the three cases).

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: be62b1fc2aa7375d553603fca07299da765a89fe
master date: 2023-03-13 15:16:21 +0100

2 years agocore-parking: fix build with gcc12 and NR_CPUS=1
Jan Beulich [Tue, 21 Mar 2023 12:50:18 +0000 (13:50 +0100)]
core-parking: fix build with gcc12 and NR_CPUS=1

Gcc12 takes issue with core_parking_remove()'s

    for ( ; i < cur_idle_nums; ++i )
        core_parking_cpunum[i] = core_parking_cpunum[i + 1];

complaining that the right hand side array access is past the bounds of
1. Clearly the compiler can't know that cur_idle_nums can only ever be
zero in this case (as the sole CPU cannot be parked).

Arrange for core_parking.c's contents to not be needed altogether, and
then disable its building when NR_CPUS == 1.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 4b0422f70feb4b1cd04598ffde805fc224f3812e
master date: 2023-03-13 15:15:42 +0100

2 years agotools/xenmon: Fix xenmon.py for with python3.x
Bernhard Kaindl [Tue, 21 Mar 2023 12:49:47 +0000 (13:49 +0100)]
tools/xenmon: Fix xenmon.py for with python3.x

Fixes for Py3:
* class Delayed(): file not defined; also an error for pylint -E.  Inherit
  object instead for Py2 compatibility.  Fix DomainInfo() too.
* Inconsistent use of tabs and spaces for indentation (in one block)

Signed-off-by: Bernhard Kaindl <bernhard.kaindl@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 3a59443c1d5ae0677a792c660ccd3796ce036732
master date: 2023-02-06 10:22:12 +0000

2 years agotools/python: change 's#' size type for Python >= 3.10
Marek Marczykowski-Górecki [Tue, 21 Mar 2023 12:49:28 +0000 (13:49 +0100)]
tools/python: change 's#' size type for Python >= 3.10

Python < 3.10 by default uses 'int' type for data+size string types
(s#), unless PY_SSIZE_T_CLEAN is defined - in which case it uses
Py_ssize_t. The former behavior was removed in Python 3.10 and now it's
required to define PY_SSIZE_T_CLEAN before including Python.h, and using
Py_ssize_t for the length argument. The PY_SSIZE_T_CLEAN behavior is
supported since Python 2.5.

Adjust bindings accordingly.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 897257ba49d0a6ddcf084960fd792ccce9c40f94
master date: 2023-02-06 08:50:13 +0100

2 years agox86/spec-ctrl: Defer CR4_PV32_RESTORE on the cstar_enter path
Andrew Cooper [Fri, 10 Feb 2023 21:11:14 +0000 (21:11 +0000)]
x86/spec-ctrl: Defer CR4_PV32_RESTORE on the cstar_enter path

As stated (correctly) by the comment next to SPEC_CTRL_ENTRY_FROM_PV, between
the two hunks visible in the patch, RET's are not safe prior to this point.

CR4_PV32_RESTORE hides a CALL/RET pair in certain configurations (PV32
compiled in, SMEP or SMAP active), and the RET can be attacked with one of
several known speculative issues.

Furthermore, CR4_PV32_RESTORE also hides a reference to the cr4_pv32_mask
global variable, which is not safe when XPTI is active before restoring Xen's
full pagetables.

This crash has gone unnoticed because it is only AMD CPUs which permit the
SYSCALL instruction in compatibility mode, and these are not vulnerable to
Meltdown so don't activate XPTI by default.

This is XSA-429 / CVE-2022-42331

Fixes: 5e7962901131 ("x86/entry: Organise the use of MSR_SPEC_CTRL at each entry/exit point")
Fixes: 5784de3e2067 ("x86: Meltdown band-aid against malicious 64-bit PV guests")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit df5b055b12116d9e63ced59ae5389e69a2a3de48)

2 years agox86/HVM: serialize pinned cache attribute list manipulation
Jan Beulich [Tue, 21 Mar 2023 12:01:01 +0000 (12:01 +0000)]
x86/HVM: serialize pinned cache attribute list manipulation

While the RCU variants of list insertion and removal allow lockless list
traversal (with RCU just read-locked), insertions and removals still
need serializing amongst themselves. To keep things simple, use the
domain lock for this purpose.

This is CVE-2022-42334 / part of XSA-428.

Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
(cherry picked from commit 829ec245cf66560e3b50d140ccb3168e7fb7c945)

2 years agox86/HVM: bound number of pinned cache attribute regions
Jan Beulich [Tue, 21 Mar 2023 12:01:01 +0000 (12:01 +0000)]
x86/HVM: bound number of pinned cache attribute regions

This is exposed via DMOP, i.e. to potentially not fully privileged
device models. With that we may not permit registration of an (almost)
unbounded amount of such regions.

This is CVE-2022-42333 / part of XSA-428.

Fixes: 642123c5123f ("x86/hvm: provide XEN_DMOP_pin_memory_cacheattr")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit a5e768640f786b681063f4e08af45d0c4e91debf)

2 years agox86/shadow: account for log-dirty mode when pre-allocating
Jan Beulich [Tue, 21 Mar 2023 11:59:44 +0000 (11:59 +0000)]
x86/shadow: account for log-dirty mode when pre-allocating

Pre-allocation is intended to ensure that in the course of constructing
or updating shadows there won't be any risk of just made shadows or
shadows being acted upon can disappear under our feet. The amount of
pages pre-allocated then, however, needs to account for all possible
subsequent allocations. While the use in sh_page_fault() accounts for
all shadows which may need making, so far it didn't account for
allocations coming from log-dirty tracking (which piggybacks onto the
P2M allocation functions).

Since shadow_prealloc() takes a count of shadows (or other data
structures) rather than a count of pages, putting the adjustment at the
call site of this function won't work very well: We simply can't express
the correct count that way in all cases. Instead take care of this in
the function itself, by "snooping" for L1 type requests. (While not
applicable right now, future new request sites of L1 tables would then
also be covered right away.)

It is relevant to note here that pre-allocations like the one done from
shadow_alloc_p2m_page() are benign when they fall in the "scope" of an
earlier pre-alloc which already included that count: The inner call will
simply find enough pages available then; it'll bail right away.

This is CVE-2022-42332 / XSA-427.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
(cherry picked from commit 91767a71061035ae42be93de495cd976f863a41a)

2 years agox86/ucode/AMD: late load the patch on every logical thread
Sergey Dyasli [Fri, 3 Mar 2023 07:17:40 +0000 (08:17 +0100)]
x86/ucode/AMD: late load the patch on every logical thread

Currently late ucode loading is performed only on the first core of CPU
siblings.  But according to the latest recommendation from AMD, late
ucode loading should happen on every logical thread/core on AMD CPUs.

To achieve that, introduce is_cpu_primary() helper which will consider
every logical cpu as "primary" when running on AMD CPUs.  Also include
Hygon in the check for future-proofing.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f1315e48a03a42f78f9b03c0a384165baf02acae
master date: 2023-02-28 14:51:28 +0100

2 years agolibs/guest: Fix leak on realloc failure in backup_ptes()
Edwin Török [Fri, 3 Mar 2023 07:17:23 +0000 (08:17 +0100)]
libs/guest: Fix leak on realloc failure in backup_ptes()

From `man 2 realloc`:

  If realloc() fails, the original block is left untouched; it is not freed or moved.

Found using GCC -fanalyzer:

  |  184 |         backup->entries = realloc(backup->entries,
  |      |         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  |      |         |               | |
  |      |         |               | (91) when ‘realloc’ fails
  |      |         |               (92) ‘old_ptes.entries’ leaks here; was allocated at (44)
  |      |         (90) ...to here

Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 275d13184cfa52ebe4336ed66526ce93716adbe0
master date: 2023-02-27 15:51:23 +0000

2 years agolibs/guest: Fix resource leaks in xc_core_arch_map_p2m_tree_rw()
Andrew Cooper [Fri, 3 Mar 2023 07:17:04 +0000 (08:17 +0100)]
libs/guest: Fix resource leaks in xc_core_arch_map_p2m_tree_rw()

Edwin, with the help of GCC's -fanalyzer, identified that p2m_frame_list_list
gets leaked.  What fanalyzer can't see is that the live_p2m_frame_list_list
and live_p2m_frame_list foreign mappings are leaked too.

Rework the logic so the out path is executed unconditionally, which cleans up
all the intermediate allocations/mappings appropriately.

Fixes: bd7a29c3d0b9 ("tools/libs/ctrl: fix xc_core_arch_map_p2m() to support linear p2m table")
Reported-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 1868d7f22660c8980bd0a7e53f044467e8b63bb5
master date: 2023-02-27 15:51:23 +0000

2 years agotools: Use PKG_CONFIG_FILE instead of PKG_CONFIG variable
Bertrand Marquis [Fri, 3 Mar 2023 07:16:45 +0000 (08:16 +0100)]
tools: Use PKG_CONFIG_FILE instead of PKG_CONFIG variable

Replace PKG_CONFIG variable name with PKG_CONFIG_FILE for the name of
the pkg-config file.
This is preventing a conflict in some build systems where PKG_CONFIG
actually contains the path to the pkg-config executable to use, as the
default assignment in libs.mk is using a weak assignment (?=).

This problem has been found when trying to build the latest version of
Xen tools using buildroot.

Fixes: d400dc5729e4 ("tools: tweak tools/libs/libs.mk for being able to support libxenctrl")
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: b97e2fe7b9e1f4706693552697239ac2b71efee4
master date: 2023-02-24 17:44:29 +0000

2 years agoxen: Fix Clang -Wunicode diagnostic when building asm-macros
Andrew Cooper [Fri, 3 Mar 2023 07:15:50 +0000 (08:15 +0100)]
xen: Fix Clang -Wunicode diagnostic when building asm-macros

While trying to work around a different Clang-IAS bug (parent changeset), I
stumbled onto:

  In file included from arch/x86/asm-macros.c:3:
  ./arch/x86/include/asm/spec_ctrl_asm.h:144:19: error: \u used with
  no following hex digits; treating as '\' followed by identifier [-Werror,-Wunicode]
  .L\@_fill_rsb_loop\uniq:
                    ^

It turns out that Clang -E is sensitive to the file extension of the source
file it is processing.  Furthermore, C explicitly permits the use of \u
escapes in identifier names, so the diagnostic would be reasonable in
principle if we trying to compile the result.

asm-macros should really have been .S from the outset, as it is ultimately
generating assembly, not C.  Rename it, which causes Clang not to complain.

We need to introduce rules for generating a .i file from .S, and substituting
c_flags for a_flags lets us drop the now-redundant -D__ASSEMBLY__.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 53f0d02040b1df08f0589f162790ca376e1c2040
master date: 2023-02-24 17:44:29 +0000

2 years agoxen: Work around Clang-IAS macro \@ expansion bug
Andrew Cooper [Fri, 3 Mar 2023 07:14:57 +0000 (08:14 +0100)]
xen: Work around Clang-IAS macro \@ expansion bug

https://github.com/llvm/llvm-project/issues/60792

It turns out that Clang-IAS does not expand \@ uniquely in a translaition
unit, and the XSA-426 change tickles this bug:

  <instantiation>:4:1: error: invalid symbol redefinition
  .L1_fill_rsb_loop:
  ^
  make[3]: *** [Rules.mk:247: arch/x86/acpi/cpu_idle.o] Error 1

Extend DO_OVERWRITE_RSB with an optional parameter so C callers can mix %= in
too, which Clang does seem to expand properly.

Fixes: 63305e5392ec ("x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: a2adacff0b91cc7b977abb209dc419a2ef15963f
master date: 2023-02-24 17:44:29 +0000

2 years agox86: perform mem_sharing teardown before paging teardown
Tamas K Lengyel [Fri, 3 Mar 2023 07:14:25 +0000 (08:14 +0100)]
x86: perform mem_sharing teardown before paging teardown

An assert failure has been observed in p2m_teardown when performing vm
forking and then destroying the forked VM (p2m-basic.c:173). The assert
checks whether the domain's shared pages counter is 0. According to the
patch that originally added the assert (7bedbbb5c31) the p2m_teardown
should only happen after mem_sharing already relinquished all shared pages.

In this patch we flip the order in which relinquish ops are called to avoid
tripping the assert. Conceptually sharing being torn down makes sense to
happen before paging is torn down.

Fixes: e7aa55c0aab3 ("x86/p2m: free the paging memory pool preemptively")
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 2869349f0cb3a89dcbf1f1b30371f58df6309312
master date: 2023-02-23 12:35:48 +0100

2 years agox86/ucode/AMD: apply the patch early on every logical thread
Sergey Dyasli [Fri, 3 Mar 2023 07:14:01 +0000 (08:14 +0100)]
x86/ucode/AMD: apply the patch early on every logical thread

The original issue has been reported on AMD Bulldozer-based CPUs where
ucode loading loses the LWP feature bit in order to gain the IBPB bit.
LWP disabling is per-SMT/CMT core modification and needs to happen on
each sibling thread despite the shared microcode engine. Otherwise,
logical CPUs will end up with different cpuid capabilities.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216211
Guests running under Xen happen to be not affected because of levelling
logic for the feature masking/override MSRs which causes the LWP bit to
fall out and hides the issue. The latest recommendation from AMD, after
discussing this bug, is to load ucode on every logical CPU.

In Linux kernel this issue has been addressed by e7ad18d1169c
("x86/microcode/AMD: Apply the patch early on every logical thread").
Follow the same approach in Xen.

Introduce SAME_UCODE match result and use it for early AMD ucode
loading. Take this opportunity and move opt_ucode_allow_same out of
compare_revisions() to the relevant callers and also modify the warning
message based on it. Intel's side of things is modified for consistency
but provides no functional change.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f4ef8a41b80831db2136bdaff9f946a1a4b051e7
master date: 2023-02-21 15:08:05 +0100

2 years agocredit2: respect credit2_runqueue=all when arranging runqueues
Marek Marczykowski-Górecki [Fri, 3 Mar 2023 07:13:20 +0000 (08:13 +0100)]
credit2: respect credit2_runqueue=all when arranging runqueues

Documentation for credit2_runqueue=all says it should create one queue
for all pCPUs on the host. But since introduction
sched_credit2_max_cpus_runqueue, it actually created separate runqueue
per socket, even if the CPUs count is below
sched_credit2_max_cpus_runqueue.

Adjust the condition to skip syblink check in case of
credit2_runqueue=all.

Fixes: 8e2aa76dc167 ("xen: credit2: limit the max number of CPUs in a runqueue")
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
master commit: 1f5747ee929fbbcae58d7234c6c38a77495d0cfe
master date: 2023-02-15 16:12:42 +0100

2 years agox86/shskt: Disable CET-SS on parts susceptible to fractured updates
Andrew Cooper [Fri, 3 Mar 2023 07:12:24 +0000 (08:12 +0100)]
x86/shskt: Disable CET-SS on parts susceptible to fractured updates

Refer to Intel SDM Rev 70 (Dec 2022), Vol3 17.2.3 "Supervisor Shadow Stack
Token".

Architecturally, an event delivery which starts in CPL<3 and switches shadow
stack will first validate the Supervisor Shadow Stack Token (setting the busy
bit), then pushes CS/LIP/SSP.  One example of this is an NMI interrupting Xen.

Some CPUs suffer from an issue called fracturing, whereby a fault/vmexit/etc
between setting the busy bit and completing the event injection renders the
action non-restartable, because when it comes time to restart, the busy bit is
found to be already set.

This is far more easily encountered under virt, yet it is not the fault of the
hypervisor, nor the fault of the guest kernel.  The fault lies somewhere
between the architectural specification, and the uarch behaviour.

Intel have allocated CPUID.7[1].ecx[18] CET_SSS to enumerate that supervisor
shadow stacks are safe to use.  Because of how Xen lays out its shadow stacks,
fracturing is not expected to be a problem on native.

Detect this case on boot and default to not using shstk if virtualised.
Specifying `cet=shstk` on the command line will override this heuristic and
enable shadow stacks irrespective.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 01e7477d1b081cff4288ff9f51ec59ee94c03ee0
master date: 2023-02-09 18:26:17 +0000

2 years agox86/cpuid: Infrastructure for leaves 7:1{ecx,edx}
Andrew Cooper [Fri, 3 Mar 2023 07:06:44 +0000 (08:06 +0100)]
x86/cpuid: Infrastructure for leaves 7:1{ecx,edx}

We don't actually need ecx yet, but adding it in now will reduce the amount to
which leaf 7 is out of order in a featureset.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: b4a23bf6293aadecfd03bf9e83974443e2eac9cb
master date: 2023-02-09 18:26:17 +0000

2 years agolibs/util: Fix parallel build between flex/bison and CC rules
Anthony PERARD [Fri, 3 Mar 2023 07:06:23 +0000 (08:06 +0100)]
libs/util: Fix parallel build between flex/bison and CC rules

flex/bison generate two targets, and when those targets are
prerequisite of other rules they are considered independently by make.

We can have a situation where the .c file is out-of-date but not the
.h, git checkout for example. In this case, if a rule only have the .h
file as prerequiste, make will procced and start to build the object.
In parallel, another target can have the .c file as prerequisite and
make will find out it need re-generating and do so, changing the .h at
the same time. This parallel task breaks the first one.

To avoid this scenario, we put both the header and the source as
prerequisite for all object even if they only need the header.

Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: bf652a50fb3bb3b1b3d93db6fb79bc28f978fe75
master date: 2023-02-09 18:26:17 +0000

2 years agoautomation: Remove clang-8 from Debian unstable container
Anthony PERARD [Tue, 21 Feb 2023 16:55:38 +0000 (16:55 +0000)]
automation: Remove clang-8 from Debian unstable container

First, apt complain that it isn't the right way to add keys anymore,
but hopefully that's just a warning.

Second, we can't install clang-8:
The following packages have unmet dependencies:
 clang-8 : Depends: libstdc++-8-dev but it is not installable
           Depends: libgcc-8-dev but it is not installable
           Depends: libobjc-8-dev but it is not installable
           Recommends: llvm-8-dev but it is not going to be installed
           Recommends: libomp-8-dev but it is not going to be installed
 libllvm8 : Depends: libffi7 (>= 3.3~20180313) but it is not installable
E: Unable to correct problems, you have held broken packages.

clang on Debian unstable is now version 14.0.6.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
(cherry picked from commit a6b1e2b80fe2053b1c9c9843fb086a668513ea36)

2 years agox86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions
Andrew Cooper [Thu, 8 Sep 2022 20:27:58 +0000 (21:27 +0100)]
x86/spec-ctrl: Mitigate Cross-Thread Return Address Predictions

This is XSA-426 / CVE-2022-27672

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 63305e5392ec2d17b85e7996a97462744425db80)

2 years agotools/ocaml/libs: Fix memory/resource leaks with caml_alloc_custom()
Andrew Cooper [Wed, 1 Feb 2023 11:27:42 +0000 (11:27 +0000)]
tools/ocaml/libs: Fix memory/resource leaks with caml_alloc_custom()

All caml_alloc_*() functions can throw exceptions, and longjump out of
context.  If this happens, we leak the xch/xce handle.

Reorder the logic to allocate the the Ocaml object first.

Fixes: 8b3c06a3e545 ("tools/ocaml/xenctrl: OCaml 5 support, fix use-after-free")
Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d69ccf52ad467ccc22029172a8e61dc621187889)

2 years agotools/ocaml/xc: Don't reference Abstract_Tag objects with the GC lock released
Andrew Cooper [Tue, 31 Jan 2023 17:19:30 +0000 (17:19 +0000)]
tools/ocaml/xc: Don't reference Abstract_Tag objects with the GC lock released

The intf->{addr,len} references in the xc_map_foreign_range() call are unsafe.
From the manual:

  https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code

"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

More than what the manual says, the intf pointer is (potentially) invalidated
by caml_enter_blocking_section() if another thread happens to perform garbage
collection at just the right (wrong) moment.

Rewrite the logic.  There's no need to stash data in the Ocaml object until
the success path at the very end.

Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 9e7c74e6f9fd2e44df1212643b80af9032b45b07)

2 years agotools/ocaml/xc: Fix binding for xc_domain_assign_device()
Edwin Török [Thu, 12 Jan 2023 11:38:38 +0000 (11:38 +0000)]
tools/ocaml/xc: Fix binding for xc_domain_assign_device()

The patch adding this binding was plain broken, and unreviewed.  It modified
the C stub to add a 4th parameter without an equivalent adjustment in the
Ocaml side of the bindings.

In 64bit builds, this causes us to dereference whatever dead value is in %rcx
when trying to interpret the rflags parameter.

This has gone unnoticed because Xapi doesn't use this binding (it has its
own), but unbreak the binding by passing RDM_RELAXED unconditionally for
now (matching the libxl default behaviour).

Fixes: 9b34056cb4 ("tools: extend xc_assign_device() to support rdm reservation policy")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 4250683842104f02996428f93927a035c8e19266)

2 years agotools/ocaml/evtchn: Don't reference Custom objects with the GC lock released
Edwin Török [Thu, 12 Jan 2023 17:48:29 +0000 (17:48 +0000)]
tools/ocaml/evtchn: Don't reference Custom objects with the GC lock released

The modification to the _H() macro for Ocaml 5 support introduced a subtle
bug.  From the manual:

  https://ocaml.org/manual/intfc.html#ss:parallel-execution-long-running-c-code

"After caml_release_runtime_system() was called and until
caml_acquire_runtime_system() is called, the C code must not access any OCaml
data, nor call any function of the run-time system, nor call back into OCaml
code."

Previously, the value was a naked C pointer, so dereferencing it wasn't
"accessing any Ocaml data", but the fix to avoid naked C pointers added a
layer of indirection through an Ocaml Custom object, meaning that the common
pattern of using _H() in a blocking section is unsafe.

In order to fix:

 * Drop the _H() macro and replace it with a static inline xce_of_val().
 * Opencode the assignment into Data_custom_val() in the two constructors.
 * Rename "value xce" parameters to "value xce_val" so we can consistently
   have "xenevtchn_handle *xce" on the stack, and obtain the pointer with the
   GC lock still held.

Fixes: 22d5affdf0ce ("tools/ocaml/evtchn: OCaml 5 support, fix potential resource leak")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 2636d8ff7a670c4d2485757dbe966e36c259a960)

2 years agotools/ocaml/libs: Allocate the correct amount of memory for Abstract_tag
Andrew Cooper [Tue, 31 Jan 2023 10:59:42 +0000 (10:59 +0000)]
tools/ocaml/libs: Allocate the correct amount of memory for Abstract_tag

caml_alloc() takes units of Wsize (word size), not bytes.  As a consequence,
we're allocating 4 or 8 times too much memory.

Ocaml has a helper, Wsize_bsize(), but it truncates cases which aren't an
exact multiple.  Use a BUILD_BUG_ON() to cover the potential for truncation,
as there's no rounding-up form of the helper.

Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Fixes: d3e649277a13 ("ocaml: add mmap bindings implementation.")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 36eb2de31b6ecb8787698fb1a701bd708c8971b2)

2 years agotools/ocaml/libs: Don't declare stubs as taking void
Edwin Török [Thu, 12 Jan 2023 11:28:29 +0000 (11:28 +0000)]
tools/ocaml/libs: Don't declare stubs as taking void

There is no such thing as an Ocaml function (C stub or otherwise) taking no
parameters.  In the absence of any other parameters, unit is still passed.

This doesn't explode with any ABI we care about, but would malfunction for an
ABI environment such as stdcall.

Fixes: c3afd398ba7f ("ocaml: Add XS bindings.")
Fixes: 8b7ce06a2d34 ("ocaml: Add XC bindings.")
Signed-off-by: Edwin Török <edwin.torok@cloud.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit ff8b560be80b9211c303d74df7e4b3921d2bb8ca)

2 years agotools/oxenstored: validate config file before live update
Edwin Török [Tue, 11 May 2021 15:56:50 +0000 (15:56 +0000)]
tools/oxenstored: validate config file before live update

The configuration file can contain typos or various errors that could prevent
live update from succeeding (e.g. a flag only valid on a different version).
Unknown entries in the config file would be ignored on startup normally,
add a strict --config-test that live-update can use to check that the config file
is valid *for the new binary*.

For compatibility with running old code during live update recognize
--live --help as an equivalent to --config-test.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit e6f07052ce4a0f0b7d4dc522d87465efb2d9ee86)

2 years agotools/ocaml/xb: Drop Xs_ring.write
Edwin Török [Fri, 16 Dec 2022 18:25:20 +0000 (18:25 +0000)]
tools/ocaml/xb: Drop Xs_ring.write

This function is unusued (only Xs_ring.write_substring is used), and the
bytes/string conversion here is backwards: the C stub implements the bytes
version and then we use a Bytes.unsafe_of_string to convert a string into
bytes.

However the operation here really is read-only: we read from the string and
write it to the ring, so the C stub should implement the read-only string
version, and if needed we could use Bytes.unsafe_to_string to be able to send
'bytes'. However that is not necessary as the 'bytes' version is dropped above.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 01f139215e678c2dc7d4bb3f9f2777069bb1b091)

2 years agotools/ocaml/xb,mmap: Use Data_abstract_val wrapper
Edwin Török [Fri, 16 Dec 2022 18:25:10 +0000 (18:25 +0000)]
tools/ocaml/xb,mmap: Use Data_abstract_val wrapper

This is not strictly necessary since it is essentially a no-op currently: a
cast to void * and value *, even in OCaml 5.0.

However it does make it clearer that what we have here is not a regular OCaml
value, but one allocated with Abstract_tag or Custom_tag, and follows the
example from the manual more closely:
https://v2.ocaml.org/manual/intfc.html#ss:c-outside-head

It also makes it clearer that these modules have been reviewed for
compat with OCaml 5.0.

We cannot use OCaml finalizers here, because we want exact control over when
to unmap these pages from remote domains.

No functional change.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit d2ccc637111d6dbcf808aaffeec7a46f0b1e1c81)

2 years agotools/ocaml/xenctrl: Use larger chunksize in domain_getinfolist
Edwin Török [Tue, 1 Nov 2022 17:59:17 +0000 (17:59 +0000)]
tools/ocaml/xenctrl: Use larger chunksize in domain_getinfolist

domain_getinfolist() is quadratic with the number of domains, because of the
behaviour of the underlying hypercall.  Nevertheless, getting domain info in
blocks of 1024 is far more efficient than blocks of 2.

In a scalability testing scenario with ~1000 VMs, a combination of this and
the previous change takes xenopsd's wallclock time in domain_getinfolist()
down from 88% to 0.02%

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 95db09b1b154fb72fad861815ceae1f3fa49fc4e)

2 years agotools/ocaml/xenctrl: Make domain_getinfolist tail recursive
Edwin Török [Tue, 1 Nov 2022 17:59:16 +0000 (17:59 +0000)]
tools/ocaml/xenctrl: Make domain_getinfolist tail recursive

domain_getinfolist() is quadratic with the number of domains, because of the
behaviour of the underlying hypercall.  xenopsd was further observed to be
wasting excessive quantites of time manipulating the list of already-obtained
domains.

Implement a tail recursive `rev_concat` equivalent to `concat |> rev`, and use
it instead of calling `@` multiple times.

An incidental benefit is that the list of domains will now be in domid order,
instead of having pairs of 2 domains changing direction every time.

In a scalability testing scenario with ~1000 VMs, a combination of this and
the subsequent change takes xenopsd's wallclock time in domain_getinfolist()
down from 88% to 0.02%

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Pau Ruiz Safont <pau.safont@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit c3b6be714c64aa62b56d0bce96f4b6a10b5c2078)

2 years agolibxl: fix guest kexec - skip cpuid policy
Jason Andryuk [Tue, 7 Feb 2023 16:06:47 +0000 (17:06 +0100)]
libxl: fix guest kexec - skip cpuid policy

When a domain performs a kexec (soft reset), libxl__build_pre() is
called with the existing domid.  Calling libxl__cpuid_legacy() on the
existing domain fails since the cpuid policy has already been set, and
the guest isn't rebuilt and doesn't kexec.

xc: error: Failed to set d1's policy (err leaf 0xffffffff, subleaf 0xffffffff, msr 0xffffffff) (17 = File exists): Internal error
libxl: error: libxl_cpuid.c:494:libxl__cpuid_legacy: Domain 1:Failed to apply CPUID policy: File exists
libxl: error: libxl_create.c:1641:domcreate_rebuild_done: Domain 1:cannot (re-)build domain: -3
libxl: error: libxl_xshelp.c:201:libxl__xs_read_mandatory: xenstore read failed: `/libxl/1/type': No such file or directory
libxl: warning: libxl_dom.c:49:libxl__domain_type: unable to get domain type for domid=1, assuming HVM

During a soft_reset, skip calling libxl__cpuid_legacy() to avoid the
issue.  Before commit 34990446ca91, the libxl__cpuid_legacy() failure
would have been ignored, so kexec would continue.

Fixes: 34990446ca91 ("libxl: don't ignore the return value from xc_cpuid_apply_policy")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
master commit: 1e454c2b5b1172e0fc7457e411ebaba61db8fc87
master date: 2023-01-26 10:58:23 +0100