]> xenbits.xensource.com Git - people/sstabellini/xen-unstable.git/.git/log
people/sstabellini/xen-unstable.git/.git
4 years agoarm,smmu: add support for generic DT bindings. Implement add_device and dt_xlate. smmu-generic-2
Brian Woods [Tue, 26 Jan 2021 22:58:36 +0000 (14:58 -0800)]
arm,smmu: add support for generic DT bindings. Implement add_device and dt_xlate.

Now that all arm iommu drivers support generic bindings we can remove
the workaround from iommu_add_dt_device().

Note that if both legacy bindings and generic bindings are present in
device tree, the legacy bindings are the ones that are used.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
4 years agoarm,smmu: restructure code in preparation to new bindings support
Brian Woods [Tue, 26 Jan 2021 22:58:35 +0000 (14:58 -0800)]
arm,smmu: restructure code in preparation to new bindings support

Restructure some of the code and add supporting functions for adding
generic device tree (DT) binding support.  This will allow for using
current Linux device trees with just modifying the chosen field to
enable Xen.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
4 years agoarm,smmu: switch to using iommu_fwspec functions
Brian Woods [Tue, 6 Apr 2021 22:52:10 +0000 (15:52 -0700)]
arm,smmu: switch to using iommu_fwspec functions

Modify the smmu driver so that it uses the iommu_fwspec helper
functions.  This means both ARM IOMMU drivers will both use the
iommu_fwspec helper functions, making enabling generic device tree
bindings in the SMMU driver much cleaner.

Signed-off-by: Brian Woods <brian.woods@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
4 years agoxen/xsm: Improve alloc/free of evtchn buckets
Andrew Cooper [Sat, 16 Jan 2021 16:09:10 +0000 (16:09 +0000)]
xen/xsm: Improve alloc/free of evtchn buckets

Currently, flask_alloc_security_evtchn() is called in loops of
64 (EVTCHNS_PER_BUCKET), which for non-dummy implementations is a function
pointer call even in the no-op case.  The non no-op case only sets a single
constant, and doesn't actually fail.

Spectre v2 protections has made function pointer calls far more expensive, and
64 back-to-back calls is a waste.  Rework the APIs to pass the size of the
bucket instead, and call them once.

No practical change, but {alloc,free}_evtchn_bucket() should be rather more
efficient now.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
4 years agotools/libs: Simplify internal *.pc files
Andrew Cooper [Wed, 25 Nov 2020 14:37:00 +0000 (14:37 +0000)]
tools/libs: Simplify internal *.pc files

The internal package config file for libxenlight reads (reformatted to avoid
exceeding the SMTP 998-character line length):

  Libs: -L${libdir}
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/evtchn
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/call
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/evtchn
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/gnttab
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/foreignmemory
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/call
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/devicemodel
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/ctrl
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/store
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/call
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/hypfs
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/evtchn
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/call
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/evtchn
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/gnttab
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/foreignmemory
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/call
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/devicemodel
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/ctrl
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/guest
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/light
  -lxenlight

Drop duplicate -rpath-link='s to turn it into the slightly-more-manageable:

  Libs: -L${libdir}
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/call
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/ctrl
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/devicemodel
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/evtchn
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/foreignmemory
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/gnttab
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/guest
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/hypfs
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/light
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/store
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toolcore
  -Wl,-rpath-link=/local/security/xen.git/tools/libs/light/../../../tools/libs/toollog
  -lxenlight

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agotools: Drop gettext as a build dependency
Andrew Cooper [Fri, 26 Mar 2021 11:25:07 +0000 (11:25 +0000)]
tools: Drop gettext as a build dependency

It has not been a dependency since at least 4.13.  Remove its mandatory check
from ./configure.

Annotate the dependency in the CI dockerfiles, and drop them from CirrusCI and
TravisCI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoxen/gunzip: Fix build with clang after 33bc2a8495f7
Julien Grall [Wed, 7 Apr 2021 18:22:10 +0000 (19:22 +0100)]
xen/gunzip: Fix build with clang after 33bc2a8495f7

The compilation will fail when building Xen with clang and
CONFIG_DEBUG=y:

make[4]: Leaving directory '/oss/xen/xen/common/libelf'
  INIT_O  gunzip.init.o
Error: size of gunzip.o:.text is 0x00000019

This is because the function init_allocator() will not be inlined
and is not part of the init section.

Fix it by marking init_allocator() with INIT.

Fixes: 33bc2a8495f7 ("xen/gunzip: Allow perform_gunzip() to be called multiple times")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoRevert "x86: guard against straight-line speculation past RET"
Jan Beulich [Fri, 9 Apr 2021 07:50:40 +0000 (09:50 +0200)]
Revert "x86: guard against straight-line speculation past RET"

This reverts commit 71b0b475d801ebeb83a6ba402425135c314fa2df,
which has no real effect - the most recent version of the patch
had lost the INT3 insn.

4 years agohypfs: avoid effectively open-coding xzalloc_array()
Jan Beulich [Fri, 9 Apr 2021 07:25:42 +0000 (09:25 +0200)]
hypfs: avoid effectively open-coding xzalloc_array()

There is a difference in generated code: xzalloc_bytes() forces
SMP_CACHE_BYTES alignment. I think we not only don't need this here, but
actually don't want it.

To avoid the need to add a cast, do away with the only forward-declared
struct hypfs_dyndata.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agox86/vPMU: avoid effectively open-coding xzalloc_flex_struct()
Jan Beulich [Fri, 9 Apr 2021 07:25:17 +0000 (09:25 +0200)]
x86/vPMU: avoid effectively open-coding xzalloc_flex_struct()

There is a difference in generated code: xzalloc_bytes() forces
SMP_CACHE_BYTES alignment. I think we not only don't need this here, but
actually don't want it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/HVM: avoid effectively open-coding xzalloc_flex_struct()
Jan Beulich [Fri, 9 Apr 2021 07:24:23 +0000 (09:24 +0200)]
x86/HVM: avoid effectively open-coding xzalloc_flex_struct()

Drop hvm_irq_size(), which exists for just this purpose.

There is a difference in generated code: xzalloc_bytes() forces
SMP_CACHE_BYTES alignment. I think we not only don't need this here, but
actually don't want it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoMAINTAINERS: add myself as hypfs maintainer
Juergen Gross [Fri, 9 Apr 2021 07:23:28 +0000 (09:23 +0200)]
MAINTAINERS: add myself as hypfs maintainer

As I have contributed all the code for hypfs, it would be natural to
be the maintainer.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agopci: move ATS code to common directory
Rahul Singh [Fri, 9 Apr 2021 07:22:26 +0000 (09:22 +0200)]
pci: move ATS code to common directory

PCI ATS code is common for all architecture, move code to common
directory to be usable for other architectures.

No functional change intended.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
4 years agox86/vpt: simplify locking argument to write_{,un}lock
Boris Ostrovsky [Fri, 9 Apr 2021 07:22:04 +0000 (09:22 +0200)]
x86/vpt: simplify locking argument to write_{,un}lock

Make pt_adjust_vcpu() call write_{,un}lock with less indirection, like
create_periodic_time() already does.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/vpt: do not take pt_migrate rwlock in some cases
Boris Ostrovsky [Fri, 9 Apr 2021 07:21:27 +0000 (09:21 +0200)]
x86/vpt: do not take pt_migrate rwlock in some cases

Commit 8e76aef72820 ("x86/vpt: fix race when migrating timers between
vCPUs") addressed XSA-336 by introducing a per-domain rwlock that was
intended to protect periodic timer during VCPU migration. Since such
migration is an infrequent event no performance impact was expected.

Unfortunately this turned out not to be the case: on a fairly large
guest (92 VCPUs) we've observed as much as 40% TPCC performance
regression with some guest kernels. Further investigation pointed to
pt_migrate read lock taken in pt_update_irq() as the largest contributor
to this regression. With large number of VCPUs and large number of VMEXITs
(from where pt_update_irq() is always called) the update of an atomic in
read_lock() is thought to be the main cause.

Stephen Brennan analyzed locking pattern and classified lock users as
follows:

1. Functions which read (maybe write) all periodic_time instances attached
to a particular vCPU. These are functions which use pt_vcpu_lock() such
as pt_restore_timer(), pt_save_timer(), etc.
2. Functions which want to modify a particular periodic_time object.
These functions lock whichever vCPU the periodic_time is attached to, but
since the vCPU could be modified without holding any lock, they are
vulnerable to XSA-336. Functions in this group use pt_lock(), such as
pt_timer_fn() or destroy_periodic_time().
3. Functions which not only want to modify the periodic_time, but also
would like to modify the =vcpu= fields. These are create_periodic_time()
or pt_adjust_vcpu(). They create XSA-336 conditions for group 2, but we
can't simply hold 2 vcpu locks due to the deadlock risk.

Roger then pointed out that group 1 functions don't really need to hold
the pt_migrate rwlock and that instead groups 2 and 3 should hold per-vcpu
lock whenever they modify per-vcpu timer lists.

Suggested-by: Stephen Brennan <stephen.s.brennan@oracle.com>
Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stephen Brennan <stephen.s.brennan@oracle.com>
4 years agox86/irq: simplify loop in unmap_domain_pirq
Roger Pau Monné [Fri, 9 Apr 2021 07:20:57 +0000 (09:20 +0200)]
x86/irq: simplify loop in unmap_domain_pirq

The for loop in unmap_domain_pirq is unnecessary complicated, with
several places where the index is incremented, and also different
exit conditions spread between the loop body.

Simplify it by looping over each possible PIRQ using the for loop
syntax, and remove all possible in-loop exit points.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/shadow: encode full GFN in magic MMIO entries
Jan Beulich [Fri, 9 Apr 2021 07:20:15 +0000 (09:20 +0200)]
x86/shadow: encode full GFN in magic MMIO entries

Since we don't need to encode all of the PTE flags, we have enough bits
in the shadow entry to store the full GFN. Limit use of literal numbers
a little and instead derive some of the involved values. Sanity-check
the result via BUILD_BUG_ON()s.

This then allows dropping from sh_l1e_mmio() again the guarding against
too large GFNs. It needs replacing by an L1TF safety check though, which
in turn requires exposing cpu_has_bug_l1tf.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/PV32: avoid TLB flushing after mod_l3_entry()
Jan Beulich [Fri, 9 Apr 2021 07:19:18 +0000 (09:19 +0200)]
x86/PV32: avoid TLB flushing after mod_l3_entry()

32-bit guests may not depend upon the side effect of using ordinary
4-level paging when running on a 64-bit hypervisor. For L3 entry updates
to take effect, they have to use a CR3 reload. Therefore there's no need
to issue a paging structure invalidating TLB flush in this case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/PV: restrict TLB flushing after mod_l[234]_entry()
Jan Beulich [Fri, 9 Apr 2021 07:18:51 +0000 (09:18 +0200)]
x86/PV: restrict TLB flushing after mod_l[234]_entry()

Just like we avoid to invoke remote root pt flushes when all uses of an
L4 table can be accounted for locally, the same can be done for all of
L[234] for the linear pt flush when the table is a "free floating" one,
i.e. it is pinned but not hooked up anywhere. While this situation
doesn't occur very often, it can be observed.

Since this breaks one of the implications of the XSA-286 fix, drop the
flush_root_pt_local variable again and set ->root_pgt_changed directly,
just like it was before that change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/PV: _PAGE_RW changes may take fast path of mod_l[234]_entry()
Jan Beulich [Fri, 9 Apr 2021 07:18:17 +0000 (09:18 +0200)]
x86/PV: _PAGE_RW changes may take fast path of mod_l[234]_entry()

The only time _PAGE_RW matters when validating an L2 or higher entry is
when a linear page table is tried to be installed (see the comment ahead
of define_get_linear_pagetable()). Therefore when we disallow such at
build time, we can allow _PAGE_RW changes to take the fast paths there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86: limit amount of INT3 in IND_THUNK_*
Jan Beulich [Fri, 9 Apr 2021 07:17:04 +0000 (09:17 +0200)]
x86: limit amount of INT3 in IND_THUNK_*

There's no point having every replacement variant to also specify the
INT3 - just have it once in the base macro. When patching, NOPs will get
inserted, which are fine to speculate through (until reaching the INT3).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86: guard against straight-line speculation past RET
Jan Beulich [Fri, 9 Apr 2021 07:16:22 +0000 (09:16 +0200)]
x86: guard against straight-line speculation past RET

Under certain conditions CPUs can speculate into the instruction stream
past a RET instruction. Guard against this just like 3b7dab93f240
("x86/spec-ctrl: Protect against CALL/JMP straight-line speculation")
did - by inserting an "INT $3" insn. It's merely the mechanics of how to
achieve this that differ: A set of macros gets introduced to post-
process RET insns issued by the compiler (or living in assembly files).

Unfortunately for clang this requires further features their built-in
assembler doesn't support: We need to be able to override insn mnemonics
produced by the compiler (which may be impossible, if internally
assembly mnemonics never get generated).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/PV: make post-migration page state consistent
Jan Beulich [Fri, 9 Apr 2021 07:15:38 +0000 (09:15 +0200)]
x86/PV: make post-migration page state consistent

When a page table page gets de-validated, its type reference count drops
to zero (and PGT_validated gets cleared), but its type remains intact.
XEN_DOMCTL_getpageframeinfo3, therefore, so far reported prior usage for
such pages. An intermediate write to such a page via e.g.
MMU_NORMAL_PT_UPDATE, however, would transition the page's type to
PGT_writable_page, thus altering what XEN_DOMCTL_getpageframeinfo3 would
return. In libxc the decision which pages to normalize / localize
depends solely on the type returned from the domctl. As a result without
further precautions the guest won't be able to tell whether such a page
has had its (apparent) PTE entries transitioned to the new MFNs.

Add a check of PGT_validated, thus consistently avoiding normalization /
localization in the tool stack.

Also use XEN_DOMCTL_PFINFO_NOTAB in the variable's initializer instead
open coding it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agolibxg: don't use max policy in xc_cpuid_xend_policy()
Jan Beulich [Fri, 9 Apr 2021 07:14:58 +0000 (09:14 +0200)]
libxg: don't use max policy in xc_cpuid_xend_policy()

using max undermines the separation between default and max. For
example, turning off AVX512F on an MPX-capable system silently turns on
MPX, despite this not being part of the default policy anymore. Since
the information is used only for determining what to convert 'x' to (but
not to e.g. validate '1' settings), the effect of this change is
identical for guests with (suitable) "cpuid=" settings to that of the
changes separating default from max and then converting (e.g.) MPX from
being part of default to only being part of max for guests without
(affected) "cpuid=" settings.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/CPUID: move some static masks into .init
Jan Beulich [Fri, 9 Apr 2021 07:14:25 +0000 (09:14 +0200)]
x86/CPUID: move some static masks into .init

Except for hvm_shadow_max_featuremask and deep_features they're
referenced by __init functions only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86: refine guest_mode()
Jan Beulich [Fri, 9 Apr 2021 07:12:51 +0000 (09:12 +0200)]
x86: refine guest_mode()

The 2nd of the assertions as well as the macro's return value have been
assuming we're on the primary stack. While for most IST exceptions we
switch back to the main one when user mode was interrupted, for #DF we
intentionally never do, and hence a #DF actually triggering on a user
mode insn (which then is still a Xen bug) would in turn trigger this
assertion, rather than cleanly logging state.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoxen/page_alloc: Don't hold the heap_lock when clearing PGC_need_scrub
Julien Grall [Thu, 21 Jan 2021 11:12:00 +0000 (11:12 +0000)]
xen/page_alloc: Don't hold the heap_lock when clearing PGC_need_scrub

Currently, the heap_lock is held when clearing PGC_need_scrub in
alloc_heap_pages(). However, this is unnecessary because the only caller
(mark_page_offline()) that can concurrently modify the count_info is
using cmpxchg() in a loop.

Therefore, rework the code to avoid holding the heap_lock and use
test_and_clear_bit() instead.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agofix for_each_cpu() again for NR_CPUS=1
Jan Beulich [Wed, 7 Apr 2021 10:24:45 +0000 (12:24 +0200)]
fix for_each_cpu() again for NR_CPUS=1

Unfortunately aa50f45332f1 ("xen: fix for_each_cpu when NR_CPUS=1") has
caused quite a bit of fallout with gcc10, e.g. (there are at least two
more similar ones, and I didn't bother trying to find them all):

In file included from .../xen/include/xen/config.h:13,
                 from <command-line>:
core_parking.c: In function ‘core_parking_power’:
.../xen/include/asm/percpu.h:12:51: error: array subscript 1 is above array bounds of ‘long unsigned int[1]’ [-Werror=array-bounds]
   12 |     (*RELOC_HIDE(&per_cpu__##var, __per_cpu_offset[cpu]))
.../xen/include/xen/compiler.h:141:29: note: in definition of macro ‘RELOC_HIDE’
  141 |     (typeof(ptr)) (__ptr + (off)); })
      |                             ^~~
core_parking.c:133:39: note: in expansion of macro ‘per_cpu’
  133 |             core_tmp = cpumask_weight(per_cpu(cpu_core_mask, cpu));
      |                                       ^~~~~~~
In file included from .../xen/include/xen/percpu.h:4,
                 from .../xen/include/asm/msr.h:7,
                 from .../xen/include/asm/time.h:5,
                 from .../xen/include/xen/time.h:76,
                 from .../xen/include/xen/spinlock.h:4,
                 from .../xen/include/xen/cpu.h:5,
                 from core_parking.c:19:
.../xen/include/asm/percpu.h:6:22: note: while referencing ‘__per_cpu_offset’
    6 | extern unsigned long __per_cpu_offset[NR_CPUS];
      |                      ^~~~~~~~~~~~~~~~

One of the further errors even went as far as claiming that an array
index (range) of [0, 0] was outside the bounds of a [1] array, so
something fishy is pretty clearly going on there.

The compiler apparently wants to be able to see that the loop isn't
really a loop in order to avoid triggering such warnings, yet what
exactly makes it consider the loop exit condition constant and within
the [0, 1] range isn't obvious - using ((mask)->bits[0] & 1) instead of
cpumask_test_cpu() for example did _not_ help.

Re-instate a special form of for_each_cpu(), experimentally "proven" to
avoid the diagnostics.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
4 years agotools/firmware: hvmloader: Use const in __bug() and __assert_failed()
Julien Grall [Tue, 6 Apr 2021 19:01:18 +0000 (20:01 +0100)]
tools/firmware: hvmloader: Use const in __bug() and __assert_failed()

__bug() and __assert_failed() are not meant to modify the string
parameters. So mark them as const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agotools/xentrace: Use const whenever we point to literal strings
Julien Grall [Tue, 6 Apr 2021 19:00:25 +0000 (20:00 +0100)]
tools/xentrace: Use const whenever we point to literal strings

literal strings are not meant to be modified. So we should use const
char * rather than char * when we want to store a pointer to them.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
4 years agotools/kdd: Use const whenever we point to literal strings
Julien Grall [Tue, 6 Apr 2021 18:59:25 +0000 (19:59 +0100)]
tools/kdd: Use const whenever we point to literal strings

literal strings are not meant to be modified. So we should use const
char * rather than char * when we want to shore a pointer to them.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agoxen/x86: shadow: The return type of sh_audit_flags() should be const
Julien Grall [Tue, 6 Apr 2021 18:58:05 +0000 (19:58 +0100)]
xen/x86: shadow: The return type of sh_audit_flags() should be const

The function sh_audit_flags() is returning pointer to literal strings.
They should not be modified, so the return is now const and this is
propagated to the callers.

Take the opportunity to fix the coding style in the declaration of
sh_audit_flags.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agoxen/sched: Constify name and opt_name in struct scheduler
Julien Grall [Tue, 6 Apr 2021 18:34:08 +0000 (19:34 +0100)]
xen/sched: Constify name and opt_name in struct scheduler

Both name and opt_name are pointing to literal string. So mark both of
the fields as const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
4 years agoxen: Constify the second parameter of rangeset_new()
Julien Grall [Tue, 6 Apr 2021 18:03:49 +0000 (19:03 +0100)]
xen: Constify the second parameter of rangeset_new()

The string 'name' will never get modified by the function, so mark it
as const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/gunzip: Allow perform_gunzip() to be called multiple times
Julien Grall [Wed, 3 Mar 2021 19:27:56 +0000 (19:27 +0000)]
xen/gunzip: Allow perform_gunzip() to be called multiple times

Currently perform_gunzip() can only be called once because the
the internal state (e.g allocate) is not fully re-initialized.

This works fine if you are only booting dom0. But this will break when
booting multiple using the dom0less that uses compressed kernel images.

This can be resolved by re-initializing bytes_out, malloc_ptr,
malloc_count every time perform_gunzip() is called.

Note the latter is only re-initialized for hardening purpose as there is
no guarantee that every malloc() are followed by free() (It should in
theory!).

Take the opportunity to check the return of alloc_heap_pages() to return
an error rather than dereferencing a NULL pointer later on failure.

Reported-by: Charles Chiou <cchiou@ambarella.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoCHANGELOG.md: irq-max-guests
George Dunlap [Thu, 1 Apr 2021 13:34:04 +0000 (14:34 +0100)]
CHANGELOG.md: irq-max-guests

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
CC: Igor Druzhinin <igor.druzhinin@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Ian Jackson <ian.jackson@citrix.com>
4 years agoCHANGELOG.md: Various entries, mostly xenstore
George Dunlap [Thu, 1 Apr 2021 13:30:55 +0000 (14:30 +0100)]
CHANGELOG.md: Various entries, mostly xenstore

...grouped by submitters / maintainers

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
---
CC: Juergen Gross <jgross@suse.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Ian Jackson <ian.jackson@citrix.com>
4 years agoCHANGELOG.md: Various new entries, mostly x86
George Dunlap [Thu, 1 Apr 2021 13:18:36 +0000 (14:18 +0100)]
CHANGELOG.md: Various new entries, mostly x86

...Grouped mostly by submitter / maintainer

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Roger Pau Monne <roger.pau@citrix.com>
4 years agoCHANGELOG.md: Mention various ARM errata
George Dunlap [Thu, 1 Apr 2021 13:08:04 +0000 (14:08 +0100)]
CHANGELOG.md: Mention various ARM errata

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
---
v2:
 - Tweaked wording
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
4 years agoCHANGELOG.md: Some additional affordances in various xl subcommands
George Dunlap [Thu, 1 Apr 2021 13:06:36 +0000 (14:06 +0100)]
CHANGELOG.md: Some additional affordances in various xl subcommands

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
---
CC: Ian Jackson <ian.jackson@citrix.com>
4 years agoCHANGELOG.md: xl PCI configuration doc, xenstore MTU entries
George Dunlap [Thu, 1 Apr 2021 12:54:10 +0000 (13:54 +0100)]
CHANGELOG.md: xl PCI configuration doc, xenstore MTU entries

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Ian Jackson <ian.jackson@citrix.com>
---
CC: Paul Durrant <paul@xen.org>
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wl@xen.org>
4 years agoCHANGELOG.md: Mention XEN_SCRIPT_DIR
George Dunlap [Thu, 1 Apr 2021 12:51:48 +0000 (13:51 +0100)]
CHANGELOG.md: Mention XEN_SCRIPT_DIR

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-acked-by: Ian Jackson <iwj@xenproject.org>
---

CC: Olaf Hering <olaf@aepfle.de>
CC: Ian Jackson <iwj@xenproject.org>
4 years agorangeset: no need to use snprintf()
Jan Beulich [Tue, 6 Apr 2021 14:18:41 +0000 (16:18 +0200)]
rangeset: no need to use snprintf()

As of the conversion to safe_strcpy() years ago there has been no need
anymore to use snprintf() to prevent storing a not-nul-terminated string.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agocommon: map_vcpu_info() cosmetics
Jan Beulich [Tue, 6 Apr 2021 14:17:42 +0000 (16:17 +0200)]
common: map_vcpu_info() cosmetics

Use ENXIO instead of EINVAL to cover the two cases of the address not
satisfying the requirements. This will make an issue here better stand
out at the call site.

Also add a missing compat-mode related size check: If the sizes
differed, other code in the function would need changing. Accompany this
by a change to the initial sizeof() expression, tying it to the type of
the variable we're actually after (matching e.g. the alignof() added by
XSA-327).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/arm: smmuv1: Intelligent SMR allocation
Rahul Singh [Mon, 22 Mar 2021 16:11:39 +0000 (16:11 +0000)]
xen/arm: smmuv1: Intelligent SMR allocation

Backport 588888a7399db352d2b1a41c9d5b3bf0fd482390
"iommu/arm-smmu: Intelligent SMR allocation" from the Linux kernel

This patch fix the stream match conflict issue when two devices have the
same stream-id.

Only difference while applying this patch with regard to Linux patch are
as follows:
1. Spinlock is used in place of mutex when attaching a device to the
   SMMU via arm_smmu_master_alloc_smes(..) function call.Replacing the
   mutex with spinlock is fine here as we are configuring the hardware
   via registers and it is very fast.

2. move iommu_group_alloc(..) function call in arm_smmu_add_device(..)
   function from the start of the function to the end.

Original commit message:
    iommu/arm-smmu: Intelligent SMR allocation

    Stream Match Registers are one of the more awkward parts of the SMMUv2
    architecture; there are typically never enough to assign one to each
    stream ID in the system, and configuring them such that a single ID
    matches multiple entries is catastrophically bad - at best, every
    transaction raises a global fault; at worst, they go *somewhere*.

    To address the former issue, we can mask ID bits such that a single
    register may be used to match multiple IDs belonging to the same device
    or group, but doing so also heightens the risk of the latter problem
    (which can be nasty to debug).

    Tackle both problems at once by replacing the simple bitmap allocator
    with something much cleverer. Now that we have convenient in-memory
    representations of the stream mapping table, it becomes straightforward
    to properly validate new SMR entries against the current state, opening
    the door to arbitrary masking and SMR sharing.

    Another feature which falls out of this is that with IDs shared by
    separate devices being automatically accounted for, simply associating a
    group pointer with the S2CR offers appropriate group allocation almost
    for free, so hook that up in the process.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien GralL <jgrall@amazon.com>
4 years agoxen/arm: smmuv1: Add a stream map entry iterator
Rahul Singh [Mon, 22 Mar 2021 16:11:38 +0000 (16:11 +0000)]
xen/arm: smmuv1: Add a stream map entry iterator

Backport commit d3097e39302083d58922a3d1032d7d59a63d263d
"iommu/arm-smmu: Add a stream map entry iterator" from the Linux kernel.

This patch is the preparatory work to fix the stream match conflict
when two devices have the same stream-id.

Original commit message:
    iommu/arm-smmu: Add a stream map entry iterator

    We iterate over the SMEs associated with a master config quite a lot in
    various places, and are about to do so even more. Let's wrap the idiom
    in a handy iterator macro before the repetition gets out of hand.

Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/arm: smmuv1: Keep track of S2CR state
Rahul Singh [Mon, 22 Mar 2021 16:11:37 +0000 (16:11 +0000)]
xen/arm: smmuv1: Keep track of S2CR state

Backport commit 8e8b203eabd8b9e96d02d6339e4abce3e5a7ea4b
"iommu/arm-smmu: Keep track of S2CR state" from the Linux kernel.

This patch is the preparatory work to fix the stream match conflict
when two devices have the same stream-id.

Original commit message:
    iommu/arm-smmu: Keep track of S2CR state

    Making S2CRs first-class citizens within the driver with a high-level
    representation of their state offers a neat solution to a few problems:

    Firstly, the information about which context a device's stream IDs are
    associated with is already present by necessity in the S2CR. With that
    state easily accessible we can refer directly to it and obviate the need
    to track an IOMMU domain in each device's archdata (its earlier purpose
    of enforcing correct attachment of multi-device groups now being handled
    by the IOMMU core itself).

    Secondly, the core API now deprecates explicit domain detach and expects
    domain attach to move devices smoothly from one domain to another; for
    SMMUv2, this notion maps directly to simply rewriting the S2CRs assigned
    to the device. By giving the driver a suitable abstraction of those
    S2CRs to work with, we can massively reduce the overhead of the current
    heavy-handed "detach, free resources, reallocate resources, attach"
    approach.

    Thirdly, making the software state hardware-shaped and attached to the
    SMMU instance once again makes suspend/resume of this register group
    that much simpler to implement in future.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/arm: smmuv1: Consolidate stream map entry state
Rahul Singh [Mon, 22 Mar 2021 16:11:36 +0000 (16:11 +0000)]
xen/arm: smmuv1: Consolidate stream map entry state

Backport commit 1f3d5ca43019bff1105838712d55be087d93c0da
"iommu/arm-smmu: Consolidate stream map entry state" from the Linux
kernel.

This patch is the preparatory work to fix the stream match conflict
when two devices have the same stream-id.

Original commit message:
    iommu/arm-smmu: Consolidate stream map entry state

    In order to consider SMR masking, we really want to be able to validate
    ID/mask pairs against existing SMR contents to prevent stream match
    conflicts, which at best would cause transactions to fault unexpectedly,
    and at worst lead to silent unpredictable behaviour. With our SMMU
    instance data holding only an allocator bitmap, and the SMR values
    themselves scattered across master configs hanging off devices which we
    may have no way of finding, there's essentially no way short of digging
    everything back out of the hardware. Similarly, the thought of power
    management ops to support suspend/resume faces the exact same problem.

    By massaging the software state into a closer shape to the underlying
    hardware, everything comes together quite nicely; the allocator and the
    high-level view of the data become a single centralised state which we
    can easily keep track of, and to which any updates can be validated in
    full before being synchronised to the hardware itself.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/arm: smmuv1: Handle stream IDs more dynamically
Rahul Singh [Mon, 22 Mar 2021 16:11:35 +0000 (16:11 +0000)]
xen/arm: smmuv1: Handle stream IDs more dynamically

Backport commit 21174240e4f4439bb8ed6c116cdbdc03eba2126e
"iommu/arm-smmu: Handle stream IDs more dynamically" from the Linux
ernel.

This patch is the preparatory work to fix the stream match conflict
when two devices have the same stream-id.

Original commit message:
    iommu/arm-smmu: Handle stream IDs more dynamically

    Rather than assuming fixed worst-case values for stream IDs and SMR
    masks, keep track of whatever implemented bits the hardware actually
    reports. This also obviates the slightly questionable validation of SMR
    fields in isolation - rather than aborting the whole SMMU probe for a
    hardware configuration which is still architecturally valid, we can
    simply refuse masters later if they try to claim an unrepresentable ID
    or mask (which almost certainly implies a DT error anyway).

Acked-by: Will Deacon <will.deacon@arm.com>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoarm: Add Kconfig entry to select CONFIG_DTB_FILE
Michal Orzel [Mon, 22 Mar 2021 08:17:15 +0000 (09:17 +0100)]
arm: Add Kconfig entry to select CONFIG_DTB_FILE

Currently in order to link existing DTB into Xen image
we need to either specify option CONFIG_DTB_FILE on the
command line or manually add it into .config.
Add Kconfig entry: CONFIG_DTB_FILE
to be able to provide the path to DTB we want to embed
into Xen image. If no path provided - the dtb will not
be embedded.

Remove the line: AFLAGS-y += -DCONFIG_DTB_FILE=\"$(CONFIG_DTB_FILE)\"
as it is not needed since Kconfig will define it in a header
with all the other config options.

Move definition of _sdtb into dtb.S to prevent defining it
if there is no reference to it or if someone protects
_sdtb with #ifdef rather than with .ifnes. If the latter,
we will get a compiler error.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agoxen: introduce XENFEAT_direct_mapped and XENFEAT_not_direct_mapped
Stefano Stabellini [Fri, 12 Mar 2021 23:16:32 +0000 (15:16 -0800)]
xen: introduce XENFEAT_direct_mapped and XENFEAT_not_direct_mapped

Introduce two feature flags to tell the domain whether it is
direct-mapped or not. It allows the guest kernel to make informed
decisions on things such as swiotlb-xen enablement.

The introduction of both flags (XENFEAT_direct_mapped and
XENFEAT_not_direct_mapped) allows the guest kernel to avoid any
guesswork if one of the two is present, or fallback to the current
checks if neither of them is present.

XENFEAT_direct_mapped is always set for not auto-translated guests.

For auto-translated guests, only Dom0 on ARM is direct-mapped. Also,
see is_domain_direct_mapped() which refers to auto-translated guests:
xen/include/asm-arm/domain.h:is_domain_direct_mapped
xen/include/asm-x86/domain.h:is_domain_direct_mapped

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
CC: jbeulich@suse.com
CC: andrew.cooper3@citrix.com
CC: julien@xen.org
4 years agoxen/arm: Use register_t type of cpuinfo entries
Bertrand Marquis [Mon, 15 Mar 2021 10:38:30 +0000 (10:38 +0000)]
xen/arm: Use register_t type of cpuinfo entries

All cpu identification registers that we store in the cpuinfo structure
are 64bit on arm64 and 32bit on arm32 so storing the values in 32bit on
arm64 is removing the higher bits which might contain information in the
future.

This patch is changing the types in cpuinfo to register_t (which is
32bit on arm32 and 64bit on arm64) and adding the necessary paddings
inside the unions.
For consistency uint64_t entries are also changed to register_t on 64bit
systems.

It is also fixing all prints using directly the bits values from cpuinfo
to use PRIregister and adapt the printed value to print all bits
available on the architecture.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxenstore: handle daemon creation errors
Norbert Manthey [Fri, 26 Feb 2021 14:41:39 +0000 (15:41 +0100)]
xenstore: handle daemon creation errors

In rare cases, the path to the daemon socket cannot be created as it is
longer than PATH_MAX. Instead of failing with a NULL pointer dereference,
terminate the application with an error message.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agoxenstore_client: handle memory on error
Norbert Manthey [Fri, 26 Feb 2021 14:41:38 +0000 (15:41 +0100)]
xenstore_client: handle memory on error

In case a command fails, also free the memory. As this is for the CLI
client, currently the leaked memory is freed right after receiving the
error, as the application terminates next.

Similarly, if the allocation fails, do not use the NULL pointer
afterwards, but instead error out.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Thomas Friebel <friebelt@amazon.de>
Reviewed-by: Julien Grall <jgrall@amazon.co.uk>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agoxen/arm: mm: flush_page_to_ram() only need to clean to PoC
Julien Grall [Sat, 20 Feb 2021 17:54:13 +0000 (17:54 +0000)]
xen/arm: mm: flush_page_to_ram() only need to clean to PoC

At the moment, flush_page_to_ram() is both cleaning and invalidate to
PoC the page.

The goal of flush_page_to_ram() is to prevent corruption when the guest
has disabled the cache (the cache line may be dirty) and the guest to
read previous content.

Per this definition, the invalidating the line is not necessary. So
invalidating the cache is unnecessary. In fact, it may be counter-
productive as the line may be (speculatively) accessed a bit after.
So this will incurr an expensive access to the memory.

More generally, we should avoid interferring too much with cache.
Therefore, flush_page_to_ram() is updated to only clean to PoC the page.

The performance impact of this change will depend on your
workload/processor.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agox86/EFI: drop stale section special casing when generating base relocs
Jan Beulich [Thu, 1 Apr 2021 14:44:24 +0000 (16:44 +0200)]
x86/EFI: drop stale section special casing when generating base relocs

As of commit a6066af5b142 ("xen/init: Annotate all command line
parameter infrastructure as const") .init.setup has been part of .init.
As of commit 544ad7f5caf5 ("xen/init: Move initcall infrastructure into
.init.data") .initcall* have been part of .init. Hence neither can be
encountered as a stand-alone section in the final binaries anymore.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/ucode: log blob date also for AMD
Jan Beulich [Thu, 1 Apr 2021 14:43:50 +0000 (16:43 +0200)]
x86/ucode: log blob date also for AMD

Like Intel, AMD also records the date in their blobs. The field was
merely misnamed as "data_code" so far; this was perhaps meant to be
"date_code". Split it into individual fields, just like we did for Intel
some time ago, and extend the message logged after a successful update.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/vioapic: issue EOI to dpci when switching pin to edge trigger mode
Roger Pau Monné [Thu, 1 Apr 2021 14:42:54 +0000 (16:42 +0200)]
x86/vioapic: issue EOI to dpci when switching pin to edge trigger mode

When an IO-APIC pin is switched from level to edge trigger mode the
IRR bit is cleared, so it can be used as a way to EOI an interrupt at
the IO-APIC level.

Such EOI however does not get forwarded to the dpci code like it's
done for the local APIC initiated EOI. This change adds the code in
order to notify dpci of such EOI, so that dpci and the interrupt
controller are in sync.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/vioapic: top word redir entry writes don't trigger interrupts
Roger Pau Monné [Thu, 1 Apr 2021 14:41:48 +0000 (16:41 +0200)]
x86/vioapic: top word redir entry writes don't trigger interrupts

Top word writes just update the destination of the interrupt, but
since there's no change on the masking or the triggering mode no
guest interrupt injection can result of such write. Add an assert to
that effect.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoCHANGELOG.md: Make PV shim smaller by factoring out HVM-specific shadow code
George Dunlap [Wed, 24 Mar 2021 17:24:31 +0000 (17:24 +0000)]
CHANGELOG.md: Make PV shim smaller by factoring out HVM-specific shadow code

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoCHANGELOG.md: Add entries for emulation
George Dunlap [Wed, 24 Mar 2021 13:24:45 +0000 (13:24 +0000)]
CHANGELOG.md: Add entries for emulation

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoCHANGELOG.md: Add entries for CI loop
George Dunlap [Wed, 24 Mar 2021 16:20:28 +0000 (16:20 +0000)]
CHANGELOG.md: Add entries for CI loop

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoCHANGELOG.md: NetBSD lib/gnttab support
George Dunlap [Tue, 23 Mar 2021 17:06:20 +0000 (17:06 +0000)]
CHANGELOG.md: NetBSD lib/gnttab support

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoCHANGELOG.md: Add dom0/domU zstd compression support
George Dunlap [Tue, 23 Mar 2021 16:58:42 +0000 (16:58 +0000)]
CHANGELOG.md: Add dom0/domU zstd compression support

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoCHANGELOG.md: Add named PCI devices
George Dunlap [Tue, 23 Mar 2021 16:52:25 +0000 (16:52 +0000)]
CHANGELOG.md: Add named PCI devices

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Ian Jackson <ian.jackson@citrix.com>
4 years agoIntel Processor Trace Support: Add CHANGELOG.md and SUPPORT.md entries
George Dunlap [Tue, 23 Mar 2021 13:55:57 +0000 (13:55 +0000)]
Intel Processor Trace Support: Add CHANGELOG.md and SUPPORT.md entries

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Ian Jackson <ian.jackson@citrix.com>
4 years agox86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()
Jan Beulich [Tue, 30 Mar 2021 13:32:59 +0000 (15:32 +0200)]
x86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()

15 apparently once used to be the last valid type to request a callback
for, and the dimension of the respective array. The arrays meanwhile are
larger than this (in a benign way, i.e. no caller ever sets a mask bit
higher than 15), dimensioned by SH_type_unused. Have the ASSERT()s
follow suit and add build time checks at the call sites.

Sadly at least some Clang versions aren't as flexible with
_Static_assert() as gcc is - they demand a truly integer constant
expression, while gcc also permits constant variables.

Also adjust a comment naming the wrong of the two functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
4 years agoRevert "x86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()"
Jan Beulich [Tue, 30 Mar 2021 13:31:25 +0000 (15:31 +0200)]
Revert "x86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()"

This reverts commit c201d303e801a949b10f9e0f36cdc1938ddd399e - a
stale version (not working with clang) ended up getting committed.

4 years agoVT-d: restore flush hooks when disabling qinval
Jan Beulich [Tue, 30 Mar 2021 12:40:24 +0000 (14:40 +0200)]
VT-d: restore flush hooks when disabling qinval

Leaving the hooks untouched is at best a latent risk: There may well be
cases where some flush is needed, which then needs carrying out the
"register" way.

Switch from u<N> to uint<N>_t while needing to touch the function
headers anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agoVT-d: re-order register restoring in vtd_resume()
Jan Beulich [Tue, 30 Mar 2021 12:39:54 +0000 (14:39 +0200)]
VT-d: re-order register restoring in vtd_resume()

For one FECTL must be written last - the interrupt shouldn't be unmasked
without first having written the data and address needed to actually
deliver it. In the common case (when dma_msi_set_affinity() doesn't end
up bailing early) this happens from init_vtd_hw(), but for this to
actually have the intended effect we shouldn't subsequently overwrite
what was written there - this is only benign when old and new settings
match. Instead we should restore the registers ahead of calling
init_vtd_hw(), just for the unlikely case of dma_msi_set_affinity()
bailing early.

In the moved code drop some stray casts as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agoVT-d: leave FECTL write to vtd_resume()
Jan Beulich [Tue, 30 Mar 2021 12:39:23 +0000 (14:39 +0200)]
VT-d: leave FECTL write to vtd_resume()

We shouldn't blindly unmask the interrupt when resuming. vtd_resume()
will restore the intended state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agox86: fix build when NR_CPUS == 1
Jan Beulich [Tue, 30 Mar 2021 12:38:45 +0000 (14:38 +0200)]
x86: fix build when NR_CPUS == 1

In this case the compiler is recognizing that no valid array indexes
remain (in x2apic_cluster()'s access to per_cpu(cpu_2_logical_apicid,
...)), but oddly enough isn't really consistent about the checking it
does (see the code comment).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agohvm/mtrr: remove unnecessary is_hvm_domain check
Roger Pau Monné [Tue, 30 Mar 2021 12:37:53 +0000 (14:37 +0200)]
hvm/mtrr: remove unnecessary is_hvm_domain check

epte_get_entry_emt will only be called for HVM domains, so the
is_hvm_domain check is unnecessary. It's a remnant of PVHv1.

Shouldn't result in any functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agopublic: add RING_COPY_RESPONSE()
Marek Marczykowski-Górecki [Tue, 30 Mar 2021 12:37:38 +0000 (14:37 +0200)]
public: add RING_COPY_RESPONSE()

Using RING_GET_RESPONSE() on a shared ring is easy to use incorrectly
(i.e., by not considering that the other end may alter the data in the
shared ring while it is being inspected). Safe usage of a response
generally requires taking a local copy.

Provide a RING_COPY_RESPONSE() macro to use instead of
RING_GET_RESPONSE() and an open-coded memcpy().  This takes care of
ensuring that the copy is done correctly regardless of any possible
compiler optimizations.

Use a volatile source to prevent the compiler from reordering or
omitting the copy.

This generalizes similar RING_COPY_REQUEST() macro added in 3f20b8def0.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agoxen/decompress: make helper symbols static
Jan Beulich [Tue, 30 Mar 2021 12:33:48 +0000 (14:33 +0200)]
xen/decompress: make helper symbols static

The individual decompression CUs need to only surface their top level
functions to other code. Arrange for everything else to be static, to
make sure no undue uses of that code exist or will appear without
explicitly noticing. (In some cases this also results in code size
reduction, but since this is all init-only code this probably doesn't
matter very much.)

In the LZO case also take the opportunity and convert u8 where lines
get touched anyway.

The downside is that the top level functions will now be non-static
in stubdom builds of libxenguest, but I think that's acceptable. This
does require declaring them first, though, as the compiler warns about
the lack of declarations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agox86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()
Jan Beulich [Tue, 30 Mar 2021 12:32:44 +0000 (14:32 +0200)]
x86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()

15 apparently once used to be the last valid type to request a callback
for, and the dimension of the respective array. The arrays meanwhile are
larger than this (in a benign way, i.e. no caller ever sets a mask bit
higher than 15), dimensioned by SH_type_unused. Have the ASSERT()s
follow suit and add build time checks at the call sites.

Also adjust a comment naming the wrong of the two functions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
4 years agotools/xenstored: Remove unnecessary define XC_WANT_COMPAT_MAP_FOREIGN_API
Julien Grall [Thu, 25 Mar 2021 11:39:23 +0000 (11:39 +0000)]
tools/xenstored: Remove unnecessary define XC_WANT_COMPAT_MAP_FOREIGN_API

The last use of the compat foreign API was dropped in commit
38eeb3864de4 "tools/xenstored: Drop mapping of the ring via foreign
map".

Therefore, we don't need to define XC_WANT_COMPAT_MAP_FOREIGN_API.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agox86/HPET: don't enable legacy replacement mode unconditionally
Jan Beulich [Wed, 24 Mar 2021 10:34:32 +0000 (11:34 +0100)]
x86/HPET: don't enable legacy replacement mode unconditionally

Commit e1de4c196a2e ("x86/timer: Fix boot on Intel systems using ITSSPRC
static PIT clock gating") was reported to cause boot failures on certain
AMD Ryzen systems. Until we can figure out what the actual issue there
is, skip this new part of HPET setup by default. Introduce a "hpet"
command line option to allow enabling this on hardware where it's really
needed for Xen to boot successfully (i.e. where the PIT doesn't drive
the timer interrupt).

Since it makes little sense to introduce just "hpet=legacy-replacement",
also allow for a boolean argument as well as "broadcast" to replace the
separate "hpetbroadcast" option.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoCHANGELOG: Correct sub-section headings
Ian Jackson [Wed, 24 Mar 2021 16:30:30 +0000 (16:30 +0000)]
CHANGELOG: Correct sub-section headings

Signed-off-by: Ian Jackson <iwj@xenproject.org>
(cherry picked from commit 0f93d79a97121c55d3f3e26304d437ddb38de6a7)

4 years agotools/libfsimage: Bump SONAME to 4.16
Andrew Cooper [Thu, 25 Mar 2021 19:40:58 +0000 (19:40 +0000)]
tools/libfsimage: Bump SONAME to 4.16

Fixes: a04509d34d ("Branching: Update version files etc. for newly unstable")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/mem_sharing: copy parent VM's hostp2m's max_mapped_pfn during forking
Tamas K Lengyel [Fri, 26 Mar 2021 15:17:07 +0000 (16:17 +0100)]
x86/mem_sharing: copy parent VM's hostp2m's max_mapped_pfn during forking

When creating a VM fork copy the parent VM's hostp2m max_mapped_pfn value. Some
toolstack relies on the XENMEM_maximum_gpfn value to establish the maximum
addressable physical memory in the VM and for forks that have not yet been
unpaused that value is not going to reflect the correct max gpfn that's
possible to populate into the p2m. This patch fixes the issue.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agochangelog: note MSR access change
Roger Pau Monné [Fri, 26 Mar 2021 15:16:48 +0000 (16:16 +0100)]
changelog: note MSR access change

The change to deny all accesses to MSRs indexes not explicitly handled
prevents leaking unwanted data into guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoRevert "x86/msr: drop compatibility #GP handling in guest_{rd,wr}msr()"
Andrew Cooper [Fri, 26 Mar 2021 15:08:39 +0000 (16:08 +0100)]
Revert "x86/msr: drop compatibility #GP handling in guest_{rd,wr}msr()"

In hindsight, this was a poor move.  Some of these MSRs require probing for,
cause unhelpful spew into xl dmesg, or cause spew from unit tests explicitly
checking behaviour.

This restores behaviour close to that of Xen 4.14, meaning in particular
that for all of the MSRs getting re-added explicitly a #GP fault will get
raised irrespective of the new "msr_relaxed" setting.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agodocs/misc: xenstored: Re-instate and tweak the documentation for XS_RESUME
Julien Grall [Thu, 25 Mar 2021 17:46:30 +0000 (17:46 +0000)]
docs/misc: xenstored: Re-instate and tweak the documentation for XS_RESUME

Commit 13dd372834a4 removed the documentation for XS_RESUME, however
this command is still implemented (at least in C Xenstored) and used by
libxl when resuming a domain.

So re-instate the documentation for the XS_RESUME. Take the opportunity
to update it as there is a user of the command.

Fixes: 13dd372834a4 ("docs/designs: re-work the xenstore migration document...")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agodocs/design: Update xenstore-migration.md
Julien Grall [Thu, 25 Mar 2021 10:42:46 +0000 (10:42 +0000)]
docs/design: Update xenstore-migration.md

It is not very clear the shared page adddress is not contained in the
connection record. Additionally, it is misleading to say the grant
will always point to the share paged as a domain is free to revoke the
permission. The restore code would need to make sure it doesn't
fail/crash if this is happening.

The sentence is now replaced with a paragraph explaining why the GFN is
not preserved and that the grant is not guarantee to exist during
restore.

Take the opportunity to replace "code" with "node" when description the
permission.

Reported-by: Raphael Ning <raphning@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen: Drop "-rc" suffix from XEN_EXTRAVERSION
Andrew Cooper [Thu, 25 Mar 2021 15:05:55 +0000 (15:05 +0000)]
xen: Drop "-rc" suffix from XEN_EXTRAVERSION

Fixes: a04509d34d ("Branching: Update version files etc. for newly unstable")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoBranching: Rerun autoconf to put version right
Ian Jackson [Wed, 24 Mar 2021 16:27:45 +0000 (16:27 +0000)]
Branching: Rerun autoconf to put version right

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agoBranching: Update version files etc. for newly unstable
Ian Jackson [Wed, 24 Mar 2021 16:26:36 +0000 (16:26 +0000)]
Branching: Update version files etc. for newly unstable

Signed-off-by: Ian Jackson <iwj@xenproject.org>
4 years agoVT-d: correct off-by-1 in number-of-IOMMUs check
Jan Beulich [Tue, 23 Mar 2021 16:01:30 +0000 (17:01 +0100)]
VT-d: correct off-by-1 in number-of-IOMMUs check

Otherwise, if we really run on a system with this many IOMMUs,
entering/leaving S3 would overrun iommu_state[].

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoSUPPORT.MD: Mark LiveUpdate of C/OCaml xenstored daemon as Tech Preview
Julien Grall [Sat, 13 Mar 2021 13:50:44 +0000 (13:50 +0000)]
SUPPORT.MD: Mark LiveUpdate of C/OCaml xenstored daemon as Tech Preview

Support to liveupdate C/OCaml xenstored daemon was added during the
4.15 development cycle. Add two new sections in SUPPORT.MD to explain
what is the support state.

For now, it is a tech preview.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: Fix domain soft reset state handling
Anthony PERARD [Wed, 24 Feb 2021 18:39:20 +0000 (18:39 +0000)]
libxl: Fix domain soft reset state handling

In do_domain_soft_reset(), a `libxl__domain_suspend_state' is used
without been properly initialised and disposed of. This lead do a
abort() in libxl due to the `dsps.qmp' state been used before been
initialised:
    libxl__ev_qmp_send: Assertion `ev->state == qmp_state_disconnected || ev->state == qmp_state_connected' failed.

Once initialised, `dsps' also needs to be disposed of as the `qmp'
state might still be in the `Connected' state in the callback for
libxl__domain_suspend_device_model(). So this patch adds
libxl__domain_suspend_dispose() which can be called from the two
places where we need to dispose of `dsps'.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Tested-by: Olaf Hering <olaf@aepfle.de>
4 years agoxen: Bump the minimum version of GCC supported to 4.9 for arm32 and 5.1 on arm64
Julien Grall [Sat, 6 Mar 2021 21:41:48 +0000 (21:41 +0000)]
xen: Bump the minimum version of GCC supported to 4.9 for arm32 and 5.1 on arm64

Compilers older than 4.8 have known codegen issues which can lead to
silent miscompilation:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145

Furthermore, pre-4.9 GCC have known bugs (including things like
internal compiler errors on Arm) which would require workaround (I
haven't checked if we have any in Xen).

The minimum version of GCC to build the hypervisor on arm is now
raised to 4.9.

In addition to that, on arm64, GCC version >= 4.9 and < 5.1 have been
shown to emit memory references beyond the stack pointer, resulting in
memory corruption if an interrupt is taken after the stack pointer has
been adjusted but before the reference has been executed.

Therefore, the minimum for arm64 is raised to 5.1.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/x86: don't rebuild cpuid-autogen.h every time
Jan Beulich [Mon, 15 Mar 2021 07:33:53 +0000 (08:33 +0100)]
tools/x86: don't rebuild cpuid-autogen.h every time

The first thing the "xen-dir" rule does is delete the entire xen/
subtree. Obviously this includes deleting xen/lib/x86/*autogen.h. As a
result there's no original version for $(move-if-changed ...) to compare
against, and hence the file and all its consumers would get rebuilt
every time. Instead only find and delete all the symlinks.

Fixes: eddf9559c977 ("libx86: generate cpuid-autogen.h in the libx86 include dir")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/AMD: expose HWCR.TscFreqSel to guests
Jan Beulich [Fri, 12 Mar 2021 11:03:06 +0000 (12:03 +0100)]
x86/AMD: expose HWCR.TscFreqSel to guests

Linux has been warning ("firmware bug") about this bit being clear for a
long time. While writable in older hardware it has been readonly on more
than just most recent hardware. For simplicitly report it always set (if
anything we may want to log the issue ourselves if it turns out to be
clear on older hardware) on CPU families 10h and up (in family 0fh the
bit is part of a larger field of different purpose).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/PV: conditionally avoid raising #GP for early guest MSR reads
Jan Beulich [Fri, 12 Mar 2021 11:02:42 +0000 (12:02 +0100)]
x86/PV: conditionally avoid raising #GP for early guest MSR reads

Prior to 4.15 Linux, when running in PV mode, did not install a #GP
handler early enough to cover for example the rdmsrl_safe() of
MSR_K8_TSEG_ADDR in bsp_init_amd() (not to speak of the unguarded read
of MSR_K7_HWCR later in the same function). The respective change
(42b3a4cb5609 "x86/xen: Support early interrupts in xen pv guests") was
backported to 4.14, but no further - presumably since it wasn't really
easy because of other dependencies.

Therefore, to prevent our change in the handling of guest MSR accesses
to render PV Linux 4.13 and older unusable on at least AMD systems, make
the raising of #GP on this paths conditional upon the guest having
installed a handler, provided of course the MSR can be read in the first
place (we would have raised #GP in that case even before). Producing
zero for reads isn't necessarily correct and may trip code trying to
detect presence of MSRs early, but since such detection logic won't work
without a #GP handler anyway, this ought to be a fair workaround.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agognttab: work around "may be used uninitialized" warning
Jan Beulich [Fri, 12 Mar 2021 16:35:54 +0000 (17:35 +0100)]
gnttab: work around "may be used uninitialized" warning

Sadly I was wrong to suggest dropping vaddrs' initializer during review
of v2 of the patch introducing this code. gcc 4.3 can't cope.

Fixes: 52531c734ea1 ("xen/gnttab: Rework resource acquisition")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agoxen: fix for_each_cpu when NR_CPUS=1
Dario Faggioli [Fri, 12 Mar 2021 16:02:47 +0000 (17:02 +0100)]
xen: fix for_each_cpu when NR_CPUS=1

When running an hypervisor build with NR_CPUS=1 for_each_cpu does not
take into account whether the bit of the CPU is set or not in the
provided mask.

This means that whatever we have in the bodies of these loops is always
done once, even if the mask was empty and it should never be done. This
is clearly a bug and was in fact causing an assert to trigger in credit2
code.

Removing the special casing of NR_CPUS == 1 makes things work again.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agovtd: make sure QI/IR are disabled before initialisation
Igor Druzhinin [Fri, 12 Mar 2021 16:01:52 +0000 (17:01 +0100)]
vtd: make sure QI/IR are disabled before initialisation

BIOS might pass control to Xen leaving QI and/or IR in enabled and/or
partially configured state. In case of x2APIC code path where EIM is
enabled early in boot - those are correctly disabled by Xen before any
attempt to configure. But for xAPIC that step is missing which was
proven to cause QI initialization failures on some ICX based platforms
where QI is left pre-enabled and partially configured by BIOS. That
problem becomes hard to avoid since those platforms are shipped with
x2APIC opt out being advertised by default at the same time by firmware.

Unify the behaviour between x2APIC and xAPIC code paths keeping that in
line with what Linux does.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agox86/msr: introduce an option for compatible MSR behavior selection
Roger Pau Monné [Fri, 12 Mar 2021 07:59:56 +0000 (08:59 +0100)]
x86/msr: introduce an option for compatible MSR behavior selection

Introduce an option to allow selecting a behavior similar to the pre
Xen 4.15 one for accesses to MSRs not explicitly handled. Since commit
84e848fd7a162f669 and 322ec7c89f6640e accesses to MSRs not explicitly
handled by Xen result in the injection of a #GP to the guest. This
is a behavior change since previously a #GP was only injected if
accessing the MSR on the real hardware would also trigger a #GP, or if
the attempted to be set bits wouldn't match the hardware values (for
PV). The reasons for not leaking hardware MSR values and injecting a
#GP are fully valid, so the solution proposed here should be
considered a temporary workaround until all the required MSRs are
properly handled.

This seems to be problematic for some guests, so introduce an option
to fallback to this kind of legacy behavior without leaking the
underlying MSR values to the guest.

When the option is set, for both PV and HVM don't inject a #GP to the
guest on MSR read if reading the underlying MSR doesn't result in a
#GP, do the same for writes and simply discard the value to be written
on that case.

Note that for guests restored or migrated from previous Xen versions
the option is enabled by default, in order to keep a compatible
MSR behavior. Such compatibility is done at the libxl layer, to avoid
higher-level toolstacks from having to know the details about this flag.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
4 years agotools/libs: Fix headers.chk logic
Andrew Cooper [Thu, 4 Mar 2021 22:30:00 +0000 (22:30 +0000)]
tools/libs: Fix headers.chk logic

c/s 4664034cd dropped the $(LIBHEADERSGLOB) dependency for the headers.chk
rule, without replacing it.

As headers.chk uses $^, a typical build looks like:

  andrewcoop@andrewcoop:/local/xen.git$ make -C tools/libs/devicemodel/
  make: Entering directory '/local/xen.git/tools/libs/devicemodel'
  for i in ; do \
      gcc -x c -ansi -Wall -Werror -I/local/xen.git/tools/libs/devicemodel/../../../tools/include \
            -S -o /dev/null $i || exit 1; \
      echo $i; \
  done >headers.chk.new
  mv headers.chk.new headers.chk

i.e. with an empty for loop.

Reinsert a $(LIBHEADERS) dependency, so more than just the $(AUTOINCS) get
checked.

Fixes: 4664034cd ("tools/libs: move official headers to common directory")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>