Andrew Cooper [Tue, 11 Aug 2020 15:05:06 +0000 (16:05 +0100)]
x86/pv: Rewrite segment context switching from scratch
There are multiple bugs with the existing implementation.
On AMD CPUs prior to Zen2, loading a NUL segment selector doesn't clear the
segment base, which is a problem for 64bit code which typically expects to use
a NUL %fs/%gs selector.
On a context switch from any PV vcpu, to a 64bit PV vcpu with an %fs/%gs
selector which faults, the fixup logic loads NUL, and the guest is entered at
the failsafe callback with the stale base.
Alternatively, a PV context switch sequence of 64 (NUL, non-zero base) =>
32 (NUL) => 64 (NUL, zero base) will similarly cause Xen to enter the guest
with a stale base.
Both of these corner cases manifest as state corruption in the final vcpu.
However, damage is limited to to 64bit code expecting to use Thread Local
Storage with a base pointer of 0, which doesn't occur by default.
The context switch logic is extremely complicated, and is attempting to
optimise away loading a NUL selector (which is fast), or writing a 64bit base
of 0 (which is rare). Furthermore, it fails to respect Linux's ABI with
userspace, which manifests as userspace state corruption as far as Linux is
concerned.
Always restore all selector and base state, in all cases.
Leave a large comment explaining hardware behaviour, and the new ABI
expectations. Update the comments in the public headers.
Drop all "segment preloading" to handle the AMD corner case. It was never
anything but a waste of time for %ds/%es, and isn't needed now that %fs/%gs
bases are unconditionally written for 64bit PV guests. In load_segments(),
store the result of is_pv_32bit_vcpu() as it is an expensive predicate now,
and not used in a way which impacts speculative safety.
Reported-by: Andy Lutomirski <luto@kernel.org> Reported-by: Sarah Newman <srn@prgmr.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 11 Aug 2020 15:05:06 +0000 (16:05 +0100)]
x86/pv: Fix consistency of 64bit segment bases
The comments in save_segments(), _toggle_guest_pt() and write_cr() are false.
The %fs and %gs bases can be updated at any time by the guest.
As a consequence, Xen's fs_base/etc tracking state is always stale when the
vcpu is in context, and must not be used to complete MSR_{FS,GS}_BASE reads, etc.
In particular, a sequence such as:
wrmsr(MSR_FS_BASE, 0x1ull << 32);
write_fs(__USER_DS);
base = rdmsr(MSR_FS_BASE);
will return the stale base, not the new base. This may cause guest a guest
kernel's context switching of userspace to malfunction.
Therefore:
* Update save_segments(), _toggle_guest_pt() and read_msr() to always read
the segment bases from hardware.
* Update write_cr(), write_msr() and do_set_segment_base() to not not waste
time caching data which is instantly going to become stale again.
* Provide comments explaining when the tracking state is and isn't stale.
This bug has been present for 14 years, but several bugfixes since have built
on and extended the original flawed logic.
Fixes: ba9adb737ba ("Apply stricter checking to RDMSR/WRMSR emulations.") Fixes: c42494acb2f ("x86: fix FS/GS base handling when using the fsgsbase feature")
Fixed: eccc170053e ("x86/pv: Don't have %cr4.fsgsbase active behind a guest kernels back") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/svm: silently drop writes to SYSCFG and related MSRs
The SYSCFG, TOP_MEM1 and TOP_MEM2 MSRs are currently exposed to guests
and writes are silently discarded. Make this explicit in the SVM code
now, and just return default constant values when attempting to read
any of the MSRs, while continuing to silently drop writes.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Such handling consist in checking that no bits have been changed from
the read value, if that's the case silently drop the write, otherwise
inject a fault.
At least Windows guests will expect to write to the MISC_ENABLE MSR
with the same value that's been read from it.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
I was based on the flase assumption that padding fields need no
copying: If native code checks such fields, they of course need
copying in. And if the ABI specifies them to be zero on
completion, we also need to copy them out.
Wei Chen [Fri, 28 Aug 2020 02:34:04 +0000 (02:34 +0000)]
xen/arm: Throw messages for unknown FP/SIMD implement ID
Arm ID_AA64PFR0_EL1 register provides two fields to describe CPU
FP/SIMD implementations. Currently, we exactly know the meaning of
0x0, 0x1 and 0xf of these fields. Xen treats value < 8 as FP/SIMD
features presented. If there is a value 0x2 bumped in the future,
Xen behaviors for value <= 0x1 can also take effect. But what Xen
done for value <= 0x1 may not always cover new value 0x2 required.
We throw these messages to break the silence when Xen detected
unknown FP/SIMD IDs to notice user to check.
Wei Chen [Fri, 28 Aug 2020 02:34:03 +0000 (02:34 +0000)]
xen/arm: Missing N1/A76/A75 FP registers in vCPU context switch
Xen has cpu_has_fp/cpu_has_simd to detect whether the CPU supports
FP/SIMD or not. But currently, these two MACROs only consider value 0
of ID_AA64PFR0_EL1.FP/SIMD as FP/SIMD features enabled. But for CPUs
that support FP/SIMD and half-precision floating-point arithmetic, the
ID_AA64PFR0_EL1.FP/SIMD are 1 (see Arm ARM DDI0487F.b, D13.2.64).
For these CPUs, xen will treat them as no FP/SIMD support, the
vfp_save/restore_state will not take effect.
From the TRM documents of Cortex-A75/A76/N1, we know these CPUs support
basic Advanced SIMD/FP and half-precision floating-point arithmetic. In
this case, on N1/A76/A75 platforms, Xen will always miss the floating
pointer registers save/restore. If different vCPUs are running on the
same pCPU, the floating pointer registers will be corrupted randomly.
Ian Jackson [Fri, 4 Sep 2020 15:42:01 +0000 (16:42 +0100)]
minios: Revert recent change and revert to working minios
Currently, xen.git#staging does not build in many environments because
of issues with minios master. This regression was introduced in an
uncontrolled manner by an update to mini-os.git#master.
This is because in e013e8514389 "config: use mini-os master for
unstable" we switched to tracking minios master in an uncontrolled
manner. At the time we thought it was unlikely that minios changes
would break the Xen build. This turns out to have been overly
optimistic.
Xen currently uses unstable internal interfaces of minios. Until this
can be sorted out, internal changes to minios can require lockstep
changes in Xen.
All this means that "config: use mini-os master for unstable" was
wrong. We should undo it. Instead, we go back to the previous
situation: xen.git names a specific minios commit.
This scheme is the model used for qemu-xen-traditional.
That nailed commit must be updated manually, to have xen.git pick up
changes from minios. If the minios changes require changes in xen.git
too, to avoid breaking the Xen build, they can be made freely in
minios without adverse consequences. When the minios commitid is
updated in xen.git, the corresponding changes to the actual source
files in xen.git should be bundled together.
For example, when minios is fixed, 8d990807ec2c "stubdom/grub: update
init_netfront() call for mini-os" will need to be reapplied, folded
into the same commit as updates MINIOS_UPSTREAM_REVISION. For now
that commit must be reverted as we are going back to a previous
version of minios.
CC: Jan Beulich <jbeulich@suse.com> CC: Costin Lupu <costin.lupu@cs.pub.ro> CC: Wei Liu <wl@xen.org> CC: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Ian Jackson <iwj@xenproject.org>
make 3.81 doesn't support multiline variables defined with
define var =
...
endef
Dropping the "=" in the first line will fix the issue.
Fixes: ded08cdfa72bb ("tools: generate most contents of library make variables") Fixes: ddb2934a914df ("stubdom: add correct dependencies for Xen libraries") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:15:21 +0000 (11:15 +0200)]
x86: only generate compat headers actually needed
As was already the case for XSM/Flask, avoid generating compat headers
when they're not going to be needed. To address resulting build issues
- move compat/hvm/dm_op.h inclusion to the only source file needing it,
- add a little bit of #ifdef-ary.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:14:28 +0000 (11:14 +0200)]
flask: drop dead compat translation code
Translation macros aren't used (and hence needed) at all (or else a
devicetree_label entry would have been missing), and userlist has been
removed quite some time ago.
No functional change.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:13:01 +0000 (11:13 +0200)]
x86: generalize padding field handling
The original intention was to ignore padding fields, but the pattern
matched only ones whose names started with an underscore. Also match
fields whose names are in line with the C spec by not having a leading
underscore. (Note that the leading ^ in the sed regexps was pointless
and hence get dropped.)
This requires adjusting some vNUMA macros, to avoid triggering
"enumeration value ... not handled in switch" warnings, which - due to
-Werror - would cause the build to fail. (I have to admit that I find
these padding fields odd, when translation of the containing structure
is needed anyway.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:11:35 +0000 (11:11 +0200)]
evtchn: add compat struct checking for newer sub-ops
Various additions to the interface did not get mirrored into the compat
handling machinery. Luckily all additions were done in ways not making
any form of translation necessary.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:08:40 +0000 (11:08 +0200)]
x86/dmop: add compat struct checking for XEN_DMOP_map_mem_type_to_ioreq_server
This was forgotten when the subop was added.
Also take the opportunity and move the dm_op_relocate_memory entry in
xlat.lst to its designated place.
No change in the resulting generated code.
Fixes: ca2b511d3ff4 ("x86/ioreq server: add DMOP to map guest ram with p2m_ioreq_server to an ioreq server") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:06:56 +0000 (11:06 +0200)]
x86/mce: add compat struct checking for XEN_MC_inject_v2
84e364f2eda2 ("x86: add CMCI software injection interface") merely made
sure things would build, without any concern about things actually
working:
- despite the addition of xenctl_bitmap to xlat.lst, the resulting macro
wasn't invoked anywhere (which would have lead to recognizing that the
structure appeared to have no fully compatible layout, despite the use
of a 64-bit handle),
- the interface struct itself was neither added to xlat.lst (and the
resulting macro then invoked) nor was any manual checking of
individual fields added.
Adjust compat header generation logic to retain XEN_GUEST_HANDLE_64(),
which is intentionally layed out to be compatible between different size
guests. Invoke the missing checking (implicitly through CHECK_mc).
No change in the resulting generated code.
Fixes: 84e364f2eda2 ("x86: add CMCI software injection interface") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:05:46 +0000 (11:05 +0200)]
x86: fix compat header generation
As was pointed out by 0e2e54966af5 ("mm: fix public declaration of
struct xen_mem_acquire_resource"), we're not currently handling structs
correctly that have uint64_aligned_t fields. #pragma pack(4) suppresses
the necessary alignment even if the type did properly survive (which
it also didn't) in the process of generating the headers. Overall,
with the above mentioned change applied, there's only a latent issue
here afaict, i.e. no other of our interface structs is currently
affected.
As a result it is clear that using #pragma pack(4) is not an option.
Drop all uses from compat header generation. Make sure
{,u}int64_aligned_t actually survives, such that explicitly aligned
fields will remain aligned. Arrange for {,u}int64_t to be transformed
into a type that's 64 bits wide and 4-byte aligned, by utilizing that
in typedef-s the "aligned" attribute can be used to reduce alignment.
Additionally, for the cases where native structures get re-used,
enforce suitable alignment via typedef-s (which allow alignment to be
reduced).
This use of typedef-s makes necessary changes to CHECK_*() macro
generation: Previously get-fields.sh relied on finding struct/union
keywords when other compound types were used. We need to use the
typedef-s (guaranteeing suitable alignment) now, and hence the script
has to recognize those cases, too. (Unfortunately there are a few
special cases to be dealt with, but this is really not much different
from e.g. the pre-existing compat_domain_handle_t special case.)
This need to use typedef-s is certainly somewhat fragile going forward,
as in similar future cases it is imperative to also use typedef-s, or
else the CHECK_*() macros won't check what they're supposed to check. I
don't currently see any means to avoid this fragility, though.
There's one change to generated code according to my observations: In
arch_compat_vcpu_op() the runstate area "area" variable would previously
have been put in a just 4-byte aligned stack slot (despite being 8 bytes
in size), whereas now it gets put in an 8-byte aligned location.
There also results some curious inconsistency in struct xen_mc from
these changes - I intend to clean this up later on. Otherwise unrelated
code would also need adjustment right here.
Additionally a note about the apparently superfluous () in
compat-build-header.py: The simpler form
Michael Kurth [Fri, 4 Sep 2020 09:01:45 +0000 (11:01 +0200)]
add additional symbols to xen-syms.map
Add "all_symbols" to all /tools/symbols calls so that
xen-syms.map lists all symbols and not only .text section
symbols. This change enhances debugging and livepatch
capabilities.
Signed-off-by: Michael Kurth <mku@amazon.de> Reviewed-by: Eslam Elnikety <elnikety@amazon.de> Reviewed-by: Julien Grall <jgrall@amazon.co.uk> Reviewed-by: Robert Stonehouse <rjstone@amazon.co.uk> Reviewed-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Acked-by: Jan Beulich <jbeulich@suse.com>
Report LFENCE_SERIALISE unconditionally for DE_CFG on AMD hardware and
silently drop writes.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 08:59:33 +0000 (10:59 +0200)]
x86: simplify is_guest_l2_slot()
is_pv_32bit_domain() has become expensive, and its use here is
redundant: Only 32-bit guests would ever get PGT_pae_xen_l2 set on
their L2 page table pages anyway. (If some other error does lead to
PGT_pae_xen_l2 ending up anywhere else, we still don't want to allow a
guest to control the entries.)
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 31 Aug 2020 11:18:42 +0000 (12:18 +0100)]
x86/pv: Fix multiple bugs with SEGBASE_GS_USER_SEL
The logic takes the segment selector unmodified from guest context. This
allowed the guest to load DPL0 descriptors into %gs. Fix up the RPL for
non-NUL selectors to be 3.
Xen's context switch logic skips saving the inactive %gs base, as it cannot be
modified by the guest behind Xen's back. This depends on Xen caching updates
to the inactive base, which is was missing from this path.
The consequence is that, following SEGBASE_GS_USER_SEL, the next context
switch will restore the stale inactive %gs base, and corrupt vcpu state.
Rework the hypercall to update the cached idea of gs_base_user, and fix the
behaviour in the case of the AMD NUL selector bug to always zero the segment
base.
Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 10 Apr 2018 15:25:40 +0000 (16:25 +0100)]
x86/intel: Expose MSR_ARCH_CAPS to dom0
The overhead of (the lack of) MDS_NO alone has been measured at 30% on some
workloads. While we're not in a position yet to offer MSR_ARCH_CAPS generally
to guests, dom0 doesn't migrate, so we can pass a subset of hardware values
straight through.
This will cause PVH dom0's not to use KPTI by default, and all dom0's not to
use VERW flushing by default, and to use eIBRS in preference to retpoline on
recent Intel CPUs.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Sun, 23 Aug 2020 08:00:12 +0000 (10:00 +0200)]
tools: tweak tools/libs/libs.mk for being able to support libxenctrl
tools/libs/libs.mk needs to be modified for being able to support
building libxenctrl, as the pkg-config file of that library is not
following the same conventions as those of the other libraries.
So add support for specifying PKG_CONFIG before including libs.mk.
In order to make life easier for unstable libraries like libxenctrl
set MAJOR and MINOR automatically to the Xen-version and 0 when not
specified. This removes the need to bump the versions of unstable
libraries when switching to a new Xen version.
As all libraries built via libs.mk require a map file generate a dummy
one in case there is none existing. This again will help avoiding the
need to bump the libarary version in the map file of an unstable
library in case it is exporting all symbols.
The clean target is missing the removal of _paths.h.
Finally drop the foreach loop when setting PKG_CONFIG_LOCAL, as there
is always only one element in PKG_CONFIG.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Costin Lupu [Fri, 28 Aug 2020 07:17:44 +0000 (09:17 +0200)]
stubdom/grub: update init_netfront() call for mini-os
This patch updates the call of init_netfront() function according to its
recently updated declaration which can also include parameters for gateway
and netmask addresses. While we are here, the patch also removes passing
the ip parameter because (a) it is not used anywhere and (b) it wastes
memory since it would reference a dynamically allocated string.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:12 +0000 (10:00 +0200)]
stubdom: simplify building xen libraries for stubdoms
The pattern for building a Xen library with sources under tools/libs
is always the same. Simplify stubdom/Makefile by defining a callable
make program for those libraries.
Even if not needed right now add the possibility for defining
additional dependencies for a library.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: generate most contents of library make variables
Library related make variables (CFLAGS_lib*, SHDEPS_lib*, LDLIBS_lib*
and SHLIB_lib*) mostly have a common pattern for their values. Generate
most of this content automatically by adding a new per-library variable
defining on which other libraries a lib is depending. Those definitions
are put into an own file in order to make it possible to include it
from various Makefiles, especially for stubdom.
This in turn makes it possible to drop the USELIB variable from each
library Makefile.
The LIBNAME variable can be dropped, too, as it can be derived from the
directory name the library is residing in.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: add a copy of library headers in tools/include
The headers.chk target in tools/Rules.mk tries to compile all headers
stand alone for testing them not to include any internal header.
Unfortunately the headers tested against are not complete, as any
header for a Xen library is not included in the include path of the
test compile run, resulting in a failure in case any of the tested
headers in including an official Xen library header.
Fix that by copying the official headers located in
tools/libs/*/include to tools/include.
In order to support libraries with header name other than xen<lib>.h
or with multiple headers add a LIBHEADER make variable a lib specific
Makefile can set in that case.
Move the headers.chk target from Rules.mk to libs.mk as it is used
for libraries in tools/libs only.
Add NO_HEADERS_CHK variable to skip checking headers as this will be
needed e.g. for libxenctrl.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: switch XEN_LIBXEN* make variables to lower case (XEN_libxen*)
In order to harmonize names of library related make variables switch
XEN_LIBXEN* names to XEN_libxen*, as all other related variables (e.g.
CFLAGS_libxen*, SHDEPS_libxen*, ...) already use this pattern.
Rename XEN_LIBXC to XEN_libxenctrl, XEN_XENSTORE to XEN_libxenstore,
XEN_XENLIGHT to XEN_libxenlight, XEN_XLUTIL to XEN_libxlutil, and
XEN_LIBVCHAN to XEN_libxenvchan for the same reason.
Introduce XEN_libxenguest with the same value as XEN_libxenctrl.
No functional change.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Tue, 10 Apr 2018 15:25:40 +0000 (16:25 +0100)]
x86: Begin to introduce support for MSR_ARCH_CAPS
... including serialisation/deserialisation logic and unit tests.
There is no current way to configure this MSR correctly for guests.
The toolstack side this logic needs building, which is far easier to
do with it in place.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Thu, 27 Aug 2020 10:48:38 +0000 (10:48 +0000)]
gitignore: ignore ebmalloc.c soft link
A previous commit split ebmalloc to its own translation unit but forgot
to modify gitignore.
Fixes: 8856a914bffd ("build: also check for empty .bss.* in .o -> .init.o conversion") Signed-off-by: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 11 Aug 2020 08:02:01 +0000 (09:02 +0100)]
xl: add 'mtu' option to network configuration
This patch adds code to parse a value for MTU from the network configuration
if it is present. The documentation in xl-network-configuration.5.pod is
also modified accordingly.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:02:00 +0000 (09:02 +0100)]
tools/hotplug/Linux: modify set_mtu() to optionally use a configured value...
...and also inform the frontend.
The set_mtu() function in xen-network-common.sh currently sets the backend
vif MTU to match that of the bridge.
A prior patch added code into libxl such that a tools-configured 'mtu' value
may be present in the xenstore backend area. If the node is present in
xenstore then it should be authoritative. Hence set_mtu() is modified to only
read the MTU of the bridge if it is not present.
The function is also modified to write whatever value it applies to the
backend vif into the xenstore frontend area where is may then be used to
configure the frontend network stack.
NOTE: There is also a small modification replacing '$mtu' with '${mtu}'
for style consistency.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:01:59 +0000 (09:01 +0100)]
libxl: wire the libxl_device_nic 'mtu' value into xenstore
Currently the 'mtu' field of libxl_device_nic objects is effectively ignored:
It is set by libxl__device_nic_setdefault() to a slightly odd default value of
1492 but otherwise ignored.
This patch changes the default value to a more conventional 1500 and modifies
libxl__set_xenstore_nic() to write the value into an 'mtu' node in the
xenstore backend area (if it is a non-default value), as well as a read-only
node of the same name in the frontend area.
The backend node is used to set the value of 'mtu' in
libxl__nic_from_xenstore(), when retrieving the configuration.
NOTE: There is currently no way to set a non-default value of 'mtu', hence
the backend node is never written. This, however, will be addressed
by a subsequent patch.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:01:58 +0000 (09:01 +0100)]
tools/hotplug/Linux: remove code duplication in vif-bridge
The 'add' and 'online' cases do exactly the same thing so have 'add' simply
fall through to 'online'.
NOTE: This patch also adds in the missing 'remove' case, which falls though
to 'offline'. (The former is passed for 'tap' devices, the latter for
'vif' devices).
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:01:57 +0000 (09:01 +0100)]
tools/hotplug/Linux: add remove_from_bridge()
This patch adds a remove_from_bridge() function into xen-network-common.sh
to partner with the existing add_to_bridge() function. The vif-bridge
script is then modified to use it.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Paul Durrant [Tue, 11 Aug 2020 08:01:55 +0000 (09:01 +0100)]
public/io/netif: specify MTU override node
There is currently no documentation to state what MTU a frontend should
adertise to its network stack. It has however long been assumed that the
default value of 1500 is correct.
This patch specifies a mechanism to allow the tools to set the MTU via a
xenstore node in the frontend area and states that the absence of that node
means the frontend should assume an MTU of 1500 octets.
NOTE: The Windows PV frontend has used an MTU sampled from the xenstore
node specified in this patch.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wl@xen.org>
libxl: fix -Werror=stringop-truncation in libxl__prepare_sockaddr_un
In file included from /usr/include/string.h:495,
from libxl_internal.h:38,
from libxl_utils.c:20:
In function 'strncpy',
inlined from 'libxl__prepare_sockaddr_un' at libxl_utils.c:1262:5:
/usr/include/bits/string_fortified.h:106:10: error: '__builtin_strncpy' specified bound 108 equals destination size [-Werror=stringop-truncation]
106 | return __builtin___strncpy_chk (__dest, __src, __len, __bos (__dest));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
It seems xlu_pci_parse_bdf has a state machine that is too complex for
gcc to understand. The build fails with:
libxlu_pci.c: In function 'xlu_pci_parse_bdf':
libxlu_pci.c:32:18: error: 'func' may be used uninitialized in this function [-Werror=maybe-uninitialized]
32 | pcidev->func = func;
| ~~~~~~~~~~~~~^~~~~~
libxlu_pci.c:51:29: note: 'func' was declared here
51 | unsigned dom, bus, dev, func, vslot = 0;
| ^~~~
libxlu_pci.c:31:17: error: 'dev' may be used uninitialized in this function [-Werror=maybe-uninitialized]
31 | pcidev->dev = dev;
| ~~~~~~~~~~~~^~~~~
libxlu_pci.c:51:24: note: 'dev' was declared here
51 | unsigned dom, bus, dev, func, vslot = 0;
| ^~~
libxlu_pci.c:30:17: error: 'bus' may be used uninitialized in this function [-Werror=maybe-uninitialized]
30 | pcidev->bus = bus;
| ~~~~~~~~~~~~^~~~~
libxlu_pci.c:51:19: note: 'bus' was declared here
51 | unsigned dom, bus, dev, func, vslot = 0;
| ^~~
libxlu_pci.c:29:20: error: 'dom' may be used uninitialized in this function [-Werror=maybe-uninitialized]
29 | pcidev->domain = domain;
| ~~~~~~~~~~~~~~~^~~~~~~~
libxlu_pci.c:51:14: note: 'dom' was declared here
51 | unsigned dom, bus, dev, func, vslot = 0;
| ^~~
cc1: all warnings being treated as errors
Workaround it by setting the initial value to invalid value (0xffffffff)
and then assert on each value being set. This way we mute the gcc
warning, while still detecting bugs in the parse code.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Jackson [Wed, 26 Aug 2020 14:47:19 +0000 (15:47 +0100)]
MAINTAINERS: Update my email address
I am changing my email address. (My affiliation to Citrix remains
unchanged.) See
https://xenbits.xen.org/people/iwj/2020/email-transition.txt
for a signed confirmation with full details.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <iwj@xenproject.org>
Jan Beulich [Thu, 27 Aug 2020 07:51:07 +0000 (09:51 +0200)]
x86: don't build with EFI support in shim-exclusive mode
There's no need for xen.efi at all, and there's also no need for EFI
support in xen.gz since the shim runs in PVH mode, i.e. without any
firmware (and hence by implication also without EFI one).
The slightly odd looking use of $(space) is to ensure the new ifneq()
evaluates consistently between "build" and "install" invocations of
make.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 27 Aug 2020 07:46:55 +0000 (09:46 +0200)]
build: also check for empty .bss.* in .o -> .init.o conversion
We're gaining such sections, and like .text.* and .data.* they shouldn't
be present in objects subject to automatic to-init conversion. Oddly
enough for quite some time we did have an instance breaking this rule,
which gets fixed at this occasion, by breaking out the EFI boot
allocator functions into its own translation unit.
Fixes: c5b9805bc1f7 ("efi: create new early memory allocator") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Tue, 25 Aug 2020 15:47:27 +0000 (17:47 +0200)]
make better use of mfn local variable in free_heap_pages()
Besides the one use that there is in the function (of the value
calculated at function entry), there are two more places where the
redundant page-to-address conversion can be avoided.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Wei Liu <wl@xen.org>
Jan Beulich [Tue, 25 Aug 2020 15:46:27 +0000 (17:46 +0200)]
x86: don't maintain compat M2P when !PV32
It's effectively unused in this case (as well as when "pv=no-32").
While touching their definitions anyway, also adjust section placement
of m2p_compat_vstart and compat_idle_pg_table_l2. Similarly, while
putting init_xen_pae_l2_slots() inside #ifdef, also move it to a PV-only
source file.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 25 Aug 2020 15:43:52 +0000 (17:43 +0200)]
x86/EFI: sanitize build logic
With changes done over time and as far as linking goes, the only special
things about building with EFI support enabled are
- the need for the dummy relocations object (for xen.gz uniformly in all
build stages, for xen.efi in stage 1),
- the special efi/buildid.o file, which can't be made part of
efi/built_in.o, due to the extra linker options required for it.
All other efi/*.o can be consumed from the built_in*.o files.
In efi/Makefile, besides moving relocs-dummy.o to "extra", also properly
split between obj-y and obj-bin-y.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 24 Aug 2020 13:38:48 +0000 (15:38 +0200)]
x86/PV: also check kernel endianness when building Dom0
While big endian x86 images are pretty unlikely to appear, merely
logging endianness isn't of much use.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 24 Aug 2020 13:38:03 +0000 (15:38 +0200)]
x86: convert set_gpfn_from_mfn() to a function
It is already a little too heavy for a macro, and more logic is about to
get added to it.
This also allows reducing the scope of compat_machine_to_phys_mapping.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Mon, 24 Aug 2020 13:36:44 +0000 (15:36 +0200)]
x86/vpic: rename irq to pin in vpic_ioport_write
The irq variable is wrongly named, as it's used to store the pin on
the 8259 chip, but not the global irq value. While renaming reduce
it's scope and make it unsigned.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
vpic_elcr_mask wasn't using the v parameter, and instead worked
because in the context of the callers v would be vpic. Fix this by
correctly using the parameter. While there also remove the unneeded
casts to uint8_t and the ending semicolon.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
George Dunlap [Fri, 21 Aug 2020 14:32:01 +0000 (15:32 +0100)]
MAINTAINERS: Add Roger Pau Monné as x86 maintainer
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Enable CPU erratum of Speculative AT on the Neoverse N1 processor
versions r0p0 to r2p0.
Also Fix Cortex A76 Erratum string which had a wrong errata number.
Roger Pau Monne [Mon, 17 Aug 2020 15:57:52 +0000 (17:57 +0200)]
x86/pv: handle writes to the EFER MSR
Silently drop writes to the EFER MSR for PV guests if the value is not
changed from what it's being reported. Current PV Linux will attempt
to write to the MSR with the same value that's been read, and raising
a fault will result in a guest crash.
As part of this work introduce a helper to easily get the EFER value
reported to guests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Mon, 17 Aug 2020 18:45:47 +0000 (19:45 +0100)]
tools/ocaml/xenstored: drop select based socket watching
Poll has been the default since 2014, I think we can safely say by now
that poll() works and we don't need to fall back to select().
This will allow fixing up the way we call poll to be more efficient
(and pave the way for introducing epoll support):
currently poll wraps the select API, which is inefficient.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
xen/arm: cmpxchg: Add missing memory barriers in __cmpxchg_mb_timeout()
The function __cmpxchg_mb_timeout() was intended to have the same
semantics as __cmpxchg_mb(). Unfortunately, the memory barriers were
not added when first implemented.
There is no known issue with the existing callers, but the barriers are
added given this is the expected semantics in Xen.
The issue was introduced by XSA-295.
Backport: 4.8+ Fixes: 86b0bc958373 ("xen/arm: cmpxchg: Provide a new helper that can timeout") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Roger Pau Monné [Wed, 12 Aug 2020 12:47:05 +0000 (14:47 +0200)]
x86/hvm: change EOI exit bitmap helper parameter
Change the last parameter of the update_eoi_exit_bitmap helper to be a
set/clear boolean instead of a triggering field. This is already
inline with how the function is implemented, and will allow deciding
whether an exit is required by the higher layers that call into
update_eoi_exit_bitmap. Note that the current behavior is not changed
by this patch.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Don Slutz [Sun, 9 Aug 2020 18:22:34 +0000 (14:22 -0400)]
rpmball: Adjust to new rpm, do not require --force
Also prevent warning: directory /boot: remove failed
Before:
[root@TestCloud1 xen]# rpm -hiv dist/xen*rpm
Preparing... ################################# [100%]
file /boot from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/bin from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/lib from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/lib64 from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/sbin from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
[root@TestCloud1 xen]# rpm -e xen
warning: directory /boot: remove failed: Device or resource busy
Paul Durrant [Tue, 4 Aug 2020 13:41:59 +0000 (14:41 +0100)]
x86/iommu: convert AMD IOMMU code to use new page table allocator
This patch converts the AMD IOMMU code to use the new page table allocator
function. This allows all the free-ing code to be removed (since it is now
handled by the general x86 code) which reduces TLB and cache thrashing as well
as shortening the code.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 4 Aug 2020 13:41:57 +0000 (14:41 +0100)]
x86/iommu: add common page-table allocator
Instead of having separate page table allocation functions in VT-d and AMD
IOMMU code, we could use a common allocation function in the general x86 code.
This patch adds a new allocation function, iommu_alloc_pgtable(), for this
purpose. The function adds the page table pages to a list. The pages in this
list are then freed by iommu_free_pgtables(), which is called by
domain_relinquish_resources() after PCI devices have been de-assigned.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 4 Aug 2020 13:41:56 +0000 (14:41 +0100)]
x86/iommu: re-arrange arch_iommu to separate common fields...
... from those specific to VT-d or AMD IOMMU, and put the latter in a union.
There is no functional change in this patch, although the initialization of
the 'mapped_rmrrs' list occurs slightly later in iommu_domain_init() since
it is now done (correctly) in VT-d specific code rather than in general x86
code.
NOTE: I have not combined the AMD IOMMU 'root_table' and VT-d 'pgd_maddr'
fields even though they perform essentially the same function. The
concept of 'root table' in the VT-d code is different from that in the
AMD code so attempting to use a common name will probably only serve
to confuse the reader.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
David Woodhouse [Thu, 19 Mar 2020 20:40:24 +0000 (20:40 +0000)]
tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
The do_ls() function has somewhat inconsistent handling of errors.
If reading the node's contents with xs_read() fails, then do_ls() will
just quietly not display the contents.
If reading the node's permissions with xs_get_permissions() fails, then
do_ls() will print a warning, continue, and ultimately won't exit with
an error code (unless another error happens).
If recursing into the node with xs_directory() fails, then do_ls() will
abort immediately, not printing any further nodes.
For persistent failure modes — such as ENOENT because a node has been
removed, or EACCES because it has had its permisions changed since the
xs_directory() on the parent directory returned its name — it's
obviously quite likely that if either of the first two errors occur for
a given node, then so will the third and thus xenstore-ls will abort.
The ENOENT one is actually a fairly common case, and has caused tools to
fail to clean up a network device because it *apparently* already
doesn't exist in xenstore.
There is a school of thought that says, "Well, xenstore-ls returned an
error. So the tools should not trust its output."
The natural corollary of this would surely be that the tools must re-run
xenstore-ls as many times as is necessary until its manages to exit
without hitting the race condition. I am not keen on that conclusion.
For the specific case of ENOENT it seems reasonable to declare that,
but for the timing, we might as well just not have seen that node at
all when calling xs_directory() for the parent. By ignoring the error,
we give acceptable output.
The issue can be reproduced as follows:
(dom0) # for a in `seq 1 1000` ; do
xenstore-write /local/domain/2/foo/$a $a ;
done
Now simultaneously:
(dom0) # for a in `seq 1 999` ; do
xenstore-rm /local/domain/2/foo/$a ;
done
(dom2) # while true ; do
./xenstore-ls -p /local/domain/2/foo | grep -c 1000 ;
done
We should expect to see node 1000 in the output, every time.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Thu, 13 Aug 2020 10:35:53 +0000 (11:35 +0100)]
x86/viridian: remove the viridian_vcpu msg_pending bit mask
The mask does not actually serve a useful purpose as we only use the SynIC
for timer messages. Dropping the mask means that the EOM MSR handler
essentially becomes a no-op. This means we can avoid setting 'message_pending'
for timer messages and hence avoid a VMEXIT for the EOM.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Trammell Hudson [Wed, 12 Aug 2020 17:42:48 +0000 (17:42 +0000)]
x86/setup: Ignore early boot parameters like no-real-mode
There are parameters in xen/arch/x86/boot/cmdline.c that
are only used early in the boot process, so handlers are
necessary to avoid an "Unknown command line option" in
dmesg.
This also updates ignore_param() to generate a temporary
variable name so that the macro can be used more than once
per file.
Signed-off-by: Trammell hudson <hudson@trmm.net> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Leave note to stop TEMP_NAME() finding more general use] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:32:11 +0000 (13:32 +0200)]
x86/EFI: sanitize build logic
With changes done over time and as far as linking goes, the only special
thing about building with EFI support enabled is the need for the dummy
relocations object for xen.gz uniformly in all build stages. All other
efi/*.o can be consumed from the built_in*.o files.
In efi/Makefile, besides moving relocs-dummy.o to "extra", also properly
split between obj-y and obj-bin-y.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:14:02 +0000 (13:14 +0200)]
x86: slightly re-arrange 32-bit handling in dom0_construct_pv()
Add #ifdef-s (the 2nd one will be needed in particular, to guard the
uses of m2p_compat_vstart and HYPERVISOR_COMPAT_VIRT_START()) and fold
duplicate uses of elf_32bit().
Also adjust what gets logged: Avoid "compat32" when support isn't built
in, and don't assume ELF class <> ELFCLASS64 means ELFCLASS32.
While doing this, in code getting touched anyway:
- use ROUNDUP() instead of open-coding it,
- drop a stale (dead) BUG_ON(),
- replace panic() by printk() plus error return, for being consistent
with other code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
The originally used sed expression converted not just multiple leading
zeroes (as intended), but also trailing ones, rendering the error
message somewhat confusing. Collapse zeroes in just the one place where
we need them collapsed, and leave objdump's output as is for all other
purposes.
Fixes: 48115d14743e ("Move more kernel decompression bits to .init.* sections") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:12:00 +0000 (13:12 +0200)]
build: work around bash issue
Older bash (observed with 3.2.57(2)) fails to honor "set -e" for certain
built-in commands ("while" here), despite the command's status correctly
being non-zero. The subsequent objcopy invocation now being separated by
a semicolon results in no failure. Insert an explicit "exit" (replacing
; by && ought to be another possible workaround).
Fixes: e321576f4047 ("xen/build: start using if_changed") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
during boot. The units on the first line are Hz, not MHz, so correct that and
add a space for clarity.
Also, for the min/max line, use three dots instead of two and add more spaces
so that the line can't be mistaken for being a double decimal point typo.
Andrew Cooper [Wed, 5 Aug 2020 11:05:27 +0000 (12:05 +0100)]
x86/ioapic: Fix fixmap error path logic in ioapic_init_mappings()
In the case that bad_ioapic_register() fails, the current position of idx++
means that clear_fixmap(idx) will be called with the wrong index, and not
clean up the mapping just created.
Increment idx as part of the loop, rather than midway through the loop body.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 5 Aug 2020 08:30:18 +0000 (10:30 +0200)]
x86emul: correct AVX512_BF16 insn names in EVEX Disp8 test
The leading 'v' ought to be omitted from the table entries.
Fixes: 7ff66809ccd5 ("x86emul: support AVX512_BF16 insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>