Andrew Cooper [Tue, 15 Sep 2020 10:43:32 +0000 (12:43 +0200)]
x86/hvm: disallow access to unknown MSRs
Change the catch-all behavior for MSR not explicitly handled. Instead
of allow full read-access to the MSR space and silently dropping
writes return an exception when the MSR is not explicitly handled.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
[remove rdmsr_safe from default case in svm_msr_read_intercept] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Change the catch-all behavior for MSR not explicitly handled. Instead
of allow full read-access to the MSR space and silently dropping
writes return an exception when the MSR is not explicitly handled.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Move the special handling of reads to it's own switch case, and also
add support for BU_CFG2. On the write side ignore writes if the MSR is
readable, otherwise return a #GP.
This is in preparation for changing the default MSR read/write
behavior, which will instead return #GP on not explicitly handled
cases.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Linux PV guests will attempt to read the FEATURE_CONTROL MSR, so move
the handling done in VMX code into guest_rdmsr as it can be shared
between PV and HVM guests that way.
Note that there's a slight behavior change and attempting to read the
MSR when no features are available will result in a fault.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 15 Sep 2020 08:20:37 +0000 (10:20 +0200)]
EFI: free unused boot mem in at least some cases
Address at least the primary reason why 52bba67f8b87 ("efi/boot: Don't
free ebmalloc area at all") was put in place: Make xen_in_range() aware
of the freed range. This is in particular relevant for EFI-enabled
builds not actually running on EFI, as the entire range will be unused
in this case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 15 Sep 2020 08:19:33 +0000 (10:19 +0200)]
x86/HVM: more consistently set I/O completion
Doing this just in hvm_emulate_one_insn() is not enough.
hvm_ud_intercept() and hvm_emulate_one_vm_event() can get invoked for
insns requiring one or more continuations, and at least in principle
hvm_emulate_one_mmio() could, too. Without proper setting of the field,
handle_hvm_io_completion() will do nothing completion-wise, and in
particular the missing re-invocation of the insn emulation paths will
lead to emulation caching not getting disabled in due course, causing
the ASSERT() in {svm,vmx}_vmenter_helper() to trigger.
Reported-by: Don Slutz <don.slutz@gmail.com>
Similar considerations go for the clearing of vio->mmio_access, which
gets moved as well.
Additionally all updating of vio->mmio_* now gets done dependent upon
the new completion value, rather than hvm_ioreq_needs_completion()'s
return value. This is because it is the completion chosen which controls
what path will be taken when handling the completion, not the simple
boolean return value. In particular, PIO completion doesn't involve
going through the insn emulator, and hence emulator state ought to get
cleared early (or it won't get cleared at all).
The new logic, besides allowing for a caller override for the
continuation type to be set (for VMX real mode emulation), will also
avoid setting an MMIO completion when a simpler PIO one will do. This
is a minor optimization only as a side effect - the behavior is strictly
needed at least for hvm_ud_intercept(), as only memory accesses can
successfully complete through handle_mmio(). Care of course needs to be
taken to correctly deal with "mixed" insns (doing both MMIO and PIO at
the same time, i.e. INS/OUTS). For this, hvmemul_validate() now latches
whether the insn being emulated is a memory access, as this information
is no longer easily available at the point where we want to consume it.
Note that the presence of non-NULL .validate fields in the two ops
structures in hvm_emulate_one_mmio() was really necessary even before
the changes here: Without this, passing non-NULL as middle argument to
hvm_emulate_init_once() is meaningless.
The restrictions on when the #UD intercept gets actually enabled are why
it was decided that this is not a security issue:
- the "hvm_fep" option to enable its use is a debugging option only,
- for the cross-vendor case is considered experimental, even if
unfortunately SUPPORT.md doesn't have an explicit statement about
this.
The other two affected functions are
- hvm_emulate_one_vm_event(), used for introspection,
- hvm_emulate_one_mmio(), used for Dom0 only,
which aren't qualifying this as needing an XSA either.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Don Slutz <don.slutz@gmail.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Mon, 14 Sep 2020 09:24:20 +0000 (10:24 +0100)]
tools/Makefile: Drop the use of $(file ...)
It is only available in make 4.0 and later, and not for example in CentOS 7.
Rewrite the logic to use echo and shell redirection, using a single capture
group to avoid having 12 different processes in quick succession each
appending one line to the file.
Fixes: 52dbd6f07cea7a ("tools: generate pkg-config files from make variables") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Mon, 14 Sep 2020 09:24:19 +0000 (10:24 +0100)]
tools/libs/vchan: Don't run the headers check
There was never a headers check previously, and CentOS 6 can't cope with the
anonymous union in struct libxenvchan.
cc1: warnings being treated as errors
... tools/include/libxenvchan.h:75: error: declaration does not declare anything
make[6]: *** [headers.chk] Error 1
Fixes: 8ab2429f12 ("tools: split libxenvchan into new tools/libs/vchan directory") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Commit 7c273ffdd0e91 ("tools/python: drop libxenguest from setup.py")
was just wrong: there is one function from libxenguest used in the
bindings, so readd the library again.
While at it remove the unused PATH_LIBXL setting.
Fixes: 7c273ffdd0e91 ("tools/python: drop libxenguest from setup.py") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Andrew Cooper [Mon, 24 Feb 2020 17:15:22 +0000 (17:15 +0000)]
tools/cpuid: Untangle Invariant TSC handling
ITSC being visible to the guest is currently implicit with the toolstack
unconditionally asking for it, and Xen clipping it based on the vTSC and/or
XEN_DOMCTL_disable_migrate settings.
This is problematic for several reasons.
First, the implicit vTSC behaviour manifests as a real bug on migration to a
host with a different frequency, with ITSC but without TSC scaling
capabilities, whereby the ITSC feature becomes advertised to the guest. ITSC
will disappear again if the guest migrates to server with the same frequency
as the original, or to one with TSC scaling support.
Secondly, disallowing ITSC unless the guest doesn't migrate is conceptually
wrong. It is common to have migration pools of identical hardware, at which
point the TSC frequency is nominally the same, and more modern hardware has
TSC scaling support anyway. In both cases, it is safe to advertise ITSC and
migrate the guest.
Remove all implicit logic in Xen, and make ITSC part of the max CPUID policies
for guests. Plumb an itsc parameter into xc_cpuid_apply_policy() and have
libxl__cpuid_legacy() fill in the two cases where it can reasonably expect
ITSC to be safe for the guest to see. This retains the current side effect of
enabling ITSC if the guest is marked as nomigrate.
This is a behaviour change for TSC_MODE_NATIVE, where the ITSC will now
reliably not appear, and for the case where the user explicitly requests ITSC,
in which case it will appear even if the guest isn't marked as nomigrate.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <Ian.Jackson@citrix.com>
Jan Beulich [Fri, 11 Sep 2020 10:45:33 +0000 (12:45 +0200)]
xen/arm64: force gcc 10+ to always inline generic atomics helpers
Recent versions of gcc (at least 10.x) will not inline generic atomics
helpers by default. Instead they will expect the software to either link
with libatomic.so or implement the helpers, which would result in
undefined reference to `__aarch64_ldadd4_acq_rel'
for us (not having any local implementation).
To keep the previous behavior, force gcc to always inline the generic
atomics helpers.
Long term we probably want to avoid relying on gcc atomics helpers as
this doesn't allow us to switch between LSE and LL/SC atomics.
Today the maximum allowed data length for writing a hypfs node is
tested in the generic hypfs_write() function. For custom runtime
parameters this might be wrong, as the maximum allowed size is derived
from the buffer holding the current setting, while there might be ways
to set the parameter needing more characters than the minimal
representation of that value.
One example for this is the "ept" parameter. Its value buffer is sized
to be able to hold the string "exec-sp=0" or "exec-sp=1", while it is
allowed to use e.g. "no-exec-sp" or "exec-sp=yes" for setting it.
Fix that by moving the length check one level down to the type
specific write function.
In order to avoid allocation of arbitrary sized buffers use a new
MAX_PARAM_SIZE macro as an upper limit for custom writes. The value
of MAX_PARAM_SIZE is the same as the limit in parse_params() for a
single parameter.
Fixes: a659d7cab9af ("xen: add runtime parameter access support to hypfs") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
arch_init_memory will treat all the gaps on the physical memory map
between RAM regions as MMIO and use share_xen_page_with_guest in order
to assign them to dom_io. This has the side effect of setting the Xen
heap flag on such pages, and thus is_special_page would then return
true which is an issue in epte_get_entry_emt because such pages will
be forced to use write-back cache attributes.
Fix this by introducing a new helper to assign the MMIO regions to
dom_io without setting the Xen heap flag on the pages, so that
is_special_page will return false and the pages won't be forced to use
write-back cache attributes.
Fixes: 81fd0d3ca4b2cd ('x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()') Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 11 Sep 2020 12:14:43 +0000 (14:14 +0200)]
x86: don't override INVALID_M2P_ENTRY with SHARED_M2P_ENTRY
While in most cases code ahead of the invocation of set_gpfn_from_mfn()
deals with shared pages, at least in set_typed_p2m_entry() I can't spot
such handling (it's entirely possible there's code missing there). Let's
try to play safe and add an extra check.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 11 Sep 2020 12:13:46 +0000 (14:13 +0200)]
x86: guard against port I/O overlapping the RTC/CMOS range
Since we intercept RTC/CMOS port accesses, let's do so consistently in
all cases, i.e. also for e.g. a dword access to [006E,0071]. To avoid
the risk of unintended impact on Dom0 code actually doing so (despite
the belief that none ought to exist), also extend
guest_io_{read,write}() to decompose accesses where some ports are
allowed to be directly accessed and some aren't.
While splitting out the new _guest_io_write() also
- add ASSERT_UNREACHABLE(),
- drop stray casts,
- add blank lines.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
When renaming the libxenguest sources to xg_*.c there was an omission
in the Makefile when setting the zlib related define for the related
sources. Fix that.
Fixes: e3dd624e487c ("tools/libxc: move libxenguest to tools/libs/guest") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Igor Druzhinin [Wed, 9 Sep 2020 15:56:13 +0000 (17:56 +0200)]
hvmloader: indicate ACPI tables with "ACPI data" type in e820
Guest kernel does need to know in some cases where the tables are located
to treat these regions properly. One example is kexec process where
the first kernel needs to pass ACPI region locations to the second
kernel which is now a requirement in Linux after 02a3e3cdb7f12 ("x86/boot:
Parse SRAT table and count immovable memory regions") in order for kexec
transition to actually work.
That commit introduced accesses to XSDT and SRAT while the second kernel
is still using kexec transition tables. The transition tables do not have
e820 "reserved" regions mapped where those tables are located currently
in a Xen guest. Instead "ACPI data" regions are mapped with the transition
tables that was introduced by the following commit 6bbeb276b7 ("x86/kexec:
Add the EFI system tables and ACPI tables to the ident map").
Reserve 1MB (out of 16MB currently available) right after ACPI info page for
ACPI tables exclusively but populate this region on demand and only indicate
populated memory as "ACPI data" since according to ACPI spec that memory is
reclaimable by the guest if necessary. That is close to how we treat
the same ACPI data in PVH guests. 1MB should be enough for now but could be
later extended if required.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 9 Sep 2020 15:55:41 +0000 (17:55 +0200)]
lib: correct __moddi3() description
The remainder of a division, when non-zero, is specified to always be of
the same sign as the dividend. Bring a comment in line with the code it
describes.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Diego Sueiro [Wed, 9 Sep 2020 12:35:56 +0000 (13:35 +0100)]
tools/hotplug: Fix dhcpd symlink removal in vif-nat
Copy temp files used to add/remove dhcpd configurations to avoid
replacing potential symlinks.
If dhcp.conf is a symlink pointing to dhcp.conf.real, using 'mv'
creates a new file dhcp.conf where cp will actually modify
dhcp.conf.real instead of replacing the symlink with a real
file.
Using 'cp' prevents some mistakes where the user will actually
continue to modify dhcp.conf.real where it would not be the one
used anymore.
Signed-off-by: Diego Sueiro <diego.sueiro@arm.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Wei Liu <wl@xen.org>
Diego Sueiro [Thu, 20 Aug 2020 11:01:11 +0000 (12:01 +0100)]
tools/hotplug: Extend dhcpd conf, init and arg files search
Newer versions of the ISC dhcp server expect the dhcpd.conf file
to be located at /etc/dhcp directory.
Also, some distributions and Yocto based ones have these installation
paths by default: /etc/init.d/{isc-dhcp-server,dhcp-server} and
/etc/default/dhcp-server.
Signed-off-by: Diego Sueiro <diego.sueiro@arm.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Wei Liu <wl@xen.org>
Olaf Hering [Wed, 9 Sep 2020 11:06:37 +0000 (11:06 +0000)]
libxenguest: use bitmap_alloc
Use existing helper to allocate a bitmap.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[ wei: rebase to staging ] Signed-off-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:39 +0000 (17:07 +0200)]
tools/libxl: fix dependencies of libxl tests
Today building the libxl internal tests depends on libxlutil having
been built, in spite of the tests not using any functionality of
libxlutil. Fix this by dropping the dependency.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:38 +0000 (17:07 +0200)]
tools: split libxenstat into new tools/libs/stat directory
There is no reason why libxenstat is not placed in the tools/libs
directory.
At the same time move xenstat.h to a dedicated include directory
in tools/libs/stat in order to follow the same pattern as the other
libraries in tools/libs.
As now xentop is the only left directory in xenstat move it directly
under tools and get rid of tools/xenstat.
Fix some missing prototype errors (add one prototype and make two
functions static).
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:37 +0000 (17:07 +0200)]
tools: split libxenvchan into new tools/libs/vchan directory
There is no reason why libvchan is not placed in the tools/libs
directory.
At the same time move libxenvchan.h to a dedicated include directory
in tools/libs/vchan in order to follow the same pattern as the other
libraries in tools/libs.
As tools/libvchan now contains no library any longer rename it to
tools/vchan.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:36 +0000 (17:07 +0200)]
tools: split libxenstore into new tools/libs/store directory
There is no reason why libxenstore is not placed in the tools/libs
directory.
The common files between libxenstore and xenstored are kept in the
tools/xenstore directory to be easily accessible by xenstore-stubdom
which needs the xenstored files to be built.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:35 +0000 (17:07 +0200)]
tools/libxc: move libxenguest to tools/libs/guest
tools/libxc now contains libxenguest only. Move it to tools/libs/guest.
When generating the pkg-config file for libxenguest a filter is now
required for replacing "xenctrl" by "xencontrol" in the
"Requires.private:" entry. Add this filter to tools/libs/libs.mk.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> (stubdom parts)
Juergen Gross [Fri, 28 Aug 2020 15:07:34 +0000 (17:07 +0200)]
tools: move libxenctrl below tools/libs
Today tools/libxc needs to be built after tools/libs as libxenctrl is
depending on some libraries in tools/libs. This in turn blocks moving
other libraries depending on libxenctrl below tools/libs.
So carve out libxenctrl from tools/libxc and move it into
tools/libs/ctrl.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> (stubdom parts) Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> (python parts) Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:31 +0000 (17:07 +0200)]
tools/libxc: untangle libxenctrl from libxenguest
Sources of libxenctrl and libxenguest are completely entangled. In
practice libxenguest is a user of libxenctrl, so don't let any source
libxenctrl include xg_private.h.
This can be achieved by moving all definitions used by libxenctrl from
xg_private.h to xc_private.h.
Export xenctrl_dom.h as it will now be included by other public
headers.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:25 +0000 (17:07 +0200)]
tools/misc: drop all libxc internals from xen-mfndump.c
The last libxc internal used by xen-mfndump.c is the ERROR() macro.
Add a simple definition for that macro to xen-mfndump.c and replace
the libxc private header includes by official ones.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:24 +0000 (17:07 +0200)]
tools/misc: replace PAGE_SIZE with XC_PAGE_SIZE in xen-mfndump.c
The definition of PAGE_SIZE comes from xc_private.h, which shouldn't be
used by xen-mfndump.c. Replace PAGE_SIZE by XC_PAGE_SIZE, as
xc_private.h contains:
#define PAGE_SIZE XC_PAGE_SIZE
For the same reason PAGE_SHIFT_X86 needs to replaced with
XC_PAGE_SHIFT.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:23 +0000 (17:07 +0200)]
tools/misc: don't include xg_save_restore.h from xen-mfndump.c
xen-mfndump.c is including the libxc private header xg_save_restore.h.
Avoid that by moving the definition of is_mapped() to xen-mfndump.c
(it is used there only) and by duplicating the definition of
M2P_SIZE() in xen-mfndump.c.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:18 +0000 (17:07 +0200)]
tools: drop explicit path specifications for qemu build
Since more than three years now qemu is capable to set the needed
include and library paths for the Xen libraries via pkg-config.
So drop the specification of those paths in tools/Makefile. This will
enable to move libxenctrl away from tools/libxc, as qemu's configure
script has special treatment of this path.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 28 Aug 2020 15:07:17 +0000 (17:07 +0200)]
tools: generate pkg-config files from make variables
For each library built two variants of pkg-config files are created
from a per-library template: an "official" one for installation on
the user's system, and one used for building internal tools, like e.g.
qemu.
Instead of the template which is looking very similar for all libraries
generate the pkg-config files directly from make variables.
This will reduce the need to specify some pkg-config file entries in
the templates, as the contents can easily be generated from available
data (e.g. "Version:" and "Requires.private:" entries).
Especially the variant used for building internal tools needs to gain
additional runtime link parameters for the internally used libraries,
as otherwise those won't be found by the users (e.g. qemu).
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Tue, 8 Sep 2020 09:47:57 +0000 (10:47 +0100)]
x86/pv: Fix assertions in svm_load_segs()
OSSTest has shown an assertion failure:
http://logs.test-lab.xenproject.org/osstest/logs/153906/test-xtf-amd64-amd64-1/serial-rimava1.log
This is because we pass a non-NUL selector into svm_load_segs(), which is
something we must not do, as this path does not load the attributes/limits
from the GDT/LDT.
Drop the {fs,gs}_sel parameters from svm_load_segs() and use 0 instead. This
is acceptable even for non-zero NUL segments, as it is how the IRET
instruction behaves in all CPUs.
Only use the svm_load_segs() path when both FS and GS are NUL, which is the
common case when scheduling a 64bit vcpu with 64bit userspace in context.
Fixes: ad0fd291c5 ("x86/pv: Rewrite segment context switching from scratch") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 2 Sep 2020 06:09:28 +0000 (08:09 +0200)]
tools/hotplug/Linux: don't needlessly use non-standard features in vif-{bridge,route}
We're not after any "fall-through" behavior here. Replace the constructs
with ones understood by all conforming shells, including older bash
(problem observed with 3.1.51(1)).
Fixes: b51715f02bf9 ("tools/hotplug/Linux: remove code duplication in vif-bridge") Fixes: 3683290fc0b0 ("tools/hotplug: only attempt to call 'ip route' if there is valid command") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Wei Liu <wl@xen.org>
minios: use more recent commit for building xen unstable
Commit 82c3d15c903aa43 ("minios: Revert recent change and revert to
working minios") switched the used commit for the build of Xen unstable
from master to a rather old commit (the one used for Xen 4.13 instead
of the last one without a known problem).
Switch to Mini-OS commit 051b87bb9c196 instead, which doesn't contain
the problematic modification being reason for switching away from
master.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Tue, 11 Aug 2020 15:05:06 +0000 (16:05 +0100)]
x86/pv: Rewrite segment context switching from scratch
There are multiple bugs with the existing implementation.
On AMD CPUs prior to Zen2, loading a NUL segment selector doesn't clear the
segment base, which is a problem for 64bit code which typically expects to use
a NUL %fs/%gs selector.
On a context switch from any PV vcpu, to a 64bit PV vcpu with an %fs/%gs
selector which faults, the fixup logic loads NUL, and the guest is entered at
the failsafe callback with the stale base.
Alternatively, a PV context switch sequence of 64 (NUL, non-zero base) =>
32 (NUL) => 64 (NUL, zero base) will similarly cause Xen to enter the guest
with a stale base.
Both of these corner cases manifest as state corruption in the final vcpu.
However, damage is limited to to 64bit code expecting to use Thread Local
Storage with a base pointer of 0, which doesn't occur by default.
The context switch logic is extremely complicated, and is attempting to
optimise away loading a NUL selector (which is fast), or writing a 64bit base
of 0 (which is rare). Furthermore, it fails to respect Linux's ABI with
userspace, which manifests as userspace state corruption as far as Linux is
concerned.
Always restore all selector and base state, in all cases.
Leave a large comment explaining hardware behaviour, and the new ABI
expectations. Update the comments in the public headers.
Drop all "segment preloading" to handle the AMD corner case. It was never
anything but a waste of time for %ds/%es, and isn't needed now that %fs/%gs
bases are unconditionally written for 64bit PV guests. In load_segments(),
store the result of is_pv_32bit_vcpu() as it is an expensive predicate now,
and not used in a way which impacts speculative safety.
Reported-by: Andy Lutomirski <luto@kernel.org> Reported-by: Sarah Newman <srn@prgmr.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 11 Aug 2020 15:05:06 +0000 (16:05 +0100)]
x86/pv: Fix consistency of 64bit segment bases
The comments in save_segments(), _toggle_guest_pt() and write_cr() are false.
The %fs and %gs bases can be updated at any time by the guest.
As a consequence, Xen's fs_base/etc tracking state is always stale when the
vcpu is in context, and must not be used to complete MSR_{FS,GS}_BASE reads, etc.
In particular, a sequence such as:
wrmsr(MSR_FS_BASE, 0x1ull << 32);
write_fs(__USER_DS);
base = rdmsr(MSR_FS_BASE);
will return the stale base, not the new base. This may cause guest a guest
kernel's context switching of userspace to malfunction.
Therefore:
* Update save_segments(), _toggle_guest_pt() and read_msr() to always read
the segment bases from hardware.
* Update write_cr(), write_msr() and do_set_segment_base() to not not waste
time caching data which is instantly going to become stale again.
* Provide comments explaining when the tracking state is and isn't stale.
This bug has been present for 14 years, but several bugfixes since have built
on and extended the original flawed logic.
Fixes: ba9adb737ba ("Apply stricter checking to RDMSR/WRMSR emulations.") Fixes: c42494acb2f ("x86: fix FS/GS base handling when using the fsgsbase feature")
Fixed: eccc170053e ("x86/pv: Don't have %cr4.fsgsbase active behind a guest kernels back") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/svm: silently drop writes to SYSCFG and related MSRs
The SYSCFG, TOP_MEM1 and TOP_MEM2 MSRs are currently exposed to guests
and writes are silently discarded. Make this explicit in the SVM code
now, and just return default constant values when attempting to read
any of the MSRs, while continuing to silently drop writes.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Such handling consist in checking that no bits have been changed from
the read value, if that's the case silently drop the write, otherwise
inject a fault.
At least Windows guests will expect to write to the MISC_ENABLE MSR
with the same value that's been read from it.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
I was based on the flase assumption that padding fields need no
copying: If native code checks such fields, they of course need
copying in. And if the ABI specifies them to be zero on
completion, we also need to copy them out.
Wei Chen [Fri, 28 Aug 2020 02:34:04 +0000 (02:34 +0000)]
xen/arm: Throw messages for unknown FP/SIMD implement ID
Arm ID_AA64PFR0_EL1 register provides two fields to describe CPU
FP/SIMD implementations. Currently, we exactly know the meaning of
0x0, 0x1 and 0xf of these fields. Xen treats value < 8 as FP/SIMD
features presented. If there is a value 0x2 bumped in the future,
Xen behaviors for value <= 0x1 can also take effect. But what Xen
done for value <= 0x1 may not always cover new value 0x2 required.
We throw these messages to break the silence when Xen detected
unknown FP/SIMD IDs to notice user to check.
Wei Chen [Fri, 28 Aug 2020 02:34:03 +0000 (02:34 +0000)]
xen/arm: Missing N1/A76/A75 FP registers in vCPU context switch
Xen has cpu_has_fp/cpu_has_simd to detect whether the CPU supports
FP/SIMD or not. But currently, these two MACROs only consider value 0
of ID_AA64PFR0_EL1.FP/SIMD as FP/SIMD features enabled. But for CPUs
that support FP/SIMD and half-precision floating-point arithmetic, the
ID_AA64PFR0_EL1.FP/SIMD are 1 (see Arm ARM DDI0487F.b, D13.2.64).
For these CPUs, xen will treat them as no FP/SIMD support, the
vfp_save/restore_state will not take effect.
From the TRM documents of Cortex-A75/A76/N1, we know these CPUs support
basic Advanced SIMD/FP and half-precision floating-point arithmetic. In
this case, on N1/A76/A75 platforms, Xen will always miss the floating
pointer registers save/restore. If different vCPUs are running on the
same pCPU, the floating pointer registers will be corrupted randomly.
Ian Jackson [Fri, 4 Sep 2020 15:42:01 +0000 (16:42 +0100)]
minios: Revert recent change and revert to working minios
Currently, xen.git#staging does not build in many environments because
of issues with minios master. This regression was introduced in an
uncontrolled manner by an update to mini-os.git#master.
This is because in e013e8514389 "config: use mini-os master for
unstable" we switched to tracking minios master in an uncontrolled
manner. At the time we thought it was unlikely that minios changes
would break the Xen build. This turns out to have been overly
optimistic.
Xen currently uses unstable internal interfaces of minios. Until this
can be sorted out, internal changes to minios can require lockstep
changes in Xen.
All this means that "config: use mini-os master for unstable" was
wrong. We should undo it. Instead, we go back to the previous
situation: xen.git names a specific minios commit.
This scheme is the model used for qemu-xen-traditional.
That nailed commit must be updated manually, to have xen.git pick up
changes from minios. If the minios changes require changes in xen.git
too, to avoid breaking the Xen build, they can be made freely in
minios without adverse consequences. When the minios commitid is
updated in xen.git, the corresponding changes to the actual source
files in xen.git should be bundled together.
For example, when minios is fixed, 8d990807ec2c "stubdom/grub: update
init_netfront() call for mini-os" will need to be reapplied, folded
into the same commit as updates MINIOS_UPSTREAM_REVISION. For now
that commit must be reverted as we are going back to a previous
version of minios.
CC: Jan Beulich <jbeulich@suse.com> CC: Costin Lupu <costin.lupu@cs.pub.ro> CC: Wei Liu <wl@xen.org> CC: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Ian Jackson <iwj@xenproject.org>
make 3.81 doesn't support multiline variables defined with
define var =
...
endef
Dropping the "=" in the first line will fix the issue.
Fixes: ded08cdfa72bb ("tools: generate most contents of library make variables") Fixes: ddb2934a914df ("stubdom: add correct dependencies for Xen libraries") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:15:21 +0000 (11:15 +0200)]
x86: only generate compat headers actually needed
As was already the case for XSM/Flask, avoid generating compat headers
when they're not going to be needed. To address resulting build issues
- move compat/hvm/dm_op.h inclusion to the only source file needing it,
- add a little bit of #ifdef-ary.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:14:28 +0000 (11:14 +0200)]
flask: drop dead compat translation code
Translation macros aren't used (and hence needed) at all (or else a
devicetree_label entry would have been missing), and userlist has been
removed quite some time ago.
No functional change.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:13:01 +0000 (11:13 +0200)]
x86: generalize padding field handling
The original intention was to ignore padding fields, but the pattern
matched only ones whose names started with an underscore. Also match
fields whose names are in line with the C spec by not having a leading
underscore. (Note that the leading ^ in the sed regexps was pointless
and hence get dropped.)
This requires adjusting some vNUMA macros, to avoid triggering
"enumeration value ... not handled in switch" warnings, which - due to
-Werror - would cause the build to fail. (I have to admit that I find
these padding fields odd, when translation of the containing structure
is needed anyway.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:11:35 +0000 (11:11 +0200)]
evtchn: add compat struct checking for newer sub-ops
Various additions to the interface did not get mirrored into the compat
handling machinery. Luckily all additions were done in ways not making
any form of translation necessary.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:08:40 +0000 (11:08 +0200)]
x86/dmop: add compat struct checking for XEN_DMOP_map_mem_type_to_ioreq_server
This was forgotten when the subop was added.
Also take the opportunity and move the dm_op_relocate_memory entry in
xlat.lst to its designated place.
No change in the resulting generated code.
Fixes: ca2b511d3ff4 ("x86/ioreq server: add DMOP to map guest ram with p2m_ioreq_server to an ioreq server") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:06:56 +0000 (11:06 +0200)]
x86/mce: add compat struct checking for XEN_MC_inject_v2
84e364f2eda2 ("x86: add CMCI software injection interface") merely made
sure things would build, without any concern about things actually
working:
- despite the addition of xenctl_bitmap to xlat.lst, the resulting macro
wasn't invoked anywhere (which would have lead to recognizing that the
structure appeared to have no fully compatible layout, despite the use
of a 64-bit handle),
- the interface struct itself was neither added to xlat.lst (and the
resulting macro then invoked) nor was any manual checking of
individual fields added.
Adjust compat header generation logic to retain XEN_GUEST_HANDLE_64(),
which is intentionally layed out to be compatible between different size
guests. Invoke the missing checking (implicitly through CHECK_mc).
No change in the resulting generated code.
Fixes: 84e364f2eda2 ("x86: add CMCI software injection interface") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 09:05:46 +0000 (11:05 +0200)]
x86: fix compat header generation
As was pointed out by 0e2e54966af5 ("mm: fix public declaration of
struct xen_mem_acquire_resource"), we're not currently handling structs
correctly that have uint64_aligned_t fields. #pragma pack(4) suppresses
the necessary alignment even if the type did properly survive (which
it also didn't) in the process of generating the headers. Overall,
with the above mentioned change applied, there's only a latent issue
here afaict, i.e. no other of our interface structs is currently
affected.
As a result it is clear that using #pragma pack(4) is not an option.
Drop all uses from compat header generation. Make sure
{,u}int64_aligned_t actually survives, such that explicitly aligned
fields will remain aligned. Arrange for {,u}int64_t to be transformed
into a type that's 64 bits wide and 4-byte aligned, by utilizing that
in typedef-s the "aligned" attribute can be used to reduce alignment.
Additionally, for the cases where native structures get re-used,
enforce suitable alignment via typedef-s (which allow alignment to be
reduced).
This use of typedef-s makes necessary changes to CHECK_*() macro
generation: Previously get-fields.sh relied on finding struct/union
keywords when other compound types were used. We need to use the
typedef-s (guaranteeing suitable alignment) now, and hence the script
has to recognize those cases, too. (Unfortunately there are a few
special cases to be dealt with, but this is really not much different
from e.g. the pre-existing compat_domain_handle_t special case.)
This need to use typedef-s is certainly somewhat fragile going forward,
as in similar future cases it is imperative to also use typedef-s, or
else the CHECK_*() macros won't check what they're supposed to check. I
don't currently see any means to avoid this fragility, though.
There's one change to generated code according to my observations: In
arch_compat_vcpu_op() the runstate area "area" variable would previously
have been put in a just 4-byte aligned stack slot (despite being 8 bytes
in size), whereas now it gets put in an 8-byte aligned location.
There also results some curious inconsistency in struct xen_mc from
these changes - I intend to clean this up later on. Otherwise unrelated
code would also need adjustment right here.
Additionally a note about the apparently superfluous () in
compat-build-header.py: The simpler form
Michael Kurth [Fri, 4 Sep 2020 09:01:45 +0000 (11:01 +0200)]
add additional symbols to xen-syms.map
Add "all_symbols" to all /tools/symbols calls so that
xen-syms.map lists all symbols and not only .text section
symbols. This change enhances debugging and livepatch
capabilities.
Signed-off-by: Michael Kurth <mku@amazon.de> Reviewed-by: Eslam Elnikety <elnikety@amazon.de> Reviewed-by: Julien Grall <jgrall@amazon.co.uk> Reviewed-by: Robert Stonehouse <rjstone@amazon.co.uk> Reviewed-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Acked-by: Jan Beulich <jbeulich@suse.com>
Report LFENCE_SERIALISE unconditionally for DE_CFG on AMD hardware and
silently drop writes.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 4 Sep 2020 08:59:33 +0000 (10:59 +0200)]
x86: simplify is_guest_l2_slot()
is_pv_32bit_domain() has become expensive, and its use here is
redundant: Only 32-bit guests would ever get PGT_pae_xen_l2 set on
their L2 page table pages anyway. (If some other error does lead to
PGT_pae_xen_l2 ending up anywhere else, we still don't want to allow a
guest to control the entries.)
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 31 Aug 2020 11:18:42 +0000 (12:18 +0100)]
x86/pv: Fix multiple bugs with SEGBASE_GS_USER_SEL
The logic takes the segment selector unmodified from guest context. This
allowed the guest to load DPL0 descriptors into %gs. Fix up the RPL for
non-NUL selectors to be 3.
Xen's context switch logic skips saving the inactive %gs base, as it cannot be
modified by the guest behind Xen's back. This depends on Xen caching updates
to the inactive base, which is was missing from this path.
The consequence is that, following SEGBASE_GS_USER_SEL, the next context
switch will restore the stale inactive %gs base, and corrupt vcpu state.
Rework the hypercall to update the cached idea of gs_base_user, and fix the
behaviour in the case of the AMD NUL selector bug to always zero the segment
base.
Reported-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 10 Apr 2018 15:25:40 +0000 (16:25 +0100)]
x86/intel: Expose MSR_ARCH_CAPS to dom0
The overhead of (the lack of) MDS_NO alone has been measured at 30% on some
workloads. While we're not in a position yet to offer MSR_ARCH_CAPS generally
to guests, dom0 doesn't migrate, so we can pass a subset of hardware values
straight through.
This will cause PVH dom0's not to use KPTI by default, and all dom0's not to
use VERW flushing by default, and to use eIBRS in preference to retpoline on
recent Intel CPUs.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Sun, 23 Aug 2020 08:00:12 +0000 (10:00 +0200)]
tools: tweak tools/libs/libs.mk for being able to support libxenctrl
tools/libs/libs.mk needs to be modified for being able to support
building libxenctrl, as the pkg-config file of that library is not
following the same conventions as those of the other libraries.
So add support for specifying PKG_CONFIG before including libs.mk.
In order to make life easier for unstable libraries like libxenctrl
set MAJOR and MINOR automatically to the Xen-version and 0 when not
specified. This removes the need to bump the versions of unstable
libraries when switching to a new Xen version.
As all libraries built via libs.mk require a map file generate a dummy
one in case there is none existing. This again will help avoiding the
need to bump the libarary version in the map file of an unstable
library in case it is exporting all symbols.
The clean target is missing the removal of _paths.h.
Finally drop the foreach loop when setting PKG_CONFIG_LOCAL, as there
is always only one element in PKG_CONFIG.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Costin Lupu [Fri, 28 Aug 2020 07:17:44 +0000 (09:17 +0200)]
stubdom/grub: update init_netfront() call for mini-os
This patch updates the call of init_netfront() function according to its
recently updated declaration which can also include parameters for gateway
and netmask addresses. While we are here, the patch also removes passing
the ip parameter because (a) it is not used anywhere and (b) it wastes
memory since it would reference a dynamically allocated string.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Costin Lupu <costin.lupu@cs.pub.ro> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:12 +0000 (10:00 +0200)]
stubdom: simplify building xen libraries for stubdoms
The pattern for building a Xen library with sources under tools/libs
is always the same. Simplify stubdom/Makefile by defining a callable
make program for those libraries.
Even if not needed right now add the possibility for defining
additional dependencies for a library.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: generate most contents of library make variables
Library related make variables (CFLAGS_lib*, SHDEPS_lib*, LDLIBS_lib*
and SHLIB_lib*) mostly have a common pattern for their values. Generate
most of this content automatically by adding a new per-library variable
defining on which other libraries a lib is depending. Those definitions
are put into an own file in order to make it possible to include it
from various Makefiles, especially for stubdom.
This in turn makes it possible to drop the USELIB variable from each
library Makefile.
The LIBNAME variable can be dropped, too, as it can be derived from the
directory name the library is residing in.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: add a copy of library headers in tools/include
The headers.chk target in tools/Rules.mk tries to compile all headers
stand alone for testing them not to include any internal header.
Unfortunately the headers tested against are not complete, as any
header for a Xen library is not included in the include path of the
test compile run, resulting in a failure in case any of the tested
headers in including an official Xen library header.
Fix that by copying the official headers located in
tools/libs/*/include to tools/include.
In order to support libraries with header name other than xen<lib>.h
or with multiple headers add a LIBHEADER make variable a lib specific
Makefile can set in that case.
Move the headers.chk target from Rules.mk to libs.mk as it is used
for libraries in tools/libs only.
Add NO_HEADERS_CHK variable to skip checking headers as this will be
needed e.g. for libxenctrl.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Sun, 23 Aug 2020 08:00:11 +0000 (10:00 +0200)]
tools: switch XEN_LIBXEN* make variables to lower case (XEN_libxen*)
In order to harmonize names of library related make variables switch
XEN_LIBXEN* names to XEN_libxen*, as all other related variables (e.g.
CFLAGS_libxen*, SHDEPS_libxen*, ...) already use this pattern.
Rename XEN_LIBXC to XEN_libxenctrl, XEN_XENSTORE to XEN_libxenstore,
XEN_XENLIGHT to XEN_libxenlight, XEN_XLUTIL to XEN_libxlutil, and
XEN_LIBVCHAN to XEN_libxenvchan for the same reason.
Introduce XEN_libxenguest with the same value as XEN_libxenctrl.
No functional change.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wl@xen.org>
Andrew Cooper [Tue, 10 Apr 2018 15:25:40 +0000 (16:25 +0100)]
x86: Begin to introduce support for MSR_ARCH_CAPS
... including serialisation/deserialisation logic and unit tests.
There is no current way to configure this MSR correctly for guests.
The toolstack side this logic needs building, which is far easier to
do with it in place.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Thu, 27 Aug 2020 10:48:38 +0000 (10:48 +0000)]
gitignore: ignore ebmalloc.c soft link
A previous commit split ebmalloc to its own translation unit but forgot
to modify gitignore.
Fixes: 8856a914bffd ("build: also check for empty .bss.* in .o -> .init.o conversion") Signed-off-by: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 11 Aug 2020 08:02:01 +0000 (09:02 +0100)]
xl: add 'mtu' option to network configuration
This patch adds code to parse a value for MTU from the network configuration
if it is present. The documentation in xl-network-configuration.5.pod is
also modified accordingly.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wl@xen.org>