Dario Faggioli [Thu, 27 Feb 2020 18:10:18 +0000 (19:10 +0100)]
xen: credit2: limit the max number of CPUs in a runqueue
In Credit2 CPUs (can) share runqueues, depending on the topology. For
instance, with per-socket runqueues (the default) all the CPUs that are
part of the same socket share a runqueue.
On platform with a huge number of CPUs per socket, that could be a
problem. An example is AMD EPYC2 servers, where we can have up to 128
CPUs in a socket.
It is of course possible to define other, still topology-based, runqueue
arrangements (e.g., per-LLC, per-DIE, etc). But that may still result in
runqueues with too many CPUs on other/future platforms.
Therefore, let's set a limit to the max number of CPUs that can share a
Credit2 runqueue. The actual value is configurable (at boot time), the
default being 16. If, for instance, there are more than 16 CPUs in a
socket, they'll be split among two (or more) runqueues.
Note: with core scheduling enabled, this parameter sets the max number
of *scheduling resources* that can share a runqueue. Therefore, with
granularity set to core (and assumint 2 threads per core), we will have
at most 16 cores per runqueue, which corresponds to 32 threads. But that
is fine, considering how core scheduling works.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
--- Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <george.dunlap@citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Juergen Gross <jgross@suse.com>
---
Changes from v1:
- always try to add a CPU to the runqueue with the least CPUs already in
it. This should guarantee a more even distribution of CPUs among
runqueues, as requested during review;
- add a comment about why we also count siblings that are currently
outside of our cpupool, as suggested during review;
- deal with the case where the user is trying to set fewer CPUs per
runqueue than there are siblings per core, as requested during review;
- fix typos in comments;
Anthony PERARD [Wed, 20 May 2020 16:39:42 +0000 (17:39 +0100)]
tools/xenstore: mark variable in header as extern
This patch fix "multiple definition of `xprintf'" (or xgt_handle)
build error with GCC 10.1.0.
These are the error reported:
gcc xs_tdb_dump.o utils.o tdb.o talloc.o -o xs_tdb_dump
/usr/bin/ld: utils.o:./utils.h:27: multiple definition of `xprintf'; xs_tdb_dump.o:./utils.h:27: first defined here
[...]
gcc xenstored_core.o xenstored_watch.o xenstored_domain.o xenstored_transaction.o xenstored_control.o xs_lib.o talloc.o utils.o tdb.o hashtable.o xenstored_posix.o -lsystemd -Wl,-rpath-link=... ../libxc/libxenctrl.so -lrt -o xenstored
/usr/bin/ld: xenstored_watch.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
/usr/bin/ld: xenstored_domain.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
/usr/bin/ld: xenstored_transaction.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
/usr/bin/ld: xenstored_control.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
/usr/bin/ld: xenstored_posix.o:./xenstored_core.h:207: multiple definition of `xgt_handle'; xenstored_core.o:./xenstored_core.h:207: first defined here
A difference that I noticed with earlier version of the build chain is
that before, I had:
$ nm xs_tdb_dump.o | grep xprintf 0000000000000008 C xprintf
And now, it's: 0000000000000000 B xprintf
With the patch apply, the symbol isn't in xs_tdb_dump.o anymore.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 20 May 2020 10:49:28 +0000 (12:49 +0200)]
x86/mem-paging: further adjustments to p2m_mem_paging_prep()'s error handling
Address late comments on ecb913be4aaa ("x86/mem-paging: correct
p2m_mem_paging_prep()'s error handling"):
- insert a gprintk() ahead of domain_crash(),
- add a comment.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Wed, 20 May 2020 10:48:37 +0000 (12:48 +0200)]
x86/idle: rework C6 EOI workaround
Change the C6 EOI workaround (errata AAJ72) to use x86_match_cpu. Also
call the workaround from mwait_idle, previously it was only used by
the ACPI idle driver. Finally make sure the routine is called for all
states equal or greater than ACPI_STATE_C3, note that the ACPI driver
doesn't currently handle them, but the errata condition shouldn't be
limited by that.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
David Woodhouse [Wed, 20 May 2020 10:47:48 +0000 (12:47 +0200)]
x86/setup: lift dom0 creation out into create_dom0() function
The creation of dom0 can be relatively self-contained. Shift it into
a separate function and simplify __start_xen() a little bit.
This is a cleanup in its own right, but will be even more desireable
when live update provides an alternative path through __start_xen()
that doesn't involve creating a new dom0 at all.
Move the calculation of the 'initrd' parameter for create_dom0()
down past the cosmetic printk about NX support, because in the fullness
of time the whole initrd and create_dom0() part will be under the same
"not live update" conditional. And in the meantime it's just neater.
Also drop the explicit check for initrd to be module #0 since that would
be the dom0 kernel and the corresponding bit is always clear in
module_map.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Acked-by: Jan Beulich <jbeulich@suse.com>
Jason Andryuk [Tue, 19 May 2020 01:55:03 +0000 (21:55 -0400)]
libxl: Check stubdomain kernel & ramdisk presence
Just out of context is the following comment for libxl__domain_make:
/* fixme: this function can leak the stubdom if it fails */
When the stubdomain kernel or ramdisk is not present, the domid and
stubdomain name will indeed be leaked. Avoid the leak by checking the
file presence and erroring out when absent. It doesn't fix all cases,
but it avoids a big one when using a linux device model stubdomain.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: consider also qemu in stubdomain in libxl__dm_active check
Since qemu-xen can now run in stubdomain too, handle this case when
checking it's state too.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: ignore emulated IDE disks beyond the first 4
Qemu supports only 4 emulated IDE disks, when given more (or with higher
indexes), it will fail to start. Since the disks can still be accessible
using PV interface, just ignore emulated path and log a warning, instead
of rejecting the configuration altogether.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: require qemu in dom0 for multiple stubdomain consoles
Device model stubdomains (both Mini-OS + qemu-trad and linux + qemu-xen)
are always started with at least 3 consoles: log, save, and restore.
Until xenconsoled learns how to handle multiple consoles, this is needed
for save/restore support.
For Mini-OS stubdoms, this is a bug. In practice, it works in most
cases because there is something else that triggers qemu in dom0 too:
vfb/vkb added if vnc/sdl/spice is enabled.
Additionally, Linux-based stubdomain waits for all the backends to
initialize during boot. Lack of some console backends results in
stubdomain startup timeout.
This is a temporary patch until xenconsoled will be improved.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
[Updated commit message with Marek's explanation from mailing list.] Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: use vchan for QMP access with Linux stubdomain
Access to QMP of QEMU in Linux stubdomain is possible over vchan
connection. Handle the actual vchan connection in a separate process
(vchan-socket-proxy). This simplified integration with QMP (already
quite complex), but also allows preliminary filtering of (potentially
malicious) QMP input.
Since only one client can be connected to vchan server at the same time
and it is not enforced by the libxenvchan itself, additional client-side
locking is needed. It is implicitly implemented by vchan-socket-proxy,
as it handle only one connection at a time. Note that qemu supports only
one simultaneous client on a control socket anyway (but in UNIX socket
case, it enforce it server-side), so it doesn't add any extra
limitation.
libxl qmp client code already has locking to handle concurrent access
attempts to the same qemu qmp interface.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Squash in changes of regenerated autotools files.
Kill the vchan-socket-proxy so we don't leak the daemonized processes.
libxl__stubdomain_is_linux_running() works against the guest_domid, but
the xenstore path is beneath the stubdomain. This leads to the use of
libxl_is_stubdom in addition to libxl__stubdomain_is_linux_running() so
that the stubdomain calls kill for the qmp-proxy.
Also call libxl__qmp_cleanup() to remove the unix sockets used by
vchan-socket-proxy. vchan-socket-proxy only creates qmp-libxl-$domid,
and libxl__qmp_cleanup removes that as well as qmp-libxenstat-$domid.
However, it tolerates ENOENT, and a stray qmp-libxenstat-$domid should
not exist.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jason Andryuk [Tue, 19 May 2020 01:54:57 +0000 (21:54 -0400)]
libxl: Refactor kill_device_model to libxl__kill_xs_path
Move kill_device_model to libxl__kill_xs_path so we have a helper to
kill a process from a pid stored in xenstore. We'll be using it to kill
vchan-qmp-proxy.
libxl__kill_xs_path takes a "what" string for use in printing error
messages. kill_device_model is retained in libxl_dm.c to provide the
string.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Add a simple proxy for tunneling socket connection over vchan. This is
based on existing vchan-node* applications, but extended with socket
support. vchan-socket-proxy serves both as a client and as a server,
depending on parameters. It can be used to transparently communicate
with an application in another domian that normally expose UNIX socket
interface. Specifically, it's written to communicate with qemu running
within stubdom.
Server mode listens for vchan connections and when one is opened,
connects to a pointed UNIX socket. Client mode listens on UNIX
socket and when someone connects, opens a vchan connection. Only
a single connection at a time is supported.
Additionally, socket can be provided as a number - in which case it's
interpreted as already open FD (in case of UNIX listening socket -
listen() needs to be already called). Or "-" meaning stdin/stdout - in
which case it is reduced to vchan-node2 functionality.
Example usage:
1. (in dom0) vchan-socket-proxy --mode=client <DOMID>
/local/domain/<DOMID>/data/vchan/1234 /run/qemu.(DOMID)
2. (in DOMID) vchan-socket-proxy --mode=server 0
/local/domain/<DOMID>/data/vchan/1234 /run/qemu.(DOMID)
This will listen on /run/qemu.(DOMID) in dom0 and whenever connection is
made, it will connect to DOMID, where server process will connect to
/run/qemu.(DOMID) there. When client disconnects, vchan connection is
terminated and server vchan-socket-proxy process also disconnects from
qemu.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxenvchan.h include xenevtchn.h and xengnttab.h, so applications built
with it needs applicable -I in CFLAGS too.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: add save/restore support for qemu-xen in stubdomain
Rely on a wrapper script in stubdomain to attach relevant consoles to
qemu. The save console (1) must be attached to fdset/1. When
performing a restore, $STUBDOM_RESTORE_INCOMING_ARG must be replaced on
the qemu command line by "fd:$FD", where $FD is an open file descriptor
number to the restore console (2).
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Address TODO in dm_state_save_to_fdset: Only remove savefile for
non-stubdom.
Use $STUBDOM_RESTORE_INCOMING_ARG instead of fd:3 and update commit
message.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
tools/libvchan: notify server when client is connected
Let the server know when the client is connected. Otherwise server will
notice only when client send some data.
This change does not break existing clients, as libvchan user should
handle spurious notifications anyway (for example acknowledge of remote
side reading the data).
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Replace spaces with tabs to match the file's whitespace. Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xl: add stubdomain related options to xl config parser
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
libxl: write qemu arguments into separate xenstore keys
This allows using arguments with spaces, like -append, without
nominating any special "separator" character.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Write arguments in dm-argv directory instead of overloading mini-os's
dmargs string.
Make libxl__write_stub_dmargs vary behaviour based on the
is_linux_stubdom flag.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Anthony PERARD [Wed, 11 Mar 2020 17:59:33 +0000 (17:59 +0000)]
tools: Use INSTALL_PYTHON_PROG
Whenever python scripts are install, have the shebang be modified to use
whatever PYTHON_PATH is. This is useful for system where python isn't available, or
where the package build tools prevent unversioned shebang.
INSTALL_PYTHON_PROG only looks for "#!/usr/bin/env python".
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Eric Shelton [Tue, 19 May 2020 01:54:49 +0000 (21:54 -0400)]
libxl: Handle Linux stubdomain specific QEMU options.
This patch creates an appropriate command line for the QEMU instance
running in a Linux-based stubdomain.
NOTE: a number of items are not currently implemented for Linux-based
stubdomains, such as:
- save/restore
- QMP socket
- graphics output (e.g., VNC)
Signed-off-by: Eric Shelton <eshelton@pobox.com>
Simon:
* fix disk path
* fix cdrom path and "format"
Signed-off-by: Simon Gaiser <simon@invisiblethingslab.com>
[drop Qubes-specific parts] Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Allow setting stubdomain_ramdisk independently from stubdomain_kernel
Add a qemu- prefix for qemu-stubdom-linux-{kernel,rootfs} since stubdom
doesn't convey device-model. Use qemu- since this code is qemu specific.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Do not prohibit anymore using stubdomain with qemu-xen.
To help distingushing MiniOS and Linux stubdomain, add helper inline
functions libxl__stubdomain_is_linux() and
libxl__stubdomain_is_linux_running(). Those should be used where really
the difference is about MiniOS/Linux, not qemu-xen/qemu-xen-traditional.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
When qemu is running in stubdomain, any attempt to initialize vnc/sdl
there will crash it (on failed attempt to load a keymap from a file). If
vfb is present, all those cases are skipped. But since b053f0c4c9e533f3d97837cf897eb920b8355ed3 "libxl: do not start dom0 qemu
for stubdomain when not needed" it is possible to create a stubdomain
without vfb and contrary to the comment -vnc none do trigger VNC
initialization code (just skips exposing it externally).
Change the implicit SDL avoiding method to -nographics option, used when
none of SDL or VNC is enabled.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Add documentation for upcoming Linux stubdomain for qemu-upstream.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Add documentation based on reverse-engineered toolstack-ioemu stubdomain
protocol.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Mon, 18 May 2020 15:18:56 +0000 (17:18 +0200)]
x86: determine MXCSR mask in all cases
For its use(s) by the emulator to be correct in all cases, the filling
of the variable needs to be independent of XSAVE availability. As
there's no suitable function in i387.c to put the logic in, keep it in
xstate_init(), arrange for the function to be called unconditionally,
and pull the logic ahead of all return paths there.
Fixes: 9a4496a35b20 ("x86emul: support {,V}{LD,ST}MXCSR") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 18 May 2020 15:15:46 +0000 (17:15 +0200)]
x86/mem-paging: use guest handle for XENMEM_paging_op_prep
While it should have been this way from the beginning, not doing so will
become an actual problem with PVH Dom0. The interface change is binary
compatible, but requires tools side producers to be re-built.
Drop the bogus/unnecessary page alignment restriction on the input
buffer at the same time.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Mon, 18 May 2020 15:13:38 +0000 (17:13 +0200)]
x86/mm: no-one passes a NULL domain to init_xen_l4_slots()
Drop the NULL checks - they've been introduced by commit 8d7b633ada
("x86/mm: Consolidate all Xen L4 slot writing into
init_xen_l4_slots()") without giving a reason; I'm told this was done
in anticipation of the function potentially getting called with a NULL
argument down the road.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Sat, 16 May 2020 12:10:07 +0000 (13:10 +0100)]
x86/hvm: Fix memory leaks in hvm_copy_context_and_params()
Any error from hvm_save() or hvm_set_param() leaks the c.data allocation.
Spotted by Coverity.
Fixes: 353744830 "x86/hvm: introduce hvm_copy_context_and_params" Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
The dirty_cpu field of struct vcpu denotes which cpu still holds data
of a vcpu. All accesses to this field should be atomic in case the
vcpu could just be running, as it is accessed without any lock held
in most cases. Especially sync_local_execstate() and context_switch()
for the same vcpu running concurrently have a risk for failing.
There are some instances where accesses are not atomically done, and
even worse where multiple accesses are done when a single one would
be mandated.
Correct that in order to avoid potential problems.
Add some assertions to verify dirty_cpu is handled properly.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Thu, 14 May 2020 15:36:13 +0000 (17:36 +0200)]
xen/sched: don't call sync_vcpu_execstate() in sched_unit_migrate_finish()
With support of core scheduling sched_unit_migrate_finish() gained a
call of sync_vcpu_execstate() as it was believed to be called as a
result of vcpu migration in any case.
In case of migrating a vcpu away from a physical cpu for a short period
of time but without ever being scheduled on the selected new cpu, this
might not be true so drop the call and let the lazy state syncing do its
job.
Roger Pau Monne [Mon, 11 May 2020 10:31:45 +0000 (12:31 +0200)]
changelog: add relevant changes during 4.14 development window
Add entries for the relevant changes I've been working on during the
4.14 development time frame. Mostly performance improvements related
to pvshim scalability issues when running with high number of vCPUs.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Paul Durrant <paul@xen.org>
This resolves the following observed error during config merge:
/bin/sh /path/to/xen/xen/../xen/tools/kconfig/merge_config.sh -m .config /path/to/xen/xen/../xen/arch/arm/configs/custom.config
Using .config as base
Merging /path/to/xen/xen/../xen/arch/arm/configs/custom.config
#
# merged configuration written to .config (needs make)
#
make -f /path/to/xen/xen/../xen/Makefile olddefconfig
make[2]: Entering directory '/path/to/xen/xen'
make[2]: *** No rule to make target 'olddefconfig'. Stop.
make[2]: Leaving directory '/path/to/xen/xen'
tools/kconfig/Makefile:95: recipe for target 'custom.config' failed
The build was invoked by first doing a defconfig (which succeeded):
$ make -C xen XEN_TARGET_ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- defconfig
Followed by the config fragment merge command (which failed before this patch)
Hongyan Xia [Wed, 13 May 2020 15:43:33 +0000 (16:43 +0100)]
domain_page: handle NULL within unmap_domain_page() itself
The macro version UNMAP_DOMAIN_PAGE() does both NULL checking and
variable clearing. Move NULL checking into the function itself so that
the semantics is consistent with other similar constructs like XFREE().
This also eases the use unmap_domain_page() in error handling paths,
where we only care about NULL checking but not about variable clearing.
Signed-off-by: Hongyan Xia <hongyxia@amazon.com> Reviewed-by: Wei Liu <wl@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Fri, 15 May 2020 14:16:29 +0000 (16:16 +0200)]
x86: retrieve and log CPU frequency information
While from just a single Skylake system it is already clear that we
can't base any of our logic on CPUID leaf 15 [1] (leaf 16 is
documented to be used for display purposes only anyway), logging this
information may still give us some reference in case of problems as well
as for future work. Additionally on the AMD side it is unclear whether
the deviation between reported and measured frequencies is because of us
not doing well, or because of nominal and actual frequencies being quite
far apart.
The chosen variable naming in amd_log_freq() has pointed out a naming
problem in rdmsr_safe(), which is being taken care of at the same time.
Symmetrically wrmsr_safe(), being an inline function, also gets an
unnecessary underscore dropped from one of its local variables.
[1] With a core crystal clock of 24MHz and a ratio of 216/2, the
reported frequency nevertheless is 2600MHz, rather than the to be
expected (and calibrated by both us and Linux) 2592MHz.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Juergen Gross [Fri, 15 May 2020 14:04:00 +0000 (16:04 +0200)]
sched: allow rcu work to happen when syncing cpus in core scheduling
With RCU barriers moved from tasklets to normal RCU processing cpu
offlining in core scheduling might deadlock due to cpu synchronization
required by RCU processing and core scheduling concurrently.
Fix that by bailing out from core scheduling synchronization in case
of pending RCU work. Additionally the RCU softirq is now required to
be of higher priority than the scheduling softirqs in order to do
RCU processing before entering the scheduler again, as bailing out from
the core scheduling synchronization requires to raise another softirq
SCHED_SLAVE, which would bypass RCU processing again.
Communicating errors from p2m_set_entry() to the caller is not enough:
Neither the M2P nor the stats updates should occur in such a case.
Instead the allocated page needs to be freed again; for cleanliness
reasons also properly take into account _PGC_allocated there.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 15 May 2020 13:57:56 +0000 (15:57 +0200)]
x86/mem-paging: fold p2m_mem_paging_prep()'s main if()-s
The condition of the second can be true only if the condition of the
first was met; the second half of the condition of the second then also
is redundant with an earlier check. Combine them, drop a pointless
local variable, and take the liberty to drop the affected gdprintk()
altogether, as we don't normally log anything on -EFAULT paths.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 14 May 2020 13:04:32 +0000 (15:04 +0200)]
x86/APIC: restrict certain messages to BSP
All CPUs get an equal setting of EOI broadcast suppression; no need to
log one message per CPU, even if it's only in verbose APIC mode.
Only the BSP is eligible to possibly get ExtINT enabled; no need to log
that it gets disabled on all APs, even if - again - it's only in verbose
APIC mode.
Take the opportunity and introduce a "bsp" parameter to the function, to
stop using smp_processor_id() to tell BSP from APs. No functional change
from this.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
xen: Allow EXPERT mode to be selected from the menuconfig directly
EXPERT mode is currently used to gate any options that are in technical
preview or not security supported. At the moment, this is selected by
adding XEN_CONFIG_EXPERT=y on the make command line, or to the
(currently undocumented) top-level .config file.
This makes the option very unintuitive to use: If the user forgets to
add the option when (re)building or when using menuconfig, then
xen/.config will be silently rewritten, leading to behavior which is
very difficult to diagnose. Adding XEN_CONFIG_EXPERT=y to the
top-level .config is not obvious behavior, particularly as the file is
undocumented.
A lot of the options behind EXPERT would benefit from being more
accessible so users can experiment with them and voice any concerns
before they are fully supported.
To make this option more discoverable and consistent to use, make it
possible to select it from the menuconfig.
This doesn't change the fact a Xen with EXPERT mode selected will not
be security supported.
Signed-off-by: Julien Grall <jgrall@amazon.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/Kconfig: define EXPERT a bool rather than a string
Since commit f80fe2b34f08 "xen: Update Kconfig to Linux v5.4" EXPERT
can only have two values (enabled or disabled). So switch from a string
to a bool.
Take the opportunity to replace all "EXPERT = y" to "EXPERT" and use
squash the lines bool and prompt together in modified place.
tools/xenstore: don't store domU's mfn of ring page in xenstored
The XS_INTRODUCE command has two parameters: the mfn (or better: gfn)
of the domain's xenstore ring page and the event channel of the
domain for communicating with Xenstore.
The gfn is not really needed. It is stored in the per-domain struct
in xenstored and in case of another XS_INTRODUCE for the domain it
is tested to match the original value. If it doesn't match the
command is aborted via EINVAL, otherwise the event channel to the
domain is recreated.
As XS_INTRODUCE is limited to dom0 and there is no real downside of
recreating the event channel just omit the test for the gfn to
match and don't return EINVAL for multiple XS_INTRODUCE calls.
Andrew Cooper [Tue, 12 May 2020 18:18:43 +0000 (19:18 +0100)]
x86/build: Unilaterally disable -fcf-protection
Xen doesn't support CET-IBT yet. At a minimum, logic is required to enable it
for supervisor use, but the livepatch functionality needs to learn not to
overwrite ENDBR64 instructions.
Furthermore, Ubuntu enables -fcf-protection by default, along with a buggy
version of GCC-9 which objects to it in combination with
-mindirect-branch=thunk-extern (Fixed in GCC 10, 9.4).
Various objects (Xen boot path, Rombios 32 stubs) require .text to be at the
beginning of the object. These paths explode when .note.gnu.properties gets
put ahead of .text and we end up executing the notes data.
Disable -fcf-protection for all embedded objects.
Reported-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 13 May 2020 12:06:28 +0000 (13:06 +0100)]
x86/build: move -fno-asynchronous-unwind-tables into EMBEDDED_EXTRA_CFLAGS
Users of EMBEDDED_EXTRA_CFLAGS already use -fno-asynchronous-unwind-tables, or
ought to. This shrinks the size of the rombios 32bit stubs in guest memory.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 12 May 2020 18:18:37 +0000 (19:18 +0100)]
x86/build32: Discard all orphaned sections
Linkers may put orphaned sections ahead of .text, which breaks the calling
requirements. A concrete example is Ubuntu's GCC-9 default of enabling
-fcf-protection which causes us to try and execute .note.gnu.properties during
Xen's boot.
Put .got.plt in its own section as it specifically needs preserving from the
linkers point of view, and discard everything else. This will hopefully be
more robust to other unexpected toolchain properties.
Fixes boot from an Ubuntu build of Xen.
Reported-by: Jason Andryuk <jandryuk@gmail.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 12 May 2020 16:21:33 +0000 (17:21 +0100)]
x86/guest: Fix assembler warnings with newer binutils
GAS of at least version 2.34 complains:
hypercall_page.S: Assembler messages:
hypercall_page.S:24: Warning: symbol 'HYPERCALL_set_trap_table' already has its type set
...
hypercall_page.S:71: Warning: symbol 'HYPERCALL_arch_7' already has its type set
which is because the whole page is declared as STT_OBJECT already. Rearrange
.set with respect to .type in DECLARE_HYPERCALL() so STT_FUNC is already in
place.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 2 Mar 2020 14:36:03 +0000 (14:36 +0000)]
tools/libxc: Reduce feature handling complexity in xc_cpuid_apply_policy()
xc_cpuid_apply_policy() is gaining extra parameters to untangle CPUID
complexity in Xen. While an improvement in general, it does have the
unfortunate side effect of duplicating some settings across multiple
parameters.
Rearrange the logic to only consider 'pae' if no explicit featureset is
provided. This reduces the complexity for callers who have already provided a
pae setting in the featureset.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <pdurrant@amzn.com> Acked-by: Wei Liu <wl@xen.org>
Add a README and package comment giving a brief overview of the package.
These also help pkg.go.dev generate better documentation.
Also, add a copy of the LGPL (the same license used by libxl) to
tools/golang/xenlight. This is required for the package to be shown
on pkg.go.dev and added to the default module proxy, proxy.golang.org.
Finally, add an entry for the xenlight package to SUPPORT.md.
Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Nick Rosbrook [Wed, 13 May 2020 00:55:50 +0000 (20:55 -0400)]
golang/xenlight: add NameToDomid and DomidToName util functions
Many exported functions in xenlight require a domid as an argument. Make
it easier for package users to use these functions by adding wrappers
for the libxl utility functions libxl_name_to_domid and
libxl_domid_to_name.
Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Paul Durrant [Tue, 28 Apr 2020 15:06:24 +0000 (16:06 +0100)]
docs/designs: re-work the xenstore migration document...
... to specify a separate migration stream that will also be suitable for
live update.
The original scope of the document was to support non-cooperative migration
of guests [1] but, since then, live update of xenstored has been brought into
scope. Thus it makes more sense to define a separate image format for
serializing xenstore state that is suitable for both purposes.
The document has been limited to specifying a new image format. The mechanism
for acquiring the image for live update or migration is not covered as that
is more appropriately dealt with by a patch to docs/misc/xenstore.txt. It is
also expected that, when the first implementation of live update or migration
making use of this specification is committed, that the document is moved from
docs/designs into docs/specs.
NOTE: It will only be necessary to save and restore state for active xenstore
connections, but the documentation for 'RESUME' in xenstore.txt implies
otherwise. That command is unused so this patch deletes it from the
specification.
[1] See https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/designs/non-cooperative-migration.md
Julien Grall [Sat, 2 May 2020 14:26:10 +0000 (15:26 +0100)]
xen/x86: atomic: Don't allow to write atomically in a pointer to const
At the moment, write_atomic() will happily write to a pointer to const.
While there are no use in Xen, it would be best to catch them at
compilation time.
Signed-off-by: Julien Grall <jgrall@amazon.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Sat, 2 May 2020 15:39:58 +0000 (16:39 +0100)]
xen/arm: atomic: Rewrite write_atomic()
The current implementation of write_atomic has two issues:
1) It cannot be used to write pointer value because the switch
contains cast to other size than the size of the pointer.
2) It will happily allow to write to a pointer to const.
Additionally, the Arm implementation is returning a value when the x86
implementation does not anymore. This was introduced in commit 2934148a0773 "x86: simplify a few macros / inline functions". There are
no users of the return value, so it is fine to drop it.
The switch is now moved in a static inline helper allowing the compiler
to prevent use of const pointer and also allow to write pointer value.
Julien Grall [Sat, 2 May 2020 14:06:22 +0000 (15:06 +0100)]
xen/arm: atomic: Allow read_atomic() to be used in more cases
The current implementation of read_atomic() on Arm will not allow to:
1) Read a value from a pointer to const because the temporary
variable will be const and therefore it is not possible to assign
any value. This can be solved by using a union between the type and
a char[0].
2) Read a pointer value (e.g void *) because the switch contains
cast from other type than the size of a pointer. This can be solved by
by introducing a static inline for the switch and use void * for the
pointer.
tools/xl: vcpu-pin: Skip global affinity when the hard affinity is not changed
After XSA-273, it is not possible to modify the vCPU soft affinity using
xl vcpu-pin without modifying the hard affinity. Instead the command
will crash.
42sh> gdb /usr/local/sbin/xl
(gdb) r vcpu-pin 0 0 - 10
[...]
Program received signal SIGSEGV, Segmentation fault.
[...]
(gdb) bt
This is happening because 'xl' will use NULL when an affinity doesn't
need to be modified. However, we will still try to apply the global
affinity in the this case.
As the hard affinity is not changed, then we don't need to apply the
global affinity. So skip it when hard is NULL.
Backport: 4.6+ # Any release with XSA-273 Fixes: aa67b97ed342 ("xl.conf: Add global affinity masks") Reported-by: Pawel Wieczorkiewicz <wipawel@amazon.de> Signed-off-by: Julien Grall <jgrall@amazon.com> Acked-by: Wei Liu <wl@xen.org>
Roger Pau Monne [Tue, 5 May 2020 09:24:54 +0000 (11:24 +0200)]
tools/libxl: disable clang indentation check for the disk parser
Clang 10 complains with:
13: error: misleading indentation; statement is not part of the previous 'if'
[-Werror,-Wmisleading-indentation]
if ( ! yyg->yy_state_buf )
^
libxlu_disk_l.c:1259:9: note: previous statement is here
if ( ! yyg->yy_state_buf )
^
Due to the missing braces in single line statements and the wrong
indentation. Fix this by disabling the warning for that specific file.
I haven't found a way to force flex to add braces around single line
statements in conditional blocks.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
[ wei: regenerate output files ] Acked-by: Wei Liu <wl@xen.org>
Juergen Gross [Fri, 8 May 2020 08:47:38 +0000 (10:47 +0200)]
sched: always modify vcpu pause flags atomically
credit2 is currently modifying the pause flags of vcpus non-atomically
via sched_set_pause_flags() and sched_clear_pause_flags(). This is
dangerous as there are cases where the paus flags are modified without
any lock held.
So drop the non-atomic pause flag modification functions and rename the
atomic ones dropping the _atomic suffix.
Juergen Gross [Fri, 8 May 2020 08:44:22 +0000 (10:44 +0200)]
cpupool: fix removing cpu from a cpupool
Commit cb563d7665f2 ("xen/sched: support core scheduling for moving
cpus to/from cpupools") introduced a regression when trying to remove
an offline cpu from a cpupool, as the system would crash in this
situation.
Fix that by testing the cpu to be online.
Fixes: cb563d7665f2 ("xen/sched: support core scheduling for moving cpus to/from cpupools") Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Dario Faggioli <dfaggioli@suse.com>
Sergey Dyasli [Wed, 6 May 2020 10:00:24 +0000 (11:00 +0100)]
sched: print information about scheduling granularity
Currently it might be not obvious which scheduling mode (e.g. core-
scheduling) is being used by the scheduler. Alleviate this by printing
additional information about the selected granularity per-cpupool.
Note: per-cpupool granularity selection is not implemented yet. Every
cpupool gets its granularity from the single global value.
Take this opportunity to introduce struct sched_gran_name array and
refactor sched_select_granularity().
Andrew Cooper [Tue, 12 Feb 2019 18:37:04 +0000 (18:37 +0000)]
x86/svm: Use flush-by-asid when available
AMD Fam15h processors introduced the flush-by-asid feature, for more fine
grain flushing purposes.
Flushing everything including ASID 0 (i.e. Xen context) is an an unnecesserily
large hammer, and never necessary in the context of guest TLBs needing
invalidating.
When available, use TLB_CTRL_FLUSH_ASID in preference to TLB_CTRL_FLUSH_ALL.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 21 Apr 2020 17:18:08 +0000 (18:18 +0100)]
x86/svm: Clean up vmcbcleanbits_t handling
Rework the vmcbcleanbits_t definitons to use bool, drop 'fields' from the
namespace, position the comments in an unambiguous position, and include the
bit position.
In svm_vmexit_handler(), don't bother conditionally writing ~0 or 0 based on
hardware support. The field was entirely unused and ignored on older
hardware (and we're already setting reserved cleanbits anyway).
In nsvm_vmcb_prepare4vmrun(), simplify the logic massively by dropping the
vcleanbit_set() macro using a vmcbcleanbits_t local variable which only gets
filled in the case that clean bits were valid previously. Fix up the style on
impacted lines.
No practical change in behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 7 May 2020 11:18:24 +0000 (13:18 +0200)]
Arm: fix build with CONFIG_DTB_FILE set
Recent changes no longer allow modification of AFLAGS. The needed
conversion was apparently missed in 2740d96efdd3 ("xen/build: have the
root Makefile generates the CFLAGS").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Thu, 7 May 2020 11:15:13 +0000 (13:15 +0200)]
x86: adjustments to guest handle treatment
First of all avoid excessive conversions. copy_{from,to}_guest(), for
example, work fine with all of XEN_GUEST_HANDLE{,_64,_PARAM}().
Further
- do_physdev_op_compat() didn't use the param form for its parameter,
- {hap,shadow}_track_dirty_vram() wrongly used the param form,
- compat processor Px logic failed to check compatibility of native and
compat structures not further converted.
As this eliminates all users of guest_handle_from_param() and as there's
no real need to allow for conversions in both directions, drop the
macros as well.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Roger Pau Monne [Tue, 5 May 2020 09:24:53 +0000 (11:24 +0200)]
configure: also add EXTRA_PREFIX to {CPP/LD}FLAGS
The path provided by EXTRA_PREFIX should be added to the search path
of the configure script, like it's done in Config.mk. Not doing so
makes the search path for configure differ from the search path used
by the build.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
[ wei: run autogen.sh ] Acked-by: Wei Liu <wl@xen.org>
Wei Liu [Wed, 29 Apr 2020 10:41:44 +0000 (11:41 +0100)]
x86/hyperv: stash and use the configured max VP index
The value returned from CPUID is the maximum number for virtual
processors supported by Hyper-V. It could be larger than the maximum
number of virtual processors configured.
Stash the configured number into a variable and use it in calculations.
Signed-off-by: Wei Liu <liuwe@microsoft.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Fri, 24 Jan 2020 17:52:52 +0000 (17:52 +0000)]
tools/libxl: Plumb a restore boolean into libxl__domain_build_state
To fix CPUID handling, libxl__build_pre() is going to have to distinguish
between a brand new VM vs one which is being migrated-in/resumed.
Transcribe dcs->restore_fd into dbs->restore in initiate_domain_create()
only (specifically avoiding the stubdom state in libxl__spawn_stub_dm()).
While tweaking initiate_domain_create(), make a new dbs alias and simplify
later code, and drop the local restore_fd alias as the new dbs->restore is
more intuitive in context.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ashok Raj [Wed, 28 Feb 2018 10:28:42 +0000 (10:28 +0000)]
x86/ucode/intel: Writeback and invalidate caches before updating microcode
Updating microcode is less error prone when caches have been flushed and
depending on what exactly the microcode is updating. For example, some of the
issues around certain Broadwell parts can be addressed by doing a full cache
flush.
Signed-off-by: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
[Linux commit 91df9fdf51492aec9fed6b4cbd33160886740f47, ported to Xen] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 24 Apr 2020 13:38:02 +0000 (14:38 +0100)]
x86/smpboot: Write the top-of-stack block in cpu_smpboot_alloc()
This allows the AP boot assembly use per-cpu variables, and brings the
semantics closer to that of the BSP, which can use per-cpu variables from the
start of day.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 5 May 2020 13:03:35 +0000 (14:03 +0100)]
x86/pv: Fix Clang build with !CONFIG_PV32
Clang 3.5 doesn't do enough dead-code-elimination to drop the compat_gdt
reference, resulting in a linker failure:
hidden symbol `per_cpu__compat_gdt' isn't defined
Drop the local variable, and move the evaluation of this_cpu(compat_gdt) to
within the guarded region.
Reported-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Tue, 5 May 2020 10:27:22 +0000 (11:27 +0100)]
x86/pv: Prune include lists
Several of these in particular haven't been pruned since the logic was all
part of arch/x86/traps.c
Some adjustments to header files are required to avoid compile errors:
* emulate.h needs xen/sched.h because gdt_ldt_desc_ptr() uses v->vcpu_id.
* mmconfig.h needs to forward declare acpi_table_header.
* shadow.h and trace.h need to have uint*_t in scope before including the Xen
public headers. For shadow.h, reorder the includes. For trace.h, include
types.h
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 5 May 2020 10:17:32 +0000 (11:17 +0100)]
x86/pv: Compile out emul-gate-op in !CONFIG_PV32 builds
The caller is already guarded by is_pv_32bit_vcpu().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 5 May 2020 07:52:28 +0000 (09:52 +0200)]
x86/hvm: simplify hvm_physdev_op allowance control
PVHv1 dom0 was given access to all PHYSDEVOP hypercalls, and such
restriction was not removed when PVHv1 code was removed. As a result
the switch in hvm_physdev_op was more complicated than required, and
relied on PVHv2 dom0 not having PIRQ support in order to prevent
access to some PV specific PHYSDEVOPs.
Fix this by moving the default case to the bottom of the switch, since
there's no need for any fall through now. Also remove the hardware
domain check, as all the not explicitly listed PHYSDEVOPs are
forbidden for HVM domains.
Finally tighten the condition to allow usage of
PHYSDEVOP_pci_mmcfg_reserved: apart from having vPCI enabled it should
only be used by the hardware domain. Note that the code in
do_physdev_op is already restricting the call to privileged domains
only, but it can be further restricted to the hardware domain only, as
other privileged domains don't have access to MMCFG regions anyway.
Overall no functional change should arise from this change.
Reported-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 5 May 2020 07:50:54 +0000 (09:50 +0200)]
x86emul: extend x86_insn_is_mem_write() coverage
Several insns were missed when this function was first added. As far as
insns already supported by the emulator go - SMSW and {,V}STMXCSR were
wrongly considered r/o insns so far.
Insns like the VMX, SVM, or CET-SS ones, PTWRITE, or AMD's new SNP ones
are intentionally not covered just yet. VMPTRST is put there just to
complete the respective group.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 30 Apr 2020 09:47:14 +0000 (10:47 +0100)]
x86/amd: Initial support for Fam19h processors
Fam19h is very similar to Fam17h in these regards.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 4 May 2020 09:54:35 +0000 (11:54 +0200)]
x86/HyperV: correct hv_hcall_page for xen.efi build
Along the lines of what the not reverted part of 3c4b2eef4941 ("x86:
refine link time stub area related assertion") did, we need to transform
the absolute HV_HCALL_PAGE into the image base relative hv_hcall_page
(or else there'd be no need for two distinct symbols). Otherwise
mkreloc, as used for generating the base relocations of xen.efi, will
spit out warnings like "Difference at .text:0009b74f is 0xc0000000
(expected 0x40000000)". As long as the offending relocations are PC
relative ones, the generated binary is correct afaict, but if there ever
was the absolute address stored, xen.efi would miss a fixup for it.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org>
Roger Pau Monné [Mon, 4 May 2020 09:53:01 +0000 (11:53 +0200)]
x86/hap: be more selective with assisted TLB flush
When doing an assisted flush on HAP the purpose of the
on_selected_cpus is just to trigger a vmexit on remote CPUs that are
in guest context, and hence just using is_vcpu_dirty_cpu is too lax,
also check that the vCPU is running. Due to the lazy context switching
done by Xen dirty_cpu won't always be cleared when the guest vCPU is
not running, and hence relying on is_running allows more fine grained
control of whether the vCPU is actually running.
I've measured the time of the non-local branch of flush_area_mask
inside the shim running with 32vCPUs over 100000 executions and
averaged the result on a large Westmere system (80 ways total). The
figures where fetched during the boot of a SLES 11 PV guest. The
results are as follow (less is better):
Non assisted flush with x2APIC: 112406ns
Assisted flush without this patch: 820450ns
Assisted flush with this patch: 8330ns
While there also pass NULL as the data parameter of on_selected_cpus,
the dummy handler doesn't consume the data in any way.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 4 May 2020 09:51:47 +0000 (11:51 +0200)]
xenoprof: limit scope of types and #define-s
Quite a few of the items are used by xenoprof.c only, so move them there
to limit their visibility as well as the amount of re-building needed in
case of changes. Also drop the inclusion of the public header there.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Mon, 4 May 2020 09:51:18 +0000 (11:51 +0200)]
xenoprof: drop unused struct xenoprof fields
Both is_primary and domain_ready are only ever written to. Drop both
fields and restrict structure visibility to just the one involved CU.
While doing so (and just for starters) make "is_compat" properly bool.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
Jan Beulich [Mon, 4 May 2020 09:48:13 +0000 (11:48 +0200)]
xenoprof: adjust ordering of page sharing vs domain type setting
Buffer pages should be shared with "ignored" or "active" guests only
(besides, obviously, the primary profiling domain). Hence domain type
should be set to "ignored" before unsharing from the primary domain
(which implies even a previously "passive" domain may then access its
buffers, albeit that's not very useful unless it gets promoted to
"active" subsequently), i.e. such that no further writes of records to
the buffer would occur, and (at least for consistency) also before
sharing it (with the calling domain) from the XENOPROF_get_buffer path.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org>
Jan Beulich [Thu, 30 Apr 2020 08:45:09 +0000 (10:45 +0200)]
x86/CPUID: correct error indicator for max extended leaf
With the max base leaf using 0, this one should be using the extended
leaf counterpart thereof, rather than some arbitrary extended leaf.
Fixes: 588a966a572e ("libx86: Introduce x86_cpu_policies_are_compatible()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>