Roger Pau Monne [Mon, 20 May 2019 10:24:11 +0000 (12:24 +0200)]
pci: switch PCI capabilities related functions to use pci_sbdf_t
Since pci_dev already has a pci_sbdf_t field switch the capability
related functions SBDF parameters to a single pci_sbdf_t parameter.
No functional change expected.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tim Deegan <tim@xen.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Brian Woods <brian.woods@amd.com> Cc: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monne [Mon, 20 May 2019 10:24:11 +0000 (12:24 +0200)]
print: introduce a format specifier for pci_sbdf_t
The new format specifier is '%pp', and prints a pci_sbdf_t using the
seg:bus:dev.func format. Replace all SBDFs printed using
'%04x:%02x:%02x.%u' to use the new format specifier.
No functional change expected.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tim Deegan <tim@xen.org> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Brian Woods <brian.woods@amd.com> Cc: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monne [Mon, 20 May 2019 10:24:11 +0000 (12:24 +0200)]
pci: switch pci_conf_{read/write} to use pci_sbdf_t
pci_dev already uses pci_sbdf_t, so propagate the usage of the type to
pci_conf functions in order to shorten the calls when made from a
pci_dev struct.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tim Deegan <tim@xen.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Brian Woods <brian.woods@amd.com> Cc: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monne [Mon, 20 May 2019 10:24:10 +0000 (12:24 +0200)]
pci: use function generation macros for pci_config_{write,read}<size>
This avoids code duplication between the helpers.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Mon, 20 May 2019 10:24:10 +0000 (12:24 +0200)]
pci: use pci_sbdf_t in pci_dev
This patch replaces the seg, bus and devfn fields of the struct and
fixes the callers. While there instances of u<size> have also been
replaced with uint<size>_t.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
--- Cc: Jan Beulich <jbeulich@suse.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Julien Grall <julien.grall@arm.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tim Deegan <tim@xen.org> Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Brian Woods <brian.woods@amd.com> Cc: Kevin Tian <kevin.tian@intel.com>
Andrii Anisov [Wed, 8 May 2019 09:59:38 +0000 (12:59 +0300)]
xen:arm: we never get into schedule_tail() with prev==current
ARM's schedule_tail() is called from two places: context_switch() and
continue_new_vcpu(). Both functions are always called with
prev!=current. So replace the correspondent check in schedule_tail()
with ASSERT() which is the development (debug) build guard.
xen/arm: Extend SCIF early prink code to handle other interfaces
Extend early prink code to be able to handle other SCIF(X)
compatible interfaces as well. These interfaces have lot in common,
but mostly differ in offsets and bits for some registers.
Introduce "EARLY_PRINTK_VERSION" config option to choose which
interface version should be used (to properly apply register offsets).
Please note, nothing has been technically changed for Renesas "Lager"
and other supported boards (SCIF).
The "EARLY_PRINTK_VERSION" option for that board should be empty:
CONFIG_EARLY_PRINTK=scif,0xe6e60000
Jan Beulich [Thu, 16 May 2019 11:43:54 +0000 (13:43 +0200)]
page-alloc: accompany BUG() with printk()
Log information likely relevant for understanding why the BUG()s were
triggering.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citirx.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 16 May 2019 11:43:17 +0000 (13:43 +0200)]
x86emul: add support for missing {,V}PMADDWD insns
Their pre-AVX512 incarnations have clearly been overlooked during much
earlier work. Their memory access pattern is entirely standard, so no
specific tests get added to the harness.
Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Alexandru Isaila <aisaila@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Mon, 15 Aug 2016 10:32:56 +0000 (11:32 +0100)]
tools: remove blktap2 related code and documentation
Blktap2 is effectively dead for a few years.
Notable changes in this patch:
0. Unhook blktap2 from build system
1. libxl no longer supports TAP disk backend, with appropriate assertions
added and some code paths now return ERROR_FAIL
2. Tap is no longer a supported backend
3. Remove blktap2 entry from MAINTAINERS
A patch to remove blktap2 directory will come later.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Roger Pau Monne [Tue, 14 May 2019 13:59:22 +0000 (15:59 +0200)]
pvshim: make PV shim build selectable from configure
So a user can decide whether to compile a PV shim as part of the tools
build. Note that the default behavior is preserved, which is to build
a PV shim when the target or host (if target is unset) architecture is
64bit x86.
Requested-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: run autogen.s ] Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Eslam Elnikety [Tue, 14 May 2019 08:43:25 +0000 (08:43 +0000)]
libxl: make vkbd tunable for HVM guests
Each HVM guest currently gets a vkbd frontend/backend pair (c/s ebbd2561b4c).
This consumes host resources unnecessarily for guests that have no use for
vkbd. Make this behaviour tunable to allow an administrator to choose. The
commit retains the current behaviour -- HVM guests still get vkdb unless
specified otherwise.
Signed-off-by: Eslam Elnikety <elnikety@amazon.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Tue, 14 May 2019 08:05:58 +0000 (10:05 +0200)]
libxl: fix migration of PV and PVH domUs with and without qemu
If a domU has a qemu-xen instance attached, it is required to call qemus
"xen-save-devices-state" method. Without it, the receiving side of a PV or
PVH migration may be unable to lock the image:
xen be: qdisk-51712: xen be: qdisk-51712: error: Failed to get "write" lock
error: Failed to get "write" lock
xen be: qdisk-51712: xen be: qdisk-51712: initialise() failed
initialise() failed
To fix this bug, libxl__domain_suspend_device_model() and
libxl__domain_resume_device_model() have to be called not only for HVM,
but also if the active device_model is QEMU_XEN.
Unfortunately, libxl__domain_build_info_setdefault() used to hardcode
b_info->device_model_version to QEMU_XEN if it does not know it any
better. As a result libxl__device_model_version_running() will return
incorrect values. This breaks domUs without a device_model.
libxl__qmp_stop() would wait 10 seconds in qmp_open() for a qemu that
will never appear. During this long timeframe the domU remains in state
paused on the sending side. As a result network connections may be
dropped. Once this bug is fixed as well, by just removing the assumption
that every domU has a QEMU_XEN, there is no code to actually initialise
b_info->device_model_version.
There is a helper function libxl__need_xenpv_qemu(), which is used in
various places to decide if a device_model has to be spawned. This
function can not be used as is, just to fill device_model_version,
because store_libxl_entry() was already called earlier.
Introduce LIBXL_DEVICE_MODEL_VERSION_NONE for PV and PVH that have no
need for a device_model to make the state explicit. Indicate this new
state via LIBXL_HAVE macro in libxl.h.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Roger Pau Monné <roger.pau@citrix.com> Cc: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Tue, 14 May 2019 07:27:41 +0000 (09:27 +0200)]
libxl: add helper function to set device_model_version
An upcoming change will set the value of device_model_version properly
also for the non-HVM case.
Move existing code to new function libxl__domain_set_device_model.
Move also initialization for device_model_stubdomain to that function.
Make sure libxl__domain_build_info_setdefault is called with
device_model_version set.
Update libxl__spawn_stub_dm() and initiate_domain_create() to call the
new function prior libxl__domain_build_info_setdefault() because
device_mode_version is expected to be initialzed.
libxl_domain_need_memory() needs no update because it does not have a
d_config available anyway, and the callers provide a populated b_info.
The upcoming change needs a full libxl_domain_config, and the existing
libxl__domain_build_info_setdefault has just a libxl_domain_build_info
to work with.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Wed, 12 Dec 2018 19:22:15 +0000 (19:22 +0000)]
x86/spec-ctrl: Introduce options to control VERW flushing
The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:
MLPDS - Microarchitectural Load Port Data Sampling
MSBDS - Microarchitectural Store Buffer Data Sampling
MFBDS - Microarchitectural Fill Buffer Data Sampling
MDSUM - Microarchitectural Data Sampling Uncacheable Memory
MDSUM is a special case of the other three, and isn't distinguished further.
These issues pertain to three microarchitectural buffers. The Load Ports, the
Store Buffers and the Fill Buffers. Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.
For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:
* Vulnerable hardware is divided into two categories - parts which suffer
from SP only, and parts with any other combination of vulnerabilities.
* SP only has an HT interaction when the thread goes idle, due to the static
partitioning of resources. LP and FB have HT interactions at all points,
due to the competitive sharing of resources. All issues potentially leak
data across the return-to-guest transition.
* The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
we don't need to do both on the HVM return-to-guest path. However, some
parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
vulnerable to MDS, so do require VERW on the HVM path.
Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.
This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 12 Dec 2018 19:22:15 +0000 (19:22 +0000)]
x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers
Three synthetic features are introduced, as we need individual control of
each, depending on circumstances. A later change will enable them at
appropriate points.
The verw_sel field doesn't strictly need to live in struct cpu_info. It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.
This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 12 Sep 2018 13:36:00 +0000 (14:36 +0100)]
x86/spec-ctrl: Misc non-functional cleanup
* Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
mitigations will shortly appear.
* Use alternative_input() and cover the lack of memory cobber with a further
barrier.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 14 May 2019 14:21:33 +0000 (16:21 +0200)]
x86/mm: subsume set_gpfn_from_mfn() into guest_physmap_add_page()
The two callers in common/memory.c currently call set_gpfn_from_mfn()
themselves, so moving the call into guest_physmap_add_page() helps
tidy their code.
The two callers in common/grant_table.c fail to make that call alongside
the one to guest_physmap_add_page(), so will actually get fixed by the
change.
Other (x86) callers are HVM only and are hence unaffected by a change
to the function's !paging_mode_translate() part.
Sadly this isn't enough yet to drop Arm's dummy macro, as there's one
more use in page_alloc.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 14 May 2019 14:20:06 +0000 (16:20 +0200)]
x86/mm: make guest_physmap_add_entry() HVM-only
Lift its !paging_mode_translate() part into guest_physmap_add_page()
(which is what common code calls), eliminating the dummy use of a
(HVM-only really) P2M type in the PV case.
Suggested-by: George Dunlap <George.Dunlap@eu.citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 14 May 2019 14:18:58 +0000 (16:18 +0200)]
x86/mm: short-circuit HVM-only mode flags when !HVM
#define-ing them to zero allows better code generation in this case,
and paves the way for more DCE, allowing to leave certain functions just
declared, but not defined.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Paul Durrant [Mon, 13 May 2019 15:50:46 +0000 (17:50 +0200)]
iommu: trivial re-organisation to avoid unnecessary test
An 'if ( !iommu_enabled )' followed by an 'if ( iommu_enabled )' with
only a printk() in between seems a little silly. Move the printk() and
use 'else' instead.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 13 May 2019 15:49:39 +0000 (17:49 +0200)]
memory: restrict XENMEM_remove_from_physmap to translated guests
The commit re-introducing it (14eb3b41d0 ["xen: reinstate previously
unused XENMEM_remove_from_physmap hypercall"]) as well as the one having
originally introduced it (d818f3cb7c ["hvm: Use main memory for video
memory"]) and the one then purging it again (78c3097e4f ["Remove unused
XENMEM_remove_from_physmap"]) make clear that this operation is intended
for use on HVM (i.e. translated) guests only. Restrict it at least as
much, because for PV guests documentation (in the public header) does
not even match the implementation: It talks about GPFN as input, but
get_page_from_gfn() assumes a GMFN in the non-translated case (and hands
back the value passed in).
Also lift the check in XENMEM_add_to_physmap{,_batch} handling up
directly into top level hypercall handling, and clarify things in the
public header accordingly.
Take the liberty and also replace a pointless use of "current" with a
more efficient use of an existing local variable (or function parameter
to be precise).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Mon, 13 May 2019 14:42:34 +0000 (16:42 +0200)]
x86/mm: free_page_type() is PV-only
While it already has a CONFIG_PV wrapped around its entire body, it is
still uselessly invoking mfn_to_gmfn(), which is about to be replaced.
Avoid morphing this code into even more suspicious shape and remove the
effectively dead code - translated mode has been made impossible for PV
quite some time ago.
Adjust and extend the assertions at the same time: The original
ASSERT(!shadow_mode_refcounts(owner)) really means
ASSERT(!shadow_mode_enabled(owner) || !paging_mode_refcounts(owner)),
which isn't what we want here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Mon, 13 May 2019 14:41:03 +0000 (16:41 +0200)]
x86/IRQ: avoid UB (or worse) in trace_irq_mask()
Dynamically allocated CPU mask objects may be smaller than cpumask_t, so
copying has to be restricted to the actual allocation size. This is
particulary important since the function doesn't bail early when tracing
is not active, so even production builds would be affected by potential
misbehavior here.
Take the opportunity and also
- use initializers instead of assignment + memset(),
- constify the cpumask_t input pointer,
- u32 -> uint32_t.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Olaf Hering [Mon, 25 Mar 2019 16:00:10 +0000 (17:00 +0100)]
install pkgconfig files into libdir
Most pkgconfig files contain a Libs: variable, which is either /usr/lib
or /usr/lib64. If a 32bit and a 64bit variant of xen libraries is
installed, the last one wins. As a result compiling for the other
bitsize will fail.
Instead of sharedir use libdir as install target. This matches both the
documentation and the expected result.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Mon, 8 Apr 2019 11:09:43 +0000 (12:09 +0100)]
docs/xl: Clarify documentation for mem-max and mem-set
mem-set is the primary command that users will need to use and
understand. Move it first, and clarify the wording; also specify that
you can't set the target higher than maxmem from the domain config.
mem-max is actually a pretty useless command at the moment. Clarify
that users are not expected to use it; and document all of its quirky
behavior.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: Lars Kurth <lars.kurth@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Thu, 2 May 2019 16:25:50 +0000 (17:25 +0100)]
tools/Makefile: Fix build of QEMU, remove --source-path
Following QEMU's commit 79d77bcd36 (configure: Remove --source-path
option), Xen's build system fails to build qemu-xen. The --source-path
option gives redundant information about the location of the sources
so simply remove it. (configure already looks at its $0 to find the
source-path.)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
python: Adjust xc_physinfo wrapper for updated virt_caps bits
Commit f089fddd94 "xen: report PV capability in sysctl and use it in
toolstack" changed meaning of virt_caps bit 1 - previously it was
"directio", but was changed to "pv" and "directio" was moved to bit 2.
Adjust python wrapper to use #defines for the bits values, and add
reporting of both "pv_directio" and "hvm_directio".
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
To the Makefile that generates the cpuid policy. Without this fix if
the tools python interpreter is different than the default 'python' it
won't be correctly propagated.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Tue, 27 Nov 2018 18:12:00 +0000 (18:12 +0000)]
docs: remove tmem related text
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Wed, 28 Nov 2018 12:13:15 +0000 (12:13 +0000)]
xen: remove tmem from hypervisor
This patch removes all tmem related code and CONFIG_TMEM from the
hypervisor. Also remove tmem hypercalls from the default XSM policy.
It is written as if tmem is disabled and tmem freeable pages is 0.
We will need to keep public/tmem.h around forever to avoid breaking
guests. Remove the hypervisor only part and put guest visible part
under a xen version check. Take the chance to remove trailing
whitespaces.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Tue, 27 Nov 2018 17:53:00 +0000 (17:53 +0000)]
tools: remove tmem code and commands
Remove all tmem related code in libxc.
Leave some stubs in libxl in case anyone has linked to those functions
before the removal.
Remove all tmem related commands in xl, all tmem related code in other
utilities we ship.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/arm: drivers: scif: Add support for SCIFA compatible UARTs
For the driver to be able to handle SCIFA interface as well,
this patch just adds the following:
- SCIFA related macros
- New element in "port_params" array to keep SCIFA specific things
- SCIFA compatible string
This patch makes possible to use existing driver for Renesas "Stout"
board based on R-Car H2 SoC (SCIFA).
xen/arm: drivers: scif: Extend driver to handle other interfaces
Extend driver to be able to handle other SCIF(X) compatible
interfaces as well. These interfaces have lot in common,
but mostly differ in offsets and bits for some registers.
For example, the main difference between SCIF and SCIFA interfaces
from "scif-uart" driver's point of view:
- Registers offset: serial status, receive/transmit FIFO data
registers have different offset
- Internal FIFO size: 64 bytes for SCIFA and 16 bytes for SCIF
- Overrun bit location: serial status register for SCIFA and
dedicated line status register for SCIF
Introduce "port_params" array to keep interface specific things.
The "data" field in struct dt_device_match is used for recognizing
what interface is present on a target board.
Please note, nothing has been technically changed for Renesas "Lager"
and other supported boards (SCIF).
Andrew Cooper [Wed, 24 Apr 2019 18:10:58 +0000 (19:10 +0100)]
xen/arm: Misc improvements to do_common_cpu_on()
* Use domain_vcpu() rather than opencoding the lookup. Amongst other things,
domain_vcpu() is spectre-v1-safe.
* Unlock the domain immediately after arch_set_info_guest() completes. There
is no need for free_vcpu_guest_context() to be within the critical region,
and moving the call simplifies the error case.
No practical change in functionality.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Julien Grall <julien.grall@arm.com>
Julien Grall [Wed, 27 Mar 2019 18:45:31 +0000 (18:45 +0000)]
xen/arm64: __cmpxchg and __cmpxchg_mb should always be inline
Currently __cmpxchg_mb and __cmpxchg are only marked inline. The
compiler is free to decide to not honor the inline. This will result to
generate code use __bad_cmpxchg and lead a link failure.
Julien Grall [Wed, 27 Mar 2019 18:45:28 +0000 (18:45 +0000)]
xen/arm: guest_walk: Avoid theoritical unitialized value in get_top_bit
Clang 8.0 throws an error in the get_top_bit function:
guest_walk.c:328:15: error: variable 'topbit' is used uninitialized
whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized]
else if ( is_64bit_domain(d) )
^~~~~~~~~~~~~~~~~~
This is happening because clang thinks that is_32bit_domain(d) is not
the exact inverse of is_64bit_domain(d). So it expects a else case to
handle the case where the latter call is false.
In other part of the code, dealing with difference between 32-bit and
64-bit domain, we usually use if ( is_XXbit_domain ) ... else ...
Julien Grall [Wed, 27 Mar 2019 18:45:25 +0000 (18:45 +0000)]
xen/arm64: sysreg: Implement the 32-bit helpers using the 64-bit helpers
Clang is pickier than GCC for the register size in asm statement. It
expects the register size to match the value size.
The instructions msr/mrs are expecting a 64-bit register. This means the
implementation of the 32-bit helpers is not correct. The easiest
solution is to implement the 32-bit helpers using the 64-bit helpers.
Julien Grall [Tue, 19 Mar 2019 23:27:53 +0000 (23:27 +0000)]
xen/arm64: head: Combine lsl and str instructions in a single one
We can optimize a bit the assembly code by combining the 2 instructions
in a single one. This likely not going to make the code faster, but
likely make easier to read the assembly.
xen/arm: Clarify usage of earlyprintk for Lager board
Current sentence is not entirely correct. Since SCIF0 interface is
applicable for Lager board, but is not applicable for all R-Car H2
based boards. For example, Stout board uses SCIFA0 interface.
Andrew Cooper [Wed, 27 Mar 2019 18:50:46 +0000 (18:50 +0000)]
x86/vvmx: Simplify per-CPU memory allocations
* Use XFREE() instead of opencoding it in nvmx_cpu_dead()
* Avoid redundant evaluations of per_cpu()
* Don't allocate vvmcs_buf at all if it isn't going to be used. It is never
touched on hardware lacking the VMCS Shadowing feature.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Eslam Elnikety [Fri, 3 May 2019 19:43:49 +0000 (19:43 +0000)]
sched/credit: avoid priority boost for capped domains when unpark
When unpausing a capped domain, the scheduler currently clears the
CSCHED_FLAG_VCPU_PARKED flag before vcpu_wake(). This, in turn, causes the
vcpu_wake to set CSCHED_PRI_TS_BOOST, resulting in an unfair credit boost. The
comment around the changed lines already states that clearing the flag should
happen AFTER the unpause. This bug was introduced in commit be650750945
"credit1: Use atomic bit operations for the flags structure".
Original patch author credit: Xi Xiong while at Amazon.
Andrew Cooper [Wed, 1 May 2019 17:14:03 +0000 (18:14 +0100)]
x86/boot: Annotate the Real Mode entry points
... because its already hard enough to follow. Cross reference the locations
in C which set the entrypoints up, and state the alignment requirements and
entry conditions.
Drop a redundant .align 16, and panic() in do_boot_cpu() if the AP trampoline
isn't set up properly rather than blindly continuing and letting the APs
execute junk, or shifting part of the address into unrelated fields in ICR.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 1 May 2019 17:14:03 +0000 (18:14 +0100)]
x86/boot: Fix latent memory corruption with early_boot_opts_t
c/s ebb26b509f "xen/x86: make VGA support selectable" added an #ifdef
CONFIG_VIDEO into the middle the backing space for early_boot_opts_t,
but didn't adjust the structure definition in cmdline.c
This only functions correctly because the affected fields are at the end
of the structure, and cmdline.c doesn't write to them in this case.
To retain the slimming effect of compiling out CONFIG_VIDEO, adjust
cmdline.c with enough #ifdef-ary to make C's idea of the structure match
the declaration in asm. This requires adding __maybe_unused annotations
to two helper functions.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
David Woodhouse [Sun, 28 Apr 2019 14:13:37 +0000 (17:13 +0300)]
x86/wakeup: Stop using %fs for lidt/lgdt
The wakeup code is now relocated alongside the trampoline code, so
as long as we move idt_48 and gdt_48 up a little bit so that they're
visible in the real-mode segment that the wakeup code runs in, using
%ds is just fine here.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 26 Apr 2019 10:19:07 +0000 (11:19 +0100)]
x86/cpu: Use cpu_has_sep for configuring the SYSENTER MSRs
Currently, configuration of the SYSENTER MSRs are behind a vendor check for
Intel and Centaur, but this misses Zhaoxin.
Use the feature bit, rather than a vendor check. cpu_has_sep is cleared early
for AMD processors, which can't use SYSENTER/SYSEXIT when operating in long
mode.
Suggested-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tamas K Lengyel [Thu, 25 Apr 2019 15:32:50 +0000 (09:32 -0600)]
x86/mem_sharing: aquire extra references for pages with correct domain
Patch 0502e0adae2 "x86: correct instances of PGC_allocated clearing" introduced
grabbing extra references for pages that drop references tied to PGC_allocated.
However, these pages are actually owned by dom_cow, resulting both sharing and
unsharing breaking.
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 23 Apr 2019 15:18:29 +0000 (16:18 +0100)]
xen/timers: Fix memory leak with cpu unplug/plug (take 2)
Previous attempts to fix this leak didn't identify the root cause, and
ultimately failed. The cause is actually the CPU_UP_PREPARE case
(re)initialising ts->heap back to dummy_heap, which leaks the previous
allocation.
Rearrange the logic to only initialise ts once. This also avoids the
redundant (but benign, due to ts->inactive always being empty) initialising of
the other ts fields.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 24 Apr 2019 17:53:15 +0000 (18:53 +0100)]
xen/domain: Block more speculative out-of-bound accesses
c/s f8303458 restricted speculative access for do_vcpu_op(), but neglected its
compat counterpart, which is reachable by guests using the 32bit ABI.
Make an identical adjustment.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Norbert Manthey <nmanthey@amazon.de> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 26 May 2016 16:37:30 +0000 (17:37 +0100)]
x86/shadow: Drop incorrect diagnostic when shadowing TSS.RSP0
During development of the XTF pagewalk tests, I reliably encountered this
message exactly once per run. It occurs when the first action to touch
TSS.RSP0 is an interrupt/exception taken in userspace, and the processor tries
to push the IRET frame.
Subsequently, OSSTest has demonstrated that it triggers frequently for a
KPTI-enabled kernel.
(XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646687f38, mfn=0x2415a1
[ 1411.949155] systemd-logind[2683]: New session 73 of user root.
(XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad264671ff38, mfn=0x240a41
(XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646837f38, mfn=0x2415c5
(XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad26468a7f38, mfn=0x2414e7
[ 1442.207473] systemd-logind[2683]: New session 74 of user root.
[ 1471.452206] systemd-logind[2683]: New session 75 of user root.
(XEN) multi.c:3324:d1v1 write to pagetable during event injection: cr2=0xffffad2646d17f08, mfn=0x2417c5
[ 1501.698971] systemd-logind[2683]: New session 76 of user root.
The actions performed by the shadow code are correct, and the guest continues
without error, but the emitted error is misleading. Tweak the comment to more
clearly identify why the condition exists, but drop the message.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Fri, 1 Feb 2019 14:48:48 +0000 (14:48 +0000)]
x86/svm: Fix handling of ICEBP intercepts
c/s 9338a37d "x86/svm: implement debug events" added support for introspecting
ICEBP debug exceptions, but didn't account for the fact that
svm_get_insn_len() (previously __get_instruction_length) can fail and may
already have raised #GP with the guest.
If svm_get_insn_len() fails, return back to guest context rather than
continuing and mistaking a trap-style VMExit for a fault-style one.
Spotted by Coverity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Brian Woods <brian.woods@amd.com>
xen/sched: we never get into context_switch() with prev==next
In schedule(), if we pick, as the next vcpu to run (next) the same one
that is running already (prev), we never get to call context_switch().
We can, therefore, get rid of all the `if`-s testing prev and next being
different, trading them with an ASSERT() (on ARM, the ASSERT() was even
already there!)
Suggested-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Andrew Cooper [Fri, 5 Apr 2019 12:26:30 +0000 (13:26 +0100)]
x86/boot: Detect the firmware SMT setting correctly on Intel hardware
While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware. As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.
Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware. Fall back to
using the ACPI table APIC IDs.
While adjusting this logic, fix a latent bug in amd_get_topology(). The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 5 Apr 2019 12:26:30 +0000 (12:26 +0000)]
x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT
This is a model specific register which details the current configuration
cores and threads in the package. Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.
It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties. Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 5 Apr 2019 15:58:44 +0000 (15:58 +0000)]
x86/boot: Don't leak the module_map allocation in __start_xen()
Ever since its introducion in c/s 436fb462 "x86/microcode: enable boot
time (pre-Dom0) loading", the allocation has gone un-freed, and has its final
use as part of constructing dom0.
Xen already consideres it an error to have more than a single unaccounted-for
module (again, logic from the same change), and will only pass the first one
to dom0 as the initrd.
Instead of having an 8 byte pointer to a bitmap which won't exceed 4 bits wide
in any production scenario (dom0 kernel, initrd, XSM blob and microcode blob),
allocate module_map[] on the stack and add a sanity bound for mbi->mods_count.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
The limit 1900x1200 do not match real world devices (1900 looks like a
typo, should be 1920). But in practice the limits are arbitrary and do
not serve any real purpose. As discussed in "Increase framebuffer size
to todays standards" thread, drop them completely.
This fixes graphic console on device with 3840x2160 native resolution.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
MAX_BPP, MAX_FONT_W, MAX_FONT_H are not used in the code at all.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
When bitmap_fill(..., 0) is called, do not try to write anything. Before
this patch, it tried to write almost LONG_MAX, surely overwriting
something.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 13 May 2019 07:58:57 +0000 (09:58 +0200)]
page-alloc: detect double free earlier
Right now this goes unnoticed until some subsequent page allocator
operation stumbles across the thus corrupted list. We can do better:
Only PGC_state_inuse and PGC_state_offlining pages can legitimately be
passed to free_heap_pages().
Take the opportunity and also restrict the PGC_broken check to the
PGC_state_offlining case, as only pages of that type or
PGC_state_offlined may have this flag set on them. Similarly, since
PGC_state_offlined is not a valid input state, the setting of "tainted"
can be restricted to just this case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Eslam Elnikety [Mon, 13 May 2019 07:58:08 +0000 (09:58 +0200)]
mm: option to _always_ scrub freed domheap pages
Give the administrator further control on when to scrub domheap pages by adding
an option to always scrub. This is a safety feature that, when enabled,
prevents a (buggy) domain from leaking secrets if it accidentally frees a page
without proper scrubbing.
Signed-off-by: Eslam Elnikety <elnikety@amazon.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Tamas K Lengyel [Mon, 13 May 2019 07:55:59 +0000 (09:55 +0200)]
x86/vmx: correctly gather gs_shadow value for current vCPU
Currently the gs_shadow value is only cached when the vCPU is being scheduled
out by Xen. Reporting this (usually) stale value through vm_event is incorrect,
since it doesn't represent the actual state of the vCPU at the time the event
was recorded. This prevents vm_event subscribers from correctly finding kernel
structures in the guest when it is trapped while in ring3.
Refresh shadow_gs value when the context being saved is for the current vCPU.
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Alexandru Isaila [Mon, 13 May 2019 07:55:24 +0000 (09:55 +0200)]
x86/altp2m: aggregate get entry and populate into common funcs
The code for getting the entry and then populating was repeated in
p2m_change_altp2m_gfn() and in p2m_set_altp2m_mem_access().
The code is now in one place with a bool param that lets the caller choose
if it populates after get_entry().
If remapping is being done then both the old and new gfn's should be
unshared in the hostp2m for keeping things consistent. The page type
of old_gfn was already checked whether it's p2m_ram_rw and bail if it
wasn't so functionality-wise this just simplifies things as a user
doesn't have to request unsharing manually before remapping.
Now, if the new_gfn is invalid it shouldn't query the hostp2m as
that is effectively a request to remove the entry from the altp2m.
But provided that scenario is used only when removing entries that
were previously remapped/copied to the altp2m, those entries already
went through P2M_ALLOC | P2M_UNSHARE before, so it won't have an
affect so the core function get_altp2m_entry() is calling
__get_gfn_type_access() with P2M_ALLOC | P2M_UNSHARE.
altp2m_get_entry_direct() is also called in p2m_set_suppress_ve()
because on a new altp2m view the function will fail with invalid mfn if
p2m->set_entry() was not called before.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Igor Druzhinin [Mon, 13 May 2019 07:54:45 +0000 (09:54 +0200)]
x86/mtrr: recalculate P2M type for domains with iocaps
This change reflects the logic in epte_get_entry_emt() and allows
changes in guest MTTRs to be reflected in EPT for domains having
direct access to certain hardware memory regions but without IOMMU
context assigned (e.g. XenGT).
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 13 May 2019 07:52:43 +0000 (09:52 +0200)]
AMD/IOMMU: disable previously enabled IOMMUs upon init failure
If any IOMMUs were successfully initialized before encountering failure,
the successfully enabled ones should be disabled again before cleaning
up their resources.
Move disable_iommu() next to enable_iommu() to avoid a forward
declaration, and take the opportunity to remove stray blank lines ahead
of both functions' final closing braces.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Brian Woods <brian.woods@amd.com>
Jan Beulich [Mon, 13 May 2019 07:51:23 +0000 (09:51 +0200)]
trace: fix build with gcc9
While I've not observed this myself, gcc 9 (imo validly) reportedly may
complain
trace.c: In function '__trace_hypercall':
trace.c:826:19: error: taking address of packed member of 'struct <anonymous>' may result in an unaligned pointer value [-Werror=address-of-packed-member]
826 | uint32_t *a = d.args;
and the fix is rather simple - remove the __packed attribute. Introduce
a BUILD_BUG_ON() as replacement, for the unlikely case that Xen might
get ported to an architecture where array alignment higher that that of
its elements.
Reported-by: Martin Liška <martin.liska@suse.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Ian Jackson [Mon, 15 Apr 2019 16:13:09 +0000 (17:13 +0100)]
build system: make install-stubdom depend on install-tools again
In d290e325179ccee966cd679d0fed48be6f4cc1b7
"build system: don't let install-stubdom depend on install-tools"
the dependency of install-stubdom on install-tools was removed.
However, this was not correct. stubdom/Makefile contains this:
With recursive make, it is necessary for the overall structure of the
makefiles to sequence things so that each directory is entered exactly
once, before its dependent directories are entered. (It is possible
to violate this rule without creating races but it is tricky and
inadvisable.)
Since d290e325179c, it can happen that the command for the
qemu-xen-traditional-dir-find rule is run twice simultaneously - once
as a result of $(MAKE) -C tools install, and once as a result of
$(MAKE) -C stubdom install. If you get unlucky, this causes lossage.
(This just happened to me in an osstest flight.)
In principle we could alternatively fix this by lifting the commands
in the qemu-xen-traditional-dir-find target (and perhaps other things
too) into the toplevel Makefile, as was done for mini-os.
But that seems overkill given how bad the stubdom build system is, and
the fact that we think at some point this qemu-trad will go away
entirely. Adding the tools dependency back to the stubdom build is
by and large good enough.
(Someone who really wants to build stubdom without tools is welcome to
do this separation if they really want to.)
CC: Juergen Gross <jgross@suse.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
* Fix the shim build by providing a !CONFIG_HVM declaration for
hvm_get_guest_bndcfgs(), and removing the introduced
ASSERT(is_hvm_domain(d))'s. They are needed for DCE to keep the build
working. Furthermore, in this way, the risk of runtime type confusion is
removed.
* Revert the de-const'ing of the vcpu pointer in vmx_get_guest_bndcfgs().
vmx_vmcs_enter() really does mutate the vcpu, and may cause it to undergo a
full de/reschedule, which is contrary to the programmers expectation of
hvm_get_guest_bndcfgs(). guest_rdmsr() was always going to need to lose
its const parameter, and this was the correct time for it to happen.
* The MSRs in vcpu_msrs are in numeric order. Re-position XSS to match.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 11 Apr 2019 10:45:41 +0000 (04:45 -0600)]
timers: move back migrate_timers_from_cpu() invocation
Commit 597fbb8be6 ("xen/timers: Fix memory leak with cpu unplug/plug")
went a little too far: Migrating timers away from a CPU being offlined
needs to heppen independent of whether it get parked or fully offlined.
Jan Beulich [Thu, 11 Apr 2019 08:25:05 +0000 (10:25 +0200)]
x86: fix build race when generating temporary object files
The rules to generate xen-syms and xen.efi may run in parallel, but both
recursively invoke $(MAKE) to build symbol/relocation table temporary
object files. These recursive builds would both re-generate the .*.d2
files (where needed). Both would in turn invoke the same rule, thus
allowing for a race on the .*.d2.tmp intermediate files.
The dependency files of the temporary .xen*.o files live in xen/ rather
than xen/arch/x86/ anyway, so won't be included no matter what. Take the
opportunity and delete them, as the just re-generated .xen*.S files will
trigger a proper re-build of the .xen*.o ones anyway.
Empty the DEPS variable in case the set of goals consists of just those
temporary object files, thus eliminating the race.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/mm: Clean up p2m_finish_type_change return value
In the case of any errors, finish_type_change() passes values returned
from p2m->recalc() up the stack (with some exceptions in the case where
an error is expected); this eventually ends up being returned to the
XEN_DOMOP_map_mem_type_to_ioreq_server hypercall.
However, on Intel processors (but not on AMD processor), p2m->recalc()
can also return '1' as well as '0'. This case is handled very
inconsistently: finish_type_change() will return the value of the final
entry it attempts, discarding results for other entries;
p2m_finish_type_change() will attempt to accumulate '1's, so that it
returns '1' if any of the calls to finish_type_change() returns '1'; and
dm_op() will again return '1' only if the very last call to
p2m_finish_type_change() returns '1'. The result is that the
XEN_DMOP_map_mem_type_to_ioreq_server() hypercall will sometimes return
0 and sometimes return 1 on success, in an unpredictable manner.
The hypercall documentation doesn't mention return values; but it's not
clear what the caller could do with the information about whether
entries had been changed or not. At the moment it's always 0 on AMD
boxes, and *usually* 1 on Intel boxes; so nothing can be relying on a
'1' return value for correctness (or if it is, it's broken).
Make the return value on success consistently '0' by only returning
0/-ERROR from finish_type_change(). Also remove the accumulation code
from p2m_finish_type_change().
Suggested-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 5 Apr 2019 14:59:27 +0000 (15:59 +0100)]
x86/hvm: Fix altp2m_op hypercall continuations
c/s 9383de210 "x86/altp2m: support for setting restrictions for an array of
pages" introduced this logic, but do_hvm_op() was already capable of handling
-ERESTART correctly.
More problematic however is a continuation from compat_altp2m_op(). The arg
written back into register state points into the hypercall XLAT area, not at
the original parameter passed by the guest. It may be truncated by the
vmentry, but definitely won't be correct on the next invocation.
Delete the hypercall_create_continuation() call, and return -ERESTART, which
will cause the compat case to start working correctly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 28 Mar 2019 14:37:00 +0000 (14:37 +0000)]
x86/smt: Support for enabling/disabling SMT at runtime
Currently, a user can in principle combine the output of `xl info -n`, the
APCI tables, and some manual CPUID data to figure out which CPU numbers to
feed into `xen-hptool cpu-offline` to effectively disable SMT at runtime.
A more convenient option is to teach Xen how to perform this action.
Extend XEN_SYSCTL_cpu_hotplug with two new operations. Introduce a new
smt_up_down_helper() which wraps the cpu_{up,down}_helper() helpers with logic
which understands siblings based on their APIC_ID.
Add libxc stubs, and extend xen-hptool with smt-{enable,disable} options.
These are intended to be shorthands for a loop over cpu-{online,offline}.
To simplify the implemention, they will strictly enable/disable secondary
siblings (those with a non-zero thread id). This functionality is intended
for use in production scenarios where debugging options such as `maxcpus=` or
other manual plug/unplug configuration has not been used.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>