Paul Durrant [Fri, 24 Aug 2018 12:16:26 +0000 (13:16 +0100)]
xenforeignmemory: work around bug in older privcmd
Versions of linux privcmd prior to commit dc9eab6fd94d ("return -ENOTTY
for unimplemented IOCTLs") will return -EINVAL rather than the conventional
-ENOTTY for unimplemented codes. This breaks the error path in
libxenforeignmemory resource mapping, which only translates ENOTTY into
EOPNOTSUPP to inform callers of the need to use an alternative (legacy)
mechanism.
This patch adds a new 'unimplemented' [1] ioctl code into the local
privcmd header which is then used to probe for the appropriate errno to
translate in the resource mapping error path
[1] this is a code that has, so far, never been used in any version of
privcmd and will be added to future versions of the header in the
linux source, to make sure it stays unimplemented.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/arm: p2m: Limit call to mem access code use in get_page_from_gva
Mem access has only an impact on the hardware translation between a
guest virtual address and the machine physical address. So it is not
necessary to fallback to memaccess for all the other case (e.g when it
is not possible to acquire the page behind the MFN).
xen/arm: p2m: Reduce the locking section in get_page_from_gva
The p2m lock is only necessary to prevent gvirt_to_maddr failing when
break-before-make sequence is used in the P2M update concurrently on
another pCPU. So reduce the locking section.
xen/arm: cpregs: Allow HSR_CPREG* to receive more than 1 parameter
At the moment, HSR_CPREG is expected to receive only the co-processor
register name in parameter. Because the name is actually a define, this
may have been expanded by a previous macro.
Rather than imposing the use of _HSR_CPREG* in such cases, allow
HSR_CPREG to receive more than 1 parameter.
xen/arm: move a few DT related defines to public/device_tree_defs.h
Move a few constants defined by libxl_arm.c to
xen/include/public/device_tree_defs.h, so that they can be used from Xen
and libxl. Prepend GUEST_ to avoid conflicts.
Move the DT_IRQ_TYPE* definitions from libxl_arm.c to
public/device_tree_defs.h. Use them in Xen where appropriate.
Re-define the existing Xen internal IRQ_TYPEs as DT_IRQ_TYPEs: they
already happen to be the same, let make it clear.
xen/arm: do not pass dt_host to make_memory_node and make_hypervisor_node
In order to make make_memory_node and make_hypervisor_node more
reusable, do not pass them dt_host. As they only use it to calculate
addrcells and sizecells, pass addrcells and sizecells directly.
Wei Liu [Fri, 17 Aug 2018 15:12:41 +0000 (16:12 +0100)]
x86/oprofile: put SVM only code under CONFIG_HVM
The code snippet in question is to detect NMI held by SVM until STGI
is called. When Xen doesn't even support HVM guests there is no need
to check svm_stgi_label.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Fri, 17 Aug 2018 15:12:32 +0000 (16:12 +0100)]
x86/pt: add HVM check to XEN_DOMCTL_unbind_pt_irq
Its counterpart is HVM only. Add the check to help dead code
elimination to figure out the call to pt_irq_destroy_bind is not
needed when HVM is not enabled.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Fri, 17 Aug 2018 15:12:22 +0000 (16:12 +0100)]
x86/mm: don't reference hvm_funcs directly
It is generally not a good idea to reference the internal data
structure of the another subsystem directly. Introduce a wrapper
function for the invlpg hook.
No functional change.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Fri, 25 May 2018 15:18:45 +0000 (16:18 +0100)]
libxl_qmp: Simplify qmp_response_type() prototype
Remove the libxl__qmp_handler* argument so the function can be reused
later in a different context.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Fri, 25 May 2018 14:07:14 +0000 (15:07 +0100)]
libxl_json: libxl__json_object_to_json
Allow to generate a JSON string from a libxl__json_object,
useful for debugging.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Thu, 31 May 2018 10:50:03 +0000 (11:50 +0100)]
libxl_json: Enable yajl_allow_trailing_garbage
This allows to parse a string that is not NUL-terminated. With that
option disabled, YAJL v2 would look ahead on completion to find out if
there is more to parse.
YAJL v1 doesn't have this behavior.
Any function that allocates a yajl_handle via this function either parse
a NUL-terminated string, or do provide proper length. So change the
default and allow garbage (like a different JSON document) after the end
of the data to parse.
This is important for the QMP client, as there could be more than one
message to parse, and YAJL would consider the next message to be garbage
and throw an error.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Anthony PERARD [Tue, 24 Jul 2018 11:31:58 +0000 (12:31 +0100)]
libxl: Add libxl__prepare_sockaddr_un() helper
There is going to be a few more users that want to use UNIX socket, this
helper is to prepare the `struct sockaddr_un` and check that the path
isn't too long.
Also start to use it in libxl_qmp.c.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
tools: fix uninstall: tests/x86_emulator, Linux hotplug
Fixing top-level "make uninstall":
tools/tests/x86_emulator is missing an uninstall target, which causes
failure. Trivial to add one since it installs nothing, so do that.
Linux hotplug uninstall returns success but doesn't actually remove what
it installed. The Makefile variables are obfuscating incorrect logic, so
strip them out and match existing code for xen-watchdog which does work.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Add support for Linux grant device driver extension which allows
converting existing dma-buf's into an array of grant references
and vise versa. This is only implemented for Linux as other OSes
have no Linux dma-buf support.
Bump gnttab library minor version to 3.
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Tim Deegan <tim@xen.org> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Thu, 26 Jul 2018 14:58:54 +0000 (15:58 +0100)]
xenpmd: make 32 bit gcc 8.1 non-debug build work
32 bit gcc 8.1 non-debug build yields:
xenpmd.c:354:23: error: '%02x' directive output may be truncated writing between 2 and 8 bytes into a region of size 3 [-Werror=format-truncation=]
snprintf(val, 3, "%02x",
^~~~
xenpmd.c:354:22: note: directive argument in the range [40, 2147483778]
snprintf(val, 3, "%02x",
^~~~~~
xenpmd.c:354:5: note: 'snprintf' output between 3 and 9 bytes into a destination of size 3
snprintf(val, 3, "%02x",
^~~~~~~~~~~~~~~~~~~~~~~~
(unsigned int)(9*4 +
~~~~~~~~~~~~~~~~~~~~
strlen(info->model_number) +
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
strlen(info->serial_number) +
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
strlen(info->battery_type) +
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
strlen(info->oem_info) + 4));
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All info->* used in calculation are 32 bytes long, and the parsing
code makes sure they are null-terminated, so the end result of the
expression won't exceed 255, which should be able to be fit into 3
bytes in hexadecimal format.
Add an assertion to make gcc happy.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Add zero-padding to #defined ACPI table strings that are copied.
Provides sufficient characters to satisfy the length required to
fully populate the destination and prevent array-bounds warnings.
Add BUILD_BUG_ON sizeof checks for compile-time length checking.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Fri, 17 Aug 2018 11:54:40 +0000 (13:54 +0200)]
rangeset: make inquiry functions tolerate NULL inputs
Rather than special casing the ->iomem_caps check in x86's
get_page_from_l1e() for the dom_xen case, let's be more tolerant in
general, along the lines of rangeset_is_empty(): A never allocated
rangeset can't possibly contain or overlap any range.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Fri, 17 Aug 2018 11:52:20 +0000 (13:52 +0200)]
x86/HVM: correct an inverted check in hvm_load()
Clearly we want to put a vCPU to sleep if it is _not_ already down.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 17 Aug 2018 11:51:27 +0000 (13:51 +0200)]
x86: make arch_set_info_guest() match comments in load_segments()
For both fs_base and gs_base_user, there are comments saying "This can
only be non-zero if selector is NULL." While save_segments() ensures
this, so far arch_set_info_guest() didn't. Make behavior consistent
(attaching comments identical to those in save_segments()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 16 Aug 2018 15:26:22 +0000 (16:26 +0100)]
x86/setup: Avoid OoB E820 lookup when calculating the L1TF safe address
A number of corner cases (most obviously, no-real-mode and no Multiboot memory
map) can end up with e820_raw.nr_map being 0, at which point the L1TF
calculation will underflow.
Spotted by Coverity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Commit "tools: Rework xc_domain_create() to take a full
xen_domctl_createdomain" failed to replace one further instance of
xc_config in libxl__arch_domain_save_config().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Thu, 16 Aug 2018 07:27:30 +0000 (09:27 +0200)]
x86/hvm/emulate: make sure rep I/O emulation does not cross GFN boundaries
When emulating a rep I/O operation it is possible that the ioreq will
describe a single operation that spans multiple GFNs. This is fine as long
as all those GFNs fall within an MMIO region covered by a single device
model, but unfortunately the higher levels of the emulation code do not
guarantee that. This is something that should almost certainly be fixed,
but in the meantime this patch makes sure that MMIO is truncated at GFN
boundaries and hence the appropriate device model is re-evaluated for each
target GFN.
NOTE: This patch does not deal with the case of a single MMIO operation
spanning a GFN boundary. That is more complex to deal with and is
deferred to a subsequent patch.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Convert calculations to be 32-bit only.
Andrew Cooper [Fri, 16 Mar 2018 18:27:24 +0000 (18:27 +0000)]
xen/evtchn: Pass max_evtchn_port into evtchn_init()
... rather than setting it up once domain_create() has completed. This
involves constructing a default value for dom0.
No practical change in functionality.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Tue, 27 Feb 2018 17:39:37 +0000 (17:39 +0000)]
xen/domctl: Merge set_max_evtchn into createdomain
set_max_evtchn is somewhat weird. It was introduced with the event_fifo work,
but has never been used. Still, it is a bounding on resources consumed by the
event channel infrastructure, and should be part of createdomain, rather than
editable after the fact.
Drop XEN_DOMCTL_set_max_evtchn completely (including XSM hooks and libxc
wrappers), and retain the functionality in XEN_DOMCTL_createdomain.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Fri, 9 Mar 2018 14:38:35 +0000 (14:38 +0000)]
tools: Rework xc_domain_create() to take a full xen_domctl_createdomain
In future patches, the structure will be extended with further information,
and this is far cleaner than adding extra parameters.
The python stubs are the only user which passes NULL for the existing config
option (which is actually the arch substructure). Therefore, the #ifdefary
moves to compensate.
For libxl, pass the full config object down into
libxl__arch_domain_{prepare,save}_config(), as there are in practice arch
specific settings in the common part of the structure (flags s3_integrity and
oos_off specifically).
No practical change in behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 10 Aug 2018 17:05:24 +0000 (18:05 +0100)]
x86/ptwr: Misc cleanup to ptwr_emulated_update()
All but one user wants mfn as mfn_t, so switch its type. offset is only ever
used when multipled by 8, so fold that into its initial calculation. Fold all
the pointer arithmic on pl1e together, to avoid needless casts.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Wed, 15 Aug 2018 12:14:06 +0000 (14:14 +0200)]
x86/hvm/ioreq: MMIO range checking completely ignores direction flag
hvm_select_ioreq_server() is used to route an ioreq to the appropriate
ioreq server. For MMIO this is done by comparing the range of the ioreq
to the ranges registered by the device models of each ioreq server.
Unfortunately the calculation of the range if the ioreq completely ignores
the direction flag and thus may calculate the wrong range for comparison.
Thus the ioreq may either be routed to the wrong server or erroneously
terminated by null_ops.
NOTE: The patch also fixes whitespace in the switch statement to make it
style compliant.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Tue, 7 Aug 2018 14:35:34 +0000 (15:35 +0100)]
xl.conf: Add global affinity masks
XSA-273 involves one hyperthread being able to use Spectre-like
techniques to "spy" on another thread. The details are somewhat
complicated, but the upshot is that after all Xen-based mitigations
have been applied:
* PV guests cannot spy on sibling threads
* HVM guests can spy on sibling threads
(NB that for purposes of this vulnerability, PVH and HVM guests are
identical. Whenever this comment refers to 'HVM', this includes PVH.)
There are many possible mitigations to this, including disabling
hyperthreading entirely. But another solution would be:
* Specify some cores as PV-only, others as PV or HVM
* Allow HVM guests to only run on thread 0 of the "HVM-or-PV" cores
* Allow PV guests to run on the above cores, as well as any thread of the PV-only cores.
For example, suppose you had 16 threads across 8 cores (0-7). You
could specify 0-3 as PV-only, and 4-7 as HVM-or-PV. Then you'd set
the affinity of the HVM guests as follows (binary representation):
In order to make this easy, this patches introduces three "global affinity
masks", placed in xl.conf:
vm.cpumask
vm.hvm.cpumask
vm.pv.cpumask
These are parsed just like the 'cpus' and 'cpus_soft' options in the
per-domain xl configuration files. The resulting mask is AND-ed with
whatever mask results at the end of the xl configuration file.
`vm.cpumask` would be applied to all guest types, `vm.hvm.cpumask`
would be applied to HVM and PVH guest types, and `vm.pv.cpumask`
would be applied to PV guest types.
The idea would be that to implement the above mask across all your
VMs, you'd simply add the following two lines to the configuration
file:
Jan Beulich [Mon, 13 Aug 2018 11:07:23 +0000 (05:07 -0600)]
x86: Make "spec-ctrl=no" a global disable of all mitigations
In order to have a simple and easy to remember means to suppress all the
more or less recent workarounds for hardware vulnerabilities, force
settings not controlled by "spec-ctrl=" also to their original defaults,
unless they've been forced to specific values already by earlier command
line options.
This is part of XSA-273.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 29 May 2018 17:44:16 +0000 (18:44 +0100)]
x86/spec-ctrl: Introduce an option to control L1D_FLUSH for HVM HAP guests
This mitigation requires up-to-date microcode, and is enabled by default on
affected hardware if available, and is used for HVM guests
The default for SMT/Hyperthreading is far more complicated to reason about,
not least because we don't know if the user is going to want to run any HVM
guests to begin with. If a explicit default isn't given, nag the user to
perform a risk assessment and choose an explicit default, and leave other
configuration to the toolstack.
This is part of XSA-273 / CVE-2018-3620.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 13 Apr 2018 15:34:01 +0000 (15:34 +0000)]
x86/msr: Virtualise MSR_FLUSH_CMD for guests
Guests (outside of the nested virt case, which isn't supported yet) don't need
L1D_FLUSH for their L1TF mitigations, but offering/emulating MSR_FLUSH_CMD is
easy and doesn't pose an issue for Xen.
The MSR is offered to HVM guests only. PV guests attempting to use it would
trap for emulation, and the L1D cache would fill long before the return to
guest context. As such, PV guests can't make any use of the L1D_FLUSH
functionality.
This is part of XSA-273 / CVE-2018-3646.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/pv: Force a guest into shadow mode when it writes an L1TF-vulnerable PTE
See the comment in shadow.h for an explanation of L1TF and the safety
consideration of the PTEs.
In the case that CONFIG_SHADOW_PAGING isn't compiled in, crash the domain
instead. This allows well-behaved PV guests to function, while preventing
L1TF from being exploited. (Note: PV guest kernels which haven't been updated
with L1TF mitigations will likely be crashed as soon as they try paging a
piece of userspace out to disk.)
This is part of XSA-273 / CVE-2018-3620.
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 23 Jul 2018 06:11:40 +0000 (08:11 +0200)]
x86/mm: Plumbing to allow any PTE update to fail with -ERESTART
Switching to shadow mode is performed in tasklet context. To facilitate this,
we schedule the tasklet, then create a hypercall continuation to allow the
switch to take place.
As a consequence, the x86 mm code needs to cope with an L1e operation being
continuable. do_mmu{,ext}_op() may no longer assert that a continuation
doesn't happen on the final iteration.
To handle the arguments correctly on continuation, compat_update_va_mapping*()
may no longer call into their non-compat counterparts. Move the compat
functions into mm.c rather than exporting __do_update_va_mapping() and
{get,put}_pg_owner(), and fix an unsigned long/int inconsistency with
compat_update_va_mapping_otherdomain().
This is part of XSA-273 / CVE-2018-3620.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/shadow: Infrastructure to force a PV guest into shadow mode
To mitigate L1TF, we cannot alter an architecturally-legitimate PTE a PV guest
chooses to write, but we can force the PV domain into shadow mode so Xen
controls the PTEs which are reachable by the CPU pagewalk.
Introduce new shadow mode, PG_SH_forced, and a tasklet to perform the
transition. Later patches will introduce the logic to enable this mode at the
appropriate time.
To simplify vcpu cleanup, make tasklet_kill() idempotent with respect to
tasklet_init(), which involves adding a helper to check for an uninitialised
list head.
This is part of XSA-273 / CVE-2018-3620.
Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 23 Jul 2018 13:46:10 +0000 (13:46 +0000)]
x86/spec-ctrl: Introduce an option to control L1TF mitigation for PV guests
Shadowing a PV guest is only available when shadow paging is compiled in.
When shadow paging isn't available, guests can be crashed instead as
mitigation from Xen's point of view.
Ideally, dom0 would also be potentially-shadowed-by-default, but dom0 has
never been shadowed before, and there are some stability issues under
investigation.
This is part of XSA-273 / CVE-2018-3620.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 25 Jul 2018 12:10:19 +0000 (12:10 +0000)]
x86/spec-ctrl: Calculate safe PTE addresses for L1TF mitigations
Safe PTE addresses for L1TF mitigations are ones which are within the L1D
address width (may be wider than reported in CPUID), and above the highest
cacheable RAM/NVDIMM/BAR/etc.
All logic here is best-effort heuristics, which should in practice be fine for
most hardware. Future work will see about disentangling the SRAT handling
further, as well as having L0 pass this information down to lower levels when
virtualised.
This is part of XSA-273 / CVE-2018-3620.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com>
Christian Lindig [Mon, 13 Aug 2018 16:26:56 +0000 (17:26 +0100)]
tools/oxenstored: Make evaluation order explicit
In Store.path_write(), Path.apply_modify() updates the node_created
reference and both the value of apply_modify() and node_created are
returned by path_write().
At least with OCaml 4.06.1 this leads to the value of node_created being
returned *before* it is updated by apply_modify(). This in turn leads
to the quota for a domain not being updated in Store.write(). Hence, a
guest can create an unlimited number of entries in xenstore.
The fix is to make evaluation order explicit.
This is XSA-272.
Signed-off-by: Christian Lindig <christian.lindig@citrix.com> Reviewed-by: Rob Hoes <rob.hoes@citrix.com>
Andrew Cooper [Mon, 13 Aug 2018 16:26:21 +0000 (17:26 +0100)]
x86/vtx: Fix the checking for unknown/invalid MSR_DEBUGCTL bits
The VPMU_MODE_OFF early-exit in vpmu_do_wrmsr() introduced by c/s 11fe998e56 bypasses all reserved bit checking in the general case. As a
result, a guest can enable BTS when it shouldn't be permitted to, and
lock up the entire host.
With vPMU active (not a security supported configuration, but useful for
debugging), the reserved bit checking in broken, caused by the original
BTS changeset 1a8aa75ed.
From a correctness standpoint, it is not possible to have two different
pieces of code responsible for different parts of value checking, if
there isn't an accumulation of bits which have been checked. A
practical upshot of this is that a guest can set any value it
wishes (usually resulting in a vmentry failure for bad guest state).
Therefore, fix this by implementing all the reserved bit checking in the
main MSR_DEBUGCTL block, and removing all handling of DEBUGCTL from the
vPMU MSR logic.
This is XSA-269.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 9 Aug 2018 16:22:17 +0000 (17:22 +0100)]
x86/spec-ctrl: Yet more fixes for xpti= parsing
As it currently stands, 'xpti=dom0' is indistinguishable from the default
value, which means it will be overridden by ARCH_CAPABILITIES_RDCL_NO on fixed
hardware.
Switch opt_xpti to use -1 as a default like all our other related options, and
clobber it as soon as we have a string to parse.
In addition, 'xpti' alone should be interpreted in its positive boolean form,
rather than resulting in a parse error.
(XEN) parameter "xpti" has invalid value "", rc=-22!
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Thu, 9 Aug 2018 09:59:41 +0000 (10:59 +0100)]
tools/libxenctrl: use new xenforeignmemory API to seed grant table
A previous patch added support for priv-mapping guest resources directly
(rather than having to foreign-map, which requires P2M modification for
HVM guests).
This patch makes use of the new API to seed the guest grant table unless
the underlying infrastructure (i.e. privcmd) doesn't support it, in which
case the old scheme is used.
NOTE: The call to xc_dom_gnttab_hvm_seed() in hvm_build_set_params() was
actually unnecessary, as the grant table has already been seeded
by a prior call to xc_dom_gnttab_init() made by libxl__build_dom().
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Thu, 9 Aug 2018 09:59:40 +0000 (10:59 +0100)]
common: add a new mappable resource type: XENMEM_resource_grant_table
This patch allows grant table frames to be mapped using the
XENMEM_acquire_resource memory op.
NOTE: This patch expands the on-stack mfn_list array in acquire_resource()
but it is still small enough to remain on-stack.
NOTE: This patch also removes a bogus comment above the
grant_to_status_frames() function.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
[Rebase over "Explicitly default to gnttab v1 during domain creation"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 8 Aug 2018 14:54:30 +0000 (15:54 +0100)]
common/gnttab: Explicitly default to gnttab v1 during domain creation
For reasons which appear to be exclusively down to poor review of the grant
table v2 code, a grant table's version field was wasn't initialised during
creation.
A number of problems (including XSAs) have occurred in the past trying trying
to use a grant table which hasn't been properly set up, and various areas of
the code cope with v0 by defaulting to v1.
In particular, the toolstack using GNTTABOP_setup_table to be able to fill in
the store/console grants has a side effect of switching to v1.
In hindsight however, this "fixup if we see 0" is a very poor, with a
substantial degree of risk. Explicitly default to grant table v1 during
domain create, and let the rest of the code work safely in the knowledge that
the version is sensibly set.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 6 Aug 2018 09:11:00 +0000 (09:11 +0000)]
x86/vlapic: Bugfixes and improvements to vlapic_{read,write}()
Firstly, there is no 'offset' boundary check on the non-32-bit write path
before the call to vlapic_read_aligned(), which allows an attacker to read
beyond the end of vlapic->regs->data[], which is only 1024 bytes long.
However, as the backing memory is a domheap page, and misaligned accesses get
chunked down to single bytes across page boundaries, I can't spot any
XSA-worthy problems which occur from the overrun.
On real hardware, bad accesses don't instantly crash the machine. Their
behaviour is undefined, but the domain_crash() prohibits sensible testing.
Behave more like other x86 MMIO and terminate bad accesses with appropriate
defaults.
While making these changes, clean up and simplify the the smaller-access
handling. In particular, avoid pointer based mechansims for 1/2-byte reads so
as to avoid forcing the value to be spilled to the stack.
add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-175 (-175)
function old new delta
vlapic_read 211 142 -69
vlapic_write 304 198 -106
Finally, there are a plethora of read/write functions in the vlapic namespace,
so rename these to vlapic_mmio_{read,write}() to make their purpose more
clear.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 3 Aug 2018 15:40:31 +0000 (17:40 +0200)]
drop {,acpi_}reserve_bootmem()
Both are entirely unused (to be fair, reserve_bootmem() has a use inside
an "#if 0" section in x86's mpparse.c, but if we were to re-enable that
code, it would need doing differently anyway).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
This initializer is flawed and only sets .name of array entry 0
to a non-NULL string.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Fri, 3 Aug 2018 14:46:49 +0000 (09:46 -0500)]
automation: ensure created are not owned as root
By default the container runs as the root user and since the source tree
is bind mounted into the container, any file is created and owned by the
root user which harms ergonomics when working outside of the container
environment. This maps the root user within the container to the uid of
the user outside of the container so files are not owned by root.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Doug Goldstein [Fri, 3 Aug 2018 14:46:47 +0000 (09:46 -0500)]
automation: drop container name from containerize
This was something that existed for some scripting support for a totally
unrelated project and when I copied this script I failed to remove it so
this removes it. Build containers for Xen are best as ephemeral
environments and should just utilizes Docker's default container naming
behavior.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
xen: specify support for EXPERT and DEBUG Kconfig options
Add a clear statement about them, reflecting the current security
support status of Kconfig options (no changes to current policies).
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Jan Beulich <jbeulich@suse.com> CC: George.Dunlap@eu.citrix.com CC: Ian.Jackson@eu.citrix.com CC: jbeulich@suse.com CC: andrew.cooper3@citrix.com CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Tim Deegan <tim@xen.org> CC: Wei Liu <wei.liu2@citrix.com>
---
Changes in v7:
- talk about EXPERT and DEBUG rather than CONFIG_EXPERT and CONFIG_DEBUG
Add a Xen build target to count the lines of code of the source files
built. Uses `cloc' to do the job.
With Xen on ARM taking off in embedded, IoT, and automotive, we are
seeing more and more uses of Xen in constrained environments. Users and
system integrators want the smallest Xen and Dom0 configurations. Some
of these deployments require certifications, where you definitely want
the smallest lines of code count. I provided this patch to give us the
lines of code count for that purpose.
Use the .o.d files to account for all the built source files. Generate a
list for the `cloc' utility and invoke `cloc'.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Jan Beulich <jbeulich@suse.com> CC: jbeulich@suse.com CC: andrew.cooper3@citrix.com
---
Changes in v4:
- use grep regex to get multiple source files from .d files
Changes in v3:
- remove build as dependecy for the cloc target
Changes in v2:
- change implementation to use .o.d to find built source files
Add specific per-platform defaults for NR_CPUS. Note that the order of
the defaults matter: they need to go first, otherwise the generic
defaults will be applied.
This is done so that Xen builds customized for a specific hardware
platform can have the right NR_CPUS number.
Add a "Platform Support" choice with four kconfig options: QEMU, RCAR3,
MPSOC and ALL_PLAT. They enable the required options for their hardware
platform. ALL_PLAT enables all available platforms and it's the default.
It doesn't automatically select any of the related drivers, otherwise
they cannot be disabled. ALL_PLAT is implemented by using hidden options
with default values depending on ALL_PLAT.
In the case of the MPSOC that has a platform file under
arch/arm/platforms/, build the file if MPSOC.
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrii Anisov <andrii_anisov@epam.com> CC: artem_mygaiev@epam.com CC: volodymyr_babchuk@epam.com
---
Changes in v8:
- remove QEMU_PLATFORM and RCAR3_PLATFORM that are currently unused
- remove selects from ALL
- rename ALL to ALL_PLAT
- introduce ALL64_PLAT and ALL32_PLAT
Changes in v5:
- turn platform support into a choice
- add ALL
Changes in v4:
- fix GICv3/GICV3
- default y to all options
- build xilinx-zynqmp if MPSOC