Juergen Gross [Fri, 31 Aug 2018 15:22:04 +0000 (17:22 +0200)]
tools/libxl: correct vcpu affinity output with sparse physical cpu map
With not all physical cpus online (e.g. with smt=0) the output of hte
vcpu affinities is wrong, as the affinity bitmaps are capped after
nr_cpus bits, instead of using max_cpu_id.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
'xl sysrq' command doesn't work with modern Linux guests with the following
message in guest's log:
xen:manage: sysrq_handler: Error -13 writing sysrq in control/sysrq
xenstore trace confirms:
IN 0x24bd9a0 20180904 04:36:32 WRITE (control/sysrq )
OUT 0x24bd9a0 20180904 04:36:32 ERROR (EACCES )
The problem seems to be in the fact that we don't pre-create control/sysrq
xenstore node and libxl_send_sysrq() doing libxl__xs_printf() creates it as
read-only. As we want to allow guests to clean 'control/sysrq' after the
requested action is performed, we need to make this node writable.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
tools/xl: fix output of xl vcpu-pin dry run with smt=0
Fix another smt=0 fallout: xl -N vcpu-pin prints only parts of the
affinities as it is using the number of online cpus instead of the
maximum cpu number.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Tue, 4 Sep 2018 16:15:18 +0000 (17:15 +0100)]
x86: change name of parameter for various invlpg functions
They all incorrectly named a parameter virtual address while it should
have been linear address.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Mon, 3 Sep 2018 11:48:13 +0000 (12:48 +0100)]
xen/domain: Fold xsm_free_security_domain() paths together
xsm_free_security_domain() is idempotent (both the dummy handler, and the
flask handler). Move it into the shared __domain_destroy() path, and drop the
INIT_xsm flag from domain_create()
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 3 Sep 2018 11:10:48 +0000 (12:10 +0100)]
xen/domain: Call lock_profile_deregister_struct() from common code
lock_profile_register_struct() is called from common code, but the matching
deregister was previously only called from x86 code.
The practical upshot of this when using CONFIG_LOCK_PROFILE, destroyed domains
on ARM (and in particular, the freed page behind struct domain) remain on the
lockprofile linked list, which will become corrupt when the page is reused.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 3 Sep 2018 10:52:17 +0000 (11:52 +0100)]
xen/domain: Break _domain_destroy() out of domain_create() and complete_domain_destroy()
This is the first step in making the destroy path idempotent, and using it in
place of the ad-hoc cleanup paths in the create path.
To begin with, the trivial free operations are broken out. The rest of the
cleanup code will be moved as it is demonstrated (or made) to be idempotent.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 3 Sep 2018 13:22:16 +0000 (14:22 +0100)]
xen/domain: Prepare data for is_{pv,hvm}_domain() as early as possible
Given two subtle failures from getting this wrong before, and more cleanup on
the way, move the setting of d->guest_type as early as possible.
Note that despite moving the assignment of d->guest_type outside of the
is_idle_domain(d) check, it still behaves the same. Previously, system
domains had no direct assignment of d->guest_type and behaved as PV guests
because guest_type_pv has the value 0.
While tidying up the predicate, leave a comment referring to
is_system_domain(), and move the associated ASSERT() to be beside the
assignment.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Tue, 4 Sep 2018 09:30:29 +0000 (11:30 +0200)]
x86emul: clean up AVX2 insn use in test harness
Drop the pretty pointless conditionals from code testing AVX insns and
properly use AVX2 mnemonics in code testing AVX2 insns (the test harness
is already requiring sufficiently new a compiler/assembler).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 4 Sep 2018 09:29:22 +0000 (11:29 +0200)]
x86emul: extend MASKMOV{Q,DQU} tests
While deriving the first AVX512 pieces from existing code I've got the
(in the end wrong) impression that the emulation of these insns would be
broken. Besides testing that the instructions act as no-ops when the
controlling mask bits are all zero, add ones to also check that the data
merging actually works.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 4 Sep 2018 09:28:30 +0000 (11:28 +0200)]
x86emul: fix FMA scalar operand sizes
FMA insns, unlike the earlier AVX additions, don't use the low opcode
bit to distinguish between single and double vector elements. While the
difference is benign for packed flavors, the scalar ones need to use
VEX.W here. Oddly enough the table entries didn't even use
simd_scalar_fp, but uniformly used simd_packed_fp (implying the
distinction was by [VEX-encoded] opcode prefix).
Split simd_scalar_fp into simd_scalar_opc and simd_scalar_vexw, and
correct FMA scalar table entries to use the latter.
Also correct the scalar insn comments (they only ever use XMM registers
as operands).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Or else it defaults to using 0x100000 as the entry point, which might
or might not point to _start. This is a fix for 09b3907f93.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 1 Aug 2018 13:48:33 +0000 (13:48 +0000)]
x86/hvm: Fix mapping corner case during task switching
hvm_map_entry() can fail for a number of reasons, including for a misaligned
LDT/GDT access which crosses a 4K boundary. Architecturally speaking, this
should be fixed, but Long Mode doesn't support task switches, and no 32bit OS
is going to misalign its LDT/GDT base, which is why this task isn't very high
on the TODO list.
However, the hvm_map_fail error label returns failure without raising an
exception, which interferes with hvm_task_switch()'s exception tracking, and
can cause it to finish and return to guest context as if the task switch had
completed successfully.
Resolve this corner case by folding all the failure paths together, which
causes an hvm_map_entry() failure to result in #TS[SEL]. hvm_unmap_entry()
copes fine with a NULL pointer so can be called unconditionally.
In practice, this is just a latent corner case as all hvm_map_entry() failures
crash the domain, but it should be fixed nevertheless.
Finally, rename hvm_load_segment_selector() to task_switch_load_seg() to avoid
giving the impression that it is usable for general segment loading.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 24 Jan 2018 16:43:55 +0000 (16:43 +0000)]
x86/mm: Drop {HAP,SHADOW}_ERROR() wrappers
Unlike the PRINTK/DEBUG wrappers, these go straight out to the console, rather
than ending up in the debugtrace buffer.
A number of these users are followed by domain_crash(), and future changes
will want to combine the printk() into the domain_crash() call. Expand these
wrappers in place, using XENLOG_ERR before a BUG(), and XENLOG_G_ERR before a
domain_crash().
Perfom some %pv/PRI_mfn/etc cleanup while modifying the invocations, and
explicitly drop some calls which are unnecessary (bad shadow op, and the empty
stubs for incorrect sh_map_and_validate_gl?e() calls).
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
The hvmloader binary generated when using LLVM LD doesn't work
properly and seems to get stuck while trying to generate and load the
ACPI tables. This is caused by the layout of the binary when linked
with LLVM LD.
LLVM LD has a different default linker script that GNU LD, and the
resulting hvmloader binary is slightly different:
There's however the PHDR which is not present when using GNU LD.
Fix this by using a very simple linker script that generates the same
binary regardless of whether LLVM or GNU LD is used. By using a linker
script the usage of -Ttext can also be avoided by placing the desired
.text load address directly in the linker script.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 3 Sep 2018 15:51:40 +0000 (17:51 +0200)]
x86/boot: silence MADT table entry logging
Logging disabled LAPIC / x2APIC entries with invalid local APIC IDs
(ones having "broadcast" meaning when used) isn't very useful, and can
be quite noisy on larger systems. Suppress their logging unless
opt_cpu_info is true.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 3 Sep 2018 15:50:10 +0000 (17:50 +0200)]
x86: assorted array_index_nospec() insertions
Don't chance having Spectre v1 (including BCBS) gadgets. In some of the
cases the insertions are more of precautionary nature rather than there
provably being a gadget, but I think we should err on the safe (secure)
side here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
c/s 580c45869 "Call arch_domain_create() as early as possible in
domain_create()" overlooked the fact that ARM uses is_hardware_domain() in at
least two places during arch_domain_create().
when dom0 tries to use the vuart. Judging by other uses of
is_hardware_domain(), I expect the x86 PVH dom0 boot is similarly broken.
Reposition the code which sets up hardware_domain so that the
is_hardware_domain() predicate works correctly all the way through domain
creation.
While moving it, leave a related comment explaining the positioning of the
is_priv assignment, which in hindsight should have been part of c/s ef765ec98
when exactly the same problem was discovered for the is_control_domain()
predicate.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Tested-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Tue, 28 Aug 2018 16:00:36 +0000 (16:00 +0000)]
x86/hvm: Drop hvm_{vmx,svm} shorthands
By making {vmx,svm} in hvm_vcpu into an anonymous union (consistent with
domain side of things), the hvm_{vmx,svm} defines can be dropped, and all code
refer to the correctly-named fields. This means that the data hierachy is no
longer obscured from grep/cscope/tags/etc.
Reformat one comment and switch one bool_t to bool while making changes.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Tue, 28 Aug 2018 15:59:28 +0000 (15:59 +0000)]
x86/svm: Rename arch_svm_struct to svm_vcpu
The suffix and prefix are redundant, and the name is curiously odd. Rename it
to svm_vcpu to be consistent with all the other similar structures. In
addition, rename local arch_svm local variables to svm for further
consistency.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Tue, 28 Aug 2018 15:53:06 +0000 (15:53 +0000)]
x86/vmx: Rename arch_vmx_struct to vmx_vcpu
The suffix and prefix are redundant, and the name is curiously odd. Rename it
to vmx_vcpu to be consistent with all the other similar structures. In
addition, rename local arch_vmx local variables to vmx for further
consistency.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
--- CC: Roger Pau Monné <roger.pau@citrix.com>
Some of the local pointers are named arch_vmx. I'm open to renaming them to
just vmx (like all the other local pointers) if people are happy with the
additional patch delta.
Andrew Cooper [Tue, 28 Aug 2018 15:52:34 +0000 (15:52 +0000)]
x86/hvm: Rename v->arch.hvm_vcpu to v->arch.hvm
The trailing _vcpu suffix is redundant, but adds to code volume. Drop it.
Reflow lines as appropriate. No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Tue, 28 Aug 2018 15:50:41 +0000 (15:50 +0000)]
xen/hvm: Rename d->arch.hvm_domain to d->arch.hvm
The trailing _domain suffix is redundant, but adds to code volume. Drop it.
Reflow lines as appropriate, and switch to using the new XFREE/etc wrappers
where applicable.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Mon, 19 Mar 2018 17:07:50 +0000 (17:07 +0000)]
xen/domain: Allocate d->vcpu[] in domain_create()
For ARM, the call to arch_domain_create() needs to have completed before
domain_max_vcpus() will return the correct upper bound.
For each arch's dom0's, drop the temporary max_vcpus parameter, and allocation
of dom0->vcpu.
With d->max_vcpus now correctly configured before evtchn_init(), the poll mask
can be constructed suitably for the domain, rather than for the worst-case
setting.
Due to the evtchn_init() fixes, it no longer calls domain_max_vcpus(), and
ARM's two implementations of vgic_max_vcpus() no longer need work around the
out-of-order call.
From this point on, d->max_vcpus and d->vcpus[] are valid for any domain which
can be looked up by domid.
The XEN_DOMCTL_max_vcpus hypercall is modified to reject any call attempt with
max != d->max_vcpus, which does match the older semantics (not that it is
obvious from the code). The logic to allocate d->vcpu[] is dropped, but at
this point the hypercall still needs making to allocate each vcpu.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Mon, 19 Mar 2018 17:28:50 +0000 (17:28 +0000)]
xen/dom0: Arrange for dom0_cfg to contain the real max_vcpus value
Make dom0_max_vcpus() a common interface, and implement it on ARM by splitting
the existing alloc_dom0_vcpu0() function in half.
As domain_create() doesn't yet set up the vcpu array, the max value is also
passed into alloc_dom0_vcpu0(). This is temporary for bisectibility and
removed in the following patch.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 27 Feb 2018 17:39:37 +0000 (17:39 +0000)]
tools: Pass max_vcpus to XEN_DOMCTL_createdomain
XEN_DOMCTL_max_vcpus is a mandatory hypercall, but nothing actually prevents a
toolstack from unpausing a domain with no vcpus.
Originally, d->vcpus[] was an embedded array in struct domain, but c/s fb442e217 "x86_64: allow more vCPU-s per guest" in Xen 4.0 altered it to being
dynamically allocated. A side effect of this is that d->vcpu[] is NULL until
XEN_DOMCTL_max_vcpus has completed, but a lot of hypercalls blindly
dereference it.
Even today, the behaviour of XEN_DOMCTL_max_vcpus is a mandatory singleton
call which can't change the number of vcpus once a value has been chosen.
In preparation to remote the hypercall, extend xen_domctl_createdomain with
the a max_vcpus field and arrange for all callers to pass the appropriate
value. There is no change in construction behaviour yet, but later patches
will rearrange the hypervisor internals.
For the python stubs, extend the domain_create keyword list to take a
max_vcpus parameter, in lieu of deleting the pyxc_domain_max_vcpus function.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Mon, 19 Mar 2018 16:50:46 +0000 (16:50 +0000)]
xen/domain: Call arch_domain_create() as early as possible in domain_create()
This is in preparation to set up d->max_cpus and d->vcpu[] in domain_create(),
and allow later parts of domain construction to have access to the values.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 19 Mar 2018 16:06:24 +0000 (16:06 +0000)]
xen/gnttab: Fold grant_table_{create,set_limits}() into grant_table_init()
Now that the max_{grant,maptrack}_frames are specified from the very beginning
of grant table construction, the various initialisation functions can be
folded together and simplified as a result.
Leave grant_table_init() as the public interface, which is more consistent
with other subsystems.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 27 Feb 2018 17:39:37 +0000 (17:39 +0000)]
xen/domctl: Remove XEN_DOMCTL_set_gnttab_limits
Now that XEN_DOMCTL_createdomain handles the grant table limits, remove
XEN_DOMCTL_set_gnttab_limits (including XSM hooks and libxc wrappers).
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 19 Mar 2018 11:19:52 +0000 (11:19 +0000)]
xen/gnttab: Pass max_{grant,maptrack}_frames into grant_table_create()
... rather than setting the limits up after domain_create() has completed.
This removes the common gnttab infrastructure for calculating the number of
dom0 grant frames (as the common grant table code is not an appropriate place
for it to live), opting instead to require the dom0 construction code to pass
a sane value in via the configuration.
In practice, this now means that there is never a partially constructed grant
table for a reference-able domain.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Tue, 27 Feb 2018 17:39:37 +0000 (17:39 +0000)]
tools: Pass grant table limits to XEN_DOMCTL_set_gnttab_limits
XEN_DOMCTL_set_gnttab_limits is a fairly new hypercall, and is strictly
mandatory. As it pertains to domain limits, it should be provided at
createdomain time.
In preparation to remove the hypercall, extend xen_domctl_createdomain with
the fields and arrange for all callers to pass appropriate details. There is
no change in construction behaviour yet, but later patches will rearrange the
hypervisor internals, then delete the hypercall.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Tue, 3 Oct 2017 10:18:37 +0000 (11:18 +0100)]
x86/pv: Deprecate support for paging out the LDT
This code is believed to be vestigial remnant of the PV Windows XP port. It
is not used by Linux, NetBSD, Solaris or MiniOS. Furthermore the
implementation is incomplete; it only functions for a present => not-present
transition, rather than a present => read/write transition.
The for_each_vcpu() is one scalability limitation for PV guests, which can't
reasonably be altered to be continuable. Most importantly however, is that
this only codepath which plays with descriptor frames of a remote vcpu.
A side effect of dropping support for paging the LDT out is that the LDT no
longer automatically cleans itself up on domain destruction. Cover this by
explicitly releasing the LDT frames at the same time as the GDT frames.
Finally, leave some asserts around to confirm the expected behaviour of all
the functions playing with PGT_seg_desc_page references.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Zhenzhong Duan [Thu, 30 Aug 2018 09:05:01 +0000 (11:05 +0200)]
x86/grant: mute gcc 4.1.x warning in steal_linear_address()
Move reference of ol1e ahead or else we see below warning.
cc1: warnings being treated as errors
grant_table.c: In function 'replace_grant_pv_mapping':
grant_table.c:142: warning: 'ol1e.l1' may be used uninitialized in this function
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 30 Aug 2018 09:03:47 +0000 (11:03 +0200)]
x86/alternatives: allow using assembler macros in favor of C ones
As was validly pointed out as motivation for similar Linux side changes
(https://lkml.org/lkml/2018/6/22/677), using long sequences of
directives and auxiliary instructions, like is commonly the case when
setting up an alternative patch site, gcc can be mislead into believing
an asm() to be more heavy weight than it really is. By presenting it
with an assembler macro invocation instead, this can be avoided.
Initially I wanted to outright change the C macros ALTERNATIVE() and
ALTERNATIVE_2() to invoke the respective assembler ones, but doing so
would require quite a bit of cleanup of some use sites, because of the
exra necessary quoting combined with the need that each assembler macro
argument must consist of just a single string literal. We can consider
working towards that subsequently.
For now, set the stage of using the assembler macros here by providing a
new generated header, being the slightly massaged pre-processor output
of (for now just) alternative-asm.h. The massaging is primarily to be
able to properly track the build dependency: For this, we need the C
compiler to see the inclusion, which means we shouldn't directly use an
asm(". include ...") directive.
The dependency added to asm-offsets.s is not a true one; it's just the
easiest approach I could think of to make sure the new header gets
generated early on, without having to fiddle with xen/Makefile (and
introducing some x86-specific construct there).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 30 Aug 2018 09:02:09 +0000 (11:02 +0200)]
VMX: reduce number of posted-interrupt hooks
Three of the four hooks are not exposed outside of vmx.c, and all of
them have only a single possible non-NULL value. So there's no reason to
use hooks here - a simple set of flag indicators is sufficient (and we
don't even need a flag for the VM entry one, as it's always
(de-)activated together the the vCPU blocking hook, which needs to
remain an actual function pointer). This is the more that with the
Spectre v2 workarounds indirect calls have become more expensive.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 30 Aug 2018 09:01:02 +0000 (11:01 +0200)]
x86/mm: re-arrange get_page_from_l<N>e() vs pv_l1tf_check_l<N>e()
Restore symmetry between get_page_from_l<N>e(): pv_l1tf_check_l<N>e() is
now uniformly invoked from outside of them. They're no longer getting
called for non-present PTEs. This way the slightly odd three-way return
value meaning of the higher level ones can also be got rid of.
Leave an assertion in get_page_from_l1e() as the only non-static one of
the four siblings, to ensure that no new unguarded calls go unnoticed.
Introduce local variables holding the page table entries processed, and
use them throughout the loop bodies instead of re-reading them from the
page table several times.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Add missing "CONFIG_". This build regression was introduced by commit 277aa3523d "arm: make it possible to disable the SMMU driver".
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
[julieng: Add the commit where the regression was introduced] Acked-by: Julien Grall <julien.grall@arm.com>
Jan Beulich [Wed, 29 Aug 2018 14:32:17 +0000 (16:32 +0200)]
x86: reduce "visibility" of spec_ctrl_asm.h
Other than indirect_thunk_asm.h, spec_ctrl_asm.h is a header generally
needed by assembly source files only. Avoid having all C sources have a
dependency on that header (the set of assembly sources now gaining a
dependency on the C header is much smaller and hence more acceptable).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 29 Aug 2018 14:31:32 +0000 (16:31 +0200)]
x86: move quoting of __ASM_{STAC,CLAC}
Both consumers want them quoted, so quote them right away instead of
using __stringify() upon use. In the spirit of other recent additions
also make the assembly forms assembler macros, allowing the helper
#define-s to be #undef-ed subsequently.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
As of commit 4008c71d7a ("x86/alt: Support for automatic padding
calculations") there's no point having explict ASM_NOPn instances in
alternatives anymore - drop them. As a result also drop the asm/nops.h
inclusion from alternative.h, adding explicit inclusions in the two
remaining C files needing them.
While touching it also move the CR4_PV32_RESTORE definition out of the
SMAP-specific conditional into a more general one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 29 Aug 2018 14:28:52 +0000 (16:28 +0200)]
x86/spec-ctrl: split reporting for PV and HVM guests
Putting them on separate lines was suggested before, and is going to
become necessary eventually anyway as things get added here. Split them
now, and put the respective pieces in CONFIG_* conditionals at the same
time.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Wed, 29 Aug 2018 10:55:32 +0000 (11:55 +0100)]
x86/alt: Fix build when CONFIG_LIVEPATCH is disabled
c/s b28cd21c3628 "x86/build: Use new .nops directive when available"
introduced a __read_mostly boolean which is included if the toolchain supports
the .nops directive.
When CONFIG_LIVEPATCH is compiled out, alternative.o is expected to be a fully
init module, and toolchain_nops_are_ideal trips the build system check:
Error: size of alternative.o:.data.read_mostly is 0x01
/local/xen.git/xen/Rules.mk:206: recipe for target 'alternative.init.o' failed
make[3]: *** [alternative.init.o] Error 12
Introduce init_or_livepatch_read_mostly and switch the annotation for
toolchain_nops_are_ideal.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Andrew Cooper [Fri, 9 Feb 2018 12:47:58 +0000 (12:47 +0000)]
x86/build: Use new .nops directive when available
Newer versions of binutils are capable of emitting an exact number bytes worth
of optimised nops, which are P6 nops. Use this in preference to .skip when
available.
Check at boot time whether the toolchain nops are the correct for the running
hardware, andskip optimising nops entirely when possible.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 20 Jul 2018 17:50:28 +0000 (17:50 +0000)]
x86/shadow: Use mfn_t in shadow_track_dirty_vram()
... as the only user of sl1mfn would prefer it that way.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Fri, 20 Jul 2018 14:28:20 +0000 (15:28 +0100)]
x86/shadow: Clean up the MMIO fastpath helpers
Use bool when appropriate, remove extraneous brackets and fix up comment
style.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Fri, 20 Jul 2018 14:21:51 +0000 (15:21 +0100)]
x86/shadow: Use MASK_* helpers for the MMIO fastpath PTE manipulation
Drop the now-unused SH_L1E_MMIO_GFN_SHIFT definition.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Fri, 20 Jul 2018 16:57:24 +0000 (16:57 +0000)]
x86/shadow: Use more appropriate conversion functions
Replace pfn_to_paddr(mfn_x(...)) with mfn_to_maddr(), and replace an opencoded
gfn_to_gaddr().
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Andrew Cooper [Fri, 1 Jun 2018 11:56:09 +0000 (12:56 +0100)]
x86/mm: Use mfn_eq()/mfn_add() rather than opencoded variations
Use l1e_get_mfn() in place of l1e_get_pfn() when applicable, and fix up style
on affected lines.
For sh_remove_shadow_via_pointer(), map_domain_page() is guaranteed to succeed
so there is no need to ASSERT() its success. This allows the pointer
arithmetic to folded into the previous expression, and for vaddr to be
properly typed as l1_pgentry_t, avoiding the cast in l1e_get_mfn().
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Zhenzhong Duan [Tue, 28 Aug 2018 15:13:42 +0000 (17:13 +0200)]
x86/mmcfg/drhd: Move acpi_mmcfg_init() call before calling acpi_parse_dmar()
pci_conf_read8() needs pci mmcfg mapping to work on multiple pci
segments system such as HPE Superdome-Flex.
Move acpi_mmcfg_init() call in acpi_boot_init() before calling
acpi_parse_dmar() so that when pci_conf_read8() is called in
acpi_parse_dev_scope(), we already have the mapping set up.
mmio_ro_ranges initialization is also moved ahead as it's the only
dependency of pci_mmcfg_arch_enable() need to be moved. Also
checked codes between the old and new call sites to ensure we
don't break anything.
Furthermore MMCFG will continue to not work this early (or
more precisely not at all until Dom0 boot has progressed far
enough) if the range(s) isn't/aren't marked reserved in E820.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Tested-by: Gopalasetty, Manoj <manoj.gopalasetty@hpe.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 28 Aug 2018 15:12:05 +0000 (17:12 +0200)]
VMX: make vmx_read_guest_msr() cope with callers not checking its return value
It took till the 4.5 backports of the L1TF prereqs that gcc 8.2 finally
noticed that the vPMU callers, not checking the function's return value,
may consume uninitialized data. Guard against this by storing zero on
the error path.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Wei Liu [Tue, 28 Aug 2018 14:19:55 +0000 (15:19 +0100)]
xenforeignmemory: fix fd leakage in error path
b49ef5d3 (xenforeignmemory: work around bug in older privcmd) added an
error path but forgot to close fd there.
Spotted by Coverity.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Wei Liu [Tue, 28 Aug 2018 13:56:38 +0000 (14:56 +0100)]
rombios: remove packed attribute for pushad_regs_t
The structure already has explicitly padding.
Removing the attribute silences a clang 6 warning:
tcgbios.c:1519:34: error: taking address of packed member 'u' of class or structure 'pushad_regs_t' may result in an unaligned pointer value [-Werror,-Waddress-of-packed-member]
®s->u.r32.edx);
^~~~~~~~~~~~~~~
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Sun, 26 Aug 2018 12:19:35 +0000 (13:19 +0100)]
xen: is_hvm_{domain,vcpu} should evaluate to false when !CONFIG_HVM
Turn them into static inline functions which evaluate to false when
CONFIG_HVM is not set. ARM won't be broken because ARM guests are set
to PV type in the hypervisor.
But ARM has plan to switch to HVM guest type inside the hypervisor, so
preemptively introduce CONFIG_HVM for ARM here.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Tue, 26 Jun 2018 09:59:10 +0000 (10:59 +0100)]
xen/xsm: Rename CONFIG_XSM_POLICY to CONFIG_XSM_FLASK_POLICY
The embedded policy is specifically a flask policy, so update the
infrastructure to reflect this.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Andrew Cooper [Tue, 26 Jun 2018 09:56:50 +0000 (10:56 +0100)]
xen/xsm: Rename CONFIG_FLASK_* to CONFIG_XSM_FLASK_*
Flask is one single XSM module, and another is about to be introduced.
Properly namespace the symbols for clarity.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Andrew Cooper [Tue, 27 Feb 2018 17:22:40 +0000 (17:22 +0000)]
x86/svm: Fixes to OS Visible Workaround handling
OSVW data is technically per-cpu, but it is the firmwares reponsibility to
make it equivelent on each cpu. A guests OSVW data is sourced from global
data in Xen, clearly making it per-domain data rather than per-vcpu data.
Move the data from struct arch_svm_struct to struct svm_domain, and call
svm_guest_osvw_init() from svm_domain_initialise() instead of
svm_vcpu_initialise().
In svm_guest_osvw_init(), reading osvw_length and osvw_status must be done
under the osvw_lock to avoid observing mismatched values. The guests view of
osvw_length also needs clipping at 64 as we only offer one status register (To
date, 5 is the maximum index defined AFAICT). Avoid opencoding max().
Drop svm_handle_osvw() as it is shorter and simpler to implement the
functionality inline in svm_msr_{read,write}_intercept(). As the OSVW MSRs
are a contiguous block, we can access them as an array for simplicity.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Wei Liu [Sun, 26 Aug 2018 12:19:38 +0000 (13:19 +0100)]
x86: provide stub for memory_type_changed
Jan indicated that for PV guests the memory type is not changed, for
HVM guests memory_type_changed is needed for EPT's effective memory
type calculation. This means memory_type_changed is HVM only.
Provide a stub to minimise code churn.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Sun, 26 Aug 2018 12:19:37 +0000 (13:19 +0100)]
x86/hvm: provide hvm_hap_supported
And replace direct accesses in non-HVM subsystems to
hvm_funcs.hap_supported with the new function, to avoid accessing an
internal data structure of another subsystem directly.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Sun, 26 Aug 2018 12:19:36 +0000 (13:19 +0100)]
x86: enclose hvm_op and dm_op in CONFIG_HVM in relevant tables
PV guest (Dom0) needs to able to use these two hypercalls in order to
serve HVM guests. But if xen doesn't support HVM at all there is no
point in exposing them to PV guests.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Doug Goldstein [Mon, 27 Aug 2018 09:37:01 +0000 (11:37 +0200)]
build: remove tboot make targets
The tboot targets are woefully out of date. These should really be
retired because setting up tboot is more complex than the build process
for it.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Christopher Clark <christopher.clark6@baesystems.com>
Paul Durrant [Mon, 27 Aug 2018 09:30:18 +0000 (11:30 +0200)]
x86/hvm: remove default ioreq server
My recent patch [1] to qemu-xen-traditional removes the last use of the
'default' ioreq server in Xen. (This is a catch-all ioreq server that is
used if no explicitly registered I/O range is targetted).
This patch can be applied once that patch is committed, to remove the
(>100 lines of) redundant code in Xen.
NOTE: The removal of the special case for HVM_PARAM_DM_DOMAIN in
hvm_allow_set_param() is not directly related to removal of
default ioreq servers. It could have been cleaned up at any time
after commit 9a422c03 "x86/hvm: stop passing explicit domid to
hvm_create_ioreq_server()". It is now added to the new
deprecated sets introduced by this patch.
Wei Liu [Mon, 13 Aug 2018 14:02:32 +0000 (15:02 +0100)]
x86/nestedhvm: provide some stubs for p2m code
Make two functions static inline so that they can be referenced in p2m
code. Check nestedhvm is enabled before calling
nestedhvm_vmcx_flushtlb (which also has a side effect of not issuing
unnecessary IPIs for non-nested case).
While moving, reformat code and use proper boolean.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Fri, 10 Aug 2018 17:08:00 +0000 (18:08 +0100)]
x86: guard HAS_VPCI with CONFIG_HVM
VPCI is only useful for PVH / HVM guests. Ideally CONFIG_HVM should
imply !PV_SHIM_EXCLUSIVE, but we still want to build PV_SHIM_EXCLUSIVE
with CONFIG_HVM at this stage because a lot of things are still
entangled.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Fri, 24 Aug 2018 12:16:26 +0000 (13:16 +0100)]
xenforeignmemory: work around bug in older privcmd
Versions of linux privcmd prior to commit dc9eab6fd94d ("return -ENOTTY
for unimplemented IOCTLs") will return -EINVAL rather than the conventional
-ENOTTY for unimplemented codes. This breaks the error path in
libxenforeignmemory resource mapping, which only translates ENOTTY into
EOPNOTSUPP to inform callers of the need to use an alternative (legacy)
mechanism.
This patch adds a new 'unimplemented' [1] ioctl code into the local
privcmd header which is then used to probe for the appropriate errno to
translate in the resource mapping error path
[1] this is a code that has, so far, never been used in any version of
privcmd and will be added to future versions of the header in the
linux source, to make sure it stays unimplemented.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/arm: p2m: Limit call to mem access code use in get_page_from_gva
Mem access has only an impact on the hardware translation between a
guest virtual address and the machine physical address. So it is not
necessary to fallback to memaccess for all the other case (e.g when it
is not possible to acquire the page behind the MFN).
xen/arm: p2m: Reduce the locking section in get_page_from_gva
The p2m lock is only necessary to prevent gvirt_to_maddr failing when
break-before-make sequence is used in the P2M update concurrently on
another pCPU. So reduce the locking section.
xen/arm: cpregs: Allow HSR_CPREG* to receive more than 1 parameter
At the moment, HSR_CPREG is expected to receive only the co-processor
register name in parameter. Because the name is actually a define, this
may have been expanded by a previous macro.
Rather than imposing the use of _HSR_CPREG* in such cases, allow
HSR_CPREG to receive more than 1 parameter.
xen/arm: move a few DT related defines to public/device_tree_defs.h
Move a few constants defined by libxl_arm.c to
xen/include/public/device_tree_defs.h, so that they can be used from Xen
and libxl. Prepend GUEST_ to avoid conflicts.
Move the DT_IRQ_TYPE* definitions from libxl_arm.c to
public/device_tree_defs.h. Use them in Xen where appropriate.
Re-define the existing Xen internal IRQ_TYPEs as DT_IRQ_TYPEs: they
already happen to be the same, let make it clear.
xen/arm: do not pass dt_host to make_memory_node and make_hypervisor_node
In order to make make_memory_node and make_hypervisor_node more
reusable, do not pass them dt_host. As they only use it to calculate
addrcells and sizecells, pass addrcells and sizecells directly.