Add a "xen-dmabuf" device node for every shared region, compatible
"xen,dmabuf". Each of these nodes refers to the corresponding
reserved-memory node using a phandle.
These device nodes can be used to bind drivers that export the region to
userspace, or do other operations based on the reserved memory region.
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---
Changes in v7:
- new patch
xen/arm: export shared memory regions as reserved-memory on device tree
Shared memory regions need to be advertised to the guest. Fortunately, a
device tree binding for special memory regions already exist:
reserved-memory.
Add a reserved-memory node for each shared memory region, for both
masters and slaves.
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---
Changes in v7:
- change node name to xen-shmem
- add compatible property
- add id property
Zhongze Liu [Fri, 3 Aug 2018 20:21:07 +0000 (13:21 -0700)]
libxl:xl: add parsing code to parse "libxl_static_sshm" from xl config files
Author: Zhongze Liu <blackskygg@gmail.com>
Add the parsing utils for the newly introduced libxl_static_sshm struct
to the libxl/libxlu_* family. And add realated parsing code in xl to
parse the struct from xl config files. This is for the proposal "Allow
setting up shared memory areas between VMs from xl config file" (see [1]).
Signed-off-by: Zhongze Liu <blackskygg@gmail.com> Signed-off-by: Stefano Stabellini <stefanos@xilinx.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Julien Grall <julien.grall@arm.com> Cc: xen-devel@lists.xen.org
---
Changes in v5:
- remove alignment checks, they were moved to libxl
Zhongze Liu [Fri, 3 Aug 2018 20:21:07 +0000 (13:21 -0700)]
libxl: support unmapping static shared memory areas during domain destruction
Author: Zhongze Liu <blackskygg@gmail.com>
Add libxl__sshm_del to unmap static shared memory areas mapped by
libxl__sshm_add during domain creation. The unmapping process is:
* For a master: decrease the refcount of the sshm region, if the refcount
reaches 0, cleanup the whole sshm path.
* For a slave:
1) unmap the shared pages, and cleanup related xs entries. If the
system works normally, all the shared pages will be unmapped, so there
won't be page leaks. In case of errors, the unmapping process will go
on and unmap all the other pages that can be unmapped, so the other
pages won't be leaked, either.
2) Decrease the refcount of the sshm region, if the refcount reaches
0, cleanup the whole sshm path.
This is for the proposal "Allow setting up shared memory areas between VMs
from xl config file" (see [1]).
Zhongze Liu [Fri, 3 Aug 2018 20:21:07 +0000 (13:21 -0700)]
libxl: support mapping static shared memory areas during domain creation
Author: Zhongze Liu <blackskygg@gmail.com>
Add libxl__sshm_add to map shared pages from one DomU to another, The mapping
process involves the following steps:
* Set defaults and check for further errors in the static_shm configs:
overlapping areas, invalid ranges, duplicated master domain,
not page aligned, no master domain etc.
* Use xc_domain_add_to_physmap_batch to map the shared pages to slaves
* When some of the pages can't be successfully mapped, roll back any
successfully mapped pages so that the system stays in a consistent state.
* Write information about static shared memory areas into the appropriate
xenstore paths and set the refcount of the shared region accordingly.
Temporarily mark this as unsupported on x86 because calling p2m_add_foreign on
two domU's is currently not allowd on x86 (see the comments in
x86/mm/p2m.c:p2m_add_foreign for more details).
This is for the proposal "Allow setting up shared memory areas between VMs
from xl config file" (see [1]).
Zhongze Liu [Fri, 3 Aug 2018 20:21:07 +0000 (13:21 -0700)]
libxl: introduce a new structure to represent static shared memory regions
Author: Zhongze Liu <blackskygg@gmail.com>
Add a new structure to the IDL family to represent static shared memory regions
as proposed in the proposal "Allow setting up shared memory areas between VMs
from xl config file" (see [1]).
Zhongze Liu [Fri, 3 Aug 2018 20:21:06 +0000 (13:21 -0700)]
xen: xsm: flask: introduce XENMAPSPACE_gmfn_share for memory sharing
Author: Zhongze Liu <blackskygg@gmail.com>
The existing XENMAPSPACE_gmfn_foreign subop of XENMEM_add_to_physmap forbids
a Dom0 to map memory pages from one DomU to another, which restricts some useful
yet not dangerous use cases -- such as sharing pages among DomU's so that they
can do shm-based communication.
This patch introduces XENMAPSPACE_gmfn_share to address this inconvenience,
which is mostly the same as XENMAPSPACE_gmfn_foreign but has its own xsm check.
Specifically, the patch:
* Introduces a new av permission MMU__SHARE_MEM to denote if two domains can
share memory by using the new subop;
* Introduces xsm_map_gmfn_share() to check if (current) has proper permission
over (t) AND MMU__SHARE_MEM is allowed between (d) and (t);
* Modify the default xen.te to allow MMU__SHARE_MEM for normal domains that
allow grant mapping/event channels.
The new subop is marked unsupported for x86 because calling p2m_add_foregin
on two DomU's is currently not supported on x86.
This is for the proposal "Allow setting up shared memory areas between VMs
from xl config file" (see [1]).
Signed-off-by: Zhongze Liu <blackskygg@gmail.com> Signed-off-by: Stefano Stabellini <stefanos@xilinx.com> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Julien Grall <julien.grall@arm.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Tim Deegan <tim@xen.org> Cc: xen-devel@lists.xen.org
---
Changes in v7:
- add additional checks
- update comments to reflect that
Changes in v5:
- fix coding style
- remove useless x86 hypervisor message for the unimplemented op
- code style
- improve/add comments
George Dunlap [Tue, 31 Jul 2018 14:17:21 +0000 (15:17 +0100)]
hvm/altp2m: Clarify the proper way to extend the altp2m interface
The altp2m functionality was originally envisioned to be used in
several different configurations, one of which was a single in-guest
agent that had full operational control of altp2m. This required the
single hypercall to be an HVMOP rather than a DOMCTL, since HVM guests
are not allowed to make DOMCTLs. Access to this HVMOP is controlled
by a per-domain HVM_PARAM, and defaults to 'off'.
Exposing the altp2m functionality to the guest was controversial at
the time, but was ultimately accepted. The fact that altp2m is an
HVMOP rather than a DOMCTL has caused some problems, however, for
those moving forward trying to extend the interface: Extending the
interface even for the 'external' use case now means extending an
HVMOP, which implicitly extends the surface of attack for the
'internal' use case as well. The result has been that every addition
to this interface has also been controversial.
Settle the controversy once and for all by documenting 1) the purpose
of the altp2m interface, and 2) how to extend it. In particular:
* Specify that the fully in-guest agent is a target use case
* Specify that all extensions to altp2m functionality should be subops
of the HVMOP hypercall
* Specify that new subops should be enabled in ALTP2M_mixed mode by
default, but that this mode has not been evaluated for safety.
Hopefully this will allow the altp2m interface to be developed further
without unnecessary controversy.
Further discussion:
As far as I can tell there are three possible solutions to this
controversy.
A. Remove the 'internal' functionality as a target by converting the
current HVMOP into a DOMCTL.
B. Have two hypercalls -- an HVMOP which contains functionality
expected to be used by the 'internal' agent, and a DOMCTL for
functionality which is expected to be used only be the 'external'
agent.
C. Agree to add all new subops to the current hypercall (HVMOP), even
if we're not sure if they should be exposed to the guest.
I think A is a terrible idea. Having a single in-guest agent is a
reasonable design choice, and apparently it was even implemented at
some point; we should make it straightforward for someone in the
future to pick up the work if they want to.
I think B is also a bad idea. The people extending it at the moment
are primarily concerned with the 'external' use case. There is nobody
around to represent whether new functionality should end up in the
HVMOP or the DOMCTL, which means that by default it will end up in the
DOMCTL. If it is discovered, afterwards, that the new operations
*would* be safe and useful for the 'internal' use case, then we will
either have to duplicate them inside the HVMOP (which would be
terrible) or move the operation from the DOMCTL to the HVMOP (which
would make coding an agent against several versions a mess).
It just makes more sense to have all the altp2m operations in a single
place, and a simple way to control whether they're available to the
'internal' use case or not. As such, I am proposing 'C'.
Even within that, we have several options as far as what to do with
the current interface:
C1: Audit the current subops and make a blacklist of subops not
suitable for exposure to the guest. Future subops should be on the
blacklist unless they have been evaluated as safe for exposure.
C2: Don't blacklist the current subops, but require that all future
subops be blacklisted unless they have been evaluated as safe for
exposure.
C3: Don't blacklist current or future subops for the present; just
document that they need to be evaluated (and some potentially
blacklisted) before being exposed to a guest in a safety-critical
environment.
C1 would be ideal, but there's nobody at present to do the work.
Given that, C3 has been seen as the best solution in discussion.
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com>
When compiling this file with gcc, the compiler happily accepts the
sequence of a label followed by an attribute. However, this sequence does
not follow the gcc documentation. Hence, other compilers might stumble
upon this statement.
To be able to compile Xen with goto-cc (the compiler of the CPROVER tool
suite), the missing semicolon is added in this commit.
Reported-by: Elizabeth Polgreen <polgreen@amazon.de> Signed-off-by: Norbert Manthey <nmanthey@amazon.de> Reviewed-by: Jan Beulich <jbeulich@suse.com>
So that an ELF binary with support for EFI services will be built when
the compiler supports the MS ABI, regardless of the linker support for
PE.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
x86/efi: move the logic to detect PE build support
So that it can be used by other components apart from the efi specific
code. By moving the detection code creating a dummy efi/disabled file
can be avoided.
This is required so that the conditional used to define the efi symbol
in the linker script can be removed and instead the definition of the
efi symbol can be guarded using the preprocessor.
The motivation behind this change is to be able to build Xen using lld
(the LLVM linker), that at least on version 6.0.0 doesn't work
properly with a DEFINED being used in a conditional expression:
ld -melf_x86_64_fbsd -T xen.lds -N prelink.o --build-id=sha1 \
/root/src/xen/xen/common/symbols-dummy.o -o /root/src/xen/xen/.xen-syms.0
ld: error: xen.lds:233: symbol not found: efi
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Daniel Kiper <daniel.kiper@oracle.com>
And replace the open-coded versions already in tree. No functional
change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reivewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Anthony PERARD [Fri, 27 Jul 2018 14:05:47 +0000 (15:05 +0100)]
libxl_qmp: Move the buffer realloc to the same scope level as read
In qmp_next(), the inner loop should only try to parse messages from
QMP, if there is more than one.
The handling of the receive buffer ('incomplete'), should be done at the
same scope level as read(). It doesn't need to be handle more that once
after a read.
Before this patch, when on message what handled, the inner loop would
restart by adding the 'buffer' into 'incomplete' (after reallocation).
Since 'rd' was not reset, the buffer would be strcat a second time.
After that, the stream from the QMP server would have syntax error, and
the parsor would throw errors.
This is unlikely to happen as the receive buffer is very large. And
receiving two messages in a row is unlikely. In the current case, this
could be an event and a response to a command.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
tools/helpers: don't hardcode domain type for dom0 and xenstore domain
Today when setting up a minimal domain configuration file for dom0 and
eventually xenstore-domain the domain type is harcoded as PV. Change
that by asking the hypervisor for the correct type.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
New field backend_type is added to vkb device in order to have QEMU and user
space backend simultaneously. Each vkb backend shall read appropriate XS entry
and service only own frontends. Id is a string field which used by the backend
to indentify the frontend.
Signed-off-by: Oleksandr Grytsov <oleksandr_grytsov@epam.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
x86/pvh: change the order of the iommu initialization for Dom0
The iommu initialization will also create MMIO mappings in the Dom0
p2m, so the paging memory pool needs to be allocated or else iommu
initialization will fail.
Move the call to init the iommu after the Dom0 p2m has been setup in
order to solve this.
Note that issues caused by this wrong ordering have only been seen
when using shadow paging.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
idle_vcpu[0] is still poisoned with INVALID_VCPU, so write_ptbase faults
dereferencing the pointer. This fault calls panic and recurses through
the same code path.
If tboot_shutdown is called while idle_vcpu[0] == INVALID_VCPU, then we
are still operating with the initial page tables. Therefore changing
page tables with write_ptbase is unnecessary.
An easy way to reproduce this is to use tboot to launch an XSM-enabled
Xen without an XSM policy.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/vhpet: add support for level triggered interrupts
Level triggered interrupts are not an optional feature of HPET, and
must be implemented in order to comply with the HPET specification.
Implement them by adding a callback to the timer which sets the
interrupt bit in the general interrupt status register. Further
interrupts (in case of periodic mode) will not be injected until the
bit is cleared.
In order to reset the interrupts when the status bit is clear Xen must
also detect accesses to such register.
While there convert tn and i in hpet_write to unsigned.
Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Level trigger interrupts will be asserted regardless of whether the
interrupt is masked, and thus the callback will also be executed.
Add a new 'level' parameter to create_periodic_time in order to create
level triggered timers. None of the current users of vpt are switched
to use level triggered interrupts yet.
Note that periodic level triggered interrupts are not supported. This
is because level triggered interrupts always require a deassert of the
IO-APIC pin, which should be done by the caller of vpt at which point
the caller should also reset the timer if required.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 23 Jul 2018 13:29:27 +0000 (14:29 +0100)]
x86/spec-ctrl: Fix the parsing of xpti= on fixed Intel hardware
The calls to xpti_init_default() in parse_xpti() are buggy. The CPUID data
hasn't been fetched that early, and boot_cpu_has(X86_FEATURE_ARCH_CAPS) will
always evaluate false.
As a result, the default case won't disable XPTI on Intel hardware which
advertises ARCH_CAPABILITIES_RDCL_NO.
Simplify parse_xpti() to solely the setting of opt_xpti according to the
passed string, and have init_speculation_mitigations() call
xpti_init_default() if appropiate. Drop the force parameter, and pass caps
instead, to avoid redundant re-reading of MSR_ARCH_CAPS.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 20 Jul 2018 15:43:49 +0000 (15:43 +0000)]
x86/svm: Drop the suggestion of Long Mode Segment Limit support
Because of a bug in 2010, LMSL support isn't available to guests.
c/s f2c608444 noticed but avoided fixing the issue for migration reasons. In
addition to migration problems, changes to the segmentation logic for
emulation would be needed before the feature could be enabled.
This feature is entirely unused by operating systems (probably owing to its
semantics which only cover half the segment registers), and no one has
commented on its absence from Xen. As supporting it would involve a large
amount of effort, it seems better to remove the code entirely.
If someone finds a valid usecase, we can resurrecting the code and
implementing the remaining parts, but I doubt anyone will.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Fri, 20 Jul 2018 15:42:04 +0000 (15:42 +0000)]
x86/hvm: Disallow unknown MSR_EFER bits
It turns out that nothing ever prevented HVM guests from trying to set unknown
EFER bits. Generally, this results in a vmentry failure.
For Intel hardware, all implemented bits are covered by the checks.
For AMD hardware, the only EFER bit which isn't covered by the checks is TCE
(which AFAICT is specific to AMD Fam15/16 hardware). We never advertise TCE
in CPUID, but it isn't a security problem to have TCE unexpected enabled in
guest context.
Disallow the setting of bits outside of the EFER_KNOWN_MASK, which prevents
any vmentry failures for guests, yielding #GP instead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Mon, 23 Jul 2018 10:26:49 +0000 (11:26 +0100)]
ocaml: remove undefined behaviour in systemd_stubs.c
Clang complains:
systemd_stubs.c:51:8: error: shifting a negative signed value is undefined [-Werror,-Wshift-negative-value]
ret = Val_int(-1U);
^~~~~~~~~~~~
Since sd_notify_fd has a signature of unit -> unit, we simply change
the return value to Val_unit.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Razvan Cojocaru [Thu, 28 Jun 2018 07:54:01 +0000 (10:54 +0300)]
xen/altp2m: set access_required properly for all altp2ms
For the hostp2m, access_required starts off as 0, then it can be
set with xc_domain_set_access_required(). However, all the altp2ms
set it to 1 on init, and ignore both the hostp2m and the hypercall.
This patch sets access_required to the value from the hostp2m
on altp2m init, and propagates the values received via hypercall
to all the active altp2ms, when applicable.
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com>
This is to facilitate the values being passed in via domain_create(), at which
point the dom0 construction code needs to know them.
While cleaning up, drop the DEFAULT_* defines, which are only used immediately
adjacent in a context which makes it obvious that they are the defaults, and
drop the (unused) logic to allow a per-arch override of
DEFAULT_MAX_NR_GRANT_FRAMES.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
x86/physdev: Remove redundant assignment in allocate_and_map_msi_pirq()
No functional change.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
scripts: add helper script to use Docker containers
This adds a script that can be used to do builds easily within the
defined containers under the automation directory. These containers live
in the public GitLab registry under the xen-project namespace. The
script can be executed a number of ways but the default is to drop you
at a bash shell within a Debian Stretch container at the top level of
the source tree.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Remove local definition of MIN and instead include the kernel.h header
for the hypervisor build. Fixes the following error on the tools build:
In file included from xc_dom_decompress_unsafe_lzma.c:8:0:
../../xen/common/unlzma.c:33:0: error: "MIN" redefined [-Werror]
#define MIN(a, b) (((a) < (b)) ? (a) : (b))
^
In file included from xc_private.h:43:0,
from xg_private.h:29,
from xc_dom_decompress_unsafe_lzma.c:5:
/home/osstest/build.125458.build-amd64/xen/stubdom/libxc-x86_64/../../tools/include/xen-tools/libs.h:21:0: note: this is the location of the previous definition
#define MIN(x, y) ((x) < (y) ? (x) : (y))
^
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Wed, 18 Jul 2018 11:22:55 +0000 (12:22 +0100)]
x86/xstate: Make errors in xstate calculations more obvious by crashing the domain
If xcr0_max exceeds xfeature_mask, then something is broken with the CPUID
policy derivation or auditing logic. If hardware rejects new_bv, then
something is broken with Xen's xstate logic.
In both cases, crash the domain with an obvious error message, to help
highlight the issues.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 18 Jul 2018 10:56:44 +0000 (11:56 +0100)]
x86/xstate: Use a guests CPUID policy, rather than allowing all features
It turns out that Xen has never enforced that a domain remain within the
xstate features advertised in CPUID.
The check of new_bv against xfeature_mask ensures that a domain stays within
the set of features that Xen has enabled in hardware (and therefore isn't a
security problem), but this does means that attempts to level a guest for
migration safety might not be effective if the guest ignores CPUID.
Check the CPUID policy in validate_xstate() (for incoming migration) and in
handle_xsetbv() (for guest XSETBV instructions). This subsumes the PKRU check
for PV guests in handle_xsetbv() (and also demonstrates that I should have
spotted this problem while reviewing c/s fbf9971241f).
For migration, this is correct despite the current (mis)ordering of data
because d->arch.cpuid is the applicable max policy.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 29 Jun 2018 13:05:52 +0000 (13:05 +0000)]
libx86: Introduce lib/x86/msr.h and share msr_policy with userspace
To facilitate the shared Xen and toolstack code in libx86, struct msr_policy
needs to be available in the same way as struct cpuid_policy.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 21 Jun 2018 14:35:48 +0000 (15:35 +0100)]
libx86: introduce a libx86 shared library
Move x86_cpuid_lookup_deep_deps() into the shared library, removing the
individual copies from the hypervisor and libxc respectively.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 21 Jun 2018 14:35:46 +0000 (16:35 +0200)]
libx86: Share struct cpuid_policy with userspace
Both Xen and the toolstack have need of the same logic when it comes to
manipulation and checking of the CPUID and MSR values offered to guests. To
that end, libx86 is being introduced to allow Xen and the toolstack to share a
single implementation, rather than duplicating the logic.
No functional change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Thu, 21 Jun 2018 14:35:46 +0000 (16:35 +0200)]
libx86: generate cpuid-autogen.h in the libx86 include dir
This avoids all users needing to opencode local generation of the file.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 28 Jun 2018 11:00:44 +0000 (11:00 +0000)]
libx86: Introduce lib/x86/cpuid.h
Begin to untangle the header dependency tangle by moving definition of
struct cpuid_leaf out of x86_emulate.h into the new cpuid.h.
Additionally, plumb the header through to libxc. This is technically a
redundant include at this point, but it helps build-test the later changes,
and will be used eventually.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 28 May 2018 14:19:05 +0000 (14:19 +0000)]
x86/vmx: Don't clobber %dr6 while debugging state is lazy
c/s 4f36452b63 introduced a write to %dr6 in the #DB intercept case, but the
guests debug registers may be lazy at this point, at which point the guests
later attempt to read %dr6 will discard this value and use the older stale
value.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 19 Jul 2018 10:33:38 +0000 (04:33 -0600)]
cpumask: tidy {,z}alloc_cpumask_var()
Drop unnecessary casts and use bool in favor of bool_t.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 10:32:43 +0000 (04:32 -0600)]
x86: command line option to avoid use of secondary hyper-threads
Shared resources (L1 cache and TLB in particular) present a risk of
information leak via side channels. Provide a means to avoid use of
hyperthreads in such cases.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 10:32:06 +0000 (04:32 -0600)]
x86: possibly bring up all CPUs even if not all are supposed to be used
Reportedly Intel CPUs which can't broadcast #MC to all targeted
cores/threads because some have CR4.MCE clear will shut down. Therefore
we want to keep CR4.MCE enabled when offlining a CPU, and we need to
bring up all CPUs in order to be able to set CR4.MCE in the first place.
The use of clear_in_cr4() in cpu_mcheck_disable() was ill advised
anyway, and to avoid future similar mistakes I'm removing clear_in_cr4()
altogether right here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 10:31:07 +0000 (04:31 -0600)]
x86: distinguish CPU offlining from CPU removal
In order to be able to service #MC on offlined CPUs, the GDT, IDT,
stack, and per-CPU data (which includes the TSS) need to be kept
allocated. They should only be freed upon CPU removal (which we
currently don't support, so some code is becoming effectively dead for
the moment).
Note that for now park_offline_cpus doesn't get set to true anywhere -
this is going to be the subject of a subsequent patch.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 09:54:45 +0000 (11:54 +0200)]
VMX: fix vmx_{find,del}_msr() build
Older gcc at -O2 (and perhaps higher) does not recognize that apparently
uninitialized variables aren't really uninitialized. Pull out the
assignments used by two of the three case blocks and make them
initializers of the variables, as I think I had suggested during review.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 19 Jul 2018 07:42:42 +0000 (09:42 +0200)]
x86/AMD: distinguish compute units from hyper-threads
Fam17 replaces CUs by HTs, which we should reflect accordingly, even if
the difference is not very big. The most relevant change (requiring some
code restructuring) is that the topoext feature no longer means there is
a valid CU ID.
Take the opportunity and convert wrongly plain int variables in
set_cpu_sibling_map() to unsigned int.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 19 Jul 2018 07:41:55 +0000 (09:41 +0200)]
cpupools: fix state when downing a CPU failed
While I've run into the issue with further patches in place which no
longer guarantee the per-CPU area to start out as all zeros, the
CPU_DOWN_FAILED processing looks to have the same issue: By not zapping
the per-CPU cpupool pointer, cpupool_cpu_add()'s (indirect) invocation
of schedule_cpu_switch() will trigger the "c != old_pool" assertion
there.
Clearing the field during CPU_DOWN_PREPARE is too early (afaict this
should not happen before cpu_disable_scheduler()). Clearing it in
CPU_DEAD and CPU_DOWN_FAILED would be an option, but would take the same
piece of code twice. Since the field's value shouldn't matter while the
CPU is offline, simply clear it (implicitly) for CPU_ONLINE and
CPU_DOWN_FAILED, but only for other than the suspend/resume case (which
gets specially handled in cpupool_cpu_remove()).
By adjusting the conditional in cpupool_cpu_add() CPU_DOWN_FAILED
handling in the suspend case should now also be handled better.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com>
x86/HVM: switch virtual_intr_delivery_enabled() hook to simple boolean
This patch modifies the hvm_funcs.virtual_intr_delivery_enabled()
to become a bool variable as VMX does and SVM will simply return a
static value.
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 19 Jul 2018 07:35:43 +0000 (09:35 +0200)]
VMX: don't unconditionally set the tsc_scaling.setup hook
Instead of checking hvm_tsc_scaling_supported inside the hook function,
install the hook only when setting state such that said predicate
becomes true.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Olaf Hering [Wed, 18 Jul 2018 11:02:38 +0000 (13:02 +0200)]
tools/firmware: reproducible seabios build
The build system of seabios always includes the current time and the
hostname into the resulting binary. To avoid that, it is required to
have a file '.version' in the toplevel directory of seabios-dir-remote.
And it is required to pass EXTRAVERSION= to make because its toplevel
Makefile does not take EXTRAVERSION from environment.
Adjust the code to create a '.version' file with fixed content.
Adjust the code to pass EXTRAVERSION down to make.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Wed, 18 Jul 2018 16:02:12 +0000 (17:02 +0100)]
tools: fix dependency for ipxe and rombios
It appears that the test in 01d631028 for ipxe's dependency on rombios
is not good enough. Configuring with --disable-rombios doesn't disable
ipxe.
Fix it by testing the dependency after AC_ARG_ENABLE and AC_ARG_WITH
have taken effect.
At the same time, regularise options for ipxe:
--enable-ipxe enable building in-tree ipxe
--disable-ipxe disable building in-tree ipxe
--with-system-ipxe specify a path to be baked into code, disable
building in-tree ipxe, this trumps --{en,dis}able-ipxe
--without-system-ipxe error
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Mon, 16 Jul 2018 09:21:54 +0000 (10:21 +0100)]
x86/cpuid: Adjust the policies based on the boot time vPMU setting
The vPMU logic isn't integrated into the CPUID policy logic (and still
requires a fair amount of work before it can be).
The ARCH_PERFMON leaf was previously copied into all policies, unilaterally
overridden (to the same value in the general case) by the toolstack using
DOMCTL_set_cpuid, then unilaterally overridden again by Xen's runtime
logic (based on the boot time settings).
The policy retrieved with DOMCTL_get_cpu_policy needs to be accurate, so take
the boot time settings into account when creating and clipping the toolstack
policy. The runtime logic is still required for now, to clip the maximum
reported version when necessary.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Wed, 18 Jul 2018 08:39:23 +0000 (10:39 +0200)]
grant_table: use term 'mfn' for machine frame numbers...
...rather than more ambiguous term 'frame'.
There are many places in the grant table code that use a variable or
field name '.*frame' to refer to a quantity that is strictly an MFN, and
even has type mfn_t.
This patch is a purely cosmetic patch that substitutes 'frame' with 'mfn'
in those places to make the purpose of the variable or field name more
obvious.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: George Dunlap <George.Dunlap@eu.citrix.com>
Jan Beulich [Wed, 18 Jul 2018 08:38:03 +0000 (10:38 +0200)]
x86/HPET: adjustments to constants and their use
Drop HPET_TN_ROUTE_SHIFT as redundant with HPET_TN_ROUTE.
Introduce HPET_TN_INT_ROUTE_CAP paralleling the other HPET_TN_*_CAP
constants, making it necessary to rename the such named constant in
hvm/hpet.c. Use MASK_EXTR() / MASK_INSR() instead of kind of open-
coding them.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 18 Jul 2018 08:37:21 +0000 (10:37 +0200)]
x86/vHPET: replace literal numbers
Also drop the unused HPET_TN_CFG_BITS_READONLY_OR_RESERVED.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 18 Jul 2018 08:36:40 +0000 (10:36 +0200)]
x86/HVM: avoid memory_type_changed() invocations when possible
They're expensive, and nothing changes if MTRRs are disabled and any of
the ranges gets changed, or if fixed range MTRRs are disabled and any of
them gets changed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 18 Jul 2018 08:35:39 +0000 (10:35 +0200)]
x86/HVM: improve a few state load checks
Using plain int for instance numbers looks quite dangerous without
being aware that hvm_load_instance() returns an unsigned quantity. Make
this more explicit. Also replace uint16_t uses by unsigned int.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Tue, 10 Jul 2018 13:01:29 +0000 (14:01 +0100)]
x86/msr: Drop stale comment for vcpu_msrs.spec_ctrl
More than the bottom two bits are now defined, and the MSR policy work has
shown that using non-architectural representations turns out to be problematic
for more than just asm code. As the architectural representation is the
expected default, we don't need to justify why we are using it.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 28 May 2018 14:19:23 +0000 (14:19 +0000)]
x86/svm Fixes and cleanup to svm_inject_event()
* State adjustments (and debug tracing) for #DB/#BP/#PF should not be done
for `int $n` instructions. Updates to %cr2 occur even if the exception
combines to #DF.
* Don't opencode DR_STEP when updating %dr6.
* Simplify the logic for calling svm_emul_swint_injection() as in the common
case, every condition needs checking.
* Fix comments which have become stale as code has moved between components.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Andrew Cooper [Wed, 14 Mar 2018 15:20:05 +0000 (15:20 +0000)]
x86/pv: Avoid locked bit manipulation in register_guest_callback()
Changes to arch.vgc_flags are made to current in syncrhonous context only, and
don't need to be locked. (The only other changes are via
arch_set_info_guest(), which operates on descheduled vcpus only).
Replace the {set,clear}_bit() calls with compiler-visible bitwise operations.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Wei Liu [Mon, 16 Jul 2018 14:02:55 +0000 (15:02 +0100)]
tools: --with-system-{ovmf,seabios,ipxe} should provide absolute paths
The paths shouldn't be set to "yes". We ask the user to set absolute
paths because Xen's build system doesn't know where to search, and the
build machine doesn't necessarily have those binaries present in the
first place.
Reported-by: Anthony Perard <anthony.perard@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Mon, 16 Jul 2018 14:02:54 +0000 (15:02 +0100)]
tools: provide --with-system-ipxe
This option lets user specify which binary is to be used as ipxe. If
it is specified, the in-tree ipxe will not be built. This option is in
line with other --with-system-* options we provide.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Mon, 16 Jul 2018 14:02:52 +0000 (15:02 +0100)]
libxc: allow HVM guest to have modules
Lift the loading code out of PVH specific branch. Take the chance to
make the debug message more useful.
Now the code needs to handle virt_base being UNSET_ADDR, which it is
for HVM guest. In case virt_base is not set, it should be treated as
zero. In case PVH and PV, virt_base is set by the respective loader
by parsing the binary.
IPXE will be loaded as a module of Rombios.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Mon, 16 Jul 2018 13:15:12 +0000 (15:15 +0200)]
allow cpu_down() to be called earlier
The function's use of the stop-machine logic has so far prevented its
use ahead of the processing of the "ordinary" initcalls. Since at this
early time we're in a controlled environment anyway, there's no need for
such a heavy tool. Additionally this ought to have less of a performance
impact especially on large systems, compared to the alternative of
making stop-machine functionality available earlier.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 16 Jul 2018 13:12:19 +0000 (15:12 +0200)]
console: avoid printing no or null time stamps
During early boot timestamps aren't very useful, as they're all zero
(in "boot" mode) or absent altogether (in "date" and "datems" modes).
Log "boot" format timestamps when the date formats aren't available yet,
and log raw timestamps when boot ones are still all zero. Also add a
"raw" mode.
For the ARM side get_cycles() to produce a meaningful value, ARM's
cycle_t gets changed to uint64_t.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com>
And enable MTRR. This allows to provide a sane initial MTRR state for
PVH DomUs. This will have to be expanded when pci-passthrough support
is added to PVH guests, so that MMIO regions of devices are set as
UC.
Note that initial MTRR setup is done by hvmloader for HVM guests,
that's not used by PVH guests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Copy the state found on the hardware when creating a PVH Dom0. Since
the memory map provided to a PVH Dom0 is based on the native one using
the same set of MTRR ranges should provide Dom0 with a sane MTRR state
without having to manually build it in Xen.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
hvm/mtrr: use the hardware number of variable ranges for Dom0
Expand the size of the variable ranges array to match the size of the
underlying hardware, this is a preparatory change for copying the
hardware MTRR state for Dom0.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 16 Jul 2018 13:08:02 +0000 (15:08 +0200)]
x86/HVM: improve MTRR load checks
We should not assume that the incoming set of values contains exactly
MTRR_VCNT variable range MSRs. Permit a smaller amount and reject a
bigger one. As a result the save path then also needs to no longer use
a fixed upper bound, in turn requiring unused space in the save record
to be zeroed up front.
Also slightly refine types where appropriate.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
[switch to use MASK_EXTR to get VCNT] Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>