Wei Liu [Fri, 22 Feb 2019 17:11:03 +0000 (17:11 +0000)]
xen: implement xmalloc_whole_pages with vmalloc
The old code has an optimisation which returned unused pages to the
allocator. It will not fare well in the new world in which xenheap is
implemented with vmap because vmap doesn't allow breaking up a chunk
of allocated address space.
To move forward we should under allocate then use vmap to create a
contiguous mapping. vmalloc does exactly that. Use it.
Note that now we have two code paths for xmalloc -- one goes through
xenheap, the other goes through vmap. They will become one once
xenheap is implemented with vmap.
Andrew Cooper [Thu, 14 Feb 2019 11:10:09 +0000 (11:10 +0000)]
x86/pv: Fix construction of 32bit dom0's
dom0_construct_pv() has logic to transition dom0 into a compat domain when
booting an ELF32 image.
One aspect which is missing is the CPUID policy recalculation, meaning that a
32bit dom0 sees a 64bit policy, which differ by the Long Mode feature flag in
particular. Another missing item is the x87_fip_width initialisation.
Update dom0_construct_pv() to use switch_compat(), rather than retaining the
opencoding. Position the call to switch_compat() such that the compat32 local
variable can disappear entirely.
The 32bit monitor table is now created by setup_compat_l4(), avoiding the need
to for manual creation later.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Wei Liu [Tue, 5 Feb 2019 12:02:00 +0000 (12:02 +0000)]
x86: switch root_pgt to mfn_t and use new APIs
This then requires moving declaration of root page table mfn into mm.h
and modify setup_cpu_root_pgt to have a single exit path.
We also need to force map_domain_page to use direct map when switching
per-domain mappings. This is contrary to our end goal of removing
direct map, but this will be removed once we make map_domain_page
context-switch safe in another (large) patch series.
Wei Liu [Tue, 29 Jan 2019 12:59:55 +0000 (12:59 +0000)]
x86/mm: change pl3e to l3t in virt_to_xen_l3e
We will need to have a variable named pl3e when we rewrite
virt_to_xen_l3e. Change pl3e to l3t to reflect better its purpose.
This will make reviewing later patch easier.
Wei Liu [Tue, 29 Jan 2019 12:57:35 +0000 (12:57 +0000)]
x86/mm: change pl1e to l1t in virt_to_xen_l1e
We will need to have a variable named pl1e when we rewrite
virt_to_xen_l1e. Change pl1e to l1t to reflect better its purpose.
This will make reviewing later patch easier.
Wei Liu [Tue, 29 Jan 2019 12:54:48 +0000 (12:54 +0000)]
x86/mm: change pl2e to l2t in virt_to_xen_l2e
We will need to have a variable named pl2e when we rewrite
virt_to_xen_l2e. Change pl2e to l2t to reflect better its purpose.
This will make reviewing later patch easier.
Wei Liu [Mon, 28 Jan 2019 18:10:10 +0000 (18:10 +0000)]
x86/mm: introduce l{1,2}t local variables to modify_xen_mappings
The pl2e and pl1e variables are heavily (ab)used in that function. It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.
We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.
Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.
Wei Liu [Mon, 28 Jan 2019 17:54:24 +0000 (17:54 +0000)]
x86/mm: introduce l{1,2}t local variables to map_pages_to_xen
The pl2e and pl1e variables are heavily (ab)used in that function. It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.
We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.
Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.
Wei Liu [Wed, 23 Jan 2019 15:33:07 +0000 (15:33 +0000)]
x86: introduce a new set of APIs to manage Xen page tables
We are going to switch to using domheap page for page tables.
A new set of APIs is introduced to allocate, map, unmap and free pages
for page tables.
The allocation and deallocation work on mfn_t but not page_info,
because they are required to work even before frame table is set up.
Implement the old functions with the new ones. We will rewrite, site
by site, other mm functions that manipulate page tables to use the new
APIs.
Note these new APIs still use xenheap page underneath and no actual
map and unmap is done so that we don't break xen half way. They will
be switched to use domheap and dynamic mappings when usage of old APIs
is eliminated.
Jan Beulich [Tue, 12 Feb 2019 10:54:57 +0000 (11:54 +0100)]
VMX: don't ignore P2M setup error
set_mmio_p2m_entry() may fail, in particular with -ENOMEM. Don't ignore
such an error, but instead cause domain creation to fail in such a case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Juergen Gross [Tue, 12 Feb 2019 10:54:07 +0000 (11:54 +0100)]
iommu: fix iommu_ops initialization
Commit 32a5ea00ec75ef53e ("IOMMU/x86: remove indirection from certain
IOMMU hook accesses") introduced iommu_ops initialized at boot time
with data declared as __initconstrel.
On Intel systems there is another path where iommu_ops is initialized
and this path is relevant on resume after returning from system suspend.
As the initialization data is no longer accessible in this case that
second initialization must be dropped in case the system isn't just
booting.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Norbert Manthey [Wed, 6 Feb 2019 14:09:33 +0000 (15:09 +0100)]
asm: handle comments when creating header file
In the early steps of compilation, the asm header files are created, such
as include/asm-$(TARGET_ARCH)/asm-offsets.h. These files depend on the
assembly file arch/$(TARGET_ARCH)/asm-offsets.s, which is generated
before. Depending on the used toolchain, there might be comments in the
assembly files. Especially the goto-gcc compiler of the bounded model
checker CBMC adds comments that start with a '#' symbol at the beginning
of the line.
This commit adds handling comments in assembler during the creation of the
asm header files, especially ignoring lines that start with '#', which
indicate comments for both ARM and x86 assembler. The used tool goto-as
produces exactly comments of this kind.
Signed-off-by: Norbert Manthey <nmanthey@amazon.de> Signed-off-by: Michael Tautschnig <tautschn@amazon.co.uk> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
A previously bad situation has become worse with the early setting of
->max_vcpus: The value returned by shadow_min_acceptable_pages() has
further grown, and hence now holds back even more memory from use for
the p2m.
Make sh_min_allocation() account for all p2m memory needed for
shadow_enable() to succeed during domain creation (at which point the
domain has no memory at all allocated to it yet, and hence use of
d->tot_pages is meaningless).
Also make shadow_min_acceptable_pages() no longer needlessly add 1 to
the vCPU count.
Finally make the debugging printk() in shadow_alloc_p2m_page() a little
more useful by logging some of the relevant domain settings.
Reported-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Release-acked-by: Juergen Gross <jgross@suse.com>
George Dunlap [Thu, 7 Feb 2019 12:41:17 +0000 (12:41 +0000)]
docs: features/qemu-depriv formatting fixes
Need a space between the paragraph and the list so pandoc knows it's a
list.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Ian Jackson [Thu, 7 Feb 2019 15:02:27 +0000 (15:02 +0000)]
tools: init scripts: make XEN_RUN_DIR and XEN_LOCK_DIR mode 700
These directories ought not to be even world-readable. If this script
for some reason runs with a lax umask they might be created
overly-writeable. Avoid any such bug by setting the mode explicitly.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Ian Jackson [Thu, 7 Feb 2019 15:02:25 +0000 (15:02 +0000)]
tools: init scripts: xencommons: Provides `xen'
It is useful to have a single `xen' facility (in the LSB Provides
namespace). That allows other facilities to specify that they should
go after `xen' without needing to know the implementation details.
This service name is already Provide'd by the (fairly different) init
scripts used in Debian.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Document provides a brief introduction to the Argo interdomain
communication mechanism and a detailed description of the granular
locking used within the Argo implementation.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
MAINTAINERS: add new section for Argo and self as maintainer
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
xsm, argo: notify: don't describe rings that cannot be sent to
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Release-acked-by: Juergen Gross <jgross@suse.com>
xsm, argo: XSM control for any access to argo by a domain
Will inhibit initialization of the domain's argo data structure to
prevent receiving any messages or notifications and access to any of
the argo hypercall operations.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Release-acked-by: Juergen Gross <jgross@suse.com>
xsm, argo: XSM control for argo message send operation
Default policy: allow.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Release-acked-by: Juergen Gross <jgross@suse.com>
XSM controls for argo ring registration with two distinct cases, where
the ring being registered is:
1) Single source: registering a ring for communication to receive messages
from a specified single other domain.
Default policy: allow.
2) Any source: registering a ring for communication to receive messages
from any, or all, other domains (ie. wildcard).
Default policy: deny, with runtime policy configuration via bootparam.
This commit modifies the signature of core XSM hook functions in order to
apply 'const' to arguments, needed in order for 'const' to be accepted in
signature of functions that invoke them.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Queries for data about space availability in registered rings and
causes notification to be sent when space has become available.
The hypercall op populates a supplied data structure with information about
ring state and if insufficient space is currently available in a given ring,
the hypervisor will record the domain's expressed interest and notify it
when it observes that space has become available.
Checks for free space occur when this notify op is invoked, so it may be
intentionally invoked with no data structure to populate
(ie. a NULL argument) to trigger such a check and consequent notifications.
Limit the maximum number of notify requests in a single operation to a
simple fixed limit of 256.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
argo: implement the sendv op; evtchn: expose send_guest_global_virq
sendv operation is invoked to perform a synchronous send of buffers
contained in iovs to a remote domain's registered ring.
It takes:
* A destination address (domid, port) for the ring to send to.
It performs a most-specific match lookup, to allow for wildcard.
* A source address, used to inform the destination of where to reply.
* The address of an array of iovs containing the data to send
* .. and the length of that array of iovs
* and a 32-bit message type, available to communicate message context
data (eg. kernel-to-kernel, separate from the application data).
If insufficient space exists in the destination ring, it will return
-EAGAIN and Xen will notify the caller when sufficient space becomes
available.
Accesses to the ring indices are appropriately atomic. The rings are
mapped into Xen's private address space to write as needed and the
mappings are retained for later use.
Notifications are sent to guests via VIRQ and send_guest_global_virq is
exposed in the change to enable argo to call it. VIRQ_ARGO is claimed
from the VIRQ previously reserved for this purpose (#11).
The VIRQ notification method is used rather than sending events using
evtchn functions directly because:
* no current event channel type is an exact fit for the intended
behaviour. ECS_IPI is closest, but it disallows migration to
other VCPUs which is not necessarily a requirement for Argo.
* at the point of argo_init, allocation of an event channel is
complicated by none of the guest VCPUs being initialized yet
and the event channel logic expects that a valid event channel
has a present VCPU.
* at the point of signalling a notification, the VIRQ logic is already
defensive: if d->vcpu[0] is NULL, the notification is just silently
dropped, whereas the evtchn_send logic is not so defensive: vcpu[0]
must not be NULL, otherwise a null pointer dereference occurs.
Using a VIRQ removes the need for the guest to query to determine which
event channel notifications will be delivered on. This is also likely to
simplify establishing future L0/L1 nested hypervisor argo communication.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Takes a single argument: a handle to the ring unregistration struct,
which specifies the port and partner domain id or wildcard.
The ring's entry is removed from the hashtable of registered rings;
any entries for pending notifications are removed; and the ring is
unmapped from Xen's address space.
If the ring had been registered to communicate with a single specified
domain (ie. a non-wildcard ring) then the partner domain state is removed
from the partner domain's argo send_info hash table.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Release-acked-by: Juergen Gross <jgross@suse.com>
The register op is used by a domain to register a region of memory for
receiving messages from either a specified other domain, or, if specifying a
wildcard, any domain.
This operation creates a mapping within Xen's private address space that
will remain resident for the lifetime of the ring. In subsequent commits,
the hypervisor will use this mapping to copy data from a sending domain into
this registered ring, making it accessible to the domain that registered the
ring to receive data.
Wildcard any-sender rings are default disabled and registration will be
refused with EPERM unless they have been specifically enabled with the
new mac-permissive flag that is added to the argo boot option here. The
reason why the default for wildcard rings is 'deny' is that there is
currently no means to protect the ring from DoS by a noisy domain
spamming the ring, affecting other domains ability to send to it. This
will be addressed with XSM policy controls in subsequent work.
Since denying access to any-sender rings is a significant functional
constraint, the new option "mac-permissive" for the argo bootparam
enables overriding this. eg: "argo=1,mac-permissive=1"
The p2m type of the memory supplied by the guest for the ring must be
p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
is registered.
This hypercall op and its interface currently only supports 4K-sized pages.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
EMSGSIZE: Argo's sendv operation will return EMSGSIZE when an excess amount
of data, across all iovs, has been supplied, exceeding either the statically
configured maximum size of a transmittable message, or the (variable) size
of the ring registered by the destination domain.
ECONNREFUSED: Argo's register operation will return ECONNREFUSED if a ring
is being registered to communicate with a specific remote domain that does
exist but is not argo-enabled.
These codes are described by POSIX here:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
EMSGSIZE : "Message too large"
ECONNREFUSED : "Connection refused".
The numeric values assigned to each are taken from Linux, as is the case
for the existing error codes.
EMSGSIZE : 90
ECONNREFUSED : 111
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
argo: init, destroy and soft-reset, with enable command line opt
Initialises basic data structures and performs teardown of argo state
for domain shutdown.
Inclusion of the Argo implementation is dependent on CONFIG_ARGO.
Introduces a new Xen command line parameter 'argo': bool to enable/disable
the argo hypercall. Defaults to disabled.
New headers:
public/argo.h: with definions of addresses and ring structure, including
indexes for atomic update for communication between domain and hypervisor.
xen/argo.h: to expose the hooks for integration into domain lifecycle:
argo_init: per-domain init of argo data structures for domain_create.
argo_destroy: teardown for domain_destroy and the error exit
path of domain_create.
argo_soft_reset: reset of domain state for domain_soft_reset.
Adds a new field to struct domain: struct argo_domain *argo;
In accordance with recent work on _domain_destroy, argo_destroy is
idempotent. It will tear down: all rings registered by this domain, all
rings where this domain is the single sender (ie. specified partner,
non-wildcard rings), and all pending notifications where this domain is
awaiting signal about available space in the rings of other domains.
A count will be maintained of the number of rings that a domain has
registered in order to limit it below the fixed maximum limit defined here.
Macros are defined to verify the internal locking state within the argo
implementation. The macros are ASSERTed on entry to functions to validate
and document the required lock state prior to calling.
The hash function for the hashtables that hold ring state is derived from
the string hashing function djb2 (http://www.cse.yorku.ca/~oz/hash.html)
by Daniel J. Bernstein. Basic testing with a limited number of domains and
ports has shown reasonable distribution for the table size.
The software license on the public header is the BSD license, standard
procedure for the public Xen headers. The public header was originally
posted under a GPL license at: [1]:
https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html
The following ACK by Lars Kurth is to confirm that only people being
employees of Citrix contributed to the header files in the series posted at
[1] and that thus the copyright of the files in question is fully owned by
Citrix. The ACK also confirms that Citrix is happy for the header files to
be published under a BSD license in this series (which is based on [1]).
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Lars Kurth <lars.kurth@citrix.com> Reviewed-by: Ross Philipson <ross.philipson@oracle.com> Tested-by: Chris Patterson <pattersonc@ainfosec.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
A convenience for working on development of the argo subsystem:
setting a #define variable enables additional debug messages.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Registers the hypercall previously reserved for this.
Takes 5 arguments, does nothing and returns -ENOSYS.
Implementation will provide a compat ABI so COMPAT_CALL is the selected
macro for the hypercall tables.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
argo: Introduce the Kconfig option to govern inclusion of Argo
Defines CONFIG_ARGO when enabled. Default: disabled.
When the Kconfig option is enabled, the Argo hypercall implementation
will be included, allowing use of the hypervisor-mediated interdomain
communication mechanism.
Argo is implemented for x86 and ARM hardware platforms.
Availability of the option depends on EXPERT and Argo is currently an
experimental feature.
Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>