]> xenbits.xensource.com Git - people/liuw/xen.git/log
people/liuw/xen.git
6 years agodocs: features/qemu-depriv formatting fixes
George Dunlap [Thu, 7 Feb 2019 12:41:17 +0000 (12:41 +0000)]
docs: features/qemu-depriv formatting fixes

Need a space between the paragraph and the list so pandoc knows it's a
list.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs: Update credit/credit2 feature docs reflecting new default scheduler
George Dunlap [Thu, 7 Feb 2019 12:05:43 +0000 (12:05 +0000)]
docs: Update credit/credit2 feature docs reflecting new default scheduler

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools: init scripts: make XEN_RUN_DIR and XEN_LOCK_DIR mode 700
Ian Jackson [Thu, 7 Feb 2019 15:02:27 +0000 (15:02 +0000)]
tools: init scripts: make XEN_RUN_DIR and XEN_LOCK_DIR mode 700

These directories ought not to be even world-readable.  If this script
for some reason runs with a lax umask they might be created
overly-writeable.  Avoid any such bug by setting the mode explicitly.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools: init scripts: xencommons: Fixes to Description
Ian Jackson [Thu, 7 Feb 2019 15:02:26 +0000 (15:02 +0000)]
tools: init scripts: xencommons: Fixes to Description

`neeeded' is a typo.  And xend is long gone.

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools: init scripts: xencommons: Provides `xen'
Ian Jackson [Thu, 7 Feb 2019 15:02:25 +0000 (15:02 +0000)]
tools: init scripts: xencommons: Provides `xen'

It is useful to have a single `xen' facility (in the LSB Provides
namespace).  That allows other facilities to specify that they should
go after `xen' without needing to know the implementation details.

This service name is already Provide'd by the (fairly different) init
scripts used in Debian.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/arm: gic-v2: deactivate interrupts during initialization
Stefano Stabellini [Tue, 5 Feb 2019 21:38:53 +0000 (13:38 -0800)]
xen/arm: gic-v2: deactivate interrupts during initialization

Interrupts could be ACTIVE at boot. Make sure to deactivate them during
initialization.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
CC: julien.grall@arm.com
CC: peng.fan@nxp.com
CC: jgross@suse.com
6 years agodocs, argo: add design document for Argo
Christopher Clark [Wed, 6 Feb 2019 08:56:00 +0000 (09:56 +0100)]
docs, argo: add design document for Argo

Document provides a brief introduction to the Argo interdomain
communication mechanism and a detailed description of the granular
locking used within the Argo implementation.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoSUPPORT.md : add new entry for the Argo feature
Christopher Clark [Wed, 6 Feb 2019 09:04:00 +0000 (10:04 +0100)]
SUPPORT.md : add new entry for the Argo feature

Status: Experimental

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoMAINTAINERS: add new section for Argo and self as maintainer
Christopher Clark [Wed, 6 Feb 2019 08:56:00 +0000 (09:56 +0100)]
MAINTAINERS: add new section for Argo and self as maintainer

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxsm, argo: notify: don't describe rings that cannot be sent to
Christopher Clark [Wed, 6 Feb 2019 08:56:00 +0000 (09:56 +0100)]
xsm, argo: notify: don't describe rings that cannot be sent to

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxsm, argo: XSM control for any access to argo by a domain
Christopher Clark [Wed, 6 Feb 2019 08:56:00 +0000 (09:56 +0100)]
xsm, argo: XSM control for any access to argo by a domain

Will inhibit initialization of the domain's argo data structure to
prevent receiving any messages or notifications and access to any of
the argo hypercall operations.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxsm, argo: XSM control for argo message send operation
Christopher Clark [Wed, 6 Feb 2019 09:02:00 +0000 (10:02 +0100)]
xsm, argo: XSM control for argo message send operation

Default policy: allow.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxsm, argo: XSM control for argo register
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
xsm, argo: XSM control for argo register

XSM controls for argo ring registration with two distinct cases, where
the ring being registered is:

1) Single source:  registering a ring for communication to receive messages
                   from a specified single other domain.
   Default policy: allow.

2) Any source:     registering a ring for communication to receive messages
                   from any, or all, other domains (ie. wildcard).
   Default policy: deny, with runtime policy configuration via bootparam.

This commit modifies the signature of core XSM hook functions in order to
apply 'const' to arguments, needed in order for 'const' to be accepted in
signature of functions that invoke them.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: implement the notify op
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: implement the notify op

Queries for data about space availability in registered rings and
causes notification to be sent when space has become available.

The hypercall op populates a supplied data structure with information about
ring state and if insufficient space is currently available in a given ring,
the hypervisor will record the domain's expressed interest and notify it
when it observes that space has become available.

Checks for free space occur when this notify op is invoked, so it may be
intentionally invoked with no data structure to populate
(ie. a NULL argument) to trigger such a check and consequent notifications.

Limit the maximum number of notify requests in a single operation to a
simple fixed limit of 256.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: implement the sendv op; evtchn: expose send_guest_global_virq
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: implement the sendv op; evtchn: expose send_guest_global_virq

sendv operation is invoked to perform a synchronous send of buffers
contained in iovs to a remote domain's registered ring.

It takes:
 * A destination address (domid, port) for the ring to send to.
   It performs a most-specific match lookup, to allow for wildcard.
 * A source address, used to inform the destination of where to reply.
 * The address of an array of iovs containing the data to send
 * .. and the length of that array of iovs
 * and a 32-bit message type, available to communicate message context
   data (eg. kernel-to-kernel, separate from the application data).

If insufficient space exists in the destination ring, it will return
-EAGAIN and Xen will notify the caller when sufficient space becomes
available.

Accesses to the ring indices are appropriately atomic. The rings are
mapped into Xen's private address space to write as needed and the
mappings are retained for later use.

Notifications are sent to guests via VIRQ and send_guest_global_virq is
exposed in the change to enable argo to call it. VIRQ_ARGO is claimed
from the VIRQ previously reserved for this purpose (#11).

The VIRQ notification method is used rather than sending events using
evtchn functions directly because:

* no current event channel type is an exact fit for the intended
  behaviour. ECS_IPI is closest, but it disallows migration to
  other VCPUs which is not necessarily a requirement for Argo.

* at the point of argo_init, allocation of an event channel is
  complicated by none of the guest VCPUs being initialized yet
  and the event channel logic expects that a valid event channel
  has a present VCPU.

* at the point of signalling a notification, the VIRQ logic is already
  defensive: if d->vcpu[0] is NULL, the notification is just silently
  dropped, whereas the evtchn_send logic is not so defensive: vcpu[0]
  must not be NULL, otherwise a null pointer dereference occurs.

Using a VIRQ removes the need for the guest to query to determine which
event channel notifications will be delivered on. This is also likely to
simplify establishing future L0/L1 nested hypervisor argo communication.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: implement the unregister op
Christopher Clark [Wed, 6 Feb 2019 09:04:00 +0000 (10:04 +0100)]
argo: implement the unregister op

Takes a single argument: a handle to the ring unregistration struct,
which specifies the port and partner domain id or wildcard.

The ring's entry is removed from the hashtable of registered rings;
any entries for pending notifications are removed; and the ring is
unmapped from Xen's address space.

If the ring had been registered to communicate with a single specified
domain (ie. a non-wildcard ring) then the partner domain state is removed
from the partner domain's argo send_info hash table.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: implement the register op
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: implement the register op

The register op is used by a domain to register a region of memory for
receiving messages from either a specified other domain, or, if specifying a
wildcard, any domain.

This operation creates a mapping within Xen's private address space that
will remain resident for the lifetime of the ring. In subsequent commits,
the hypervisor will use this mapping to copy data from a sending domain into
this registered ring, making it accessible to the domain that registered the
ring to receive data.

Wildcard any-sender rings are default disabled and registration will be
refused with EPERM unless they have been specifically enabled with the
new mac-permissive flag that is added to the argo boot option here. The
reason why the default for wildcard rings is 'deny' is that there is
currently no means to protect the ring from DoS by a noisy domain
spamming the ring, affecting other domains ability to send to it. This
will be addressed with XSM policy controls in subsequent work.

Since denying access to any-sender rings is a significant functional
constraint, the new option "mac-permissive" for the argo bootparam
enables overriding this. eg: "argo=1,mac-permissive=1"

The p2m type of the memory supplied by the guest for the ring must be
p2m_ram_rw and the memory will be pinned as PGT_writable_page while the ring
is registered.

This hypercall op and its interface currently only supports 4K-sized pages.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/arm: introduce guest_handle_for_field()
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
xen/arm: introduce guest_handle_for_field()

ARM port of c/s bb544585: "introduce guest_handle_for_field()"

This helper turns a field of a GUEST_HANDLE into a GUEST_HANDLE.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoerrno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
errno: add POSIX error codes EMSGSIZE, ECONNREFUSED to the ABI

EMSGSIZE: Argo's sendv operation will return EMSGSIZE when an excess amount
of data, across all iovs, has been supplied, exceeding either the statically
configured maximum size of a transmittable message, or the (variable) size
of the ring registered by the destination domain.

ECONNREFUSED: Argo's register operation will return ECONNREFUSED if a ring
is being registered to communicate with a specific remote domain that does
exist but is not argo-enabled.

These codes are described by POSIX here:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/errno.h.html
    EMSGSIZE     : "Message too large"
    ECONNREFUSED : "Connection refused".

The numeric values assigned to each are taken from Linux, as is the case
for the existing error codes.
    EMSGSIZE     : 90
    ECONNREFUSED : 111

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: init, destroy and soft-reset, with enable command line opt
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: init, destroy and soft-reset, with enable command line opt

Initialises basic data structures and performs teardown of argo state
for domain shutdown.

Inclusion of the Argo implementation is dependent on CONFIG_ARGO.

Introduces a new Xen command line parameter 'argo': bool to enable/disable
the argo hypercall. Defaults to disabled.

New headers:
  public/argo.h: with definions of addresses and ring structure, including
  indexes for atomic update for communication between domain and hypervisor.

  xen/argo.h: to expose the hooks for integration into domain lifecycle:
    argo_init: per-domain init of argo data structures for domain_create.
    argo_destroy: teardown for domain_destroy and the error exit
                  path of domain_create.
    argo_soft_reset: reset of domain state for domain_soft_reset.

Adds a new field to struct domain: struct argo_domain *argo;

In accordance with recent work on _domain_destroy, argo_destroy is
idempotent. It will tear down: all rings registered by this domain, all
rings where this domain is the single sender (ie. specified partner,
non-wildcard rings), and all pending notifications where this domain is
awaiting signal about available space in the rings of other domains.

A count will be maintained of the number of rings that a domain has
registered in order to limit it below the fixed maximum limit defined here.

Macros are defined to verify the internal locking state within the argo
implementation. The macros are ASSERTed on entry to functions to validate
and document the required lock state prior to calling.

The hash function for the hashtables that hold ring state is derived from
the string hashing function djb2 (http://www.cse.yorku.ca/~oz/hash.html)
by Daniel J. Bernstein. Basic testing with a limited number of domains and
ports has shown reasonable distribution for the table size.

The software license on the public header is the BSD license, standard
procedure for the public Xen headers. The public header was originally
posted under a GPL license at: [1]:
https://lists.xenproject.org/archives/html/xen-devel/2013-05/msg02710.html

The following ACK by Lars Kurth is to confirm that only people being
employees of Citrix contributed to the header files in the series posted at
[1] and that thus the copyright of the files in question is fully owned by
Citrix. The ACK also confirms that Citrix is happy for the header files to
be published under a BSD license in this series (which is based on [1]).

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
Reviewed-by: Ross Philipson <ross.philipson@oracle.com>
Tested-by: Chris Patterson <pattersonc@ainfosec.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: define argo_dprintk for subsystem debugging
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: define argo_dprintk for subsystem debugging

A convenience for working on development of the argo subsystem:
setting a #define variable enables additional debug messages.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
6 years agoargo: introduce the argo_op hypercall boilerplate
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: introduce the argo_op hypercall boilerplate

Presence is gated upon CONFIG_ARGO.

Registers the hypercall previously reserved for this.
Takes 5 arguments, does nothing and returns -ENOSYS.

Implementation will provide a compat ABI so COMPAT_CALL is the selected
macro for the hypercall tables.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoargo: Introduce the Kconfig option to govern inclusion of Argo
Christopher Clark [Wed, 6 Feb 2019 08:55:00 +0000 (09:55 +0100)]
argo: Introduce the Kconfig option to govern inclusion of Argo

Defines CONFIG_ARGO when enabled. Default: disabled.

When the Kconfig option is enabled, the Argo hypercall implementation
will be included, allowing use of the hypervisor-mediated interdomain
communication mechanism.

Argo is implemented for x86 and ARM hardware platforms.

Availability of the option depends on EXPERT and Argo is currently an
experimental feature.

Signed-off-by: Christopher Clark <christopher.clark6@baesystems.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoarm: gic-v3: deactivate interrupts during initialization
Peng Fan [Tue, 5 Feb 2019 05:55:35 +0000 (05:55 +0000)]
arm: gic-v3: deactivate interrupts during initialization

On i.MX8, we implemented partition reboot which means Cortex-A reboot
will not impact M4 cores and System control Unit core. However GICv3 is
not reset because we also need to support A72 Cluster reboot without
affecting A53 Cluster.

The gic-v3 controller is configured with EOImode to 1, so during xen
reboot, there is a function call "smp_call_function(halt_this_cpu, NULL, 0);"
but halt_this_cpu never returns, that means other CPUs have no chance to
deactivate the SGI interrupt, because the deactivate_irq operation is at
the end of do_sgi. During the next boot of Xen, CPU0 will issue
GIC_SGI_CALL_FUNCTION to other CPUs. As the Active state for SGI is left
untouched during the reboot, the GIC_SGI_CALL_FUNCTION will still be active
on the non-boot CPUs. This means the interrupt cannot be triggered again
until it get deactivated.

And according to IHI0069D_gic_architecture_specification, chapter
"8.11.3 GICR_ICACTIVER0, Interrupt Clear-Active Register 0", the RW
field of GICR_ICACTIVER0 resets to a value that is architecturally UNKNOWN.
So make sure all interrupts are deactivated during initialization by
clearing the state.

Signed-off-by: Peng Fan <peng.fan@nxp.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agotools: drop obsolete xen-ringwatch
Wei Liu [Mon, 4 Feb 2019 13:58:24 +0000 (13:58 +0000)]
tools: drop obsolete xen-ringwatch

This utility can't possibly work with modern Xen setup: none of the
sysfs path used (under /sys/devices/xen-backend) is documented as
stable ABI in upstream Linux kernel.

Archaeology shows that the path used could have been part of the
xenolinux fork which never got upstreamed.

Its utility is zero nowadays. Drop it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agoxen/arm: irq: End cleanly spurious interrupt
Julien Grall [Mon, 28 Jan 2019 16:00:23 +0000 (16:00 +0000)]
xen/arm: irq: End cleanly spurious interrupt

no_irq_type handlers are used when an IRQ does not have action attached.
This is useful to detect misconfiguration between the interrupt
controller and the software.

Currently, all the handlers will do nothing on spurious interrupt. This
means if such interrupt is received, the priority of the interrupt will
not be dropped and the processor will lose the ability to receive any
interrupt lower or equal to the priority.

Spurious interrupt can happen while releasing interrupt assigned to
guest (happen during domain destruction). The interaction is roughly

CPU0                                CPU1
release_guest_irq(A)
spin_lock(&desc->lock)
gic_remove_irq_from_guest
                                    receive IRQ A
                                    spin_lock(&desc->lock)
    desc->handler->shutdown()
      set_bit(IRQ_DISABLED)
    desc->handler = &no_irq_type
spin_unlock(&desc->lock)
                                    desc->handler->end();
                                    spin_unlock(&desc->lock)

Because the no_irq_type.end callback is implemented as a NOP, CPU1 will
not drop the priority of the interrupt. So the CPU will not be able to
receive any interrupt route to any guest afterwards.

The problem can be prevented by dropping the priority and deactivating
the interrupt via gic_hw_ops->gic_host_irq->end().

Note that, for now, interrupt used by Xen are safe because it is not
using no_irq_type on release.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agotools/misc: Remove obsolete xen-bugtool
Hans van Kranenburg [Sun, 3 Feb 2019 20:35:18 +0000 (21:35 +0100)]
tools/misc: Remove obsolete xen-bugtool

xen-bugtool relies on code that has been removed in commit 9e8672f1c3
"tools: remove xend and associated python modules", more than 5 years
ago. Remove it, since it confuses users.

    -$ /usr/sbin/xen-bugtool
    Traceback (most recent call last):
      File "/usr/sbin/xen-bugtool", line 9, in <module>
from xen.util import bugtool
    ImportError: No module named xen.util

Signed-off-by: Hans van Kranenburg <hans.van.kranenburg@mendix.com>
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=866380
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoautomation: introduce a QEMU smoke test for PVH Dom0
Wei Liu [Thu, 24 Jan 2019 14:03:48 +0000 (14:03 +0000)]
automation: introduce a QEMU smoke test for PVH Dom0

Make qemu-smoke-x86-64.sh take a variant argument. Make two new tests
in test.yaml.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibxl: When restricted, start QEMU paused
Anthony PERARD [Thu, 31 Jan 2019 10:57:48 +0000 (10:57 +0000)]
libxl: When restricted, start QEMU paused

libxl runs the command "cont" later during guest creation; i.e. it
is expecting that QEMU would not do any emulation.  Use the "-S"
command option to achieve this.

Unfortunately, when QEMU is started with "-S", it won't write QEMU's
readiness into xenstore. So only activate this option when we have a
QEMU startup notification via QMP available, i.e. when dm_restrict
is activated.

The -S option has the side-effect of suppressing the startup
notification via xenstore: libxl will only get the notification via
QMP.

It is important to rely only on QMP for notification when we have
QMP available, as (due to a qemu bug) not waiting for that QMP
notification may result in the QMP socket becoming blocked, so that
QEMU stops responding to new connections even if no existing ones
are active.

When the QEMU bug happens, the actions taken by both libxl and QEMU
are roughly as follows:
- libxl connects and handshakes with QEMU, then sends the
  cmd "query-status".
- QEMU prepares and maybe tries to send the response,
  while also writing "running" into xenstore.
- libxl sees via xenstore that QEMU is running and disconnects from the
  QMP socket before receiving the response from the cmd.
=> The QMP socket (monitor) is thereby blocked and will never reply
  to commands on new connections.

This is due to QEMU only responding to one command at a time, and
suspending its monitor (QMP) until the command has been processed and
sent. Disconnecting from the socket doesn't unsuspend the monitor. The
race described here is very likely to happen with QEMU 3.1.50 (during
3.2 development), but can be reproduced with QEMU 3.1.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agox86/svm: Improve diagnostics when svm_get_insn_len() fails
Andrew Cooper [Fri, 30 Nov 2018 13:50:54 +0000 (13:50 +0000)]
x86/svm: Improve diagnostics when svm_get_insn_len() fails

Sadly, a lone:

  (XEN) emulate.c:156:d2v0 svm_get_insn_len: Mismatch between expected and actual instruction: eip = fffff804564139c0

on the console is of no use trying to identify what went wrong.  Dump as much
state as we can to help identify what went wrong.

  (XEN) Insn mismatch: Expected opcode 0xf0031, modrm 0, got nrip_len 3, emul_len 3
  (XEN) SVM Insn len emulation failed (1): d1v0 64bit @ 0008:0010475f -> 0f 01 f9 0f 31 5b 31 ff 31 c0 e9 c2 db ff ff 00

Drop the debug-only early exit if the sources of length disagree, because the
only effect it has it to avoid the more detailed analysis of what went wrong.

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/svm: Drop enum instruction_index and simplify svm_get_insn_len()
Andrew Cooper [Thu, 13 Dec 2018 17:01:24 +0000 (17:01 +0000)]
x86/svm: Drop enum instruction_index and simplify svm_get_insn_len()

Passing a 32-bit integer index into an array with entries containing less than
32 bits of data is wasteful, and creates an unnecessary error condition of
passing an out-of-range index.

The width of the X86EMUL_OPC() encoding is currently 20 bits for the
instructions used, which leaves room for a modrm byte.  Drop opc_tab[]
entirely, and encode the expected opcode/modrm information directly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/svm: Remove list functionality from __get_instruction_length_* infrastructure
Andrew Cooper [Thu, 13 Dec 2018 17:01:24 +0000 (09:01 -0800)]
x86/svm: Remove list functionality from __get_instruction_length_* infrastructure

The existing __get_instruction_length_from_list() has a single user
which uses the list functionality.  That user however should be looking
specifically for INVD or WBINVD, as reported by the vmexit exit reason.

Modify svm_vmexit_do_invalidate_cache() to ask for the correct
instruction, and drop all list functionality from the helper.

Take the opportunity to rename it to svm_get_insn_len(), and drop the
IOIO length handling which has never been used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86emul: correct AVX512BW write masking checks
Jan Beulich [Thu, 31 Jan 2019 10:38:24 +0000 (11:38 +0100)]
x86emul: correct AVX512BW write masking checks

For VPSADBW this likely was a result of bad copy-and-paste.

For VPS{L,R}LDQ comment and code were not in line, but then again the
comment also wasn't fully updated from the AVX2 original it got cloned
from.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agotools: fix build dependency upon generated header(s)
Jan Beulich [Thu, 31 Jan 2019 10:37:56 +0000 (11:37 +0100)]
tools: fix build dependency upon generated header(s)

Commit fd35f32b4b ("tools/x86emul: Use struct cpuid_policy in the
userspace test harnesses") didn't account for the dependencies of
cpuid-autogen.h to potentially change between incremental builds.
Putting the make invocation to produce the header together with the
directory tree creation therefore does not work. Introduce a separate
goal.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/cmdline: Work around some specific command line warnings
Andrew Cooper [Tue, 29 Jan 2019 19:07:40 +0000 (19:07 +0000)]
xen/cmdline: Work around some specific command line warnings

Xen will warn when an unknown parameter is found in the command line.  e.g.

  (d8) [ 1556.334664] (XEN) parameter "pv-shim" unknown!

One case where this goes wrong is a workaround for an old grub bug, which
resulted in "placeholder" being prepended to the command line.

Another case is when booting a CONFIG_PV_SHIM_EXCLUSIVE build, in which the
parsing for the "pv-shim" parameter is discarded.

Introduce ignore_param() and OPT_IGNORE to cope with known cases, where
issuing a warning is the wrong course of action to take.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/pvh-boot: don't mandate validity of RSDP pointer
Wei Liu [Wed, 30 Jan 2019 13:55:55 +0000 (13:55 +0000)]
x86/pvh-boot: don't mandate validity of RSDP pointer

RSDP is not mandatory according to PVH spec. Remove the BUG_ON. The
guest (xen) will fall back to scanning if necessary.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooepr3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/arm: gic-vgic: Fix the assert condition in vgic_connect_hw_irq
Andrii Anisov [Fri, 25 Jan 2019 17:06:02 +0000 (19:06 +0200)]
xen/arm: gic-vgic: Fix the assert condition in vgic_connect_hw_irq

Currently, the assert condition in vgic_connect_hw_irq does not
correspond to the comment above and result to hit the assertion
on HW IRQ disconnection.

Fix the condition so it corresponds to the comment and allows IRQ
disconnection on debug builds.

Fixes: ec2a2f1 ("ARM: VGIC: factor out vgic_connect_hw_irq()")
Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Suggested-by: Stefan Nuernberger <snu@amazon.de>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
[julieng: Reword the commit message]
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibxl: correctly dispose of dominfo list in libxl_name_to_domid
Wei Liu [Tue, 29 Jan 2019 11:37:59 +0000 (11:37 +0000)]
libxl: correctly dispose of dominfo list in libxl_name_to_domid

Tamas reported ssid_label was leaked. Use the designated function to
free dominfo list to fix the leakage.

Reported-by: Tamas K Lengyel <tamas@tklengyel.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/hvm: Fix bit checking for CR4 and MSR_EFER
Andrew Cooper [Fri, 25 Jan 2019 16:23:46 +0000 (16:23 +0000)]
x86/hvm: Fix bit checking for CR4 and MSR_EFER

Before the cpuid_policy logic came along, %cr4/EFER auditing on migrate-in was
complicated, because at that point no CPUID information had been set for the
guest.  Auditing against the host CPUID was better than nothing, but not
ideal.

Similarly at the time, PVHv1 lacked the "CPUID passed through from hardware"
behaviour with PV guests had, and PVH dom0 had to be special-cased to be able
to boot.

Order of information in the migration stream is still an issue (hence we still
need to keep the restore parameter to cope with a nested virt corner case for
%cr4), but since Xen 4.9, all domains start with a suitable CPUID policy,
which is a more appropriate upper bound than host_cpuid_policy.

Finally, reposition the UMIP logic as it is the only row out of order.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/p2m: Drop erroneous #VE-enabled check in ept_set_entry()
Andrew Cooper [Tue, 22 Jan 2019 18:58:56 +0000 (18:58 +0000)]
x86/p2m: Drop erroneous #VE-enabled check in ept_set_entry()

Code clearing the "Suppress VE" bit in an EPT entry isn't nececsserily running
in current context.  In ALTP2M_external mode, it definitely is not, and in PV
context, vcpu_altp2m(current) acts upon the HVM union.

Even if we could sensibly resolve the target vCPU, it may legitimately not be
fully set up at this point, so rejecting the EPT modification would be buggy.

There is a path in hvm_hap_nested_page_fault() which explicitly emulates #VE
in the cpu_has_vmx_virt_exceptions case, so the -EOPNOTSUPP part of this
condition is also wrong.

Drop the !sve check entirely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agopvh/dom0: fix deadlock in GSI mapping
Roger Pau Monne [Mon, 28 Jan 2019 14:22:45 +0000 (15:22 +0100)]
pvh/dom0: fix deadlock in GSI mapping

The current GSI mapping code can cause the following deadlock:

(XEN) *** Dumping CPU0 host state: ***
(XEN) ----[ Xen-4.12.0-rc  x86_64  debug=y   Tainted:  C   ]----
[...]
(XEN) Xen call trace:
(XEN)    [<ffff82d080239852>] vmac.c#_spin_lock_cb+0x32/0x70
(XEN)    [<ffff82d0802ed40f>] vmac.c#hvm_gsi_assert+0x2f/0x60 <- pick hvm.irq_lock
(XEN)    [<ffff82d080255cc9>] io.c#hvm_dirq_assist+0xd9/0x130 <- pick event_lock
(XEN)    [<ffff82d080255b4b>] io.c#dpci_softirq+0xdb/0x120
(XEN)    [<ffff82d080238ce6>] softirq.c#__do_softirq+0x46/0xa0
(XEN)    [<ffff82d08026f955>] domain.c#idle_loop+0x35/0x90
(XEN)
[...]
(XEN) *** Dumping CPU3 host state: ***
(XEN) ----[ Xen-4.12.0-rc  x86_64  debug=y   Tainted:  C   ]----
[...]
(XEN) Xen call trace:
(XEN)    [<ffff82d08023985d>] vmac.c#_spin_lock_cb+0x3d/0x70
(XEN)    [<ffff82d080281fc8>] vmac.c#allocate_and_map_gsi_pirq+0xc8/0x130 <- pick event_lock
(XEN)    [<ffff82d0802f44c0>] vioapic.c#vioapic_hwdom_map_gsi+0x80/0x130
(XEN)    [<ffff82d0802f4399>] vioapic.c#vioapic_write_redirent+0x119/0x1c0 <- pick hvm.irq_lock
(XEN)    [<ffff82d0802f4075>] vioapic.c#vioapic_write+0x35/0x40
(XEN)    [<ffff82d0802e96a2>] vmac.c#hvm_process_io_intercept+0xd2/0x230
(XEN)    [<ffff82d0802e9842>] vmac.c#hvm_io_intercept+0x22/0x50
(XEN)    [<ffff82d0802dbe9b>] emulate.c#hvmemul_do_io+0x21b/0x3c0
(XEN)    [<ffff82d0802db302>] emulate.c#hvmemul_do_io_buffer+0x32/0x70
(XEN)    [<ffff82d0802dcd29>] emulate.c#hvmemul_do_mmio_buffer+0x29/0x30
(XEN)    [<ffff82d0802dcc19>] emulate.c#hvmemul_phys_mmio_access+0xf9/0x1b0
(XEN)    [<ffff82d0802dc6d0>] emulate.c#hvmemul_linear_mmio_access+0xf0/0x180
(XEN)    [<ffff82d0802de971>] emulate.c#hvmemul_linear_mmio_write+0x21/0x30
(XEN)    [<ffff82d0802de742>] emulate.c#linear_write+0xa2/0x100
(XEN)    [<ffff82d0802dce15>] emulate.c#hvmemul_write+0xb5/0x120
(XEN)    [<ffff82d0802babba>] vmac.c#x86_emulate+0x132aa/0x149a0
(XEN)    [<ffff82d0802c04f9>] vmac.c#x86_emulate_wrapper+0x29/0x70
(XEN)    [<ffff82d0802db570>] emulate.c#_hvm_emulate_one+0x50/0x140
(XEN)    [<ffff82d0802e9e31>] vmac.c#hvm_emulate_one_insn+0x41/0x100
(XEN)    [<ffff82d080345066>] guest_4.o#sh_page_fault__guest_4+0x976/0xd30
(XEN)    [<ffff82d08030cc69>] vmac.c#vmx_vmexit_handler+0x949/0xea0
(XEN)    [<ffff82d08031411a>] vmac.c#vmx_asm_vmexit_handler+0xfa/0x270

In order to solve it move the vioapic_hwdom_map_gsi outside of the
locked region in vioapic_write_redirent. vioapic_hwdom_map_gsi will
not access any of the vioapic fields, so there's no need to call the
function holding the hvm.irq_lock.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/arm: Implement workaround for Cortex-A76 erratum 1165522
Julien Grall [Mon, 28 Jan 2019 11:50:25 +0000 (11:50 +0000)]
xen/arm: Implement workaround for Cortex-A76 erratum 1165522

Early version of Cortex-A76 can end-up with corrupt TLBs if they
speculate an AT instruction while the S1/S2 system registers are in an
inconsistent state.

This can happen during guest context switch and when invalidating the
TLBs for other than the current VMID.

The workaround implemented in Xen will:
    - Use an empty stage-2 with a reserved VMID while context switching
    between 2 guests
    - Use an empty stage-2 with the VMID where TLBs need to be flushed

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Only use isb() when it is necessary
Julien Grall [Mon, 28 Jan 2019 11:50:24 +0000 (11:50 +0000)]
xen/arm: p2m: Only use isb() when it is necessary

The EL1 translation regime is out-of-context when running at EL2. This
means the processor cannot speculate memory accesses using the registers
associated to that regime.

An isb() is only needed if Xen is going to use the translation regime
before returning to the guest (exception returns will synchronize the
context).

Remove unnecessary isb() and document the ones left.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: domain_build: Don't switch to the guest P2M when copying data
Julien Grall [Mon, 28 Jan 2019 11:50:23 +0000 (11:50 +0000)]
xen/arm: domain_build: Don't switch to the guest P2M when copying data

Until recently, kernel/initrd/dtb were loaded using guest VA and
therefore requiring to restore temporarily the P2M. This was reworked
in a series of commits (up to 9292086 "xen/arm: domain_build: Use
copy_to_guest_phys_flush_dcache in dtb_load") to use a guest PA.

This will also help a follow-up patch which will require
p2m_{save,restore}_state to work in pair to workaround an erratum.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Introduce an helper to allocate the root page-table
Julien Grall [Mon, 28 Jan 2019 11:50:22 +0000 (11:50 +0000)]
xen/arm: p2m: Introduce an helper to allocate the root page-table

A follow-up patch will require to allocate the root page-table without
having a domain in hand.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: p2m: Provide an helper to generate the VTTBR
Julien Grall [Mon, 28 Jan 2019 11:50:21 +0000 (11:50 +0000)]
xen/arm: p2m: Provide an helper to generate the VTTBR

A follow-up patch will need to generate the VTTBR in a few places.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agoxen/arm: Only set necessary flags when initializing HCR_EL2
Julien Grall [Mon, 28 Jan 2019 11:50:20 +0000 (11:50 +0000)]
xen/arm: Only set necessary flags when initializing HCR_EL2

Only {A,F,I}MO are necessary to receive interrupts until a guest vCPU is
loaded.

The rest have no effect on Xen and it is better to avoid setting them.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
6 years agox86/AMD: flush TLB after ucode update
Jan Beulich [Mon, 28 Jan 2019 16:40:39 +0000 (17:40 +0100)]
x86/AMD: flush TLB after ucode update

The increased number of messages (spec_ctrl.c:print_details()) within a
certain time window made me notice some slowness of boot time screen
output. Experimentally I've narrowed the time window to be from
immediately after the early ucode update on the BSP to the PAT write in
cpu_init(), which upon further investigation has an effect because of
the full TLB flush that's implied by that write.

For that reason, as a workaround, flush the TLB of the mapping of the
page that holds the blob. Note that flushing just a single page is
sufficient: As per verify_patch_size() patch size can't exceed 4k, and
the way xmalloc() works the blob can't be crossing a page boundary.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Brian Woods <brian.woods@amd.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/CPUID: block speculative out-of-bound accesses
Norbert Manthey [Mon, 28 Jan 2019 16:38:29 +0000 (17:38 +0100)]
x86/CPUID: block speculative out-of-bound accesses

During instruction emulation, the cpuid instruction is emulated with
data that is controlled by the guest. As speculation might pass bound
checks, we have to ensure that no out-of-bound loads are possible.

To not rely on the compiler to perform value propagation, instead of
using the array_index_nospec macro, we replace the variable with the
constant to be propagated instead.

This commit is part of the SpectreV1+L1TF mitigation patch series.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/hvm/hpet: block speculative out-of-bound accesses
Norbert Manthey [Mon, 28 Jan 2019 16:37:20 +0000 (17:37 +0100)]
x86/hvm/hpet: block speculative out-of-bound accesses

When interacting with hpet, read and write operations can be executed
during instruction emulation, where the guest controls the data that
is used. As it is hard to predict the number of instructions that are
executed speculatively, we prevent out-of-bound accesses by using the
array_index_nospec function for guest specified addresses that should
be used for hpet operations.

We introduce another macro that uses the ARRAY_SIZE macro to block
speculative accesses. For arrays that are statically accessed, this macro
can be used instead of the usual macro. Using this macro results in more
readable code, and allows to modify the way this case is handled in a
single place.

This commit is part of the SpectreV1+L1TF mitigation patch series.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs: Fix dm_restrict documentation
George Dunlap [Thu, 24 Jan 2019 17:48:27 +0000 (17:48 +0000)]
docs: Fix dm_restrict documentation

Remove "chatty" and redundant information from the xl man page;
restrict it to functional descriptions only, and point instead to
qemu-depriv.pandoc and SUPPORT.md as locations for "canonical"
information.

Add a man page entry for device_model_user.

Update qemu-deprivilege.pandoc:

Changes in missing feature list:
- Migration is functional
- But qdisk backends are not

Add a missing restriction list.

The following statements from the man page are dropped:
- Mentioning PV; PV guests never have a device model.
- Drop the confusing statement about stdvga and cirrus vga options.
- Re-used domain IDs are now handled.
- Device models should no longer be able to create world-readable
  files on dom0's filesystem.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoiommu: fix order of arguments in iommu_map call at iommu_hwdom_init
Roger Pau Monné [Fri, 25 Jan 2019 08:49:50 +0000 (09:49 +0100)]
iommu: fix order of arguments in iommu_map call at iommu_hwdom_init

The order of the page_order and the flags parameters are inverted in
the call to iommu_map made in iommu_hwdom_init.

Fixes: e8afe1124cc1 ("iommu: elide flushing for higher order map/unmap operations")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoamd/iommu: fix present bit checking when clearing PTE
Roger Pau Monné [Fri, 25 Jan 2019 08:48:38 +0000 (09:48 +0100)]
amd/iommu: fix present bit checking when clearing PTE

The current check for the present bit is wrong, since the present bit
is located in the low part of the entry.

Fixes: e8afe1124cc1 ("iommu: elide flushing for higher order map/unmap operations")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Brian Woods <brian.woods@amd.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/sched: Introduce domain_vcpu() helper
Andrew Cooper [Thu, 24 Jul 2014 10:06:39 +0000 (11:06 +0100)]
xen/sched: Introduce domain_vcpu() helper

The progression of multi-vcpu support in Xen (originally a single pointer,
then an embedded d->vcpu[] array, then a dynamically allocated array) has
resulted in a large quantity of ad-hoc code for looking a vcpu up by id, and a
large number of ways that the toolstack can cause Xen to trip over a NULL
pointer.  Some of this has been addressed in Xen 4.12, and work is ongoing.

Another property of looking a vcpu up by id is that it is frequently done in
unprivileged hypercall context, making it an attractive target for speculative
sidechannel attacks.

Introduce a helper to do the lookup correctly, and without speculative
interference.  For performance reasons, it is useful not to have an smp_rmb()
in this helper on ARM, and luckily this is safe to do, because of the
serialisation offered by the global domlist lock.

As a minor change noticed when checking the safety of this construct, sanity
check during boot that idle->max_vcpus is a suitable upper bound for
idle->vcpu[].

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/pvh-dom0: Remove unnecessary function pointer call from modify_identity_mmio()
Andrew Cooper [Fri, 21 Dec 2018 17:23:32 +0000 (17:23 +0000)]
x86/pvh-dom0: Remove unnecessary function pointer call from modify_identity_mmio()

Function pointer calls are far more expensive in a post-Spectre world, and
this one doesn't need to be.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/dom0: Add a dom0-iommu=none option
Andrew Cooper [Fri, 7 Dec 2018 13:43:27 +0000 (13:43 +0000)]
xen/dom0: Add a dom0-iommu=none option

For development purposes, it is very convenient to boot Xen as a PVH guest,
with an XTF PV or PVH "dom0".  The edit-compile-go cycle is a matter of
seconds, and you can reasonably insert printk() debugging in places which
which would be completely infeasible when booting fully-fledged guests.

However, the PVH dom0 path insists on having a working IOMMU, which doesn't
exist when virtualised as a PVH guest, and isn't necessary for XTF anyway.

Introduce a developer mode to skip the IOMMU requirement.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/dom0: Deprecate iommu_hwdom_inclusive and leave it disabled by default
Andrew Cooper [Mon, 31 Dec 2018 14:06:52 +0000 (14:06 +0000)]
xen/dom0: Deprecate iommu_hwdom_inclusive and leave it disabled by default

This option is unique to x86 PV dom0's, but it is not sensible to have a
catch-all which blindly maps all non-RAM regions into the IOMMU.

The map-reserved option remains, and covers all the buggy firmware issues that
I am aware of.  The two common cases are legacy USB keyboard emulation, and
the BMC mailbox used by vendor firmware in NICs/HBAs to report information
back to the iLO/iDRAC/etc for remote remote management purposes.

A specific advantage of this change is that x86 dom0's IOMMU setup is now
consistent between PV and PVH.

This change is not expected to have any impact, due to map-reserved remaining.
In the unlikely case that it does cause an issue, we should introduce other
map-$SPECIFIC options rather than re-introducing this catch-all.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agodocs: Improve documentation and parsing for efi=
Andrew Cooper [Mon, 10 Dec 2018 21:29:10 +0000 (21:29 +0000)]
docs: Improve documentation and parsing for efi=

Update parse_efi_param() to use parse_boolean() for "rs", so it behaves
like other Xen booleans.

However, change "attr=uc" to not be a boolean.  "no-attr=uc" is ambiguous and
shouldn't be accepted, but accept "attr=no" as an acceptable alternative.

Update the command line documentation for consistency.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen/arm: gic: Make sure the number of interrupt lines is valid before using it
Julien Grall [Fri, 30 Nov 2018 17:15:33 +0000 (17:15 +0000)]
xen/arm: gic: Make sure the number of interrupt lines is valid before using it

GICv2 and GICv3 supports up to 1020 interrupts. However, the value computed
from GICD_TYPER.ITLinesNumber can be up to 1024. On GICv3, we will end up to
write in reserved registers that are right after the IROUTERs one as the
value is not capped early enough.

Cap the number of interrupts as soon as we compute it so we know we can
safely using it afterwards.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reported-by: Jan-Peter Larsson <Jan-Peter.Larsson@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-Acked-by: Juergen Gross <jgross@suse.com>
6 years agoarm/p2m: call iommu iotlb flush if iommu exists and enabled
Andrii Anisov [Wed, 23 Jan 2019 12:50:07 +0000 (14:50 +0200)]
arm/p2m: call iommu iotlb flush if iommu exists and enabled

Taking decision by `need_iommu_pt_sync()` make us never kicking
`iommu_iotlb_flush()` for IOMMUs which do share P2M with CPU.
So check `has_iommu_pt()` instead.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Paul Durant <paul.durrant@citrix.com>
Release-Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
6 years agodocs: Fix all links to Xen man pages in html
Anthony PERARD [Wed, 16 Jan 2019 16:16:56 +0000 (16:16 +0000)]
docs: Fix all links to Xen man pages in html

Second try, this time also works for all links to xen-vbd-interface(7).

We don't try anymore to have pod2html generate relative links, instead
we do it ourself.

First, we modify all links to man pages to have what looks like an
absolute URL and pod2html will just write it in the html output.
Absolute URL in POD are in the form L<text|scheme:...> so let's just use
a scheme that isn't real, but easy to find in the resulting html output:
"relative:".

Then we fix the output and remove all the bogus scheme "relative" and
can end up with nice relative links.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoman: Highlight reference in xl-disk-configuration(5)
Anthony PERARD [Wed, 16 Jan 2019 16:16:57 +0000 (16:16 +0000)]
man: Highlight reference in xl-disk-configuration(5)

Provide a better way to see the link to a different manpage, with simple
words.

Suggested-by: Ian Jackson <ian.jackson@citrix.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/dom0: Improve dom0= useability
Andrew Cooper [Fri, 7 Dec 2018 13:43:27 +0000 (13:43 +0000)]
x86/dom0: Improve dom0= useability

Having a pvh boolean isn't ideal.  If we gain a 3rd virtulsation mode,
what does `dom0=no-pvh` mean?

Change the syntax to be "dom0 = pv | pvh" which offers an option to more
obviously select PV mode.  Hide both options behind the relevent
CONFIG_* settings, and default to PVH mode when CONFIG_PV is compiled
out.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs: Improve documentation and parsing for pci=
Andrew Cooper [Thu, 27 Dec 2018 18:40:19 +0000 (18:40 +0000)]
docs: Improve documentation and parsing for pci=

Alter parse_pci_param() to use parse_boolean(), so the sub options
behave like other Xen booleans.

Update the command line documentation for consistency.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs: Improve documentation and parsing for iommu=
Andrew Cooper [Thu, 27 Dec 2018 18:40:19 +0000 (18:40 +0000)]
docs: Improve documentation and parsing for iommu=

Update parse_iommu_param() to uniformly use parse_boolean(), so the sub
booleans behave like other Xen boolean options.  Reposition the
custom_param() to avoid a forward declaration of parse_iommu_param().

Rewrite the command line documentation almost from scratch, including
far more detail.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs: Improve documentation for dom0= and dom0-iommu=
Andrew Cooper [Fri, 7 Dec 2018 13:43:27 +0000 (13:43 +0000)]
docs: Improve documentation for dom0= and dom0-iommu=

Update to the latest metadata style, and discuss the options more
completely where appropriate.

Drop the redundant comment beside parse_dom0_param() - it is already out
of sync with the main documentation.  Also drop the individual
documentation for deprecated options which refer to their newer
versions, for the same reason.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agomaintainers: always use hard tabs
Roger Pau Monné [Mon, 21 Jan 2019 11:14:29 +0000 (12:14 +0100)]
maintainers: always use hard tabs

As that seems to be the prevailing style.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
6 years agox86/vm_event: block interrupt injection for sync vm_events
Razvan Cojocaru [Mon, 21 Jan 2019 11:13:22 +0000 (12:13 +0100)]
x86/vm_event: block interrupt injection for sync vm_events

Block interrupts (in vmx_intr_assist()) for the duration of
processing a sync vm_event (similarly to the strategy
currently used for single-stepping). Otherwise, attempting
to emulate an instruction when requested by a vm_event
reply may legitimately need to call e.g.
hvm_inject_page_fault(), which then overwrites the active
interrupt in the VMCS.

The sync vm_event handling path on x86/VMX is (roughly):
monitor_traps() -> process vm_event -> vmx_intr_assist()
(possibly writing VM_ENTRY_INTR_INFO) ->
hvm_vm_event_do_resume() -> hvm_emulate_one_vm_event()
(possibly overwriting the VM_ENTRY_INTR_INFO value).

This patch may also be helpful for the future removal
of may_defer in hvm_set_cr{0,3,4} and hvm_set_msr().

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibxl: fix error message for unsharing namespaces
Wei Liu [Fri, 18 Jan 2019 12:47:45 +0000 (12:47 +0000)]
libxl: fix error message for unsharing namespaces

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibxl: fix build (missing CLONE_NEWIPC) on astonishingly old systems
Ian Jackson [Mon, 14 Jan 2019 14:59:37 +0000 (14:59 +0000)]
libxl: fix build (missing CLONE_NEWIPC) on astonishingly old systems

CLONE_NEWIPC was introduced in Linux 2.6.19, on the 29th of November
2006, which was 12 years, 1 month, and 14 days ago.

Nevertheless apparently some people are trying to build Xen on systems
whose kernel headers are that old.  Placate these people by providing
a fallback #define for CLONE_NEWIPC.

The actual binary value will of course remain constant, because of the
kernel API promise, so this is and will be correct on all platforms
where the CLONE_NEWIPC is supported.  (Even if for some reason we miss
the right #includes.)

Of course at runtime this value will not work on older kernels.  It
will be rejected as unknown by anything except some pre-2.6.18
kernels.  On those kernels we do not want to support dm_restrict, and
an attempt to use it will fail.  It is OK for the failure to be a
messy EINVAL syscall failure.  (The IPC namespace unshare is necessary
to avoid a suborned deprivileged qemu from causing trouble with shm,
sem, etc.)

On the very old kernels, the feature is totally out of scope.
(We are only interested, here, in making the build work, to avoid
blocking people who aren't using this feature.)

CC: Wei Liu <wei.liu2@citrix.com>
CC: Juergen Gross <jgross@suse.com>
CC: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoRevert "libxl: fix build on rather old systems"
Ian Jackson [Mon, 14 Jan 2019 14:59:36 +0000 (14:59 +0000)]
Revert "libxl: fix build on rather old systems"

This reverts commit 1bce5f9baf0f4a4e50722f32b44afe4fdefc6b35.

This situation should be handled by disabling the dm restrict
feature, not silently falling back to lower protection.

Also this #ifdeffery is bad style.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs/features/qemu-deprivilege.pandoc: No support with Linux <2.6.18
Ian Jackson [Mon, 14 Jan 2019 14:59:35 +0000 (14:59 +0000)]
docs/features/qemu-deprivilege.pandoc: No support with Linux <2.6.18

Some early kernels are known not to reject unknown flags to
unshare().  There may be other problems.

CC: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoMerge tag '4.12.0-rc1' into staging
Ian Jackson [Wed, 16 Jan 2019 16:29:22 +0000 (16:29 +0000)]
Merge tag '4.12.0-rc1' into staging

Xen 4.12.0-rc1

6 years agoPrep for 4.12-rc1: Change external trees to refer to rc1 tags
Ian Jackson [Wed, 16 Jan 2019 16:13:49 +0000 (16:13 +0000)]
Prep for 4.12-rc1: Change external trees to refer to rc1 tags

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agoPrep for 4.12-rc1: Change versions from -unstable to -rc
Ian Jackson [Wed, 16 Jan 2019 16:13:21 +0000 (16:13 +0000)]
Prep for 4.12-rc1: Change versions from -unstable to -rc

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agotools: only call git when necessary in OVMF Makefile
Wei Liu [Tue, 15 Jan 2019 11:09:40 +0000 (11:09 +0000)]
tools: only call git when necessary in OVMF Makefile

Users may choose to export a snapshot of OVMF and build it
with xen.git supplied ovmf-makefile. In that case we don't
need to call `git submodule`.

Fixes b16281870e.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agodocs: Fix links in html generation of man pages
Anthony PERARD [Tue, 15 Jan 2019 15:48:37 +0000 (15:48 +0000)]
docs: Fix links in html generation of man pages

Currently, all links to other man pages are sent to
http://man.he.net/man$mansection/$manpage, but that site doesn't have
Xen man pages, so all links to other Xen man pages are broken.

In order to fix that, this is going to be a bit complex.

First, we need to teach pod2html on where other .pod files can be found,
otherwise it isn't going make any links to our pages. This is done with
--podpath.

Second, pod2html doesn't actually understand our format
"$manpage.$mansection.pod". But instead of teaching it (which is
probably impossible) we are going to modify our .pod files in order to
tell pod2html which file to look for. This is done with the sed command
by transforming for example: "L<xl.conf(5)>" to "L<xl.conf(5)|xl.conf.5>".

Last but not least, in order to have relative links to the other
generated man page, we are going against the rules, we are going to use
"--htmlroot=." so that pod2html doesn't prepand "/" to all "relative"
links. We are also going to `cd` into the "man" dir and set podpath to
"." so that pod2html is going to generate relative links to other pod
file in the form "./$man" insteadof "man/$man" or "../$man" with other
compination of options. The result of --podpath + --podroot can be check
in pod2html's cache file "pod2html.tmp".

All of this is going to generate links in the form "./$html_manpage".

But all of this doesn't work for xen-vbd-interface(7), because it's not
a pod file... maybe we could generate pod2html's cache (pod2html.tmp)
file to add en entry.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoman: Fix links in xl(1)
Anthony PERARD [Tue, 15 Jan 2019 15:48:36 +0000 (15:48 +0000)]
man: Fix links in xl(1)

All links to other manpages should contain the man section number.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agoxen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct
Andrew Cooper [Fri, 7 Dec 2018 13:43:27 +0000 (13:43 +0000)]
xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct

When the command line parsing was updated to use const strings and no longer
tokenise with NUL characters, string matches could no longer be made with
strcmp().

Unfortunately, the replacement was buggy.  strncmp(s, "opt", ss - s) matches
"o", "op" and "opt" on the command line, as ss - s may be shorter than the
passed literal.  Furthermore, parse_bool() is affected by this, so substrings
such as "d", "e" and "o" are considered valid, with the latter being ambiguous
between "on" and "off".

Introduce a new strcmp-like function for the task, which looks for exact
string matches, but declares success when the NUL of the literal matches a
comma, colon or semicolon in the command line fragment.

No change to the intended parsing functionality, but fixes cases where a
partial string on the command line will inadvertently trigger options.

A few areas were more than just a trivial change:

 * parse_irq_vector_map_param() gained some style corrections.
 * parse_vpmu_params() was rewritten to use the normal list-of-options form,
   rather than just fixing up parse_vpmu_param() and leaving the parsing being
   hard to follow.
 * Instead of making the trivial fix of adding an explicit length check in
   parse_bool(), use the length to select which token to we search for, which
   is more efficient than the previous linear search over all possible tokens.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibxl: get_reaper_lock_and_uid: Document fd handling
Ian Jackson [Wed, 2 Jan 2019 11:59:46 +0000 (11:59 +0000)]
libxl: get_reaper_lock_and_uid: Document fd handling

Coverity understandably complains that get_reaper_lock_and_uid leaks
the fd and hence open-file.  But this is intentional: the lock becomes
owned by the child process as a whole, which is entirely the property
of libxl.

(The coding style here in this subprocess is a bit anomalous but it's
probably not worth it to convert get_reaper_lock_and_uid to `goto out'
style and have it explicitly return the fd number.)

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
6 years agolibxl: fix build on rather old systems
Jan Beulich [Fri, 11 Jan 2019 10:09:35 +0000 (03:09 -0700)]
libxl: fix build on rather old systems

CLONE_NEWIPC has been introduced in Linux 2.6.19 only (and into glibc
at around that time as well). Cope with it being undefined as well as
with the underlying kernel not knowing of it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agolibxl: Add comments to libxl__json_*get* functions
Anthony PERARD [Thu, 3 Jan 2019 18:24:56 +0000 (18:24 +0000)]
libxl: Add comments to libxl__json_*get* functions

This comments that libxl__json_object_get_* and libxl__json_*_get
functions accept the libxl__json_object parameter to be NULL.

libxl__json_object_to_json also works with NULL.

This also move libxl__json_object_alloc declaration closer to similar
functions, and closer to libxl__json_object_free.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl_json: Remove libxl__json_object_append_to from header
Anthony PERARD [Fri, 4 Jan 2019 13:53:21 +0000 (13:53 +0000)]
libxl_json: Remove libxl__json_object_append_to from header

It isn't possible to use libxl__json_object_append_to() outside of
libxl_json.c as there is no way to allocate a struct libxl__yajl_ctx.
So also remove libxl__yajl_ctx typedef from the internal header.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl: Remove unused arg from libxl__sendmsg_fds
Anthony PERARD [Wed, 2 Jan 2019 15:55:44 +0000 (16:55 +0100)]
libxl: Remove unused arg from libxl__sendmsg_fds

Now that `datalen' needs to be 1, we can remove it. Also change `data'
parameter to be a singe byte.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl: Re-implement domain_suspend_device_model using libxl__ev_qmp
Anthony PERARD [Wed, 25 Jul 2018 15:16:32 +0000 (16:16 +0100)]
libxl: Re-implement domain_suspend_device_model using libxl__ev_qmp

The re-implementation is done because we want to be able to send the
file description that QEMU can use to save its state. When QEMU is
restricted, it would not be able to write to a path.

This replace both libxl__qmp_stop() and libxl__qmp_save().

qmp_qemu_check_version() was only used by libxl__qmp_save(), so it is
replace by a version using libxl__ev_qmp instead.

Coding style fixed in libxl__domain_suspend_device_model() for the
return value.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl: Change libxl__domain_suspend_device_model() to be async
Anthony PERARD [Wed, 25 Jul 2018 15:03:09 +0000 (16:03 +0100)]
libxl: Change libxl__domain_suspend_device_model() to be async

This create an extra step for the two call sites of the function.

libxl__domain_suspend_device_model() in this patch gets an extra error
variable (there is ret and rc), but ret goes away in the next patch.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
---
libxl_domain_soft_reset() haven't been tested, as it doesn't appear to
possible to call the function from xl.

6 years agolibxl_qmp: Store advertised QEMU version in libxl__ev_qmp
Anthony PERARD [Tue, 24 Jul 2018 17:26:33 +0000 (18:26 +0100)]
libxl_qmp: Store advertised QEMU version in libxl__ev_qmp

This will be used in a later patch.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl: QEMU startup sync based on QMP
Anthony PERARD [Thu, 31 May 2018 13:45:12 +0000 (14:45 +0100)]
libxl: QEMU startup sync based on QMP

This is only activated when dm_restrict=1, as explained in a previous
patch "libxl_dm: Pre-open QMP socket for QEMU"

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl: Add dmss_init/dispose for libxl__dm_spawn_state
Anthony PERARD [Thu, 22 Nov 2018 12:09:37 +0000 (12:09 +0000)]
libxl: Add dmss_init/dispose for libxl__dm_spawn_state

These two functions, dmss_init and dmss_dispose, need to be called to
initialise the private parts of a libxl__dm_spawn_state (dmss) as well
as dispose of them before giving back control to a caller.

There are 3 functions that can start using a dmss, the classic
libxl__spawn_local_dm, the one for stubdom libxl__spawn_stub_dm and
libxl__spawn_qdisk_backend. But there are only 2 exit path as
libxl__spawn_qdisk_backend is using libxl__spawn_local_dm functions.

These two new functions are empty but will be used shortly.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl_dm: Pre-open QMP socket for QEMU
Anthony PERARD [Thu, 31 May 2018 13:43:20 +0000 (14:43 +0100)]
libxl_dm: Pre-open QMP socket for QEMU

This patch moves the creation of the QMP unix socket from QEMU to libxl.
But libxl doesn't rely on this yet.

When starting QEMU with dm_restrict=1, pre-open the QMP socket before
exec QEMU. That socket will be useful to find out if QEMU is ready, and
pre-opening it means that libxl can connect to it without waiting for
QEMU to create it.

The pre-opening is conditional, based on the use of dm_restrict
because it is using a new command line option of QEMU, and dm_restrict
support in QEMU is newer.

-chardev socket,fd=X is available with QEMU 2.12, since commit:
> char: allow passing pre-opened socket file descriptor at startup
0935700f8544033ebbd41e1f13cd528f8a58d24d

dm_restrict is available in QEMU 3.0.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl: Add init/dispose of for libxl__domain_build_state
Anthony PERARD [Thu, 22 Nov 2018 18:10:45 +0000 (18:10 +0000)]
libxl: Add init/dispose of for libxl__domain_build_state

These two new functions libxl__domain_build_state_{init,dispose} should
be called every time a new libxl__domain_build_state comes to existance.

There seems to be two of them, one with the domain creation machinery,
and one in the stub_dm_spawn.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl_exec: Add libxl__spawn_initiate_failure
Anthony PERARD [Thu, 26 Jul 2018 16:12:52 +0000 (17:12 +0100)]
libxl_exec: Add libxl__spawn_initiate_failure

This function can be used by user of libxl__spawn_* when they setup a
notification other than xenstore. The parent can already report success
via libxl__spawn_initiate_detach(), this new function can be used for
failure instead of waiting for the timeout.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl_qmp: Implementation of libxl__ev_qmp_*
Anthony PERARD [Thu, 8 Nov 2018 17:38:19 +0000 (17:38 +0000)]
libxl_qmp: Implementation of libxl__ev_qmp_*

This patch implement the API libxl__ev_qmp documented in the previous
patch, "libxl: Design of an async API to issue QMP commands to QEMU".

Since this API is to interact with QEMU via the QMP protocol, it also
implement a QMP client. The specification for the QEMU Machine Protocol
(QMP) can be found in the QEMU repository at:
https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/interop/qmp-spec.txt

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ wei: fix build ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: Design of an async API to issue QMP commands to QEMU
Anthony PERARD [Tue, 3 Jul 2018 09:29:17 +0000 (10:29 +0100)]
libxl: Design of an async API to issue QMP commands to QEMU

All the functions will be implemented in later patches.

This patch includes the API that libxl can use to send QMP commands to
QEMU.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ wei: fix build ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agolibxl: Add wrapper around libxl__json_object_to_json JSON
Anthony PERARD [Thu, 22 Nov 2018 18:38:39 +0000 (18:38 +0000)]
libxl: Add wrapper around libxl__json_object_to_json JSON

That wrapper is going to be used to safely log a json_object, as
libxl__json_object_to_json return NULL on error. In the error case,
JSON() will return an invalid json string.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl_qmp: Change qmp_qemu_check_version to compare version
Anthony PERARD [Fri, 9 Nov 2018 17:45:54 +0000 (17:45 +0000)]
libxl_qmp: Change qmp_qemu_check_version to compare version

This patch makes the function simpler to read. It also add the ability
for a caller to tell if QEMU is newer or have the exact version.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl_qmp: Separate QMP message generation from qmp_send_prepare
Anthony PERARD [Fri, 25 May 2018 16:00:28 +0000 (17:00 +0100)]
libxl_qmp: Separate QMP message generation from qmp_send_prepare

.. to be able to re-use qmp_prepare_cmd with libxl__ev_qmp.

This patch also add the QMP end of command '\r\n' into the generated
string as every caller will needs this.

There should be no functional change.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
6 years agolibxl: Enhance libxl__sendmsg_fds to deal with EINTR and EWOULDBLOCK
Anthony PERARD [Wed, 31 Oct 2018 16:31:49 +0000 (16:31 +0000)]
libxl: Enhance libxl__sendmsg_fds to deal with EINTR and EWOULDBLOCK

This patch change the behavior of libxl__sendmsg_fds to retry sendmsg on
EINTR error and return an error on short writes.

This patch allow a caller of libxl__sendmsg_fds to deal with EWOULDBLOCK
and short writes. The function now requires to send only 1 byte of data
so that when dealing with non-blocking fds a EWOULDBLOCK error would
mean that the fds haven't been sent yet. Current caller already send
only 1 byte.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ wei: fix build ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
6 years agotmem: default to off
Jan Beulich [Fri, 11 Jan 2019 11:30:29 +0000 (12:30 +0100)]
tmem: default to off

As a short term alternative to deleting the code, default its building
to off (overridable in EXPERT mode only). Additionally make sure other
related baggage (LZO code) won't be carried when the option is off (with
TMEM scheduled to be deleted anyway, I didn't want to introduce a
separate Kconfig option to control the LZO compression code, and hence
CONFIG_TMEM is used directly there). Similarly I couldn't be bothered to
add actual content to the command line option doc for the two affected
options.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
6 years agox86/p2m: fix p2m_finish_type_change()
Razvan Cojocaru [Fri, 11 Jan 2019 11:28:49 +0000 (12:28 +0100)]
x86/p2m: fix p2m_finish_type_change()

finish_type_change() returns a negative int on error, but the
current code checks if ( !rc ). We also need to treat
finish_type_change()'s return codes cumulatively in the
success case (don't overwrite a 1 returned while processing
the hostp2m if processing an altp2m returns 0).

The breakage was introduced by commit 0fb4b58c8b
("x86/altp2m: fix display frozen when switching to a new view
early").

Properly indent the out: label while at it.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>