Roger Pau Monné [Fri, 22 Jan 2016 15:18:29 +0000 (16:18 +0100)]
x86/PV: allow PV guests to have an emulated PIT
This fixes the fallout from the HVMlite series, that removed the emulated
PIT from PV(H) guests. Also, this patch forces the hardware domain to
always have an emulated PIT, regardless of whether the toolstack specified
one or not.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Malcolm Crossley [Fri, 22 Jan 2016 15:17:13 +0000 (16:17 +0100)]
p2m: convert p2m rwlock to percpu rwlock
The per domain p2m read lock suffers from significant contention when
performance multi-queue block or network IO due to the parallel
grant map/unmaps/copies occuring on the DomU's p2m.
On multi-socket systems, the contention results in the locked compare swap
operation failing frequently which results in a tight loop of retries of the
compare swap operation. As the coherency fabric can only support a specific
rate of compare swap operations for a particular data location then taking
the read lock itself becomes a bottleneck for p2m operations.
Percpu rwlock p2m performance with the same configuration is approximately
64 gbit/s vs the 48 gbit/s with grant table percpu rwlocks only.
Oprofile was used to determine the initial overhead of the read-write locks
and to confirm the overhead was dramatically reduced by the percpu rwlocks.
Note: altp2m users will not achieve a gain if they take an altp2m read lock
simultaneously with the main p2m lock.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Malcolm Crossley [Fri, 22 Jan 2016 15:16:05 +0000 (16:16 +0100)]
grant_table: convert grant table rwlock to percpu rwlock
The per domain grant table read lock suffers from significant contention when
performance multi-queue block or network IO due to the parallel
grant map/unmaps/copies occurring on the DomU's grant table.
On multi-socket systems, the contention results in the locked compare swap
operation failing frequently which results in a tight loop of retries of the
compare swap operation. As the coherency fabric can only support a specific
rate of compare swap operations for a particular data location then taking
the read lock itself becomes a bottleneck for grant operations.
Standard rwlock performance of a single VIF VM-VM transfer with 16 queues
configured was limited to approximately 15 gbit/s on a 2 socket Haswell-EP
host.
Percpu rwlock performance with the same configuration is approximately
48 gbit/s.
Oprofile was used to determine the initial overhead of the read-write locks
and to confirm the overhead was dramatically reduced by the percpu rwlocks.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Per-cpu read-write locks allow for the fast path read case to have
low overhead by only setting/clearing a per-cpu variable for using
the read lock. The per-cpu read fast path also avoids locked
compare swap operations which can be particularly slow on coherent
multi-socket systems, particularly if there is heavy usage of the
read lock itself.
The per-cpu reader-writer lock uses a local variable to control
the read lock fast path. This allows a writer to disable the fast
path and ensures the readers switch to using the underlying
read-write lock implementation instead of the per-cpu variable.
Once the writer has taken the write lock and disabled the fast path,
it must poll the per-cpu variable for all CPU's which have entered
the critical section for the specific read-write lock the writer is
attempting to take. This design allows for a single per-cpu variable
to be used for read/write locks belonging to seperate data structures.
If a two or more different per-cpu read lock(s) are taken
simultaneously then the per-cpu data structure is not used and the
implementation takes the read lock of the underlying read-write lock,
this behaviour is equivalent to the slow path in terms of performance.
The per-cpu rwlock is not recursion safe for taking the per-cpu
read lock because there is no recursion count variable, this is
functionally equivalent to standard spin locks.
Slow path readers which are unblocked, set the per-cpu variable and
drop the read lock. This simplifies the implementation and allows
for fairness in the underlying read-write lock to be taken
advantage of.
There is more overhead on the per-cpu write lock path due to checking
each CPUs fast path per-cpu variable but this overhead is likely be
hidden by the required delay of waiting for readers to exit the
critical section. The loop is optimised to only iterate over
the per-cpu data of active readers of the rwlock. The cpumask_t for
tracking the active readers is stored in a single per-cpu data
location and thus the write lock is not pre-emption safe. Therefore
the per-cpu write lock can only be used with interrupts disabled.
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Tue, 22 Sep 2015 14:16:05 +0000 (15:16 +0100)]
tools: Update CFLAGS for qemu-xen to allow it to use new libraries
This means adding -L for libxen{evtchn,gnttab,foreignmemory} so that
it can link them directly (rather than using the libxenctrl compat
layer exposed via -rpath-link). Also add -I for libxenforeignmemory.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Wed, 2 Dec 2015 16:21:41 +0000 (16:21 +0000)]
tools/libs/*: Use O_CLOEXEC on Linux and FreeBSD
In some cases this replaces an FD_CLOEXEC dance, in others it is new.
Linux has had O_CLOEXEC since 2.6.23 (October 2007), so we can rely on
it from Xen 4.7 I think. Some libc headers may still lack the
definition, so we take care of that if need be by defining to 0 (on
the premise that such an old glibc might barf on O_CLOEXEC even if the
kernel may or may not be so old).
All stable versions of FreeBSD support O_CLOEXEC (10.2, 9.3 and 8.4),
and we assume the libc there does too.
Remove various comments about having to take responsibility for this
(since really it is just hygiene, politeness, not a requirement) and
the reasons for using O_CLOEXEC seem pretty straightforward.
Backends for other OSes are untouched.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Roger.Pau@citrix.com Cc: jbeulich@suse.com
Ian Campbell [Fri, 11 Dec 2015 17:31:26 +0000 (17:31 +0000)]
tools/libs/{call,evtchn}: Document requirements around forking.
Much like for gnttab and foreignmemory xencall hypercall buffers need
care.
Evtchn is a bit simpler (no magic mappings) but may not work from
parent + child simultaneously, document "parent only" since it is
consistent with the others.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Fri, 27 Nov 2015 12:08:32 +0000 (12:08 +0000)]
tools/libs/call: Describe return values and error semantics for xencall*
This behaviour has been confirmed by inspection on:
- Linux
- NetBSD & FreeBSD (NB: hcall->retval is the hypercall return value
only for values >= 0. For negative values the underlying privcmd
driver translates the value from Xen to {Net,Free}BSD errno space
and returns it as the result of the ioctl, which becomes
ret=-1/errno=EFOO in userspace)
- MiniOS (which takes care of errno in this library)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Roger Pau Monné <roger.pau@citrix.com>
Ian Campbell [Tue, 22 Sep 2015 11:12:55 +0000 (12:12 +0100)]
tools/libs/gnttab: Extensive updates to API documentation.
In particular around error handling, behaviour on fork and the unmap
notification mechanism.
Behaviour of xengnttab_map_*grant_refs and xengntshr_share_pages on
partial failure has been confirmed/inferred (by inspection) on Linux
and Mini-os (the only two known implementations. Likewise the
behaviour of the notification mechanism has been confirmed/inferred
(by inspection) of the Linux implementation (currently the only
implementation) and libvchan (primary known user).
These updates are not folded into "tools: Refactor
/dev/xen/gnt{dev,shr} wrappers into libxengnttab." to try and reduce
the amount of non-movement changes in that patch.
While I'm not convinced by javadoc/doxygen cause the existing comments
which appear to use that syntax to have the appropriate /** marker.
Also fix a typo in a code comment.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Ian Campbell [Mon, 30 Nov 2015 10:32:28 +0000 (10:32 +0000)]
tools/libs/foreignmemory: pull array length argument to map forward
By having the "num" argument before the page and error arrays we can
potentially use a variable-length-array argument ("int pages[num]") in
the function prototype.
However VLAs are a C99 feature and we are currently targetting C89 and
later, so we don't actually make use of this here, merely arrange that
we can switch to VLAs in the future without changing the function ABI.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Tue, 28 Jul 2015 13:20:01 +0000 (14:20 +0100)]
tools/libs/foreignmemory: provide xenforeignmemory_unmap.
And require it be used instead of direct munmap.
This will allow e.g. Valgrind hooks to help track incorrect use of
foreign mappings.
Switch all uses of xenforeignmemory_map to use
xenforeignmemory_unmap, not that foreign mappings via the libxc compat
xc_map_foreign_* interface will not take advantage of this and will
need converting.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Thu, 18 Jun 2015 15:30:19 +0000 (16:30 +0100)]
tools: Refactor foreign memory mapping into libxenforeignmemory
libxenforeignmemory will provide a stable API and ABI for mapping
foreign domain memory (subject to appropriate privileges).
The new library exposes an interface equivalent to
xc_map_foreign_memory_bulk, which all the other
xc_map_foreign_memory_* functions (which remain in libxc) are
implemented in terms of.
Upon request (via #define XC_WANT_COMPAT_MAP_FOREIGN_API) libxenctrl
will provide a compat API for the old names. This is used by qemu-xen
and qemu-trad as well as various in tree things (which required
de-dupping various #includes in some too to get the #define before the
first).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- updated MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION ]
Ian Campbell [Thu, 18 Jun 2015 10:19:09 +0000 (11:19 +0100)]
tools: Implement xc_map_foreign_range(s) in terms of common helper
Both Linux and FreeBSD already implemented these functions using
identical helpers based on xc_map_foreign_pages. Make one copy of
these common helpers and switch all OSes to use them, even those which
previously had a specific lower level implementation of this
functionality.
This is makes two fewer low level interfaces to think about.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Thu, 18 Jun 2015 09:52:30 +0000 (10:52 +0100)]
tools: Remove xc_map_foreign_batch
It can trivially be replaced by xc_map_foreign_pages which is the
interface I want to move to going forward (by standardising on _bulk
but handling err=NULL as _pages does).
The callers of _batch are checking a mixture of a NULL return or
looking to see if the top nibble of the (usually sole) mfn they pass
has been modified to be non-zero to detect errors. _pages never
modifies the mfn it was given (it's const) and returns NULL on
failure, so adjust the error handling where necessary. Some callers
use a copy of the mfn array, for reuse on failure with _batch, which
is no longer necessary as _pages doesn't modify the array, however I
haven't cleaned that up here.
This reduces the twist maze of xc_map_foreign_* by one, which will be
useful when trying to come up with an underlying stable interface.
NetBSD and Solaris implemented xc_map_foreign_bulk in terms of
xc_map_foreign_batch via a compat layer, so xc_map_foreign_batch
becomes an internal osdep for them.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com>
Ian Campbell [Thu, 18 Jun 2015 09:35:06 +0000 (10:35 +0100)]
tools/libxc: drop xc_map_foreign_bulk_compat wrappers
On Solaris and NetBSD xc_map_foreign_bulk is implemented by calling
xc_map_foreign_bulk_compat and xc_map_foreign_bulk_compat is exposed
as a symbol by libxenctrl.so.
Remove these wrappers and turn the compat function into the real thing
surrounded by the appropriate ifdef.
As this is a compat function all new ports should instead implement
xc_map_foreign_bulk properly, hence the ifdef should never be
expanded.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Mon, 1 Jun 2015 15:20:09 +0000 (16:20 +0100)]
tools: Refactor hypercall calling wrappers into libxencall.
libxencall will provide a stable API and ABI for calling hypercalls
(although those hypercalls themselves may not have a stable API). As
well as the hypercall buffer infrastructure needed in order to safely
provide pointer arguments to hypercalls.
libxenctrl encapsulates a instance of this interface, so users of that
library are not currently subjected to any actual changes. However all
hypercalls made internally by libxc now use the correct interface. It
is expected that most users of this library will be other libraries
providing a higher level interface, rather than applications directly.
Only the basic functionality to allocate hypercall safe memory is
moved, the type safe stuff and bounce buffers remain in libxc.
Note that the functionality to map foreign pages using privcmd is not
yet moved, meaning that an xc_interface will now contain two open
privcmd file descriptors. Foreign memory mapping is logically separate
functionality and will be moved into its own library.
The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- updated MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION ]
Ian Campbell [Thu, 11 Jun 2015 16:39:00 +0000 (17:39 +0100)]
tools/libxc: Remove osdep indirection for privcmd
The alternative backend (a xen-api/xapi shim) is no longer around and
so this stuff is now just baggage which is getting in the way of
refactoring libxenctrl.
Nested virt probably suffices for this use case now.
This was the last component of the osdep infrastructure, so all the
dynamic loading etc stuff all falls away too.
As part of this I was forced to investigate the twisty
xc_map_foreign_* maze, which I have added to the
toolstack-library-apis doc in the hopes of doing something sensible.
NetBSD and Solaris now call xc_map_foreign_bulk_compat directly from
their xc_map_foreign_bulk, which could have been achieved by using
some ifdefs around a renamed function. This will fall out in the wash
when these functions move to their own library.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: David Scott <dave.scott@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Mon, 1 Jun 2015 15:20:09 +0000 (16:20 +0100)]
tools: Refactor /dev/xen/gnt{dev,shr} wrappers into libxengnttab.
libxengnttab will provide a stable API and ABI for accessing the
grant table devices.
The functions are moved into the xengnt{tab,shr} namespace to make a
clean break from libxc and avoid ambiguity regarding which interfaces
are stable.
All in-tree users are updated to use the new names.
Upon request (via #define XC_WANT_COMPAT_GNTTAB_API) libxenctrl will
provide a compat API for the old names. This is used by qemu-xen for
the time being. qemu-xen-traditional is updated in lockstep.
This leaves a few grant table related functions which go via privcmd
(GNTTABOP) rather than ioctls on the /dev/xen/gnt* devices in
libxenctrl. Specifically:
These functions do not appear to be needed by qemu-dm, qemu-pv
(provision of device model to HVM guests and PV backends respectively)
or by libvchan suggesting they are not needed by non-toolstack uses of
event channels.
The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.
After this change libxenvchan no longer needs to link against
libxenctrl. It still needs xenctrl.h in one file for xen_mb and
friends.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- updated MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION ]
Ian Campbell [Thu, 11 Jun 2015 16:39:00 +0000 (17:39 +0100)]
tools/libxc: Remove osdep indirection for xc_gnt{shr,tab}
The alternative backend (a xen-api/xapi shim) is no longer around and
so this stuff is now just baggage which is getting in the way of
refactoring libxenctrl.
Nested virt probably suffices for this use case now.
It is now necessary to provide explicit versions of things for
platforms which do not implement this functionality, since the osdep
dispatcher cannot fulfil this need any more. These are provided by
appropriate xc_nognt???.c files which are compiled and linked on the
appropriate platforms. In them open and close return failure and
everything else aborts, since if open fails they should never be
called.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Mon, 1 Jun 2015 15:20:09 +0000 (16:20 +0100)]
tools: Refactor /dev/xen/evtchn wrappers into libxenevtchn.
libxenevtchn will provide a stable API and ABI for accessing the
evtchn device.
The functions are moved into the xenevtchn namespace to make a clean
break from libxc and avoid ambiguity regarding which interfaces are
stable.
All in-tree users are updated to use the new names.
Upon request (via #define XC_WANT_COMPAT_EVTCHN_API) libxenctrl will
provide a compat API for the old names. This is used by qemu-xen for
the time being. qemu-xen-traditional is updated in lockstep.
This leaves a few event channel related functions which go via privcmd
(EVTCHNOP) rather than ioctls on the /dev/xen/evtchn device in
libxenctrl. Specifically:
Note that xc_evtchn_alloc_unbound's functionality is also provided by
xenevtchn_bind_unbound_port() (née xc_evtchn_bind_unbound_port) and is
probably redundant.
These functions do not appear to be needed by qemu-dm, qemu-pv
(provision of device model to HVM guests and PV backends respectively)
or by libvchan suggesting they are not needed by non-toolstack uses of
event channels. QEMU does use these in hw/xenpv/xen_domainbuild.c but
that is a "toolstack use".
The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- updated MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION ]
Ian Campbell [Tue, 9 Jun 2015 12:54:09 +0000 (13:54 +0100)]
tools/libxc: Remove osdep indirection for xc_evtchn
The alternative backend (a xen-api/xapi shim) is no longer around and
so this stuff is now just baggage which is getting in the way of
refactoring libxenctrl.
Note that the intention is to move this into a separate library
shortly.
Nested virt probably suffices for this use case now.
One incorrect instance of using xc_interface where xc_evtchn (in ocaml
stubs) is removed, this used to work because they were typedefs to the
same struct, but is no longer permitted.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 21 Jan 2016 15:11:04 +0000 (16:11 +0100)]
x86: fix (and simplify) MTRR overlap checking
Obtaining one individual range per variable range register (via
get_mtrr_range()) was bogus from the beginning, as these registers may
cover multiple disjoint ranges. Do away with that, in favor of simply
comparing masked addresses.
Also, for is_var_mtrr_overlapped()'s result to be correct when called
from mtrr_wrmsr(), generic_set_mtrr() must update saved state first.
As minor cleanup changes, constify is_var_mtrr_overlapped()'s parameter
and make mtrr_wrmsr() static.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 21 Jan 2016 15:10:42 +0000 (16:10 +0100)]
x86: constrain MFN range Dom0 may access
... to that covered by the physical address width supported by the
processor. This implicitly avoids Dom0 (accidentally or due to some
kind of abuse) passing out of range addresses to a guest, which in
turn eliminates this only possibility for PV guests to create PTEs
with one or more reserved bits set.
Note that this is not a security issue due to XSA-77.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 21 Jan 2016 15:10:19 +0000 (16:10 +0100)]
x86/paging: invlpg() hook returns boolean
... so make its return type reflect this.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: George Dunlap <george.dunlap@citrix.com>
Instead of having a manually-curated list of schedulers, use the array
that was auto-generated simply by compiling in the scheduler files as
the sole source of truth of the available schedulers.
Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com> Acked-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Adds a simple macro to place a pointer to a scheduler into an array
section at compile time. Also, goes ahead and generates the array
entries with each of the schedulers.
Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com> Acked-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
build: alloc space for sched list in the link file
Creates a section to contain scheduler entry pointers that are gathered
together into an array. This will allow, in a follow-on patch, scheduler
entries to be automatically gathered together into the array for
automatic parsing.
Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Allow the schedulers to be independently enabled or disabled at
compile-time. To match existing behavior, all four schedulers are
compiled in by default, although the Credit2, RTDS, and ARINC653 are
marked EXPERIMENTAL to match their not currently supported status.
Wen Congyang [Tue, 19 Jan 2016 07:17:41 +0000 (15:17 +0800)]
tools/libxc: error handling for the postcopy() callback
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wen Congyang [Tue, 19 Jan 2016 07:17:40 +0000 (15:17 +0800)]
tools/libxc: don't send end record if remus fails
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wen Congyang [Tue, 19 Jan 2016 07:17:39 +0000 (15:17 +0800)]
remus: resume immediately if libxl__xc_domain_save_done() completes
For example: if the secondary host is down, and we fail to send the data to
the secondary host. xc_domain_save() returns 0. So in the function
libxl__xc_domain_save_done(), rc is 0 (the helper program exits normally),
and retval is 0 (it is xc_domain_save()'s return value). In such case, we
just need to complete the stream.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- updated wording of comment as discussed ]
Wen Congyang [Tue, 19 Jan 2016 07:17:38 +0000 (15:17 +0800)]
remus: don't call stream_continue() when doing failover
stream_continue() is used for migration to read emulator
xenstore data and emulator context. For remus, if we do
failover, we have read it in the checkpoint cycle, and
we only need to complete the stream.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wen Congyang [Tue, 19 Jan 2016 07:17:37 +0000 (15:17 +0800)]
remus: don't do failover if we don't have a consistent state
We will have a consistent state when a CHECKPOINT_END record
is received. After the first CHECKPOINT_END record is received,
we will buffer all records until the next CHECKPOINT_END record
is received. So if the checkpoint() callback returns XGR_CHECKPOINT_FAILOVER,
we only can do failover if ctx->restore.buffer_all_records is
true.
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 20 Jan 2016 12:50:10 +0000 (13:50 +0100)]
x86/VMX: prevent INVVPID failure due to non-canonical guest address
While INVLPG (and on SVM INVLPGA) don't fault on non-canonical
addresses, INVVPID fails (in the "individual address" case) when passed
such an address.
Since such intercepted INVLPG are effectively no-ops anyway, don't fix
this in vmx_invlpg_intercept(), but instead have paging_invlpg() never
return true in such a case.
This is CVE-2016-1571 / XSA-168.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Roger Pau Monne [Tue, 19 Jan 2016 17:47:19 +0000 (17:47 +0000)]
x86/HVM: memset CPU context save area
In order to prevent leaking data in the padding field. Also remove the
memset done to the fpu_regs in case of no FPU context present, since it's
already taken care by the memset of the whole CPU context structure. The
same applies to setting ctxt.flags to 0 in case there's no FPU context.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Ian Campbell [Tue, 19 Jan 2016 11:56:50 +0000 (11:56 +0000)]
docs: correct descriptions of gnttab_max_{, maptrack}_frames
gnttab_max_frames incorrectly referred to numbers of grant tab
operations and gnttab_max_maptrack_frames was confusingly worded.
Add the default for gnttab_max_frames while here (it's currently the
same on all arches since no arch uses the available arch override) and
adjust the default for gnttab_max_maptrack_frames to match the normal
form.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Boris Ostrovsky [Thu, 7 Jan 2016 22:19:07 +0000 (17:19 -0500)]
libxc: Defer initialization of start_page for HVM guests
With commit 8c45adec18e0 ("libxc: create unmapped initrd in domain
builder if supported") location of ramdisk may not be available to
HVMlite guests by the time alloc_magic_pages_hvm() is invoked if the
guest supports unmapped initrd.
So let's move ramdisk info initialization (along with a few other
operations that are not directly related to allocating magic/special
pages) from alloc_magic_pages_hvm() to bootlate_hvm().
Since we now split allocation and mapping of the start_info segment
let's stash it, along with cmdline length, in xc_dom_image so that we
can check whether we are mapping correctly-sized range.
We can also stop using xc_dom_image.start_info_pfn and leave it for
PV(H) guests only.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Haozhong Zhang [Tue, 19 Jan 2016 15:07:59 +0000 (16:07 +0100)]
svm: remove redundant TSC scaling in svm_set_tsc_offset()
Now every caller passes an already scaled offset to
svm_set_tsc_offset(), so it's not necessary to recalculate a scaled TSC
offset in svm_set_tsc_offset().
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Haozhong Zhang [Tue, 19 Jan 2016 15:07:39 +0000 (16:07 +0100)]
x86/time: scale host TSC in pvclock properly
This patch makes the pvclock return the scaled host TSC and
corresponding scaling parameters to HVM domains if guest TSC is not
emulated and TSC scaling is enabled.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Haozhong Zhang [Tue, 19 Jan 2016 15:07:15 +0000 (16:07 +0100)]
x86/hvm: scale host TSC when setting/getting guest TSC
The existing hvm_[set|get]_guest_tsc_fixed() calculate the guest TSC by
adding the TSC offset to the host TSC. When the TSC scaling is enabled,
the host TSC should be scaled first. This patch adds the scaling logic
to those two functions.
Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Paul Durrant [Tue, 12 Jan 2016 09:58:56 +0000 (09:58 +0000)]
public/io/netif.h: document control ring and toeplitz hashing
This patch documents a new shared ring between frontend and backend that
can be used to pass bulk out-of-band data, such as that required to
implement toeplitz hashing in the backend such that it is configurable by
the frontend (which is needed to support NDIS RSS for Windows guests).
The patch then goes on to document the messages passed over the control
ring that can be used to configure toeplitz hashing and a new extra info
fragment that can be used to pass hash values between frontend and
backend for both transmit and receive packets.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Paul Durrant [Tue, 12 Jan 2016 09:58:55 +0000 (09:58 +0000)]
public/io/netif.h: clarifications to wire formats
My previous patch 03809ae7 "document transmit and receive wire formats
separately" improved documentation of the receive and transmit wire
formats but further clarifications were requested.
This patch adds those clarifications.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Cc: Tim Deegan <tim@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Wed, 16 Dec 2015 14:41:56 +0000 (08:41 -0600)]
MAINTAINERS: add myself for kconfig
Added myself as the maintainer of kconfig.
CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Jan Beulich <jbeulich@suse.com> CC: Keir Fraser <keir@xen.org> CC: Tim Deegan <tim@xen.org> Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Juergen Gross [Mon, 18 Jan 2016 08:04:03 +0000 (09:04 +0100)]
tools: don't stop xenstore domain when stopping dom0
When restarting or shutting down dom0 the xendomains script tries to
stop all other domains. Don't do this for the xenstore domain, as it
might survive a dom0 reboot in the future.
The same applies to xl shutdown --all. Here the xenstore domain is
flagged as "never stop".
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Juergen Gross [Mon, 18 Jan 2016 08:04:02 +0000 (09:04 +0100)]
xenstore: write xenstore domain data to xenstore
After starting the xenstore domain write the basic data (domid, name
and memory values) to the xenstore. This makes the domain appear
correctly in xl list. Create a stub json object in order to make e.g.
xl list -l happy.
Add a new option to init-xenstore-domain to be able to specify the
domain's name.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Juergen Gross [Mon, 18 Jan 2016 08:03:59 +0000 (09:03 +0100)]
xenstore: make use of the "xenstore domain" flag
Create the xenstore domain with the xenstore flag specified. This
enables us to test whether such a domain is already running before
we create it. As there ought to be only one xenstore in the system
we don't need to start another one.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
init-xenstore-domain takes only positional parameters today. Change
this to a more flexible parameter syntax allowing to specify additional
options or to omit some.
Juergen Gross [Mon, 18 Jan 2016 08:03:52 +0000 (09:03 +0100)]
libxl: provide a flag in dominfo to avoid stopping it
Add a "never_stop" flag to dominfo as indicator for the toolstack that
this domain is to be kept running. For now it is being set for xenstore
domain, but there might be other domains in the future.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ashwin Chaugule [Mon, 18 Jan 2016 13:54:08 +0000 (14:54 +0100)]
ACPI / table: add new function to get table entries
The acpi_table_parse() function has a callback that
passes a pointer to a table_header. Add a new function
which takes this pointer and parses its entries. This
eliminates the need to re-traverse all the tables for
each call. e.g. as in acpi_table_parse_madt() which is
normally called after acpi_table_parse().
Acked-by: Grant Likely <grant.likely@linaro.org> Signed-off-by: Ashwin Chaugule <ashwin.chaugule@linaro.org> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit f08bb472bff3c0397fb7d6f47bc5cec41dad76e3] Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Len Brown [Mon, 18 Jan 2016 13:53:41 +0000 (14:53 +0100)]
ACPI: check acpi_disabled in acpi_table_parse() and acpi_table_parse_entries()
Allow consumers of the acpi_table_parse()/acpi_table_parse_entries() API
to gracefully handle the acpi_disabled=1 case via return value
rather than checking the global flag themselves.
Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
[Linux commit e5b8fc6ac158f65598f58dba2c0d52ba3b412f52] Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Add an additional environment variable, defaulting to disabled,
that enables the CONFIG_EXPERT configuration option. The purpose
of the CONFIG_EXPERT configuration option is to make non-standard
Kconfig options visible during the configuration process. The
CONFIG_EXPERT option is not, itself, visible during the Kconfig
configuration process, so typical users will never see it nor
any of the non-standard configuration options.
Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Roger Pau Monné [Mon, 18 Jan 2016 13:52:31 +0000 (14:52 +0100)]
x86/hvm: don't set the BSP as initialised in hvm_vcpu_initialise
The BSP will be marked as initialised after hvm_load_cpu_ctxt has loaded the
initial state, which is called from the toolstack during domain creation.
Previous to my HVMlite series HVM guests were started without setting any
explicit CPU state (in fact we placed that horrible jmp at 0x0, because the
IP was by default set to 0x0). This is no longer true, and now HVM guests
require that a proper CPU context is loaded before starting. This change
helps enforce this policy.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 18 Jan 2016 13:51:06 +0000 (14:51 +0100)]
Kconfig: fix environment variable handling
With xen/Makefile including include/config/auto.conf.cmd, environment
variables checked in the latter must be available at the time of
inclusion of that file, and hence must be populated in xen/Makefile
rather than by passing to or inside xen/tools/kconfig/Makefile.kconfig.
Otherwise incremental re-builds will always be full re-builds, which is
not only annoying but actively problematic when building as non-root
and only running "install-xen" as root.
Also take the opportunity and remove stray $(Q) uses.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Doug Goldstein [Fri, 15 Jan 2016 17:39:40 +0000 (11:39 -0600)]
tools: make FLASK utils build unconditional
The flask utilities only have dependencies on libxc so there's no
downside to always building it. Distros and projects based on Xen can
put these in a different package to not install them for all users.
Prior to this change FLASK_ENABLE needed to be set at the top level to
build the utilities and the tools/configure script would build the FLASK
policy by default, but only if the utilities were built.
This change makes item 3 from
http://lists.xenproject.org/archives/html/xen-devel/2016-01/msg01796.html
a happen by default.
CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Paul Durrant [Fri, 15 Jan 2016 10:00:10 +0000 (10:00 +0000)]
libxl: create 'drivers', 'feature' and 'attr' xenstore paths
My recent patch series 'docs: Document xenstore paths' included 3
patches documenting new xenstore paths to allow PV drivers/agents in
guests to advertise version information, significant features and
attributes (such as assigned IP addresses).
This patch adds the necessary code to libxl to create these paths in
xenstore when a domain is created.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
xen/arm: Add r1p12 to the list of supported Cadence UARTs
Add r1p12 to the list of supported Cadence UARTs. Xen only
uses the subset of features available in r1p8, so we don't
need to differentiate between r1p8 and r1p12 yet.
Signed-off-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Juergen Gross [Mon, 4 Jan 2016 14:55:52 +0000 (15:55 +0100)]
libxl: base libxl_list_vm() on libxl_list_domain()
libxl_list_vm() is calling xc_domain_getinfolist() today with a limit
of 1024 domains. To avoid open coding a loop around
xc_domain_getinfolist() to avoid the 1024 domain limit just use
libxl_list_domain() instead.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Roger Pau Monne [Thu, 14 Jan 2016 15:06:50 +0000 (16:06 +0100)]
libxl: fix _SC_GETPW_R_SIZE_MAX usage
If sysconf(_SC_GETPW_R_SIZE_MAX) fails for any reason just use an initial
buffer size of 2048. This is not a critical failure, and the code that
makes use of this buffer is able to expand it later if required.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
commit 5d3dc8671521ea4a4f753e77d3e7fb3a3a6f5f80
"tools: Refactor "xentoollog" into its own library"
with older python versions (2.6.4) will fail to
the build if attempted to be done twice (which
happens due to pygrub dependencies).
make -C python DESTDIR=/tmp
make -C python DESTDIR=/tmp
The second one will fail with:
error: -Wl, -rpath-link=../../tools/libs/toollog: No such file or directory
even though the directory is there (with the libs).
Andrew pointed out that the linker additions should be in
the "extra_link_args" rather than "depends".
And true enough - with that modification it builds.
CC: Ian Campbell <ian.campbell@citrix.com> CC: Ian Jackson <ian.jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> Suggested-by: Andrew Cooper <andrew.cooper3@citirx.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- typo in commit message ]
Tamas K Lengyel [Thu, 14 Jan 2016 09:49:50 +0000 (10:49 +0100)]
vm_event: add altp2m info to HVM events as well
Add altp2m information to HVM events as well when altp2m is active.
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 14 Jan 2016 09:42:53 +0000 (10:42 +0100)]
x86/xsave: simplify xcomp_bv initialization
This simplifies a number of pointless conditionals: Bits 0 and 1 of
xcomp_bv don't matter anyway, and as long as none of bits 2..62 are
set, setting bit 63 is pointless too unless XSAVES is in use.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>