]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agoxenstore: add small default data buffer to internal struct
Juergen Gross [Mon, 5 Dec 2016 07:48:52 +0000 (08:48 +0100)]
xenstore: add small default data buffer to internal struct

Instead of always allocating a data buffer for incoming or outgoing
xenstore wire data add a small buffer to the buffered_data structure
of xenstored. This has the advantage that especially sending simple
response messages like errors or "OK" will no longer need allocating
a data buffer. This requires adding a memory context where the
allocated buffer was used for that purpose.

In order to avoid allocating a new buffered_data structure for each
response reuse the structure of the original request. This in turn
will avoid any new memory allocations for sending e.g. an ENOMEM
response making it possible to send it at all. To do this the
allocation of the buffered_data structure for the incoming request
must be done when a new request is recognized instead of doing it
when accepting a new connect.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: add helper functions for wire argument parsing
Juergen Gross [Mon, 5 Dec 2016 07:48:51 +0000 (08:48 +0100)]
xenstore: add helper functions for wire argument parsing

The xenstore wire command argument parsing of the different commands
is repeating some patterns multiple times. Add some helper functions
to avoid the duplicated code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: make functions static
Juergen Gross [Mon, 5 Dec 2016 07:48:50 +0000 (08:48 +0100)]
xenstore: make functions static

Move functions used in only one source to the file where they are used
and make them static.

Remove some prototypes from headers which are no longer in use.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: let command functions return error or success
Juergen Gross [Mon, 5 Dec 2016 07:48:49 +0000 (08:48 +0100)]
xenstore: let command functions return error or success

Add a return value to all wire command functions of xenstored. If such
a function returns an error send the error message in
process_message().

Only code refactoring, no change in behavior expected.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: use array for xenstore wire command handling
Juergen Gross [Mon, 5 Dec 2016 07:48:48 +0000 (08:48 +0100)]
xenstore: use array for xenstore wire command handling

Instead of switch() statements for selecting wire command actions use
an array for this purpose.

While doing this add the XS_RESTRICT type for diagnostic prints and
correct the printed string for XS_IS_DOMAIN_INTRODUCED.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: support XS_DIRECTORY_PART in libxenstore
Juergen Gross [Mon, 5 Dec 2016 07:48:47 +0000 (08:48 +0100)]
xenstore: support XS_DIRECTORY_PART in libxenstore

This will enable all users of libxenstore to handle xenstore nodes
with a huge amount of children.

In order to not depend completely on the XS_DIRECTORY_PART
functionality use it only in case of E2BIG returned by XS_DIRECTORY.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: add support for reading directory with many children
Juergen Gross [Mon, 5 Dec 2016 07:48:46 +0000 (08:48 +0100)]
xenstore: add support for reading directory with many children

As the payload size for one xenstore wire command is limited to 4096
bytes it is impossible to read the children names of a node with a
large number of children (e.g. /local/domain in case of a host with
more than about 2000 domains). This effectively limits the maximum
number of domains a host can support.

In order to support such long directory outputs add a new wire command
XS_DIRECTORY_PART which will return only some entries in each call and
can be called in a loop to get all entries.

Input data are the path of the node and the byte offset into the child
list where returned data should start.

Output is the generation count of the node (which will change each time
the node is being modified) and a list of child names starting with
the specified index. The end of the list is indicated by an empty
child name. It is the responsibility of the caller to check for data
consistency by comparing the generation counts of all returned data
sets to be the same for one node.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: add per-node generation counter
Juergen Gross [Mon, 5 Dec 2016 07:48:45 +0000 (08:48 +0100)]
xenstore: add per-node generation counter

In order to be able to support reading the list of a node's children in
multiple chunks (needed for list sizes > 4096 bytes) without having to
allocate a temporary buffer we need some kind of generation counter for
each node. This will help to recognize a node has changed between
reading two chunks.

As removing a node and reintroducing it must result in different
generation counts each generation value has to be globally unique. This
can be ensured only by using a global 64 bit counter.

For handling of transactions there is already such a counter available,
it just has to be expanded to 64 bits and must be stored in each
modified node.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: use common tdb record header in xenstore
Juergen Gross [Mon, 5 Dec 2016 07:48:44 +0000 (08:48 +0100)]
xenstore: use common tdb record header in xenstore

The layout of the tdb record of xenstored is defined at multiple
places: read_node(), write_node() and in xs_tdb_dump.c

Use a common structure instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: call add_change_node() directly when writing node
Juergen Gross [Mon, 5 Dec 2016 07:48:43 +0000 (08:48 +0100)]
xenstore: call add_change_node() directly when writing node

Instead of calling add_change_node() at places where write_node() is
called, do that inside write_node().

Note that there is one case where add_change_node() is called now when
a later failure will prohibit the changed node to be written: in case
of a write_node failing due to an error in tdb_store(). As the only
visible change of behavior is a stale event fired for the node, while
the failing tdb_store() signals a corrupted xenstore database, the
stale event will be the least problem of this case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: modify add_change_node() parameter types
Juergen Gross [Mon, 5 Dec 2016 07:48:42 +0000 (08:48 +0100)]
xenstore: modify add_change_node() parameter types

In order to prepare adding a generation count to each node modify
add_change_node() to take the connection pointer and a node pointer
instead of the transaction pointer and node name as parameters. This
requires moving the call of add_change_node() from do_rm() to
delete_node_single().

While at it correct the comment for the prototype: there is no
longjmp() involved.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_xshelp.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:41 +0000 (16:08 +0100)]
libxl/libxl_xshelp.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_x86.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:40 +0000 (16:08 +0100)]
libxl/libxl_x86.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_vtpm.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:39 +0000 (16:08 +0100)]
libxl/libxl_vtpm.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_vnuma.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:38 +0000 (16:08 +0100)]
libxl/libxl_vnuma.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_stream_write.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:37 +0000 (16:08 +0100)]
libxl/libxl_stream_write.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_save_callout.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:36 +0000 (16:08 +0100)]
libxl/libxl_save_callout.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_remus.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:35 +0000 (16:08 +0100)]
libxl/libxl_remus.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_qmp.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:34 +0000 (16:08 +0100)]
libxl/libxl_qmp.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_pvusb.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:33 +0000 (16:08 +0100)]
libxl/libxl_pvusb.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_psr.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:32 +0000 (16:08 +0100)]
libxl/libxl_psr.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_pci.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:31 +0000 (16:08 +0100)]
libxl/libxl_pci.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_no_colo.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:30 +0000 (16:08 +0100)]
libxl/libxl_no_colo.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_nic.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:29 +0000 (16:08 +0100)]
libxl/libxl_nic.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_netbuffer.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:28 +0000 (16:08 +0100)]
libxl/libxl_netbuffer.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_netbsd.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:27 +0000 (16:08 +0100)]
libxl/libxl_netbsd.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_linux.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:26 +0000 (16:08 +0100)]
libxl/libxl_linux.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_internal.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:25 +0000 (16:08 +0100)]
libxl/libxl_internal.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_freebsd.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:24 +0000 (16:08 +0100)]
libxl/libxl_freebsd.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_dom_suspend.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:23 +0000 (16:08 +0100)]
libxl/libxl_dom_suspend.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_dom_save.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:22 +0000 (16:08 +0100)]
libxl/libxl_dom_save.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_dm.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:21 +0000 (16:08 +0100)]
libxl/libxl_dm.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_device.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:20 +0000 (16:08 +0100)]
libxl/libxl_device.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_create.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:19 +0000 (16:08 +0100)]
libxl/libxl_create.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_colo_save.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:18 +0000 (16:08 +0100)]
libxl/libxl_colo_save.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_colo_restore.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:17 +0000 (16:08 +0100)]
libxl/libxl_colo_restore.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_colo_qdisk.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:16 +0000 (16:08 +0100)]
libxl/libxl_colo_qdisk.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_colo_proxy.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:15 +0000 (16:08 +0100)]
libxl/libxl_colo_proxy.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_colo_nic.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:14 +0000 (16:08 +0100)]
libxl/libxl_colo_nic.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_colo.h: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:13 +0000 (16:08 +0100)]
libxl/libxl_colo.h: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_checkpoint_device.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:12 +0000 (16:08 +0100)]
libxl/libxl_checkpoint_device.c: used LOG*D functions

Use LOG*D logging functions where possible instead of the LOG* ones.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl/libxl_bootloader.c: used LOG*D functions
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:11 +0000 (16:08 +0100)]
libxl/libxl_bootloader.c: used LOG*D functions

Use LOG*D functions to output the domain ID in logs as much as
possible. This will help consumer code sorting the logs by
domain.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl.c: switch to LOG*D use
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:10 +0000 (16:08 +0100)]
libxl.c: switch to LOG*D use

Use LOG*D functions to output the domain ID in logs as much as
possible. This will help consumer code sorting the logs by
domain.

This commit includes all LOG* to LOG*D changes where the domain
ID is not just a domid variable.

We want the domain ID provided to the LOG*D functions to be the
one of the publicly known domain, not a stubdom one.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl.c: switch to LOG*D use (refactored messages)
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:09 +0000 (16:08 +0100)]
libxl.c: switch to LOG*D use (refactored messages)

Use LOG*D functions to output the domain ID in logs as much as
possible. This will help consumer code sorting the logs by
domain.

This commit, only changes LOG*() into LOG*D() and adds a domid
parameter. The message of these LOG* calls has been altered to
remove the domain id from it since it is already contained in
the output log string.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl.c: switch to LOG*D use
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:08 +0000 (16:08 +0100)]
libxl.c: switch to LOG*D use

Use LOG*D functions to output the domain ID in logs as much as
possible. This will help consumer code sorting the logs by
domain.

libxl.c changes have been split into 3 commits to help review
them and isolate more instances that could be problematic.

This commit, only changes LOG*() into LOG*D() and adds a domid
parameter.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: add LIBXL_LOGD_* and LOG*D function families.
Cedric Bosdonnat [Fri, 2 Dec 2016 15:08:07 +0000 (16:08 +0100)]
libxl: add LIBXL_LOGD_* and LOG*D function families.

These functions should be used to log messages when the domain
id is known. libxl__log will now prepend the log message by
"Domain %PRIu32:" if the domain id is a valid one.

This aims at helping consumers filter logs on domain IDs.

Signed-off-by: Cédric Bosdonnat <cbosdonnat@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agostubdom: remove EXTRA_CFLAGS meant for building tools
Juergen Gross [Tue, 8 Nov 2016 09:09:41 +0000 (10:09 +0100)]
stubdom: remove EXTRA_CFLAGS meant for building tools

When building stubdoms EXTRA_CFLAGS_XEN_TOOLS and
EXTRA_CFLAGS_QEMU_TRADITIONAL should be cleared as they might contain
flags not suitable for all stubdom builds (e.g. "-m64" often to be
found in $RPM_OPT_FLAGS will break building 32 bit stubdoms).

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
8 years agobuild system: don't let install-stubdom depend on install-tools
Juergen Gross [Tue, 8 Nov 2016 08:29:11 +0000 (09:29 +0100)]
build system: don't let install-stubdom depend on install-tools

There is no reason for the install-stubdom target to depend on
install-tools. It is absolutely reasonable to install new stubdoms
only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agostubdom: simplify and fix Makefile
Juergen Gross [Fri, 4 Nov 2016 09:53:29 +0000 (10:53 +0100)]
stubdom: simplify and fix Makefile

The stubdom Makefile is setting up links for various libraries. This
is done only once when qemu links are created and each library's links
are updated/created only if the link for the Makefile of the library
isn't already existing. In case a source is added to one library after
doing the first make of stubdom the new source won't be linked by a
new call of make.

Instead of testing the existence of the Makefile link use a make
dependency which will catch changes of the linked Makefile, too.

At the same time don't repeat the same link pattern 7 times but use a
make macro to do the linking.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
[ wei: move "touch $@" to correct location in do_links ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agoflask: add gcov_op check
Wei Liu [Thu, 13 Oct 2016 14:33:15 +0000 (15:33 +0100)]
flask: add gcov_op check

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
8 years agogcov: provide the capability to select gcov format automatically
Wei Liu [Wed, 5 Oct 2016 14:29:59 +0000 (15:29 +0100)]
gcov: provide the capability to select gcov format automatically

And make it the default in Kconfig.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoConfig.mk: introduce cc-ifversion
Wei Liu [Wed, 5 Oct 2016 14:25:42 +0000 (15:25 +0100)]
Config.mk: introduce cc-ifversion

It returns different string depending on compiler version.

No user yet.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoConfig.mk: expand cc-ver a bit
Wei Liu [Wed, 5 Oct 2016 13:48:58 +0000 (14:48 +0100)]
Config.mk: expand cc-ver a bit

... so that we can do other comparisons as well.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agogcov: userspace tools to extract and split gcov data
Wei Liu [Mon, 3 Oct 2016 13:38:13 +0000 (14:38 +0100)]
gcov: userspace tools to extract and split gcov data

Provide two tools: a small C program to extract data from hypervisor and
a python script to split data into multiple files.

The file xencov.c is salvaged and modified from the original xencov.c.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agogcov: add new interface and new formats support
Wei Liu [Thu, 29 Sep 2016 20:10:53 +0000 (21:10 +0100)]
gcov: add new interface and new formats support

A new sysctl interface for passing gcov data back to userspace. The new
interface uses a customised record file format. The new sysctl reuses
original sysctl number but renames the op to gcov_op.

Formats starting from gcc version 3.4 are supported. The code is
rewritten so that a new format can be easily added in the future.
Version specific code is grouped into different files. The format one
needs to use can be picked via Kconfig. The default format is the newest
one.

Userspace programs to handle extracted data will come in a later patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen, tools: rip out old gcov implementation
Wei Liu [Thu, 29 Sep 2016 17:38:30 +0000 (18:38 +0100)]
xen, tools: rip out old gcov implementation

The internal data structure and code are tied to an old gcov format.
It's easier to just redo everything from scratch.

Salvage the reusable parts: leave xen/common/gcov and an empty Makefile
there, leave gcov support in Kconfig but mark that as broken. Also
reserve the sysctl number for later use (but delete relevant sysctl
structures).

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen: delete gcno files in clean target
Wei Liu [Mon, 3 Oct 2016 17:33:16 +0000 (18:33 +0100)]
xen: delete gcno files in clean target

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoKconfig: add BROKEN config
Wei Liu [Thu, 29 Sep 2016 17:40:52 +0000 (18:40 +0100)]
Kconfig: add BROKEN config

Used to hide feature that is completely broken.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Use system-segment relative memory accesses
Andrew Cooper [Fri, 1 Jul 2016 17:29:46 +0000 (18:29 +0100)]
x86/emul: Use system-segment relative memory accesses

With hvm_virtual_to_linear_addr() capable of doing proper system-segment
relative memory accesses, avoid open-coding the address and limit calculations
locally.

When a table spans the 4GB boundary (32bit) or non-canonical boundary (64bit),
segmentation errors are now raised.  Previously, the use of x86_seg_none
resulted in segmentation being skipped, and the linear address being truncated
through the pagewalk, and possibly coming out valid on the far side.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agox86/emul: Prepare to allow use of system segments for memory references
Andrew Cooper [Thu, 30 Jun 2016 22:55:33 +0000 (23:55 +0100)]
x86/emul: Prepare to allow use of system segments for memory references

All system segments (GDT/IDT/LDT and TR) describe a linear address and limit,
and act similarly to user segments.  However all current uses of these tables
in the emulator opencode the address calculations and limit checks.  In
particular, no care is taken for access which wrap around the 4GB or
non-canonical boundaries.

Alter hvm_virtual_to_linear_addr() to cope with performing segmentation checks
on system segments.  This involves restricting access checks in the 32bit case
to user segments only, and adding presence/limit checks in the 64bit case.

When suffering a segmentation fault for a system segments, return
X86EMUL_EXCEPTION but leave the fault injection to the caller.  The fault type
depends on the higher level action being performed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
8 years agox86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back
Andrew Cooper [Tue, 1 Nov 2016 20:02:35 +0000 (20:02 +0000)]
x86/hvm: Avoid __hvm_copy() raising #PF behind the emulators back

Drop the call to hvm_inject_page_fault() in __hvm_copy(), and require callers
to inject the pagefault themselves.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
8 years agox86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear()
Andrew Cooper [Wed, 23 Nov 2016 11:11:23 +0000 (11:11 +0000)]
x86/hvm: Rename hvm_copy_*_guest_virt() to hvm_copy_*_guest_linear()

The functions use linear addresses, not virtual addresses, as no segmentation
is used.  (Lots of other code in Xen makes this mistake.)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
8 years agox86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info
Andrew Cooper [Wed, 2 Nov 2016 11:49:25 +0000 (11:49 +0000)]
x86/hvm: Reimplement hvm_copy_*_nofault() in terms of no pagefault_info

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
8 years agox86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer
Andrew Cooper [Tue, 1 Nov 2016 20:49:25 +0000 (20:49 +0000)]
x86/hvm: Extend the hvm_copy_*() API with a pagefault_info pointer

which is filled with pagefault information should one occur.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86/shadow: Avoid raising faults behind the emulators back
Andrew Cooper [Fri, 25 Nov 2016 15:20:44 +0000 (15:20 +0000)]
x86/shadow: Avoid raising faults behind the emulators back

Use x86_emul_{hw_exception,pagefault}() rather than
{pv,hvm}_inject_page_fault() and hvm_inject_hw_exception() to cause raised
faults to be known to the emulator.  This requires altering the callers of
x86_emulate() to properly re-inject the event.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/pv: Avoid raising faults behind the emulators back
Andrew Cooper [Thu, 24 Nov 2016 18:18:36 +0000 (18:18 +0000)]
x86/pv: Avoid raising faults behind the emulators back

Use x86_emul_pagefault() rather than pv_inject_page_fault() to cause raised
pagefaults to be known to the emulator.  This requires altering the callers of
x86_emulate() to properly re-inject the event.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Avoid raising faults behind the emulators back
Andrew Cooper [Tue, 1 Nov 2016 19:50:47 +0000 (19:50 +0000)]
x86/emul: Avoid raising faults behind the emulators back

Introduce a new x86_emul_pagefault() similar to x86_emul_hw_exception(), and
use this instead of hvm_inject_page_fault() from emulation codepaths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS
Andrew Cooper [Mon, 26 Sep 2016 16:13:14 +0000 (16:13 +0000)]
x86/hvm: Reposition the modification of raw segment data from the VMCB/VMCS

Intel VT-x and AMD SVM provide access to the full segment descriptor cache via
fields in the VMCB/VMCS.  However, the bits which are actually checked by
hardware and preserved across vmentry/exit are inconsistent, and the vendor
accessor functions perform inconsistent modification to the raw values.

Convert {svm,vmx}_{get,set}_segment_register() into raw accessors, and alter
hvm_{get,set}_segment_register() to cook the values consistently.  This allows
the common emulation code to better rely on finding architecturally-expected
values.

While moving the code performing the cooking, fix the %ss.db quirk.  A NULL
selector is indicated by .p being clear, not the value of the .type field.

This does cause some functional changes because of the modifications being
applied uniformly.  A side effect of this fixes latent bugs where
vmx_set_segment_register() didn't correctly fix up .G for segments, and
inconsistent fixing up of the GDTR/IDTR limits.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
8 years agox86/vmx: Use hvm_{get,set}_segment_register() rather than vmx_{get,set}_segment_regis...
Andrew Cooper [Tue, 27 Sep 2016 17:21:20 +0000 (18:21 +0100)]
x86/vmx: Use hvm_{get,set}_segment_register() rather than vmx_{get,set}_segment_register()

No functional change at this point, but this is a prerequisite for forthcoming
functional changes.

Make vmx_get_segment_register() private to vmx.c like all the other Vendor
get/set functions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86/emul: Rework emulator event injection
Andrew Cooper [Mon, 7 Nov 2016 13:14:03 +0000 (13:14 +0000)]
x86/emul: Rework emulator event injection

The emulator needs to gain an understanding of interrupts and exceptions
generated by its actions.

Move hvm_emulate_ctxt.{exn_pending,trap} into struct x86_emulate_ctxt so they
are visible to the emulator.  This removes the need for the
inject_{hw_exception,sw_interrupt}() hooks, which are dropped and replaced
with x86_emul_{hw_exception,software_event,reset_event}() instead.

For exceptions raised by x86_emulate() itself (rather than its callbacks), the
shadow pagetable and PV uses of x86_emulate() previously failed with
X86EMUL_UNHANDLEABLE due to the lack of inject_*() hooks.

This behaviour has changed, and such cases will now return X86EMUL_EXCEPTION
with event_pending set.  Until the callers of x86_emulate() have been updated
to inject events back into the guest, divert the event_pending case back into
the X86EMUL_UNHANDLEABLE path to maintain the same guest-visible behaviour.

No overall functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Remove opencoded exception generation
Andrew Cooper [Wed, 2 Nov 2016 15:59:49 +0000 (15:59 +0000)]
x86/emul: Remove opencoded exception generation

Introduce generate_exception() for unconditional exception generation, and
replace existing uses.  Both generate_exception() and generate_exception_if()
are updated to make their error code parameters optional, which removes the
use of the -1 sentinal.

The ioport_access_check() check loses the presence check for %tr, as the x86
architecture has no concept of a non-usable task register.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <JBeulich@suse.com>
8 years agox86/emul: Implement singlestep as a retire flag
Andrew Cooper [Tue, 29 Nov 2016 17:56:17 +0000 (17:56 +0000)]
x86/emul: Implement singlestep as a retire flag

The behaviour of singlestep is to raise #DB after the instruction has been
completed, but implementing it with inject_hw_exception() causes x86_emulate()
to return X86EMUL_EXCEPTION, despite succesfully completing execution of the
instruction, including register writeback.

Instead, use a retire flag to indicate singlestep, which causes x86_emulate()
to return X86EMUL_OKAY.

Update all callers of x86_emulate() to use the new retire flag.  This fixes
the behaviour of singlestep for shadow pagetable updates and mmcfg/mmio_ro
intercepts, which previously discarded the exception.

With this change, all uses of X86EMUL_EXCEPTION from x86_emulate() are
believed to have strictly fault semantics.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Always use fault semantics for software events
Andrew Cooper [Tue, 29 Nov 2016 11:45:41 +0000 (11:45 +0000)]
x86/emul: Always use fault semantics for software events

The common case is already using fault semantics out of x86_emulate(), as that
is how VT-x/SVM expects to inject the event (given suitable hardware support).

However, x86_emulate() returning X86EMUL_EXCEPTION and also completing a
register writeback is problematic for callers.

Switch the logic to always using fault semantics, and leave svm_inject_trap()
to fix up %eip if necessary.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Provide a wrapper to x86_emulate() to ASSERT() certain behaviour
Andrew Cooper [Tue, 29 Nov 2016 18:46:56 +0000 (18:46 +0000)]
x86/emul: Provide a wrapper to x86_emulate() to ASSERT() certain behaviour

In debug builds, confirm that some properties of x86_emulate()'s behaviour
actually hold.  The first property, fixed in a previous change, is that retire
flags are only ever set in the X86EMUL_OKAY case.

While adjusting the userspace test harness to cope with ASSERT() in
x86_emulate.h, fix a build problem introduced in c/s 122dd9575c7 "x86emul:
in_longmode() should not ignore ->read_msr() errors" by providing an
implementation of likely()/unlikely().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Correct the behaviour of pop %ss and interrupt shadowing
Andrew Cooper [Tue, 29 Nov 2016 18:35:46 +0000 (18:35 +0000)]
x86/emul: Correct the behaviour of pop %ss and interrupt shadowing

The mov_ss retire flag should only be set once load_seg() has returned
success.  In particular, it should not be set if an exception occured when
trying to load %ss.

_hvm_emulate_one(), currently the sole user of mov_ss, only consideres it in
the case that x86_emulate() returns X86EMUL_OKAY, so this bug isn't actually
exposed to guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Clean up the naming of the retire union
Andrew Cooper [Tue, 29 Nov 2016 17:55:21 +0000 (17:55 +0000)]
x86/emul: Clean up the naming of the retire union

Rename byte to raw, as the field being a single byte long is an implementation
detail.  Make the bitfields part of an anonymous struct to remove the .flags
qualifier.  Change the types of the flags to being booleans, to match their
use.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/pv: Implement pv_inject_{event,page_fault,hw_exception}()
Andrew Cooper [Thu, 24 Nov 2016 18:18:36 +0000 (18:18 +0000)]
x86/pv: Implement pv_inject_{event,page_fault,hw_exception}()

To help with event injection improvements for the PV uses of x86_emulate(),
implement a event injection API which matches its hvm counterpart.

This is started with taking do_guest_trap() and modifying its calling API to
pv_inject_event(), subsequentally implementing the former in terms of the
latter.

The existing propagate_page_fault() is fairly similar to
pv_inject_page_fault(), although it has a return value.  Only a single caller
makes use of the return value, and non-NULL is only returned if the passed cr2
is non-canonical.  Opencode this single case in
handle_gdt_ldt_mapping_fault(), allowing propagate_page_fault() to become
void.

The call to reserved_bit_page_fault() in propagate_page_fault() was
conceptually wrong to start with.  Complaining about reserved bits should be
part of handling the pagefault itself, not part of injecting a pagefault into
the guest.  It is therefore moved ahead of the injection call in
do_page_fault() to compensate.

The remaining #PF specific bits are moved into pv_inject_event(), and
pv_inject_page_fault() is implemented as a static inline wrapper.

No practical change from a guests point of view.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Rename HVM_DELIVER_NO_ERROR_CODE to X86_EVENT_NO_EC
Andrew Cooper [Mon, 7 Nov 2016 13:14:03 +0000 (13:14 +0000)]
x86/emul: Rename HVM_DELIVER_NO_ERROR_CODE to X86_EVENT_NO_EC

and move it to live with the other x86_event infrastructure in x86_emulate.h.
Switch it and x86_event.error_code to being signed, matching the rest of the
code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Rename hvm_trap to x86_event and move it into the emulation infrastructure
Andrew Cooper [Mon, 7 Nov 2016 13:14:03 +0000 (13:14 +0000)]
x86/emul: Rename hvm_trap to x86_event and move it into the emulation infrastructure

The x86 emulator needs to gain an understanding of interrupts and exceptions
generated by its actions.  The naming choice is to match both the Intel and
AMD terms, and to avoid 'trap' specifically as it has an architectural meaning
different to its current usage.

While making this change, make other changes for consistency

 * Rename *_trap() infrastructure to *_event()
 * Rename trapnr/trap parameters to vector
 * Convert hvm_inject_hw_exception() and hvm_inject_page_fault() to being
   static inlines, as they are only thin wrappers around hvm_inject_event()

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/emul: Simplfy emulation state setup
Andrew Cooper [Wed, 23 Nov 2016 13:34:52 +0000 (13:34 +0000)]
x86/emul: Simplfy emulation state setup

The current code to set up emulation state is ad-hoc and error prone.

 * Consistently zero all emulation state structures.
 * Avoid explicitly initialising some state to 0.
 * Explicitly identify all input and output state in x86_emulate_ctxt.  This
   involves rearanging some fields.
 * Have x86_decode() explicitly initalise all output state at its start.

While making the above changes, two minor tweaks:

 * Move the calculation of hvmemul_ctxt->ctxt.swint_emulate from
   _hvm_emulate_one() to hvm_emulate_init_once().  It doesn't need
   recalculating for each instruction.
 * Change force_writeback to being a boolean, to match its use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
8 years agox86/emul: Drop X86EMUL_CMPXCHG_FAILED
Andrew Cooper [Thu, 24 Nov 2016 18:31:34 +0000 (18:31 +0000)]
x86/emul: Drop X86EMUL_CMPXCHG_FAILED

X86EMUL_CMPXCHG_FAILED was introduced in c/s d430aae25 in 2005.  Even at the
time it alised what is now X86EMUL_RETRY (as well as what is now
X86EMUL_EXCEPTION).  I am not sure why the distinction was considered useful
at the time.

It is only used twice; there is no need to call it out differently from other
uses of X86EMUL_RETRY.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/shadow: Fix #PFs from emulated writes crossing a page boundary
Andrew Cooper [Fri, 25 Nov 2016 17:23:04 +0000 (17:23 +0000)]
x86/shadow: Fix #PFs from emulated writes crossing a page boundary

When translating the second frame of a write crossing a page boundary, mask
the linear address down to the page boundary.

This causes the correct %cr2 being reported to the guest in the case that the
second frame suffers a pagefault during translation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agovtd: refuse to enable IOMMU if the PCI scan fails
Roger Pau Monné [Fri, 2 Dec 2016 17:09:11 +0000 (18:09 +0100)]
vtd: refuse to enable IOMMU if the PCI scan fails

This provides uniform behavior between Intel and AMD IOMMU initialization, and
is a requirement for PVHv2 Dom0, that depends on a working IOMMU plus the PCI
bus being scanned for devices.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86: split the setup of Dom0 permissions to a function
Roger Pau Monné [Fri, 2 Dec 2016 17:08:51 +0000 (18:08 +0100)]
x86: split the setup of Dom0 permissions to a function

So that it can also be used by the PVH-specific domain builder. This is just
code motion, it should not introduce any functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/paging: introduce paging_set_allocation
Roger Pau Monné [Fri, 2 Dec 2016 17:08:26 +0000 (18:08 +0100)]
x86/paging: introduce paging_set_allocation

... and remove hap_set_alloc_for_pvh_dom0. While there also change the last
parameter of the {hap/shadow}_set_allocation functions to be a boolean.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: allow calling {shadow/hap}_set_allocation with the idle domain
Roger Pau Monné [Fri, 2 Dec 2016 17:07:58 +0000 (18:07 +0100)]
x86: allow calling {shadow/hap}_set_allocation with the idle domain

... and using the "preempted" parameter. Introduce a new helper that can
be used from both hypercall or idle vcpu context (ie: during Dom0
creation) in order to check if preemption is needed. If such preemption
happens, the caller should then call process_pending_softirqs in order to
drain the pending softirqs, and then call *_set_allocation again to continue
with it's execution.

This allows us to call *_set_allocation() when building domain 0.

While there also document hypercall_preempt_check and add an assert to
local_events_need_delivery in order to be sure it's not called by the idle
domain, which doesn't receive any events (and that in turn
hypercall_preempt_check is also not called by the idle domain).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
8 years agox86: fix return value of *_set_allocation functions
Roger Pau Monné [Fri, 2 Dec 2016 17:07:12 +0000 (18:07 +0100)]
x86: fix return value of *_set_allocation functions

Return should be an int.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
8 years agoacpi: PVH guests need _E02 method
Boris Ostrovsky [Fri, 2 Dec 2016 17:06:40 +0000 (18:06 +0100)]
acpi: PVH guests need _E02 method

This is the method that will get invoked on an SCI.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoacpi: power and sleep ACPI buttons are not emulated for PVH guests
Boris Ostrovsky [Fri, 2 Dec 2016 17:06:25 +0000 (18:06 +0100)]
acpi: power and sleep ACPI buttons are not emulated for PVH guests

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoacpi: make pmtimer optional in FADT
Boris Ostrovsky [Fri, 2 Dec 2016 17:06:06 +0000 (18:06 +0100)]
acpi: make pmtimer optional in FADT

PM timer is not supported by PVH guests.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/cpuid: Add AVX512_4VNNIW and AVX512_4FMAPS support
He Chen [Mon, 21 Nov 2016 06:01:14 +0000 (14:01 +0800)]
x86/cpuid: Add AVX512_4VNNIW and AVX512_4FMAPS support

Add two new AVX512 subfeatures support for guest.

AVX512_4VNNIW:
Vector instructions for deep learning enhanced word variable precision.

AVX512_4FMAPS:
Vector instructions for deep learning floating-point single precision.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Signed-off-by: He Chen <he.chen@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/vmx: Shorten vmx_{get,set}_segment_register() for user segments
Andrew Cooper [Fri, 23 Sep 2016 14:03:08 +0000 (15:03 +0100)]
x86/vmx: Shorten vmx_{get,set}_segment_register() for user segments

The x86_segment enumeration matches hardware SReg encoding, which can be used
to calculate the appropriate VMCS fields, rather than open coding every
instance.

This reduces the size of the switch statement, and the number of embedded BUG
frames from the __vm{read,write}() calls.  In the unlikely case that a call
does fault, the field can unambiguously be retrieved from the GPR state
printed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agox86/svm: Improve segment register printing in svm_vmcb_dump()
Andrew Cooper [Thu, 27 Oct 2016 13:07:21 +0000 (13:07 +0000)]
x86/svm: Improve segment register printing in svm_vmcb_dump()

This makes it more succinct and easier to read.

Before:
  (XEN) H_CR3 = 0x000000042a0ec000 CleanBits = 0
  (XEN) CS: sel=0x0008, attr=0x029b, limit=0xffffffff, base=0x0000000000000000
  (XEN) DS: sel=0x0033, attr=0x0cf3, limit=0xffffffff, base=0x0000000000000000
  (XEN) SS: sel=0x0018, attr=0x0c93, limit=0xffffffff, base=0x0000000000000000
  (XEN) ES: sel=0x0033, attr=0x0cf3, limit=0xffffffff, base=0x0000000000000000
  (XEN) FS: sel=0x0033, attr=0x0cf3, limit=0xffffffff, base=0x0000000000000000
  (XEN) GS: sel=0x0033, attr=0x0cf3, limit=0xffffffff, base=0x0000000000000000
  (XEN) GDTR: sel=0x0000, attr=0x0000, limit=0x00000067, base=0x000000000010d100
  (XEN) LDTR: sel=0x0000, attr=0x0000, limit=0x00000000, base=0x0000000000000000
  (XEN) IDTR: sel=0x0000, attr=0x0000, limit=0x00000fff, base=0x0000000000110900
  (XEN) TR: sel=0x0038, attr=0x0089, limit=0x00000067, base=0x000000000010d020

After:
  (XEN) H_CR3 = 0x000000042a0ec000 CleanBits = 0
  (XEN)        sel attr  limit   base
  (XEN)   CS: 0008 029b ffffffff 0000000000000000
  (XEN)   DS: 0033 0cf3 ffffffff 0000000000000000
  (XEN)   SS: 0018 0c93 ffffffff 0000000000000000
  (XEN)   ES: 0033 0cf3 ffffffff 0000000000000000
  (XEN)   FS: 0033 0cf3 ffffffff 0000000000000000
  (XEN)   GS: 0033 0cf3 ffffffff 0000000000000000
  (XEN) GDTR: 0000 0000 00000067 000000000010d100
  (XEN) LDTR: 0000 0000 00000000 0000000000000000
  (XEN) IDTR: 0000 0000 00000fff 0000000000110900
  (XEN)   TR: 0038 0089 00000067 000000000010d020

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostorvsky@oracle.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
8 years agoRe-enable hypervisor debug as part of opening 4.9
Ian Jackson [Fri, 2 Dec 2016 12:16:35 +0000 (12:16 +0000)]
Re-enable hypervisor debug as part of opening 4.9

AFAICT following bacbf0cb7349 "build: convert debug to Kconfig"
hypervisor debug enablement is controlled here, rather than in
Config.mk.

The release checklist says that when branching, the new staging should
have debug enabled.  It seems to me that I should be changing this
here, therefore.

As additional evidence, I offer e1d1c68ea8a3 "xen: disable debug
build" which went in between 4.8.0 RC5 and RC6.  It does not explain
why this was done but it does STM that reverting that change is right.

CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Tim Deegan <tim@xen.org>
CC: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoOpen Xen 4.9-unstable
Ian Jackson [Fri, 2 Dec 2016 12:11:43 +0000 (12:11 +0000)]
Open Xen 4.9-unstable

* Change version number in README and xen/Makefile to `4.8-unstable'.

* Set `debug ?= y' in Config.mk.

* Set QEMU_UPSTREAM_REVISION to track qemu-xen.git `master'.

* Set MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION back to
  commit hashes, rather than 4.8 tags.

Hypervisor debug enablement is left aside for now.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
8 years agoConfig.mk: Drop stale QEMU_TRADITIONAL_REVISION commentary 4.8.0-rc8
Ian Jackson [Tue, 29 Nov 2016 18:05:48 +0000 (18:05 +0000)]
Config.mk: Drop stale QEMU_TRADITIONAL_REVISION commentary

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
8 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Ian Jackson [Tue, 29 Nov 2016 16:54:30 +0000 (16:54 +0000)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

8 years agoUpdate QEMU_TRADITIONAL_REVISION and QEMU_UPSTREAM_REVISION to -rc7 4.8.0-rc7
Ian Jackson [Tue, 29 Nov 2016 16:41:32 +0000 (16:41 +0000)]
Update QEMU_TRADITIONAL_REVISION and QEMU_UPSTREAM_REVISION to -rc7

These commits include the fix for XSA-197.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
8 years agocredit2: make runqueues be per-socket by default
Dario Faggioli [Tue, 29 Nov 2016 15:01:03 +0000 (16:01 +0100)]
credit2: make runqueues be per-socket by default

Benchmarks have shown that per-socket runqueues arrangement
behaves better (e.g., we achieve better load balancing)
than the current per-core default.

Here's an example (coming from
https://lists.xen.org/archives/html/xen-devel/2016-06/msg02287.html ):

|=======================================|
| XEN BUILD TIME, LOW LOAD, NO NOISE    |
|---------------------------------------|
|       runq=core   runq=socket         |
|         35.200       33.433           |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, HIGH LOAD, NO NOISE   | IPERF, HIGH LOAD, NO NOISE   |
|---------------------------------------|------------------------------|
|       runq=core   runq=socket         |     runq=core runq=socket    |
|         18.013       18.530           |       23.200     23.466      |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, LOW LOAD, WITH NOISE  |
|-------------------------------------  |
|       runq=core   runq=socket         |
|         45.866       39.493           |
|---------------------------------------|------------------------------|
| XEN BUILD TIME, HIGH LOAD, WITH NOISE | IPERF, HIGH LOAD, WITH NOISE |
|---------------------------------------|------------------------------|
|       runq=core   runq=socket         |     runq=core runq=socket    |
|         36.840       29.080           |       19.967     21.000      |
|=======================================|==============================|

The only reason why we went for per-core, initially, was to
introduce some form of hyperthreading support. Now we have
hyperthreading support, independently from how runqueues
are organized (9bb9c7388 "xen: credit2: implement true SMT
support"), and thus we can switch to per-socket.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibacpi: fix compilation when cross building the tools
Julien Grall [Tue, 29 Nov 2016 15:00:48 +0000 (16:00 +0100)]
libacpi: fix compilation when cross building the tools

The tools (such as mk_dsdt) can be cross-built when it may not be
desirable to build them on the target.

The commit c4ac1077 "libxl/arm: Generate static ACPI DSDT table"
introduced support of ARM64 in mk_dsdt but also break cross-building
tools because the ACPI tables are not correct.

While mk_dsdt should generate ACPI table for the target architecture, it
currently generates the one for the host. This is because the source
code contains reference to the host architecture (__aarch64__,
__x86_64__, __i386__) when it should be the target architecture.

Replace all __aarch64__, __x86_64__, __i386__ by the corresponding
CONFIG_*.

Also expose the CONFIG_* to the source code as the currently only
exposed to the Makefile.

Reported-by: Andrii Anisov <andrii.anisov@gmail.com>
Suggested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>