]> xenbits.xensource.com Git - people/pauldu/xen.git/log
people/pauldu/xen.git
4 years agolibxl / libxlu: support 'xl pci-attach/detach' by name libxl-pci12
Paul Durrant [Fri, 16 Oct 2020 16:26:07 +0000 (16:26 +0000)]
libxl / libxlu: support 'xl pci-attach/detach' by name

This patch adds a 'name' field into the idl for 'libxl_device_pci' and
libxlu_pci_parse_spec_string() is modified to parse the new 'name'
parameter of PCI_SPEC_STRING detailed in the updated documention in
xl-pci-configuration(5).

If the 'name' field is non-NULL then both libxl_device_pci_add() and
libxl_device_pci_remove() will use it to look up the device BDF in
the list of assignable devices.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
v6:
 - Re-base
 - Slight modification to the patch name
 - Kept Wei's A-b since modifications are small

4 years agodocs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING
Paul Durrant [Fri, 23 Oct 2020 14:03:51 +0000 (14:03 +0000)]
docs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING

Since assignable devices can be named, a subsequent patch will support use
of a PCI_SPEC_STRING containing a 'name' parameter instead of a 'bdf'. In
this case the name will be used to look up the 'bdf' in the list of assignable
(or assigned) devices.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agoxl: support naming of assignable devices
Paul Durrant [Tue, 8 Dec 2020 11:03:27 +0000 (11:03 +0000)]
xl: support naming of assignable devices

This patch converts libxl to use libxl_pci_bdf_assignable_add/remove/list/
list_free() rather than libxl_device_pci_assignable_add/remove/list/
list_free(), which then allows naming of assignable devices to be supported.

With this patch applied 'xl pci-assignable-add' will take an optional '--name'
parameter, 'xl pci-assignable-remove' can be passed either a BDF or a name and
'xl pci-assignable-list' will take a optional '--show-names' flag which
determines whether names are displayed in its output.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Wei Liu <wl@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
v6:
 - New in v6 (split out from "xl / libxl: support naming of assignable
   devices")

4 years agolibxl: introduce libxl_pci_bdf_assignable_add/remove/list/list_free(), ...
Paul Durrant [Tue, 8 Dec 2020 12:25:48 +0000 (12:25 +0000)]
libxl: introduce libxl_pci_bdf_assignable_add/remove/list/list_free(), ...

which support naming and use 'libxl_pci_bdf' rather than 'libxl_device_pci',
as replacements for libxl_device_pci_assignable_add/remove/list/list_free().

libxl_pci_bdf_assignable_add() takes a 'name' parameter which is stored in
xenstore and facilitates two addtional functions added by this patch:
libxl_pci_bdf_assignable_name2bdf() and libxl_pci_bdf_assignable_bdf2name().
Currently there are no callers of these two functions. They will be added in
a subsequent patch.

libxl_device_pci_assignable_add/remove/list/list_free() are left in place
for compatibility but are re-implemented in terms of the newly introduced
functions.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Wei Liu <wl@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
v6:
 - New in v6 (replacing remaining code from "libxl: modify
   libxl_device_pci_assignable_add/remove/list/list_free()...")

4 years agolibxl: convert internal functions in libxl_pci.c...
Paul Durrant [Thu, 15 Oct 2020 15:13:49 +0000 (15:13 +0000)]
libxl: convert internal functions in libxl_pci.c...

... to use 'libx_pci_bdf' where appropriate.

No API change.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Wei Liu <wl@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
v6:
 - New in v6 (split out from "libxl: modify libxl_device_pci_assignable_add/
   remove/list/list_free()..."

4 years agodocs/man: modify xl(1) in preparation for naming of assignable devices
Paul Durrant [Fri, 23 Oct 2020 13:21:03 +0000 (13:21 +0000)]
docs/man: modify xl(1) in preparation for naming of assignable devices

A subsequent patch will introduce code to allow a name to be specified to
'xl pci-assignable-add' such that the assignable device may be referred to
by than name in subsequent operations.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxlu: introduce xlu_pci_parse_spec_string()
Paul Durrant [Thu, 15 Oct 2020 11:57:40 +0000 (11:57 +0000)]
libxlu: introduce xlu_pci_parse_spec_string()

This patch largely re-writes the code to parse a PCI_SPEC_STRING and enters
it via the newly introduced function. The new parser also deals with 'bdf'
and 'vslot' as non-positional paramaters, as per the documentation in
xl-pci-configuration(5).

The existing xlu_pci_parse_bdf() function remains, but now strictly parses
BDF values. Some existing callers of xlu_pci_parse_bdf() are
modified to call xlu_pci_parse_spec_string() as per the documentation in xl(1).

NOTE: Usage text in xl_cmdtable.c and error messages are also modified
      appropriately.
      As a side-effect this patch also fixes a bug where using '*' to specify
      all functions would lead to an assertion failure at the end of
      xlu_pci_parse_bdf().

Fixes: d25cc3ec93eb ("libxl: workaround gcc 10.2 maybe-uninitialized warning")
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
4 years agolibxl: introduce 'libxl_pci_bdf' in the idl...
Paul Durrant [Wed, 21 Oct 2020 18:04:45 +0000 (18:04 +0000)]
libxl: introduce 'libxl_pci_bdf' in the idl...

... and use in 'libxl_device_pci'

This patch is preparatory work for restricting the type passed to functions
that only require BDF information, rather than passing a 'libxl_device_pci'
structure which is only partially filled. In this patch only the minimal
mechanical changes necessary to deal with the structural changes are made.
Subsequent patches will adjust the code to make better use of the new type.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Nick Rosbrook <rosbrookn@ainfosec.com>
---
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
4 years agodocs/man: fix xl(1) documentation for 'pci' operations
Paul Durrant [Thu, 15 Oct 2020 10:07:53 +0000 (10:07 +0000)]
docs/man: fix xl(1) documentation for 'pci' operations

Currently the documentation completely fails to mention the existence of
PCI_SPEC_STRING. This patch tidies things up, specifically clarifying that
'pci-assignable-add/remove' take <BDF> arguments where as 'pci-attach/detach'
take <PCI_SPEC_STRING> arguments (which will be enforced in a subsequent
patch).

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agodocs/man: improve documentation of PCI_SPEC_STRING...
Paul Durrant [Tue, 13 Oct 2020 07:43:35 +0000 (07:43 +0000)]
docs/man: improve documentation of PCI_SPEC_STRING...

... and prepare for adding support for non-positional parsing of 'bdf' and
'vslot' in a subsequent patch.

Also document 'BDF' as a first-class parameter type and fix the documentation
to state that the default value of 'rdm_policy' is actually 'strict', not
'relaxed', as can be seen in libxl__device_pci_setdefault().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agodocs/man: extract documentation of PCI_SPEC_STRING from the xl.cfg manpage...
Paul Durrant [Mon, 12 Oct 2020 16:01:55 +0000 (16:01 +0000)]
docs/man: extract documentation of PCI_SPEC_STRING from the xl.cfg manpage...

... and put it into a new xl-pci-configuration(5) manpage, akin to the
xl-network-configration(5) and xl-disk-configuration(5) manpages.

This patch moves the content of the section verbatim. A subsequent patch
will improve the documentation, once it is in its new location.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: use COMPARE_PCI() macro is_pci_in_array()...
Paul Durrant [Tue, 20 Oct 2020 16:53:35 +0000 (16:53 +0000)]
libxl: use COMPARE_PCI() macro is_pci_in_array()...

... rather than an open-coded equivalent.

This patch tidies up the is_pci_in_array() function, making it take a single
'libxl_device_pci' argument rather than separate domain, bus, device and
function arguments. The already-available COMPARE_PCI() macro can then be
used and it is also modified to return 'bool' rather than 'int'.

The patch also modifies libxl_pci_assignable() to use is_pci_in_array() rather
than a separate open-coded equivalent, and also modifies it to return a
'bool' rather than an 'int'.

NOTE: The COMPARE_PCI() macro is also fixed to include the 'domain' in its
      comparison, which should always have been the case.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: add libxl_device_pci_assignable_list_free()...
Paul Durrant [Wed, 21 Oct 2020 16:41:02 +0000 (16:41 +0000)]
libxl: add libxl_device_pci_assignable_list_free()...

... to be used by callers of libxl_device_pci_assignable_list().

Currently there is no API for callers of libxl_device_pci_assignable_list()
to free the list. The xl function pciassignable_list() calls
libxl_device_pci_dispose() on each element of the returned list, but
libxl_pci_assignable() in libxl_pci.c does not. Neither does the implementation
of libxl_device_pci_assignable_list() call libxl_device_pci_init().

This patch adds the new API function, makes sure it is used everywhere and
also modifies libxl_device_pci_assignable_list() to initialize list
entries rather than just zeroing them.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: David Scott <dave@recoil.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
4 years agolibxl: make sure callers of libxl_device_pci_list() free the list after use
Paul Durrant [Tue, 20 Oct 2020 16:08:21 +0000 (16:08 +0000)]
libxl: make sure callers of libxl_device_pci_list() free the list after use

A previous patch introduced libxl_device_pci_list_free() which should be used
by callers of libxl_device_pci_list() to properly dispose of the exported
'libxl_device_pci' types and the free the memory holding them. Whilst all
current callers do ensure the memory is freed, only the code in xl's
pcilist() function actually calls libxl_device_pci_dispose(). As it stands
this laxity does not lead to any memory leaks, but the simple addition of
.e.g. a 'string' into the idl definition of 'libxl_device_pci' would lead
to leaks.

This patch makes sure all callers of libxl_device_pci_list() can call
libxl_device_pci_list_free() by keeping copies of 'libxl_device_pci'
structures inline in 'pci_add_state' and 'pci_remove_state' (and also making
sure these are properly disposed at the end of the operations) rather
than keeping pointers to the structures returned by libxl_device_pci_list().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
4 years agolibxl: remove get_all_assigned_devices() from libxl_pci.c
Paul Durrant [Fri, 23 Oct 2020 08:46:09 +0000 (08:46 +0000)]
libxl: remove get_all_assigned_devices() from libxl_pci.c

Use of this function is a very inefficient way to check whether a device
has already been assigned.

This patch adds code that saves the domain id in xenstore at the point of
assignment, and removes it again when the device id de-assigned (or the
domain is destroyed). It is then straightforward to check whether a device
has been assigned by checking whether a device has a saved domain id.

NOTE: To facilitate the xenstore check it is necessary to move the
      pci_info_xs_read() earlier in libxl_pci.c. To keep related functions
      together, the rest of the pci_info_xs_XXX() functions are moved too.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: remove unnecessary check from libxl__device_pci_add()
Paul Durrant [Fri, 23 Oct 2020 08:04:33 +0000 (08:04 +0000)]
libxl: remove unnecessary check from libxl__device_pci_add()

The code currently checks explicitly whether the device is already assigned,
but this is actually unnecessary as assigned devices do not form part of
the list returned by libxl_device_pci_assignable_list() and hence the
libxl_pci_assignable() test would have already failed.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: generalise 'driver_path' xenstore access functions in libxl_pci.c
Paul Durrant [Fri, 16 Oct 2020 08:43:00 +0000 (08:43 +0000)]
libxl: generalise 'driver_path' xenstore access functions in libxl_pci.c

For the purposes of re-binding a device to its previous driver
libxl__device_pci_assignable_add() writes the driver path into xenstore.
This path is then read back in libxl__device_pci_assignable_remove().

The functions that support this writing to and reading from xenstore are
currently dedicated for this purpose and hence the node name 'driver_path'
is hard-coded. This patch generalizes these utility functions and passes
'driver_path' as an argument. Subsequent patches will invoke them to
access other nodes.

NOTE: Because functions will have a broader use (other than storing a
      driver path in lieu of pciback) the base xenstore path is also
      changed from '/libxl/pciback' to '/libxl/pci'.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: stop using aodev->device_config in libxl__device_pci_add()...
Paul Durrant [Wed, 21 Oct 2020 11:14:45 +0000 (11:14 +0000)]
libxl: stop using aodev->device_config in libxl__device_pci_add()...

... to hold a pointer to the device.

There is already a 'pci' field in 'pci_add_state' so simply use that from
the start. This also allows the 'pci' (#3) argument to be dropped from
do_pci_add().

NOTE: This patch also changes the type of the 'pci_domid' field in
      'pci_add_state' from 'int' to 'libxl_domid' which is more appropriate
      given what the field is used for.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: remove extraneous arguments to do_pci_remove() in libxl_pci.c
Paul Durrant [Wed, 21 Oct 2020 10:58:33 +0000 (10:58 +0000)]
libxl: remove extraneous arguments to do_pci_remove() in libxl_pci.c

Both 'domid' and 'pci' are available in 'pci_remove_state' so there is no
need to also pass them as separate arguments.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: s/detatched/detached in libxl_pci.c
Paul Durrant [Fri, 23 Oct 2020 06:22:19 +0000 (06:22 +0000)]
libxl: s/detatched/detached in libxl_pci.c

Simply spelling correction. Purely cosmetic fix.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: add/recover 'rdm_policy' to/from PCI backend in xenstore
Paul Durrant [Wed, 21 Oct 2020 13:50:12 +0000 (13:50 +0000)]
libxl: add/recover 'rdm_policy' to/from PCI backend in xenstore

Other parameters, such as 'msitranslate' and 'permissive' are dealt with
but 'rdm_policy' appears to be have been completely missed.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
4 years agolibxl: Make sure devices added by pci-attach are reflected in the config
Paul Durrant [Mon, 19 Oct 2020 15:19:56 +0000 (15:19 +0000)]
libxl: Make sure devices added by pci-attach are reflected in the config

Currently libxl__device_pci_add_xenstore() is broken in that does not
update the domain's configuration for the first device added (which causes
creation of the overall backend area in xenstore). This can be easily observed
by running 'xl list -l' after adding a single device: the device will be
missing.

This patch fixes the problem and adds a DEBUG log line to allow easy
verification that the domain configuration is being modified. Also, the use
of libxl__device_generic_add() is dropped as it leads to a confusing situation
where only partial backend information is written under the xenstore
'/libxl' path. For LIBXL__DEVICE_KIND_PCI devices the only definitive
information in xenstore is under '/local/domain/0/backend' (the '0' being
hard-coded).

NOTE: This patch includes a whitespace in add_pcis_done().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
v3:
 - Revert some changes form v2 as there is confusion over use of the libxl
   and backend xenstore paths which needs to be fixed

v2:
 - Avoid having two completely different ways of adding devices into xenstore

4 years agolibxl: make libxl__device_list() work correctly for LIBXL__DEVICE_KIND_PCI...
Paul Durrant [Mon, 23 Nov 2020 15:02:36 +0000 (15:02 +0000)]
libxl: make libxl__device_list() work correctly for LIBXL__DEVICE_KIND_PCI...

... devices.

Currently there is an assumption built into libxl__device_list() that device
backends are fully enumarated under the '/libxl' path in xenstore. This is
not the case for PCI backend devices, which are only properly enumerated
under '/local/domain/0/backend'.

This patch adds a new get_path() method to libxl__device_type to allow a
backend implementation (such as PCI) to specify the xenstore path where
devices are enumerated and modifies libxl__device_list() to use this method
if it is available. Also, if the get_num() method is defined then the
from_xenstore() method expects to be passed the backend path without the device
number concatenated, so this issue is also rectified.

Having made libxl__device_list() work correctly, this patch removes the
open-coded libxl_pci_device_pci_list() in favour of an evaluation of the
LIBXL_DEFINE_DEVICE_LIST() macro. This has the side-effect of also defining
libxl_pci_device_pci_list_free() which will be used in subsequent patches.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
v3:
 - New in v3 (replacing "libxl: use LIBXL_DEFINE_DEVICE_LIST for pci devices")

4 years agoxl: s/pcidev/pci where possible
Paul Durrant [Mon, 7 Dec 2020 18:44:00 +0000 (18:44 +0000)]
xl: s/pcidev/pci where possible

To improve naming consistency, replaces occurrences of 'pcidev' with 'pci'.
The only remaining use of the term should be in relation to
'libxl_domain_config' where there are fields named 'pcidevs' and 'num_pcidevs'.

Purely cosmetic. No functional change.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Wei Liu <wl@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
v6:
 - New in v6 (split out from "xl / libxl: s/pcidev/pci and remove
   DEFINE_DEVICE_TYPE_STRUCT_X")

4 years agolibxl: s/pcidev/pci and remove DEFINE_DEVICE_TYPE_STRUCT_X
Paul Durrant [Tue, 20 Oct 2020 13:28:39 +0000 (13:28 +0000)]
libxl: s/pcidev/pci and remove DEFINE_DEVICE_TYPE_STRUCT_X

The seemingly arbitrary use of 'pci' and 'pcidev' in the code in libxl_pci.c
is confusing and also compromises use of some macros used for other device
types. Indeed it seems that DEFINE_DEVICE_TYPE_STRUCT_X exists solely because
of this duality.

This patch purges use of 'pcidev' from the libxl internal code, but
unfortunately the 'pcidevs' and 'num_pcidevs' fields in 'libxl_domain_config'
are part of the API and need to be retained to avoid breaking callers,
particularly libvirt.

DEFINE_DEVICE_TYPE_STRUCT_X is still removed to avoid the special case in
libxl_pci.c but DEFINE_DEVICE_TYPE_STRUCT is given an extra 'array' argument
which is used to identify the fields in 'libxl_domain_config' relating to
the device type.

NOTE: Some of the more gross formatting errors (such as lack of spaces after
      keywords) that came into context have been fixed in libxl_pci.c.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Wei Liu <wl@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
v6:
 - Avoid name changes in 'libxl_domain_config'
 - Defer xl changes to a subsequent patch

v5:
 - Minor cosmetic fix

4 years agotools/libs/ctrl: fix dumping of ballooned guest
Juergen Gross [Wed, 11 Nov 2020 10:01:43 +0000 (11:01 +0100)]
tools/libs/ctrl: fix dumping of ballooned guest

A guest with memory < maxmem often can't be dumped via xl dump-core
without an error message today:

xc: info: exceeded nr_pages (262144) losing pages

In case the last page of the guest isn't allocated the loop in
xc_domain_dumpcore_via_callback() will always spit out this message,
as the number of already dumped pages is tested before the next page
is checked to be valid.

The guest's p2m_size might be lower than expected, so this should be
tested in order to avoid reading past the end of it.

The guest might use high bits in p2m entries to flag special cases like
foreign mappings. Entries with an MFN larger than the highest MFN of
the host should be skipped.

Signed-off-by: Juergen Gross <jgross@suse.com>
4 years agox86/IRQ: reduce casting involved in guest action retrieval
Jan Beulich [Fri, 4 Dec 2020 12:17:24 +0000 (13:17 +0100)]
x86/IRQ: reduce casting involved in guest action retrieval

Introduce a helper function covering both the IRQ_GUEST check and the
cast involved in obtaining the (correctly typed) pointer. Where possible
add const and/or reduce variable scope.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
4 years agox86/IRQ: drop three unused variables
Jan Beulich [Fri, 4 Dec 2020 12:16:49 +0000 (13:16 +0100)]
x86/IRQ: drop three unused variables

I didn't bother figuring which commit(s) should have deleted them while
removing their last uses.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
4 years agoxl / libxl: add 'ex_processor_mask' into 'libxl_viridian_enlightenment'
Paul Durrant [Fri, 4 Dec 2020 12:16:22 +0000 (13:16 +0100)]
xl / libxl: add 'ex_processor_mask' into 'libxl_viridian_enlightenment'

Adding the new value into the enumeration makes it immediately available
to xl, so this patch adjusts the xl.cfg(5) documentation accordingly.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: add a new '_HVMPV_ex_processor_masks' bit into HVM_PARAM_VIRIDIAN...
Paul Durrant [Fri, 4 Dec 2020 12:15:57 +0000 (13:15 +0100)]
viridian: add a new '_HVMPV_ex_processor_masks' bit into HVM_PARAM_VIRIDIAN...

... and advertise ExProcessorMasks support if it is set.

Support is advertised by setting bit 11 in CPUID:40000004:EAX.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: log initial invocation of each type of hypercall
Paul Durrant [Fri, 4 Dec 2020 12:15:38 +0000 (13:15 +0100)]
viridian: log initial invocation of each type of hypercall

To make is simpler to observe which viridian hypercalls are issued by a
particular Windows guest, this patch adds a per-domain mask to track them.
Each type of hypercall causes a different bit to be set in the mask and
when the bit transitions from clear to set, a log line is emitted showing
the name of the hypercall and the domain that issued it.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: add ExProcessorMasks variant of the IPI hypercall
Paul Durrant [Fri, 4 Dec 2020 12:15:21 +0000 (13:15 +0100)]
viridian: add ExProcessorMasks variant of the IPI hypercall

A previous patch introduced variants of the flush hypercalls that take a
'Virtual Processor Set' as an argument rather than a simple 64-bit mask.
This patch introduces a similar variant of the HVCALL_SEND_IPI hypercall
(HVCALL_SEND_IPI_EX).

NOTE: As with HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST_EX, a guest should
      not yet issue the HVCALL_SEND_IPI_EX hypercall as support for
      'ExProcessorMasks' is not yet advertised via CPUID.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: add ExProcessorMasks variants of the flush hypercalls
Paul Durrant [Fri, 4 Dec 2020 12:14:59 +0000 (13:14 +0100)]
viridian: add ExProcessorMasks variants of the flush hypercalls

The Microsoft Hypervisor TLFS specifies variants of the already implemented
HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST hypercalls that take a 'Virtual
Processor Set' as an argument rather than a simple 64-bit mask.

This patch adds a new hvcall_flush_ex() function to implement these
(HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST_EX) hypercalls. This makes use of
new helper functions, hv_vpset_nr_banks() and hv_vpset_to_vpmask(), to
determine the size of the Virtual Processor Set (so it can be copied from
guest memory) and parse it into hypercall_vpmask (respectively).

NOTE: A guest should not yet issue these hypercalls as 'ExProcessorMasks'
      support needs to be advertised via CPUID. This will be done in a
      subsequent patch.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: use softirq batching in hvcall_ipi()
Paul Durrant [Fri, 4 Dec 2020 12:14:42 +0000 (13:14 +0100)]
viridian: use softirq batching in hvcall_ipi()

vlapic_ipi() uses a softirq batching mechanism to improve the efficiency of
sending a IPIs to large number of processors. This patch modifies send_ipi()
(the worker function called by hvcall_ipi()) to also make use of the
mechanism when there multiple bits set the hypercall_vpmask.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: use hypercall_vpmask in hvcall_ipi()
Paul Durrant [Fri, 4 Dec 2020 12:14:25 +0000 (13:14 +0100)]
viridian: use hypercall_vpmask in hvcall_ipi()

A subsequent patch will need to IPI a mask of virtual processors potentially
wider than 64 bits. A previous patch introduced per-cpu hypercall_vpmask
to allow hvcall_flush() to deal with such wide masks. This patch modifies
the implementation of hvcall_ipi() to make use of the same mask structures,
introducing a for_each_vp() macro to facilitate traversing a mask.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: introduce a per-cpu hypercall_vpmask and accessor functions...
Paul Durrant [Fri, 4 Dec 2020 12:14:03 +0000 (13:14 +0100)]
viridian: introduce a per-cpu hypercall_vpmask and accessor functions...

... and make use of them in hvcall_flush()/need_flush().

Subsequent patches will need to deal with virtual processor masks potentially
wider than 64 bits. Thus, to avoid using too much stack, this patch
introduces global per-cpu virtual processor masks and converts the
implementation of hvcall_flush() to use them.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: move IPI hypercall implementation into separate function
Paul Durrant [Fri, 4 Dec 2020 12:13:41 +0000 (13:13 +0100)]
viridian: move IPI hypercall implementation into separate function

This patch moves the implementation of HVCALL_SEND_IPI that is currently
inline in viridian_hypercall() into a new hvcall_ipi() function.

The new function returns Xen errno values similarly to hvcall_flush(). Hence
the errno translation code in viridial_hypercall() is generalized, removing
the need for the local 'status' variable.

NOTE: The formatting of the switch statement at the top of
      viridial_hypercall() is also adjusted as per CODING_STYLE.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: move flush hypercall implementation into separate function
Paul Durrant [Fri, 4 Dec 2020 12:13:22 +0000 (13:13 +0100)]
viridian: move flush hypercall implementation into separate function

This patch moves the implementation of HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE/LIST
that is currently inline in viridian_hypercall() into a new hvcall_flush()
function.

The new function returns Xen erro values which are then dealt with
appropriately. A return value of -ERESTART translates to viridian_hypercall()
returning HVM_HCALL_preempted. Other return values translate to status codes
and viridian_hypercall() returning HVM_HCALL_completed. Currently the only
values, other than -ERESTART, returned by hvcall_flush() are 0 (indicating
success) or -EINVAL.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoviridian: don't blindly write to 32-bit registers if 'mode' is invalid
Paul Durrant [Fri, 4 Dec 2020 12:12:54 +0000 (13:12 +0100)]
viridian: don't blindly write to 32-bit registers if 'mode' is invalid

If hvm_guest_x86_mode() returns something other than 8 or 4 then
viridian_hypercall() will return immediately but, on the way out, will write
back status as if 'mode' was 4. This patch simply makes it leave the registers
alone.

NOTE: The formatting of the 'out' label and the switch statement are also
      adjusted as per CODING_STYLE.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools/hotplug: allow tuning of xenwatchdogd arguments
Olaf Hering [Thu, 3 Dec 2020 06:34:36 +0000 (07:34 +0100)]
tools/hotplug: allow tuning of xenwatchdogd arguments

Currently the arguments for xenwatchdogd are hardcoded with 15s
keep-alive interval and 30s timeout.

It is not possible to tweak these values via
/etc/systemd/system/xen-watchdog.service.d/*.conf because ExecStart
can not be replaced. The only option would be a private copy
/etc/systemd/system/xen-watchdog.service, which may get out of sync
with the Xen provided xen-watchdog.service.

Adjust the service file to recognize XENWATCHDOGD_ARGS= in a
private unit configuration file.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
4 years agoxen/hypfs: add getsize() and findentry() callbacks to hypfs_funcs
Juergen Gross [Fri, 4 Dec 2020 07:31:25 +0000 (08:31 +0100)]
xen/hypfs: add getsize() and findentry() callbacks to hypfs_funcs

Add a getsize() function pointer to struct hypfs_funcs for being able
to have dynamically filled entries without the need to take the hypfs
lock each time the contents are being generated.

For directories add a findentry callback to the vector and modify
hypfs_get_entry_rel() to use it. For its non-directory node counterpart
introduce the so far unused and hence missing ENOTDIR error code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/hypfs: pass real failure reason up from hypfs_get_entry()
Juergen Gross [Fri, 4 Dec 2020 07:30:17 +0000 (08:30 +0100)]
xen/hypfs: pass real failure reason up from hypfs_get_entry()

Instead of handling all errors from hypfs_get_entry() as ENOENT pass
up the real error value via ERR_PTR().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/hypfs: move per-node function pointers into a dedicated struct
Juergen Gross [Fri, 4 Dec 2020 07:29:41 +0000 (08:29 +0100)]
xen/hypfs: move per-node function pointers into a dedicated struct

Move the function pointers currently stored in each hypfs node into a
dedicated structure in order to save some space for each node. This
will save even more space with additional callbacks added in future.

Provide some standard function vectors.

Instead of testing the write pointer to be not NULL provide a dummy
function just returning -EACCESS. ASSERT() all vector entries being
populated when adding a node. This avoids any potential problem (e.g.
pv domain privilege escalations) in case of calling a non populated
vector entry.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agofix spelling errors
Diederik de Haas [Fri, 4 Dec 2020 07:28:21 +0000 (08:28 +0100)]
fix spelling errors

Only spelling errors; no functional changes.

In docs/misc/dump-core-format.txt there are a few more instances of
'informations'. I'll leave that up to someone who can properly determine
how those sentences should be constructed.

Signed-off-by: Diederik de Haas <didi.debian@cknow.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoevtchn: avoid access tearing for ->virq_to_evtchn[] accesses
Jan Beulich [Fri, 4 Dec 2020 07:27:26 +0000 (08:27 +0100)]
evtchn: avoid access tearing for ->virq_to_evtchn[] accesses

Use {read,write}_atomic() to exclude any eventualities, in particular
observing that accesses aren't all happening under a consistent lock.

Requested-by: Julien Grall <julien@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/iommu: vtd: Fix undefined behavior pci_vtd_quirks()
Julien Grall [Tue, 17 Nov 2020 13:45:47 +0000 (13:45 +0000)]
xen/iommu: vtd: Fix undefined behavior pci_vtd_quirks()

When booting Xen with CONFIG_USBAN=y on Sandy Bridge, UBSAN will throw
the following splat:

(XEN) ================================================================================
(XEN) UBSAN: Undefined behaviour in quirks.c:449:63
(XEN) left shift of 1 by 31 places cannot be represented in type 'int'
(XEN) ----[ Xen-4.11.4  x86_64  debug=y   Not tainted ]----

[...]

(XEN) Xen call trace:
(XEN)    [<ffff82d0802c0ccc>] ubsan.c#ubsan_epilogue+0xa/0xad
(XEN)    [<ffff82d0802c16c9>] __ubsan_handle_shift_out_of_bounds+0xb4/0x145
(XEN)    [<ffff82d0802eeecd>] pci_vtd_quirk+0x3d3/0x74f
(XEN)    [<ffff82d0802e508b>] iommu.c#domain_context_mapping+0x45b/0x46f
(XEN)    [<ffff82d08053f39e>] iommu.c#setup_hwdom_device+0x22/0x3a
(XEN)    [<ffff82d08053dfbc>] pci.c#setup_one_hwdom_device+0x8c/0x124
(XEN)    [<ffff82d08053e302>] pci.c#_setup_hwdom_pci_devices+0xbb/0x2f7
(XEN)    [<ffff82d0802da5b7>] pci.c#pci_segments_iterate+0x4c/0x8c
(XEN)    [<ffff82d08053e8bd>] setup_hwdom_pci_devices+0x25/0x2c
(XEN)    [<ffff82d08053e916>] iommu.c#intel_iommu_hwdom_init+0x52/0x2f3
(XEN)    [<ffff82d08053d6da>] iommu_hwdom_init+0x4e/0xa4
(XEN)    [<ffff82d080577f32>] dom0_construct_pv+0x23c8/0x2476
(XEN)    [<ffff82d08057cb50>] construct_dom0+0x6c/0xa3
(XEN)    [<ffff82d080564822>] __start_xen+0x4651/0x4b55
(XEN)    [<ffff82d0802000f3>] __high_start+0x53/0x55

Note that splat is from 4.11.4 and not staging. Although, the problem is
still present.

This can be solved by making the first operand unsigned int.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agodocs: fix hypfs path documentation
Juergen Gross [Wed, 2 Dec 2020 09:13:52 +0000 (10:13 +0100)]
docs: fix hypfs path documentation

The /params/* entry is missing a writable tag.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/cpupool: sort included headers in cpupool.c
Juergen Gross [Wed, 2 Dec 2020 09:13:09 +0000 (10:13 +0100)]
xen/cpupool: sort included headers in cpupool.c

Common style is to include header files in alphabetical order. Sort the
#include statements in cpupool.c accordingly.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
4 years agoxen/cpupool: add missing bits for per-cpupool scheduling granularity
Juergen Gross [Wed, 2 Dec 2020 09:12:37 +0000 (10:12 +0100)]
xen/cpupool: add missing bits for per-cpupool scheduling granularity

Even with storing the scheduling granularity in struct cpupool there
are still a few bits missing for being able to have cpupools with
different granularity (apart from the missing interface for setting
the individual granularities): the number of cpus in a scheduling
unit is always taken from the global sched_granularity variable.

So store the value in struct cpupool and use that instead of
sched_granularity.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
4 years agoxen/cpupool: add cpu to sched_res_mask when removing it from cpupool
Juergen Gross [Wed, 2 Dec 2020 09:12:04 +0000 (10:12 +0100)]
xen/cpupool: add cpu to sched_res_mask when removing it from cpupool

When a cpu is removed from a cpupool and added to the free cpus it
should be added to sched_res_mask, too.

The related removal from sched_res_mask in case of core scheduling
is already done in schedule_cpu_add().

As long as all cpupools share the same scheduling granularity there
is nothing going wrong with the missing addition, but this will change
when per-cpupool granularity is fully supported.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
4 years agons16550: gate all PCI code with CONFIG_X86
Rahul Singh [Wed, 2 Dec 2020 09:09:27 +0000 (10:09 +0100)]
ns16550: gate all PCI code with CONFIG_X86

The NS16550 driver is assuming that NS16550 PCI card are usable if the
architecture supports PCI (i.e. CONFIG_HAS_PCI=y). However, the code is
very x86 focus and will fail to build on Arm (/!\ it is not all the
errors):

ns16550.c: In function ‘ns16550_init_irq’:
ns16550.c:726:21: error: implicit declaration of function ‘create_irq’;
did you mean ‘release_irq’? [-Werror=implicit-function-declaration]
          uart->irq = create_irq(0, false);
                      ^~~~~~~~~~
                      release_irq
ns16550.c:726:21: error: nested extern declaration of ‘create_irq’
[-Werror=nested-externs]
ns16550.c: In function ‘ns16550_init_postirq’:
ns16550.c:768:33: error: ‘mmio_ro_ranges’ undeclared (first use in this
function); did you mean ‘mmio_handler’?
               rangeset_add_range(mmio_ro_ranges, uart->io_base,
                                  ^~~~~~~~~~~~~~
                                  mmio_handler
ns16550.c:768:33: note: each undeclared identifier is reported only once
for each function it appears in
ns16550.c:780:20: error: variable ‘msi’ has initializer but incomplete
type
              struct msi_info msi = {
                     ^~~~~~~~

Enabling support for NS16550 PCI card on Arm would require more plumbing
in addition to fixing the compilation error.

Arm systems tend to have platform UART available such as NS16550, PL011.
So there are limited reasons to get NS16550 PCI support for now on Arm.

Guard all remaining PCI code that is not under x86 flag with CONFIG_X86.

No functional change intended.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agox86/vioapic: fix usage of index in place of GSI in vioapic_write_redirent
Roger Pau Monné [Mon, 30 Nov 2020 13:06:38 +0000 (14:06 +0100)]
x86/vioapic: fix usage of index in place of GSI in vioapic_write_redirent

The usage of idx instead of the GSI in vioapic_write_redirent when
accessing gsi_assert_count can cause a PVH dom0 with multiple
vIO-APICs to lose interrupts in case a pin of a IO-APIC different than
the first one is unmasked with pending interrupts.

Switch to use gsi instead to fix the issue.

Fixes: 9f44b08f7d0e4 ('x86/vioapic: introduce support for multiple vIO APICS')
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Roger Pau Monné <roge.rpau@citrix.com>
Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/events: rework fifo queue locking
Juergen Gross [Mon, 30 Nov 2020 13:05:39 +0000 (14:05 +0100)]
xen/events: rework fifo queue locking

Two cpus entering evtchn_fifo_set_pending() for the same event channel
can race in case the first one gets interrupted after setting
EVTCHN_FIFO_PENDING and when the other one manages to set
EVTCHN_FIFO_LINKED before the first one is testing that bit. This can
lead to evtchn_check_pollers() being called before the event is put
properly into the queue, resulting eventually in the guest not seeing
the event pending and thus blocking forever afterwards.

Note that commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel
lock") made the race just more obvious, while the fifo event channel
implementation had this race forever since the introduction and use of
per-channel locks, when an unmask operation was running in parallel with
an event channel send operation.

Using a spinlock for the per event channel lock had turned out
problematic due to some paths needing to take the lock are called with
interrupts off, so the lock would need to disable interrupts, which in
turn broke some use cases related to vm events.

For avoiding this race the queue locking in evtchn_fifo_set_pending()
needs to be reworked to cover the test of EVTCHN_FIFO_PENDING,
EVTCHN_FIFO_MASKED and EVTCHN_FIFO_LINKED, too. Additionally when an
event channel needs to change queues both queues need to be locked
initially, in order to avoid having a window with no lock held at all.

Reported-by: Jan Beulich <jbeulich@suse.com>
Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
Fixes: de6acb78bf0e137c ("evtchn: use a per-event channel lock for sending events")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/events: modify struct evtchn layout
Juergen Gross [Mon, 30 Nov 2020 13:04:34 +0000 (14:04 +0100)]
xen/events: modify struct evtchn layout

In order to avoid latent races when updating an event channel put
xen_consumer and pending fields in different bytes. This is no problem
right now, but especially the pending indicator isn't used only when
initializing an event channel (unlike xen_consumer), so any future
addition to this byte would need to be done with a potential race kept
in mind.

At the same time move some other fields around to have less implicit
paddings and to keep related fields more closely together.

Finally switch struct evtchn to no longer use fixed sized types where
not needed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/pci: solve compilation error on ARM with HAS_PCI enabled
Rahul Singh [Fri, 27 Nov 2020 17:07:50 +0000 (18:07 +0100)]
xen/pci: solve compilation error on ARM with HAS_PCI enabled

If mem-sharing, mem-paging, or log-dirty functionality is not enabled
for architecture when HAS_PCI is enabled, the compiler will throw an
error.

Move code to x86 specific file to fix compilation error.

Also, modify the code to use likely() in place of unlikley() for each
condition to make code more optimized.

No functional change intended.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/pci: move x86 specific code to x86 directory
Rahul Singh [Fri, 27 Nov 2020 17:06:04 +0000 (18:06 +0100)]
xen/pci: move x86 specific code to x86 directory

passthrough/pci.c file is common for all architecture, but there is x86
specific code in this file.

Move x86 specific code to the drivers/passthrough/io.c file to avoid
compilation error for other architecture.

As drivers/passthrough/io.c is compiled only for x86 move it to
x86 directory and rename it to hvm.c.

No functional change intended.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoiommu: stop calling IOMMU page tables 'p2m tables'
Paul Durrant [Fri, 27 Nov 2020 17:04:47 +0000 (18:04 +0100)]
iommu: stop calling IOMMU page tables 'p2m tables'

It's confusing and not consistent with the terminology introduced with 'dfn_t'.
Just call them IOMMU page tables.

Also remove a pointless check of the 'acpi_drhd_units' list in
vtd_dump_page_table_level(). If the list is empty then IOMMU mappings would
not have been enabled for the domain in the first place.

NOTE: All calls to printk() have also been removed from
      iommu_dump_page_tables(); the implementation specific code is now
      responsible for all output.
      The check for the global 'iommu_enabled' has also been replaced by an
      ASSERT since iommu_dump_page_tables() is not registered as a key handler
      unless IOMMU mappings are enabled.
      Error messages are now prefixed with the name of the function.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agoiommu: remove the share_p2m operation
Paul Durrant [Fri, 27 Nov 2020 17:03:42 +0000 (18:03 +0100)]
iommu: remove the share_p2m operation

Sharing of HAP tables is now VT-d specific so the operation is never defined
for AMD IOMMU any more. There's also no need to pro-actively set vtd.pgd_maddr
when using shared EPT as it is straightforward to simply define a helper
function to return the appropriate value in the shared and non-shared cases.

NOTE: This patch also modifies unmap_vtd_domain_page() to take a const
      pointer since the only thing it calls, unmap_domain_page(), also takes
      a const pointer.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agoevtchn: double per-channel locking can't hit identical channels
Jan Beulich [Wed, 25 Nov 2020 13:08:14 +0000 (14:08 +0100)]
evtchn: double per-channel locking can't hit identical channels

Inter-domain channels can't possibly be bound to themselves, there's
always a 2nd channel involved, even when this is a loopback into the
same domain. As a result we can drop one conditional each from the two
involved functions.

With this, the number of evtchn_write_lock() invocations can also be
shrunk by half, swapping the two incoming function arguments instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agomm: check for truncation in vmalloc_type()
Jan Beulich [Wed, 25 Nov 2020 13:07:36 +0000 (14:07 +0100)]
mm: check for truncation in vmalloc_type()

While it's currently implied from the checking xmalloc_array() does,
let's make this more explicit in the function itself. As a result both
involved local variables don't need to have size_t type anymore. This
brings them in line with the rest of the code in this file.

Requested-by: Julien Grall <julien@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agox86: replace open-coded occurrences of sizeof_field()...
Paul Durrant [Wed, 25 Nov 2020 13:06:42 +0000 (14:06 +0100)]
x86: replace open-coded occurrences of sizeof_field()...

... with macro evaluations, now that it is available.

A recent patch imported the sizeof_field() macro from Linux. This patch makes
use of it in places where the construct is currently open-coded.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/include: import sizeof_field() macro from Linux stddef.h
Paul Durrant [Wed, 25 Nov 2020 13:06:27 +0000 (14:06 +0100)]
xen/include: import sizeof_field() macro from Linux stddef.h

Co-locate it with the definition of offsetof() (since this is also in stddef.h
in the Linux kernel source). This macro will be needed in a subsequent patch.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agotools/libs: fix uninstall rule for header files
Jan Beulich [Wed, 25 Nov 2020 13:05:52 +0000 (14:05 +0100)]
tools/libs: fix uninstall rule for header files

This again was working right only as long as $(LIBHEADER) consisted of
just one entry.

Fixes: bc44e2fb3199 ("tools: add a copy of library headers in tools/include")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agoxen/arm: Add workaround for Cortex-A55 erratum #1530923
Bertrand Marquis [Tue, 24 Nov 2020 11:12:15 +0000 (11:12 +0000)]
xen/arm: Add workaround for Cortex-A55 erratum #1530923

On the Cortex A55, TLB entries can be allocated by a speculative AT
instruction. If this is happening during a guest context switch with an
inconsistent page table state in the guest, TLBs with wrong values might
be allocated.
The ARM64_WORKAROUND_AT_SPECULATE workaround is used as for erratum
1165522 on Cortex A76 or Neoverse N1.

This change is also introducing the MIDR identifier for the Cortex-A55.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agomemory: fix off-by-one in XSA-346 change
Jan Beulich [Tue, 24 Nov 2020 13:01:31 +0000 (14:01 +0100)]
memory: fix off-by-one in XSA-346 change

The comparison against ARRAY_SIZE() needs to be >= in order to avoid
overrunning the pages[] array.

This is XSA-355.

Fixes: 5777a3742d88 ("IOMMU: hold page ref until after deferred TLB flush")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agons16550: drop stray "#ifdef CONFIG_HAS_PCI"
Jan Beulich [Tue, 24 Nov 2020 10:28:41 +0000 (11:28 +0100)]
ns16550: drop stray "#ifdef CONFIG_HAS_PCI"

There's no point wrapping the function invocation when
- the function body is already suitably wrapped,
- the function itself is unconditionally available.

Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
4 years agons16550: "com<N>=" command line options are x86-specific
Jan Beulich [Tue, 24 Nov 2020 10:28:15 +0000 (11:28 +0100)]
ns16550: "com<N>=" command line options are x86-specific

Pure code motion (plus the addition of "#ifdef CONFIG_X86); no
functional change intended.

Reported-by: Julien Grall <julien@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
4 years agons16550: move PCI arrays next to the function using them
Jan Beulich [Tue, 24 Nov 2020 10:27:49 +0000 (11:27 +0100)]
ns16550: move PCI arrays next to the function using them

Pure code motion; no functional change intended.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
4 years agox86/DMI: fix table mapping when one lives above 1Mb
Jan Beulich [Tue, 24 Nov 2020 10:26:34 +0000 (11:26 +0100)]
x86/DMI: fix table mapping when one lives above 1Mb

Use of __acpi_map_table() is kind of an abuse here, and doesn't work
anymore for the majority of cases if any of the tables lives outside the
low first Mb. Keep this (ab)use only prior to reaching SYS_STATE_boot,
primarily to avoid needing to audit whether any of the calls here can
happen this early in the first place; quite likely this isn't necessary
at all - at least dmi_scan_machine() gets called late enough.

For the "normal" case, call __vmap() directly, despite effectively
duplicating acpi_os_map_memory(). There's one difference though: We
shouldn't need to establish UC- mappings, WP or r/o WB mappings ought to
be fine, as the tables are going to live in either RAM or ROM. Short of
having PAGE_HYPERVISOR_WP and wanting to map the tables r/o anyway, use
the latter of the two options. The r/o mapping implies some
constification of code elsewhere in the file. For code touched anyway
also switch to void (where possible) or uint8_t.

Fixes: 1c4aa69ca1e1 ("xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/ACPI: fix mapping of FACS
Jan Beulich [Tue, 24 Nov 2020 10:26:02 +0000 (11:26 +0100)]
x86/ACPI: fix mapping of FACS

acpi_fadt_parse_sleep_info() runs when the system is already in
SYS_STATE_boot. Hence its direct call to __acpi_map_table() won't work
anymore. This call should probably have been replaced long ago already,
as the layering violation hasn't been necessary for quite some time.

Fixes: 1c4aa69ca1e1 ("xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/DMI: fix SMBIOS pointer range check
Jan Beulich [Tue, 24 Nov 2020 10:25:29 +0000 (11:25 +0100)]
x86/DMI: fix SMBIOS pointer range check

Forever since its introduction this has been using an inverted relation
operator.

Fixes: 54057a28f22b ("x86: support SMBIOS v3")
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen/events: access last_priority and last_vcpu_id together
Juergen Gross [Tue, 24 Nov 2020 10:23:42 +0000 (11:23 +0100)]
xen/events: access last_priority and last_vcpu_id together

The queue for a fifo event is depending on the vcpu_id and the
priority of the event. When sending an event it might happen the
event needs to change queues and the old queue needs to be kept for
keeping the links between queue elements intact. For this purpose
the event channel contains last_priority and last_vcpu_id values
elements for being able to identify the old queue.

In order to avoid races always access last_priority and last_vcpu_id
with a single atomic operation avoiding any inconsistencies.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agoamd-iommu: Fix Guest CR3 Table following c/s 3a7947b6901
Andrew Cooper [Wed, 3 Apr 2019 16:53:15 +0000 (17:53 +0100)]
amd-iommu: Fix Guest CR3 Table following c/s 3a7947b6901

"amd-iommu: use a bitfield for DTE" renamed iommu_dte_set_guest_cr3()'s gcr3
parameter to gcr3_mfn but ended up with an off-by-PAGE_SIZE error when
extracting bits from the address.

get_guest_cr3_from_dte() and iommu_dte_set_guest_cr3() are (almost) getters
and setters for the same field, so should live together.

Rename them to dte_{get,set}_gcr3_table() to specifically avoid 'guest_cr3' in
the name.  This field actually points to a table in memory containing an array
of guest CR3 values.  As these functions are used for different logical
indirections, they shouldn't use gfn/mfn terminology for their parameters.
Switch them to use straight uint64_t full addresses.

Fixes: 3a7947b6901 ("amd-iommu: use a bitfield for DTE")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoAMD/IOMMU: avoid UB in guest CR3 retrieval
Jan Beulich [Fri, 20 Nov 2020 07:28:58 +0000 (08:28 +0100)]
AMD/IOMMU: avoid UB in guest CR3 retrieval

Found by looking for patterns similar to the one Julien did spot in
pci_vtd_quirks(). (Not that it matters much here, considering the code
is dead right now.)

Fixes: 3a7947b69011 ("amd-iommu: use a bitfield for DTE")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoSVM: avoid UB in intercept mask definitions
Jan Beulich [Fri, 20 Nov 2020 07:28:27 +0000 (08:28 +0100)]
SVM: avoid UB in intercept mask definitions

Found by looking for patterns similar to the one Julien did spot in
pci_vtd_quirks().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/nmi: avoid UB for P4-era watchdogs
Jan Beulich [Fri, 20 Nov 2020 07:28:11 +0000 (08:28 +0100)]
x86/nmi: avoid UB for P4-era watchdogs

Found by looking for patterns similar to the one Julien did spot in
pci_vtd_quirks().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolib: split _ctype[] into its own object, under lib/
Jan Beulich [Fri, 20 Nov 2020 07:25:17 +0000 (08:25 +0100)]
lib: split _ctype[] into its own object, under lib/

This is, besides for tidying, in preparation of then starting to use an
archive rather than an object file for generic library code which
arch-es (or even specific configurations within a single arch) may or
may not need.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/arm: acpi: Allow Xen to boot with ACPI 5.1
Julien Grall [Thu, 19 Nov 2020 17:08:29 +0000 (17:08 +0000)]
xen/arm: acpi: Allow Xen to boot with ACPI 5.1

At the moment Xen requires the FADT ACPI table to be at least version
6.0, apparently because of some reliance on other ACPI v6.0 features.

But actually this is overzealous, and Xen works now fine with ACPI v5.1.

Let's relax the version check for the FADT table to allow QEMU to
run the hypervisor with ACPI.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agoxen/arm: gic: acpi: Use the correct length for the GICC structure
Julien Grall [Thu, 19 Nov 2020 17:08:28 +0000 (17:08 +0000)]
xen/arm: gic: acpi: Use the correct length for the GICC structure

The length of the GICC structure in the MADT ACPI table differs between
version 5.1 and 6.0, although there are no other relevant differences.

Use the BAD_MADT_GICC_ENTRY macro, which was specifically designed to
overcome this issue.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agoxen/arm: gic: acpi: Guard helpers to build the MADT with CONFIG_ACPI
Julien Grall [Thu, 19 Nov 2020 17:08:27 +0000 (17:08 +0000)]
xen/arm: gic: acpi: Guard helpers to build the MADT with CONFIG_ACPI

gic_make_hwdom_madt() and gic_get_hwdom_madt_size() are ACPI specific.

While they build fine today, this will change in a follow-up patch.
Rather than trying to fix the build on ACPI, it is best to avoid
compiling the helpers and the associated callbacks when CONFIG_ACPI=n.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoautomation/scripts/containerize: fix DOCKER_CMD=podman
Edwin Török [Tue, 17 Nov 2020 18:24:09 +0000 (18:24 +0000)]
automation/scripts/containerize: fix DOCKER_CMD=podman

On CentOS 8 with SELinux containerize doesn't work at all:

Make sure that the source code and SSH agent directories are passed on
with SELinux relabeling enabled.
(`-security-opt label=disabled` would be another option)

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
4 years agoci: drop CentOS 6
Doug Goldstein [Wed, 18 Nov 2020 16:27:06 +0000 (10:27 -0600)]
ci: drop CentOS 6

CentOS 6 is no longer supported by upstream so we cannot test against it
for future Xen releases.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen/arm: Add workaround for Cortex-A76/Neoverse-N1 erratum #1286807
Michal Orzel [Mon, 16 Nov 2020 12:11:40 +0000 (13:11 +0100)]
xen/arm: Add workaround for Cortex-A76/Neoverse-N1 erratum #1286807

On the affected Cortex-A76/Neoverse-N1 cores (r0p0 to r3p0),
if a virtual address for a cacheable mapping of a location is being
accessed by a core while another core is remapping the virtual
address to a new physical page using the recommended break-before-make
sequence, then under very rare circumstances TLBI+DSB completes before
a read using the translation being invalidated has been observed by
other observers. The workaround repeats the TLBI+DSB operation for all
the TLB flush operations. While this is stricly not necessary, we don't
want to take any risk.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/x86: issue pci_serr error message via NMI continuation
Juergen Gross [Wed, 18 Nov 2020 11:39:21 +0000 (12:39 +0100)]
xen/x86: issue pci_serr error message via NMI continuation

Instead of using a softirq pci_serr_error() can use NMI continuation
for issuing an error message.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/oprofile: use NMI continuation for sending virq to guest
Juergen Gross [Wed, 18 Nov 2020 11:38:53 +0000 (12:38 +0100)]
xen/oprofile: use NMI continuation for sending virq to guest

Instead of calling send_guest_vcpu_virq() from NMI context use the
NMI continuation framework for that purpose. This avoids taking locks
in NMI mode.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/x86: add nmi continuation framework
Juergen Gross [Wed, 18 Nov 2020 11:38:29 +0000 (12:38 +0100)]
xen/x86: add nmi continuation framework

Actions in NMI context are rather limited as e.g. locking is rather
fragile.

Add a framework to continue processing in normal interrupt context
after leaving NMI processing.

This is done by a high priority interrupt vector triggered via a
self IPI from NMI context, which will then call the continuation
function specified during NMI handling.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/vpt: fix build with old gcc
Jan Beulich [Wed, 18 Nov 2020 11:38:01 +0000 (12:38 +0100)]
x86/vpt: fix build with old gcc

I believe it was the XSA-336 fix (42fcdd42328f "x86/vpt: fix race when
migrating timers between vCPUs") which has unmasked a bogus
uninitialized variable warning. This is observable with gcc 4.3.4, but
only on 4.13 and older; it's hidden on newer versions apparently due to
the addition to _read_unlock() done by 12509bbeb9e3 ("rwlocks: call
preempt_disable() when taking a rwlock").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/p2m: split write_p2m_entry() hook
Jan Beulich [Wed, 18 Nov 2020 11:37:24 +0000 (12:37 +0100)]
x86/p2m: split write_p2m_entry() hook

Fair parts of the present handlers are identical; in fact
nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
common parts right into write_p2m_entry(), splitting the hooks into a
"pre" one (needed just by shadow code) and a "post" one.

For the common parts moved I think that the p2m_flush_nestedp2m() is,
at least from an abstract perspective, also applicable in the shadow
case. Hence it doesn't get a 3rd hook put in place.

The initial comment that was in hap_write_p2m_entry() gets dropped: Its
placement was bogus, and looking back the the commit introducing it
(dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
of a p2m it was meant to be associated with.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/HAP: move nested-P2M flush calculations out of locked region
Jan Beulich [Wed, 18 Nov 2020 11:34:54 +0000 (12:34 +0100)]
x86/HAP: move nested-P2M flush calculations out of locked region

By latching the old MFN into a local variable, these calculations don't
depend on anything but local variables anymore. Hence the point in time
when they get performed doesn't matter anymore, so they can be moved
past the locked region.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/p2m: suppress audit_p2m hook when possible
Jan Beulich [Wed, 18 Nov 2020 11:34:14 +0000 (12:34 +0100)]
x86/p2m: suppress audit_p2m hook when possible

When P2M_AUDIT is false, it's unused, so instead of having a dangling
NULL pointer sit there, omit the field altogether.

Instead of adding "#if P2M_AUDIT && defined(CONFIG_HVM)" in even more
places, fold the latter part right into the definition of P2M_AUDIT.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/p2m: collapse the two ->write_p2m_entry() hooks
Jan Beulich [Wed, 18 Nov 2020 11:33:18 +0000 (12:33 +0100)]
x86/p2m: collapse the two ->write_p2m_entry() hooks

The struct paging_mode instances get set to the same functions
regardless of mode by both HAP and shadow code, hence there's no point
having this hook there. The hook also doesn't need moving elsewhere - we
can directly use struct p2m_domain's. This merely requires (from a
strictly formal pov; in practice this may not even be needed) making
sure we don't end up using safe_write_pte() for nested P2Ms.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoxen/arm: Add Cortex-A73 erratum 858921 workaround
Penny Zheng [Mon, 9 Nov 2020 08:21:10 +0000 (16:21 +0800)]
xen/arm: Add Cortex-A73 erratum 858921 workaround

CNTVCT_EL0 or CNTPCT_EL0 counter read in Cortex-A73 (all versions)
might return a wrong value when the counter crosses a 32bit boundary.

Until now, there is no case for Xen itself to access CNTVCT_EL0,
and it also should be the Guest OS's responsibility to deal with
this part.

But for CNTPCT, there exists several cases in Xen involving reading
CNTPCT, so a possible workaround is that performing the read twice,
and to return one or the other depending on whether a transition has
taken place.

Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Reviewed-by: Wei Chen <Wei.Chen@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agox86/p2m: paging_write_p2m_entry() is a private function
Jan Beulich [Wed, 11 Nov 2020 07:57:32 +0000 (08:57 +0100)]
x86/p2m: paging_write_p2m_entry() is a private function

As it gets installed by p2m_pt_init(), it doesn't need to live in
paging.c. The function working in terms of l1_pgentry_t even further
indicates its non-paging-generic nature. Move it and drop its
paging_ prefix, not adding any new one now that it's static.

This then also makes more obvious that in the EPT case we wouldn't
risk mistakenly calling through the NULL hook pointer.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoxen/events: fix build
Juergen Gross [Wed, 11 Nov 2020 07:56:21 +0000 (08:56 +0100)]
xen/events: fix build

Commit 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
introduced a build failure for NDEBUG builds.

Fixes: 5f2df45ead7c1195 ("xen/evtchn: rework per event channel lock")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/arm: Always trap AMU system registers
Julien Grall [Mon, 9 Nov 2020 20:28:59 +0000 (20:28 +0000)]
xen/arm: Always trap AMU system registers

The Activity Monitors Unit (AMU) has been introduced by ARMv8.4. It is
considered to be unsafe to be expose to guests as they might expose
information about code executed by other guests or the host.

Arm provided a way to trap all the AMU system registers by setting
CPTR_EL2.TAM to 1.

Unfortunately, on older revision of the specification, the bit 30 (now
CPTR_EL1.TAM) was RES0. Because of that, Xen is setting it to 0 and
therefore the system registers would be exposed to the guest when it is
run on processors with AMU.

As the bit is mark as UNKNOWN at boot in Armv8.4, the only safe solution
for us is to always set CPTR_EL1.TAM to 1.

Guest trying to access the AMU system registers will now receive an
undefined instruction. Unfortunately, this means that even well-behaved
guest may fail to boot because we don't sanitize the ID registers.

This is a known issues with other Armv8.0+ features (e.g. SVE, Pointer
Auth). This will taken care separately.

This is part of XSA-351 (or XSA-93 re-born).

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agox86/CPUID: also check leaf 7 max subleaf to be compatible
Jan Beulich [Tue, 10 Nov 2020 13:40:09 +0000 (14:40 +0100)]
x86/CPUID: also check leaf 7 max subleaf to be compatible

Just like is done for basic and extended major leaves.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/CPUID: suppress IOMMU related hypervisor leaf data
Jan Beulich [Tue, 10 Nov 2020 13:39:30 +0000 (14:39 +0100)]
x86/CPUID: suppress IOMMU related hypervisor leaf data

Now that the IOMMU for guests can't be enabled "on demand" anymore,
there's also no reason to expose the related CPUID bit "just in case".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/CPUID: don't use UB shift when library is built as 32-bit
Jan Beulich [Tue, 10 Nov 2020 13:39:03 +0000 (14:39 +0100)]
x86/CPUID: don't use UB shift when library is built as 32-bit

At least the insn emulator test harness will continue to be buildable
(and ought to continue to be usable) also as a 32-bit binary. (Right now
the CPU policy test harness is, too, but there it may be less relevant
to keep it functional, just like e.g. we don't support fuzzing the insn
emulator in 32-bit mode.) Hence the library code needs to cope with
this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen/evtchn: revert 52e1fc47abc3a0123
Juergen Gross [Tue, 10 Nov 2020 13:37:15 +0000 (14:37 +0100)]
xen/evtchn: revert 52e1fc47abc3a0123

With the event channel lock no longer disabling interrupts commit
52e1fc47abc3a0123 ("evtchn/Flask: pre-allocate node on send path") can
be reverted again.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/evtchn: rework per event channel lock
Juergen Gross [Tue, 10 Nov 2020 13:36:15 +0000 (14:36 +0100)]
xen/evtchn: rework per event channel lock

Currently the lock for a single event channel needs to be taken with
interrupts off, which causes deadlocks in some cases.

Rework the per event channel lock to be non-blocking for the case of
sending an event and removing the need for disabling interrupts for
taking the lock.

The lock is needed for avoiding races between event channel state
changes (creation, closing, binding) against normal operations (set
pending, [un]masking, priority changes).

Use a rwlock, but with some restrictions:

- Changing the state of an event channel (creation, closing, binding)
  needs to use write_lock(), with ASSERT()ing that the lock is taken as
  writer only when the state of the event channel is either before or
  after the locked region appropriate (either free or unbound).

- Sending an event needs to use read_trylock() mostly, in case of not
  obtaining the lock the operation is omitted. This is needed as
  sending an event can happen with interrupts off (at least in some
  cases).

- Dumping the event channel state for debug purposes is using
  read_trylock(), too, in order to avoid blocking in case the lock is
  taken as writer for a long time.

- All other cases can use read_lock().

Fixes: e045199c7c9c54 ("evtchn: address races with evtchn_reset()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>