]> xenbits.xensource.com Git - people/dwmw2/xen.git/log
people/dwmw2/xen.git
5 years agolibxl: add initializers for libxl__domid_history
Paul Durrant [Wed, 26 Feb 2020 13:12:13 +0000 (13:12 +0000)]
libxl: add initializers for libxl__domid_history

This patch fixes Coverity issue CID 1459006 (Insecure data handling
(INTEGER_OVERFLOW)).

The problem is that the error paths for libxl__mark_domid_recent() and
libxl__is_domid_recent() check the 'f' field in struct libxl__domid_history
when it may not have been initialized.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agodomctl: fix typo in comment
Olaf Hering [Wed, 26 Feb 2020 16:13:39 +0000 (17:13 +0100)]
domctl: fix typo in comment

Add missing 'a' to sharing.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
5 years agobuild: remove use of AFLAGS-y
Anthony PERARD [Wed, 26 Feb 2020 16:41:53 +0000 (17:41 +0100)]
build: remove use of AFLAGS-y

And simply add directly to AFLAGS.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: remove confusing comment on the %.s:%.S rule
Anthony PERARD [Wed, 26 Feb 2020 16:41:37 +0000 (17:41 +0100)]
build: remove confusing comment on the %.s:%.S rule

That comment was introduce by 3943db776371 ("[XEN] Can be built
-std=gnu99 (except for .S files).") to explain why CFLAGS was removed
from the command line. The comment is already written where the
-std=gnu flags gets remove from AFLAGS, no need to repeat it.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoMakefile: fix install-tests
Anthony PERARD [Wed, 26 Feb 2020 16:41:02 +0000 (17:41 +0100)]
Makefile: fix install-tests

The top-level makefile make uses of internal implementation detail of
the xen build system. Avoid that by creating a new target
"install-tests" in xen/Makefile, and by fixing the top-level Makefile
to not call xen/Rules.mk anymore.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/include: remove include of Config.mk
Anthony PERARD [Wed, 26 Feb 2020 16:40:06 +0000 (17:40 +0100)]
xen/include: remove include of Config.mk

It isn't necessary to include Config.mk here because this Makefile is
only used by xen/Rules.mk which already includes Config.mk.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/smp: do not use scratch_cpumask when in interrupt or exception context
Roger Pau Monné [Wed, 26 Feb 2020 16:38:58 +0000 (17:38 +0100)]
x86/smp: do not use scratch_cpumask when in interrupt or exception context

Using scratch_cpumask in send_IPI_mask is not safe in IRQ or exception
context because it can nest, and hence send_IPI_mask could be
overwriting another user scratch cpumask data when used in such
contexts.

Fallback to not using the scratch cpumask (and hence not attemping to
optimize IPI sending by using a shorthand) when in IRQ or exception
context. Note that the scratch cpumask cannot be used when
non-maskable interrupts are being serviced (NMI or #MC) and hence
fallback to not using the shorthand in that case, like it was done
previously.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: track when in #MC context
Roger Pau Monné [Wed, 26 Feb 2020 16:38:11 +0000 (17:38 +0100)]
x86: track when in #MC context

Add helpers to track when executing in #MC handler context. This is
modeled after the in_irq helpers.

Note that there are no users of in_mce_handler() introduced by the
change, further users will be added by followup changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: track when in NMI context
Roger Pau Monné [Wed, 26 Feb 2020 16:37:22 +0000 (17:37 +0100)]
x86: track when in NMI context

Add helpers to track when running in NMI handler context. This is
modeled after the in_irq helpers.

The SDM states that no NMI can be delivered while handling a NMI
until the processor has executed an iret instruction. It's possible
however that another fault is received while handling the NMI (a #MC
for example), and thus the iret from that fault would allow further
NMIs to be injected while still processing the previous one, and
hence an integer is needed in order to keep track of in service NMIs.
The added macros only track when the execution context is in the NMI
handler, but that doesn't mean NMIs are blocked for the reasons listed
above.

Note that there are no users of in_nmi_handler() introduced by the
change, further users will be added by followup changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: introduce a nmi_count tracking variable
Roger Pau Monné [Wed, 26 Feb 2020 16:36:30 +0000 (17:36 +0100)]
x86: introduce a nmi_count tracking variable

This is modeled after the irq_count variable, and is used to account
for all the NMIs handled by the system.

This will allow to repurpose the nmi_count() helper so it can be used
in a similar manner as local_irq_count(): account for the NMIs
currently in service.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/vPMU: don't blindly assume IA32_PERF_CAPABILITIES MSR exists
Jan Beulich [Wed, 26 Feb 2020 16:35:48 +0000 (17:35 +0100)]
x86/vPMU: don't blindly assume IA32_PERF_CAPABILITIES MSR exists

Just like VMX'es lbr_tsx_fixup_check() the respective CPUID bit should
be consulted first.

Reported-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mm: drop p2mt parameter from map_domain_gfn()
Jan Beulich [Wed, 26 Feb 2020 16:35:07 +0000 (17:35 +0100)]
x86/mm: drop p2mt parameter from map_domain_gfn()

No caller actually consumes it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoSVM: drop asm/hvm/emulate.h inclusion from vmcb.h
Jan Beulich [Wed, 26 Feb 2020 16:33:57 +0000 (17:33 +0100)]
SVM: drop asm/hvm/emulate.h inclusion from vmcb.h

It's not needed there and introduces a needless, almost global
dependency. Include the file (or in some cases just xen/err.h) where
actually needed, or - in one case - simply forward-declare a struct. In
microcode*.c take the opportunity and also re-order a few other
#include-s.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/sysctl: Don't return cpu policy data for compiled-out support
Andrew Cooper [Tue, 25 Feb 2020 16:57:03 +0000 (16:57 +0000)]
x86/sysctl: Don't return cpu policy data for compiled-out support

Policy objects aren't tiny, and the derivation logic isn't trivial.  We are
about to increase the number of policy objects, so will have the opportunity
to drop logic and storage space based on CONFIG_{PV,HVM}.

Start by causing XEN_SYSCTL_get_cpu_policy to fail with -EOPNOTSUPP when
requesting data for a compiled-out subsystem.  Update xen-cpuid to cope and
continue to further system policies, seeing as the indicies are interleaved.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/gen-cpuid: Fix Py2/3 compatibility
Andrew Cooper [Tue, 25 Feb 2020 15:43:55 +0000 (15:43 +0000)]
x86/gen-cpuid: Fix Py2/3 compatibility

There is a fencepost error on the sys.version_info check which will break on
Python 3.0.  Reverse the logic to make py2 compatible with py3 (rather than
py3 compatible with py2) which will be more natural to follow as py2 usage
reduces.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agons16550: Re-order the serial port address checking
Wei Xu [Wed, 26 Feb 2020 09:56:23 +0000 (10:56 +0100)]
ns16550: Re-order the serial port address checking

The serial port address space ID qualifies the address. Whether a value
of zero for the serial port address can sensibly mean "disabled" depends
on the address space ID. Hence check the address space ID before
checking the address.

Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agosmp: convert cpu_hotplug_begin into a blocking lock acquisition
Roger Pau Monné [Wed, 26 Feb 2020 09:55:22 +0000 (10:55 +0100)]
smp: convert cpu_hotplug_begin into a blocking lock acquisition

Don't allow cpu_hotplug_begin to fail by converting the trylock into a
blocking lock acquisition. Write users of the cpu_add_remove_lock are
limited to CPU plug/unplug operations, and cannot deadlock between
themselves or other users taking the lock in read mode as
cpu_add_remove_lock is always locked with interrupts enabled. There
are also no other locks taken during the plug/unplug operations.

The exclusive lock usage in register_cpu_notifier is also converted
into a blocking lock acquisition, as it was previously not allowed to
fail anyway.

This is meaningful when running Xen in shim mode, since VCPU_{up/down}
hypercalls use cpu hotplug/unplug operations in the background, and
hence failing to take the lock results in VPCU_{up/down} failing with
-EBUSY, which most users are not prepared to handle.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agorwlock: allow recursive read locking when already locked in write mode
Roger Pau Monné [Wed, 26 Feb 2020 09:53:03 +0000 (10:53 +0100)]
rwlock: allow recursive read locking when already locked in write mode

Allow a CPU already holding the lock in write mode to also lock it in
read mode. There's no harm in allowing read locking a rwlock that's
already owned by the caller (ie: CPU) in write mode. Allowing such
accesses is required at least for the CPU maps use-case.

In order to do this reserve 12bits of the lock, this allows to support
up to 4096 CPUs. Also reduce the write lock mask to 2 bits: one to
signal there are pending writers waiting on the lock and the other to
signal the lock is owned in write mode.

This reduces the maximum number of concurrent readers from 16777216 to
262144, I think this should still be enough, or else the lock field
can be expanded from 32 to 64bits if all architectures support atomic
operations on 64bit integers.

Fixes: 5872c83b42c608 ('smp: convert the cpu maps lock into a rw lock')
Reported-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Jürgen Groß <jgross@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoatomic: add atomic_and operations
Roger Pau Monné [Wed, 26 Feb 2020 09:51:31 +0000 (10:51 +0100)]
atomic: add atomic_and operations

To x86 and Arm. This performs an atomic AND operation against an
atomic_t variable with the provided mask.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien@xen.org>
5 years agosched: rework credit2 run-queue allocation
Juergen Gross [Wed, 26 Feb 2020 09:50:26 +0000 (10:50 +0100)]
sched: rework credit2 run-queue allocation

Currently the memory for each run-queue of the credit2 scheduler is
allocated at the scheduler's init function: for each cpu in the system
a struct csched2_runqueue_data is being allocated, even if the
current scheduler only handles one physical cpu or is configured to
work with a single run-queue. As each struct contains 4 cpumasks this
sums up to rather large memory sizes pretty fast.

Rework the memory allocation for run-queues to be done only when
needed, i.e. when adding a physical cpu to the scheduler requiring a
new run-queue.

In fact this fixes a bug in credit2 related to run-queue handling:
cpu_to_runqueue() will return the first free or matching run-queue,
which ever is found first. So in case a cpu is removed from credit2
this could result in e.g. run-queue 0 becoming free, so when another
cpu is added it will in any case be assigned to that free run-queue,
even if it would have found another run-queue matching later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86/pvh: drop v2 suffix from pvh.pandoc
Wei Liu [Tue, 25 Feb 2020 14:22:32 +0000 (14:22 +0000)]
x86/pvh: drop v2 suffix from pvh.pandoc

There is now only one version of PVH implementation in Xen. Drop "v2" to
avoid confusion.

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agolibxl: fix build with older glibc
Paul Durrant [Tue, 25 Feb 2020 12:33:43 +0000 (12:33 +0000)]
libxl: fix build with older glibc

Commit 2b02882ebbbc "libxl: add infrastructure to track and query
'recent' domids" added a call to clock_gettime() into libxl. The man-
page for this states:

"Link with -lrt (only for glibc versions before 2.17)."

Unfortunately Centos 6 does have an glibc prior to that version, and the
libxl Makefile was not updated to add '-lrt' so the build will fail in
that environment.

This patch simply adds '-lrt' to LIBXL_LIBS unconditionally, as it does
no harm in newer environments.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Fixes: 2b02882ebbbc ("libxl: add infrastructure to track and query 'recent' domids")
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/dom0_build: PVH ABI is now in pvh.pandoc
Wei Liu [Sun, 23 Feb 2020 21:13:30 +0000 (21:13 +0000)]
x86/dom0_build: PVH ABI is now in pvh.pandoc

Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxl: allow domid to be preserved on save/restore or migrate
Paul Durrant [Wed, 8 Jan 2020 15:40:55 +0000 (15:40 +0000)]
xl: allow domid to be preserved on save/restore or migrate

This patch adds a '-D' command line option to save and migrate to allow
the domain id to be incorporated into the saved domain configuration and
hence be preserved.

NOTE: Logically it may seem as though preservation of domid should be
      dealt with by libxl, but the libxl migration stream has no record
      in which to transfer domid and remote domain creation occurs before
      the migration stream is parsed. Hence this patch modifies xl rather
      then libxl.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxl.conf: introduce 'domid_policy'
Paul Durrant [Wed, 8 Jan 2020 12:32:14 +0000 (12:32 +0000)]
xl.conf: introduce 'domid_policy'

This patch adds a new global 'domid_policy' configuration option to decide
how domain id values are allocated for new domains. It may be set to one of
two values:

"xen", the default value, will cause an invalid domid value to be passed
to do_domain_create() preserving the existing behaviour of having Xen
choose the domid value during domain_create().

"random" will cause the special RANDOM_DOMID value to be passed to
do_domain_create() such that libxl__domain_make() will select a random
domid value.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: allow creation of domains with a specified or random domid
Paul Durrant [Mon, 23 Dec 2019 17:16:20 +0000 (17:16 +0000)]
libxl: allow creation of domains with a specified or random domid

This patch adds a 'domid' field to libxl_domain_create_info and then
modifies libxl__domain_make() to have Xen use that value if it is valid.
If the domid value is invalid then Xen will choose the domid, as before,
unless the value is the new special RANDOM_DOMID value added to the API.
This value instructs libxl__domain_make() to choose a random domid value
for Xen to use.

If Xen determines that a domid specified to or chosen by
libxl__domain_make() co-incides with an existing domain then the create
operation will fail. In this case, if RANDOM_DOMID was specified to
libxl__domain_make() then a new random value will be chosen and the create
operation will be re-tried, otherwise libxl__domain_make() will fail.

After Xen has successfully created a new domain, libxl__domain_make() will
check whether its domid matches any recently used domid values. If it does
then the domain will be destroyed. If the domid used in creation was
specified to libxl__domain_make() then it will fail at this point,
otherwise the create operation will be re-tried with either a new random
or Xen-selected domid value.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agopublic/xen.h: add a definition for a 'valid domid' mask
Paul Durrant [Wed, 19 Feb 2020 08:53:31 +0000 (08:53 +0000)]
public/xen.h: add a definition for a 'valid domid' mask

A subsequent patch will modify libxl to allow selection of a random domid
value when creating domains. Valid values are limited to a width of 15 bits,
so add an appropriate mask definition to the public header.

NOTE: It is reasonable for this mask definition to be in a Xen public header
      rather than in, say, a libxenctrl header since it relates to the
      validity of a value passed to XEN_DOMCTL_createdomain. This new
      definition is placed in xen.h rather than domctl.h only to co-locate
      it with other domid-related defitions.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agolibxl: add infrastructure to track and query 'recent' domids
Paul Durrant [Tue, 7 Jan 2020 13:46:45 +0000 (13:46 +0000)]
libxl: add infrastructure to track and query 'recent' domids

A domid is considered recent if the domain it represents was destroyed
less than a specified number of seconds ago. For debugging and/or testing
purposes the number can be set using the environment variable
LIBXL_DOMID_REUSE_TIMEOUT. If the variable does not exist then a default
value of 60s is used.

Whenever a domain is destroyed, a time-stamped record will be written into
a history file (/var/run/xen/domid-history). To avoid the history file
growing too large, any records with time-stamps that indicate that the
age of a domid has exceeded the re-use timeout will also be purged.

A new utility function, libxl__is_recent_domid(), has been added. This
function reads the same history file checking whether a specified domid
has a record that does not exceed the re-use timeout. Since this utility
function does not write to the file, no records are actually purged by it.

NOTE: The history file is purged on boot to it is safe to use
      CLOCK_MONOTONIC as a time source.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/msr: Drop {pv,hvm}_max_vcpu_msrs objects
Andrew Cooper [Mon, 24 Feb 2020 13:52:24 +0000 (13:52 +0000)]
x86/msr: Drop {pv,hvm}_max_vcpu_msrs objects

It turns out that these are unused, and we dup a type-dependent block of
zeros.  Use xzalloc() instead.

Read/write MSRs typically default 0, and non-zero defaults would need dealing
with at suitable INIT/RESET points (e.g. arch_vcpu_regs_init).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/msr: Start cleaning up msr-index.h
Andrew Cooper [Fri, 25 May 2018 15:12:05 +0000 (16:12 +0100)]
x86/msr: Start cleaning up msr-index.h

Make a start on cleaning up the constants in msr-index.h.

No functional change - only formatting changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agons16550: add ACPI support for ARM only
Wei Xu [Fri, 21 Feb 2020 16:20:22 +0000 (17:20 +0100)]
ns16550: add ACPI support for ARM only

Parse the ACPI SPCR table and initialize the 16550 compatible serial port
for ARM only. Currently we only support one UART on ARM. Some fields
which we do not care yet on ARM are ignored.

Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agox86/p2m: drop p2m_access_t parameter from set_mmio_p2m_entry()
Jan Beulich [Fri, 21 Feb 2020 16:19:16 +0000 (17:19 +0100)]
x86/p2m: drop p2m_access_t parameter from set_mmio_p2m_entry()

Both callers request the host P2M's default access, which can as well be
done inside the function. While touching this anyway, make the "gfn"
parameter type-safe as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@ciitrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: p2m_flags_to_type() deals only with "unsigned int"
Jan Beulich [Fri, 21 Feb 2020 16:16:25 +0000 (17:16 +0100)]
x86/p2m: p2m_flags_to_type() deals only with "unsigned int"

PTE flags, for now at least, get stored in "unsigned int". Hence there's
no need to widen the values to "unsigned long" before processing them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: adjust non-PoD accounting in p2m_pod_decrease_reservation()
Jan Beulich [Fri, 21 Feb 2020 16:15:22 +0000 (17:15 +0100)]
x86/p2m: adjust non-PoD accounting in p2m_pod_decrease_reservation()

Throughout the function the equation

pod + nonpod == (1UL << order)

should hold. This has been violated by the final loop of the function:
* changing a range from a type other than p2m_populate_on_demand to
  p2m_invalid doesn't alter the amount of non-PoD pages in the region,
* changing a range from p2m_populate_on_demand to p2m_invalid does
  increase the amount of non-PoD pages in the region along with
  decreasing the amount of PoD pages there.
Fortunately the variable isn't used anymore after the loop. Instead of
correcting the updating of the "nonpod" variable, however, drop it
altogether, to avoid getting the above equation to not hold again by a
future change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: fix PoD accounting in guest_physmap_add_entry()
Jan Beulich [Fri, 21 Feb 2020 16:09:28 +0000 (17:09 +0100)]
x86/p2m: fix PoD accounting in guest_physmap_add_entry()

The initial observation was that the mfn_valid() check comes too late:
Neither mfn_add() nor mfn_to_page() (let alone de-referencing the
result of the latter) are valid for MFNs failing this check. Move it up
and - noticing that there's no caller doing so - also add an assertion
that this should never produce "false" here.

In turn this would have meant that the "else" to that if() could now go
away, which didn't seem right at all. And indeed, considering callers
like memory_exchange() or various grant table functions, the PoD
accounting should have been outside of that if() from the very
beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/public: Obsolete HVM_PARAM_PAE_ENABLED
Andrew Cooper [Wed, 5 Feb 2020 14:33:00 +0000 (14:33 +0000)]
xen/public: Obsolete HVM_PARAM_PAE_ENABLED

Xen has never acted upon the value of HVM_PARAM_PAE_ENABLED, contrary perhaps
to expectations based on how other boolean fields work.

It was only ever used as a non-standard calling convention for
xc_cpuid_apply_policy() but that has been fixed now.

Purge its use, and any possible confusion over its behaviour, by having Xen
reject any attempts to use it.  Forgo setting it up in libxl's
hvm_set_conf_params().  The only backwards compatibility necessary is to have
the HVM restore stream discard it if found.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/hvm: Fix handling of obsolete HVM_PARAMs
Andrew Cooper [Thu, 6 Feb 2020 12:40:50 +0000 (12:40 +0000)]
xen/hvm: Fix handling of obsolete HVM_PARAMs

The local xc_hvm_param_deprecated_check() in libxc tries to guess Xen's
behaviour for the MEMORY_EVENT params, but is wrong for the get side, where
Xen would return 0 (which is also a bug).  Delete the helper.

In Xen, perform the checks in hvm_allow_set_param(), rather than
hvm_set_param(), and actually implement checks on the get side so the
hypercall doesn't return successfully with 0 as an answer.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/splitlock: CPUID and MSR details
Andrew Cooper [Mon, 23 Dec 2019 14:10:29 +0000 (14:10 +0000)]
x86/splitlock: CPUID and MSR details

A splitlock is an atomic operation which crosses a cache line boundary.  It
serialises operations in the cache coherency fabric and comes with a
multi-thousand cycle stall.

Intel Tremont CPUs introduce MSR_CORE_CAPS to enumerate various core-specific
features, and MSR_TEST_CTRL to adjust the behaviour in the case of a
splitlock.

Virtualising this for guests is distinctly tricky owing to the fact that
MSR_TEST_CTRL has core rather than thread scope.  In the meantime however,
prevent the MSR values leaking into guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/p2m: Allow p2m_get_page_from_gfn to return shared entries
Tamas K Lengyel [Mon, 10 Feb 2020 19:21:25 +0000 (11:21 -0800)]
x86/p2m: Allow p2m_get_page_from_gfn to return shared entries

The owner domain of shared pages is dom_cow, use that for get_page
otherwise the function fails to return the correct page under some
situations. The check if dom_cow should be used was only performed in
a subset of use-cases. Fixing the error and simplifying the existing check
since we can't have any shared entries with dom_cow being NULL.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/arm: Restrict access to most HVM_PARAM's
Andrew Cooper [Wed, 5 Sep 2018 13:38:42 +0000 (14:38 +0100)]
xen/arm: Restrict access to most HVM_PARAM's

ARM currently has no restrictions on toolstack and guest access to the entire
HVM_PARAM block.  As the monitor feature isn't under security support, this
doesn't need an XSA.

The CALLBACK_IRQ and {STORE,CONSOLE}_{PFN,EVTCHN} details are only exposed
read-only to the guest, while MONITOR_RING_PFN is restricted to only toolstack
access.  No other parameters are used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>
5 years agoMAINTAINERS: Step back to designated reviewer for mm/
George Dunlap [Thu, 20 Feb 2020 18:09:17 +0000 (18:09 +0000)]
MAINTAINERS: Step back to designated reviewer for mm/

With having to take over Lars' role as community manager, I don't have
the necessary time to review the mm/ subsystem.  Step back to being only
a designated reviewer, reverting mantainership to the x86 maintianers.

While here, fix my e-mail address in other places.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl: modify libxl__logv() to only log valid domid values
Paul Durrant [Fri, 21 Feb 2020 11:20:45 +0000 (11:20 +0000)]
libxl: modify libxl__logv() to only log valid domid values

Some code-paths use values other than INVALID_DOMID to indicate an invalid
domain id. Specifically, xl will pass a value of 0 when creating/restoring
a domain. Therefore modify libxl__logv() to use libxl_domid_valid_guest()
as a validity test.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxen: Move async_exception_* infrastructure into x86
Andrew Cooper [Thu, 13 Feb 2020 12:58:35 +0000 (12:58 +0000)]
xen: Move async_exception_* infrastructure into x86

The async_exception_{state,mask} infrastructure is implemented in common code,
but is limited to x86 because of the VCPU_TRAP_LAST ifdef-ary.

The internals are very x86 specific (and even then, in need of correction),
and won't be of interest to other architectures.  Move it all into x86
specific code.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/nmi: Corrections and improvements to do_nmi_stats()
Andrew Cooper [Thu, 13 Feb 2020 14:06:50 +0000 (14:06 +0000)]
x86/nmi: Corrections and improvements to do_nmi_stats()

The hardware domain doesn't necessarily have the domid 0.  Render v instead,
adjusting the strings to avoid printing trailing whitespace.

Rename i to cpu, and use separate booleans for pending/masked.  Drop the
unnecessary domain local variable.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/msr: Virtualise MSR_PLATFORM_ID properly
Andrew Cooper [Tue, 30 Apr 2019 11:07:04 +0000 (12:07 +0100)]
x86/msr: Virtualise MSR_PLATFORM_ID properly

This is an Intel-only, read-only MSR related to microcode loading.  Expose it
in similar circumstances as the PATCHLEVEL MSR.

This should have been alongside c/s 013896cb8b2 "x86/msr: Fix handling of
MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV"

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoMAINTAINERS: make Roger VPCI maintainer
Wei Liu [Thu, 20 Feb 2020 15:58:43 +0000 (15:58 +0000)]
MAINTAINERS: make Roger VPCI maintainer

Roger has kindly agreed to take on the burden.

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86: introduce a new set of APIs to manage Xen page tables
Wei Liu [Tue, 28 Jan 2020 13:50:05 +0000 (13:50 +0000)]
x86: introduce a new set of APIs to manage Xen page tables

We are going to switch to using domheap page for page tables.
A new set of APIs is introduced to allocate and free pages of page
tables based on mfn instead of the xenheap direct map address. The
allocation and deallocation work on mfn_t but not page_info, because
they are required to work even before frame table is set up.

Implement the old functions with the new ones. We will rewrite, site
by site, other mm functions that manipulate page tables to use the new
APIs.

After the allocation, one needs to map and unmap via map_domain_page to
access the PTEs. This does not break xen half way, since the new APIs
still use xenheap pages underneath, and map_domain_page will just use
the directmap for mappings. They will be switched to use domheap and
dynamic mappings when usage of old APIs is eliminated.

No functional change intended in this patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/xentop: Cleanup some trailing whitespace
Sander Eikelenboom [Wed, 19 Feb 2020 20:31:32 +0000 (21:31 +0100)]
tools/xentop: Cleanup some trailing whitespace

Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/xentop: Remove dead code
Sander Eikelenboom [Wed, 19 Feb 2020 20:31:31 +0000 (21:31 +0100)]
tools/xentop: Remove dead code

The freeable_mb variable was made to always be zero when purging tmem
from tools. We can in fact just delete it and the code associated with
it.

Fixes: c588c002cc1 ("tools: remove tmem code and commands")
Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/xentop: Fix calculation of used memory
Sander Eikelenboom [Wed, 19 Feb 2020 20:31:30 +0000 (21:31 +0100)]
tools/xentop: Fix calculation of used memory

Used memory should be calculated by subtracting free memory from total
memory.

Fixes: c588c002cc1 ("tools: remove tmem code and commands")
Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Wei Liu <wl@xen.org>
5 years agosched: don't disable interrupts all the time when dumping run-queues
Juergen Gross [Thu, 20 Feb 2020 10:38:31 +0000 (11:38 +0100)]
sched: don't disable interrupts all the time when dumping run-queues

Having interrupts disabled all the time when running dump_runq() is
not necessary. All the called functions are doing proper locking
and disable interrupts if needed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoAMD/IOMMU: drop unused PCI-generic #define-s
Jan Beulich [Thu, 20 Feb 2020 10:38:00 +0000 (11:38 +0100)]
AMD/IOMMU: drop unused PCI-generic #define-s

Quite possibly they had been in use when some of the PCI interfacing was
done in an ad hoc way rather than using the PCI functions we have. Right
now these have no users (left).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: "spec-ctrl=no-xen" should also disable branch hardening
Jan Beulich [Thu, 20 Feb 2020 10:37:01 +0000 (11:37 +0100)]
x86: "spec-ctrl=no-xen" should also disable branch hardening

This is controlling Xen behavior alone, after all.

Reported-by: Jin Nan Wang <jnwang@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agosched: add some diagnostic info in the run queue keyhandler
Juergen Gross [Thu, 20 Feb 2020 10:36:16 +0000 (11:36 +0100)]
sched: add some diagnostic info in the run queue keyhandler

When dumping the run queue information add some more data regarding
current and (if known) previous vcpu for each physical cpu.

With core scheduling activated the printed data will be e.g.:

(XEN) CPUs info:
(XEN) CPU[00] current=d[IDLE]v0, curr=d[IDLE]v0, prev=NULL
(XEN) CPU[01] current=d[IDLE]v1
(XEN) CPU[02] current=d[IDLE]v2, curr=d[IDLE]v2, prev=NULL
(XEN) CPU[03] current=d[IDLE]v3

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agosmp: convert the cpu maps lock into a rw lock
Roger Pau Monné [Wed, 19 Feb 2020 15:09:03 +0000 (16:09 +0100)]
smp: convert the cpu maps lock into a rw lock

Most users of the cpu maps just care about the maps not changing while
the lock is being held, but don't actually modify the maps.

Convert the lock into a rw lock, and take the lock in read mode in
get_cpu_maps and in write mode in cpu_hotplug_begin. This will lower
the contention around the lock, since plug and unplug operations that
take the lock in write mode are not that common.

Note that the read lock can be taken recursively (as it's a shared
lock), and hence will keep the same behavior as the previously used
recursive lock. As for the write lock, it's only used by CPU
plug/unplug operations, and the lock is never taken recursively in
that case.

While there also change get_cpu_maps return type to bool.

Reported-by: Julien Grall <julien@xen.org>
Suggested-also-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agosched: fix get_cpu_idle_time() with core scheduling
Juergen Gross [Wed, 19 Feb 2020 15:08:11 +0000 (16:08 +0100)]
sched: fix get_cpu_idle_time() with core scheduling

get_cpu_idle_time() is calling vcpu_runstate_get() for an idle vcpu.
With core scheduling active this is fragile, as idle vcpus are assigned
to other scheduling units temporarily, and that assignment is changed
in some cases without holding the scheduling lock, and
vcpu_runstate_get() is using v->sched_unit as parameter for
unit_schedule_[un]lock_irq(), resulting in an ASSERT() triggering in
unlock in case v->sched_unit has changed meanwhile.

Fix that by using a local unit variable holding the correct unit.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agosysctl: use xmalloc_array() for XEN_SYSCTL_page_offline_op
Jan Beulich [Tue, 18 Feb 2020 16:52:10 +0000 (17:52 +0100)]
sysctl: use xmalloc_array() for XEN_SYSCTL_page_offline_op

This is more robust than the raw xmalloc_bytes().

Also add a sanity check on the input page range, to avoid returning
the less applicable -ENOMEM in such cases (and trying the allocation in
the first place).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
5 years agonvmx: always trap accesses to x2APIC MSRs
Roger Pau Monne [Wed, 19 Feb 2020 10:22:56 +0000 (11:22 +0100)]
nvmx: always trap accesses to x2APIC MSRs

Nested VMX doesn't expose support for
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE,
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT, and hence the x2APIC MSRs should
always be trapped in the nested guest MSR bitmap, or else a nested
guest could access the hardware x2APIC MSRs given certain conditions.

Accessing the hardware MSRs could be achieved by forcing the L0 Xen to
use SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE and
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT (if supported), and then creating a
L2 guest with a MSR bitmap that doesn't trap accesses to the x2APIC
MSR range. Then OR'ing both L0 and L1 MSR bitmaps would result in a
bitmap that doesn't trap certain x2APIC MSRs and a VMCS that doesn't
have SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE and
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT set either.

Fix this by making sure x2APIC MSRs are always trapped in the nested
MSR bitmap.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agobitmap: import bitmap_{set/clear} from Linux 5.5
Roger Pau Monne [Wed, 19 Feb 2020 10:22:55 +0000 (11:22 +0100)]
bitmap: import bitmap_{set/clear} from Linux 5.5

Import the functions and it's dependencies. Based on Linux 5.5, commit
id d5226fa6dbae0569ee43ecfc08bdcd6770fc4755.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoarm: rename BIT_{WORD/MASK/PER_WORD) to BITOP_*
Roger Pau Monne [Wed, 19 Feb 2020 10:22:54 +0000 (11:22 +0100)]
arm: rename BIT_{WORD/MASK/PER_WORD) to BITOP_*

So BIT_WORD can be imported from Linux. The difference between current
Linux implementation of BIT_WORD is that the size of the word unit is
a long integer, while the Xen one is hardcoded to 32 bits.

Current users of BITOP_WORD on Arm (which considers a word a long
integer) are switched to use the generic BIT_WORD which also operates
on long integers.

No functional change intended.

Suggested-by: Julien Grall <julien@xen.org>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agoamd/iommu: fix missing unlock in iommu_read_log
Roger Pau Monné [Wed, 19 Feb 2020 11:19:04 +0000 (12:19 +0100)]
amd/iommu: fix missing unlock in iommu_read_log

Coverity-ID: 1458632
Fixes: 709d3ddea2d5e ('AMD/IOMMU: Common the #732/#733 errata handling in iommu_read_log()')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agonvmx: implement support for MSR bitmaps
Roger Pau Monné [Tue, 18 Feb 2020 15:27:07 +0000 (16:27 +0100)]
nvmx: implement support for MSR bitmaps

Current implementation of nested VMX has a half baked handling of MSR
bitmaps for the L1 VMM: it maps the L1 VMM provided MSR bitmap, but
doesn't actually load it into the nested vmcs, and thus the nested
guest vmcs ends up using the same MSR bitmap as the L1 VMM.

This is wrong as there's no assurance that the set of features enabled
for the L1 vmcs are the same that L1 itself is going to use in the
nested vmcs, and thus can lead to misconfigurations.

For example L1 vmcs can use x2APIC virtualization and virtual
interrupt delivery, and thus some x2APIC MSRs won't be trapped so that
they can be handled directly by the hardware using virtualization
extensions. On the other hand, the nested vmcs created by L1 VMM might
not use any of such features, so using a MSR bitmap that doesn't trap
accesses to the x2APIC MSRs will be leaking them to the underlying
hardware.

Fix this by crafting a merged MSR bitmap between the one used by L1
and the nested guest.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoadd using domlist_read_lock in keyhandlers
Juergen Gross [Tue, 18 Feb 2020 15:26:33 +0000 (16:26 +0100)]
add using domlist_read_lock in keyhandlers

Using for_each_domain() with out holding the domlist_read_lock is
fragile, so add the lock in the keyhandlers it is missing.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agomake rangeset_printk() static
Juergen Gross [Tue, 18 Feb 2020 15:25:42 +0000 (16:25 +0100)]
make rangeset_printk() static

rangeset_printk() is only used locally, so it can be made static.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agosched: remove sched_init_pdata()
Juergen Gross [Tue, 18 Feb 2020 15:25:02 +0000 (16:25 +0100)]
sched: remove sched_init_pdata()

sched_init_pdata() is used nowhere, it can be removed. Same applies to
the .init_pdata hook of the per-scheduler interface. The last caller
has been removed with commit f855dd962523b6cb47a92037bdd28b1485141abe
("sched: add minimalistic idle scheduler for free cpus").

With the idle scheduler introduction the switch_sched hook became the
only place where new cpus get added to a normal scheduler, so the
init_pdata functionality is performed inside that hook.

Adjust some comments as well to reflect reality. While at it correct a
typo in a comment next to a modified comment.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86/MCFG: fix off-by-one in E820 check
Jan Beulich [Tue, 18 Feb 2020 15:24:24 +0000 (16:24 +0100)]
x86/MCFG: fix off-by-one in E820 check

Also adjust the comment ahead of e820_all_mapped() to clarify that the
range is not inclusive at its end.

Reported-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoVT-d: drop stray "list" field from struct user_rmrr
Jan Beulich [Tue, 18 Feb 2020 15:23:41 +0000 (16:23 +0100)]
VT-d: drop stray "list" field from struct user_rmrr

The field looks to have been bogusly added by the patch introducing the
struct (431685e8deb6 "VT-d: add command line option for extra rmrrs").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoVT-d: adjust logging of RMRRs
Jan Beulich [Tue, 18 Feb 2020 15:22:50 +0000 (16:22 +0100)]
VT-d: adjust logging of RMRRs

Consistently use [,] range representation, shrink leading double blanks
to a single one, and slightly adjust text in some cases.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoVT-d: check all of an RMRR for being E820-reserved
Jan Beulich [Tue, 18 Feb 2020 15:21:19 +0000 (16:21 +0100)]
VT-d: check all of an RMRR for being E820-reserved

Checking just the first and last page is not sufficient (and redundant
for single-page regions). As we don't need to care about IA64 anymore,
use an x86-specific function to get this done without looping over each
individual page.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/smp: unify header includes in smp.h
Roger Pau Monne [Mon, 17 Feb 2020 18:43:19 +0000 (19:43 +0100)]
x86/smp: unify header includes in smp.h

Unify the two adjacent header includes that are both gated with ifndef
__ASSEMBLY__.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/x86: p2m: Don't initialize slot 0 of the P2M
Julien Grall [Mon, 3 Feb 2020 16:26:16 +0000 (16:26 +0000)]
xen/x86: p2m: Don't initialize slot 0 of the P2M

It is not entirely clear why the slot 0 of each p2m should be populated
with empty page-tables. The commit introducing it 759af8e3800 "[HVM]
Fix 64-bit HVM domain creation." does not contain meaningful
explanation except that it was necessary for shadow.

As we don't seem to have a good explanation why this is there, drop the
code completely.

This was tested by successfully booting a HVM with shadow enabled.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agotools/libx[cl]: Don't use HVM_PARAM_PAE_ENABLED as a function parameter
Andrew Cooper [Wed, 5 Feb 2020 13:30:18 +0000 (13:30 +0000)]
tools/libx[cl]: Don't use HVM_PARAM_PAE_ENABLED as a function parameter

HVM_PARAM_PAE_ENABLED is set and consumed by the toolstack only.  It is in
practice a complicated and non-standard way of passing a boolean parameter
into xc_cpuid_apply_policy().

This is silly.  Pass PAE as a regular parameter instead.

In libxl__cpuid_legacy(), leave a rather better explaination of why only HVM
guests have a choice in PAE setting.

No change in how a guest is constructed.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoAMD/IOMMU: Common the #732/#733 errata handling in iommu_read_log()
Andrew Cooper [Thu, 20 Sep 2018 17:30:34 +0000 (18:30 +0100)]
AMD/IOMMU: Common the #732/#733 errata handling in iommu_read_log()

There is no need to have both helpers implement the same workaround.  The size
and layout of the the Event and PPR logs (and others for that matter) share a
lot of commonality.

Use MASK_EXTR() to locate the code field, and use ACCESS_ONCE() rather than
barrier() to prevent hoisting of the repeated read.

Avoid unnecessary zeroing by only clobbering the 'code' field - this alone is
sufficient to spot the errata when the rings wrap.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/setup: Don't skip 2MiB underneath relocated Xen image
David Woodhouse [Sat, 1 Feb 2020 00:32:58 +0000 (00:32 +0000)]
x86/setup: Don't skip 2MiB underneath relocated Xen image

Set 'e' correctly to reflect the location that Xen is actually relocated
to from its default 2MiB location. Not 2MiB below that.

This is only vaguely a bug fix. The "missing" 2MiB would have been used
in the end, and fed to the allocator. It's just that other things don't
get to sit right up *next* to the Xen image, and it isn't very tidy.

For live update, I'd quite like a single contiguous region for the
reserved bootmem and Xen, allowing the 'slack' in the former to be used
when Xen itself grows larger. Let's not allow 2MiB of random heap pages
to get in the way...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/smp: reset x2apic_enabled in smp_send_stop()
David Woodhouse [Sat, 1 Feb 2020 00:32:56 +0000 (00:32 +0000)]
x86/smp: reset x2apic_enabled in smp_send_stop()

Just before smp_send_stop() re-enables interrupts when shutting down
for reboot or kexec, it calls __stop_this_cpu() which in turn calls
disable_local_APIC(), which puts the APIC back in to the mode Xen found
it in at boot.

If that means turning x2APIC off and going back into xAPIC mode, then
a timer interrupt occurring just after interrupts come back on will
lead to a GP# when apic_timer_interrupt() attempts to ack the IRQ
through the EOI register in x2APIC MSR 0x80b:

  (XEN) Executing kexec image on cpu0
  (XEN) ----[ Xen-4.14-unstable  x86_64  debug=n   Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e008:[<ffff82d08026c139>] apic_timer_interrupt+0x29/0x40
  (XEN) RFLAGS: 0000000000010046   CONTEXT: hypervisor
  (XEN) rax: 0000000000000000   rbx: 00000000000000fa   rcx: 000000000000080b
  ...
  (XEN) Xen code around <ffff82d08026c139> (apic_timer_interrupt+0x29/0x40):
  (XEN)  c0 b9 0b 08 00 00 89 c2 <0f> 30 31 ff e9 0e c9 fb ff 0f 1f 40 00 66 2e 0f
  ...
  (XEN) Xen call trace:
  (XEN)    [<ffff82d08026c139>] R apic_timer_interrupt+0x29/0x40
  (XEN)    [<ffff82d080283825>] S do_IRQ+0x95/0x750
  ...
  (XEN)    [<ffff82d0802a0ad2>] S smp_send_stop+0x42/0xd0

We can't clear the global x2apic_enabled variable in disable_local_APIC()
itself because that runs on each CPU. Instead, correct it (by using
current_local_apic_mode()) in smp_send_stop() while interrupts are still
disabled immediately after calling __stop_this_cpu() for the boot CPU,
after all other CPUs have been stopped.

cf: d639bdd9bbe ("x86/apic: Disable the LAPIC later in smp_send_stop()")
    ... which didn't quite fix it completely.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/time: report correct frequency of Xen PV clocksource
Igor Druzhinin [Tue, 4 Feb 2020 21:49:37 +0000 (21:49 +0000)]
x86/time: report correct frequency of Xen PV clocksource

The value of the counter represents the number of nanoseconds
since host boot. That means the correct frequency is always 1GHz.

This inconsistency caused time to go slower in PV shim on most
platforms.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/shim: suspend and resume platform time correctly
Igor Druzhinin [Tue, 4 Feb 2020 21:49:36 +0000 (21:49 +0000)]
x86/shim: suspend and resume platform time correctly

Similarly to S3, platform time needs to be saved on guest suspend
and restored on resume respectively. This should account for expected
jumps in PV clock counter value after resume. time_suspend/resume()
are safe to use in PVH setting as is since any existing operations
with PIT/HPET that they do would simply be ignored if PIT/HPET is
not present.

Additionally, add resume callback for Xen PV clocksource to avoid
its breakage on migration.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/amd: Avoid cpu_has_hypervisor evaluating true on native hardware
Andrew Cooper [Tue, 11 Feb 2020 15:02:31 +0000 (15:02 +0000)]
x86/amd: Avoid cpu_has_hypervisor evaluating true on native hardware

Currently when booting native on AMD hardware, cpuidmask_defaults._1cd gets
configured with the HYPERVISOR bit before native CPUID is scanned for feature
bits.

This results in cpu_has_hypervisor becoming set as part of identify_cpu(), and
ends up appearing in the raw and host CPU policies.

A combination of this bug, and c/s bb502a8ca59 "x86: check feature flags after
resume" which checks that feature bits don't go missing, results in broken S3
on AMD hardware.

Alter amd_init_levelling() to exclude the HYPERVISOR bit from
cpumask_defaults, and update domain_cpu_policy_changed() to allow it to be
explicitly forwarded.

This also fixes a bug on kexec, where the hypervisor bit is left enabled for
the new kernel to find.

These changes highlight a further but - dom0 construction is asymetric with
domU construction, by not having any calls to domain_cpu_policy_changed().
Extend arch_domain_create() to always call domain_cpu_policy_changed().

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: Clean up the allocation helpers
Andrew Cooper [Thu, 20 Sep 2018 15:37:42 +0000 (16:37 +0100)]
AMD/IOMMU: Clean up the allocation helpers

Conform to style, drop unnecessary local variables, and avoid opencoding
clear_domain_page().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: Remove unused iommu_get_addr_{lo,hi}_from_cmd() helpers
Andrew Cooper [Tue, 11 Feb 2020 14:59:41 +0000 (14:59 +0000)]
AMD/IOMMU: Remove unused iommu_get_addr_{lo,hi}_from_cmd() helpers

These were introduced in 262bb227a4 in 2012, and have never had any users.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/vmx: use a MEMF_no_refcount domheap page for APIC_DEFAULT_PHYS_BASE
Paul Durrant [Fri, 24 Jan 2020 14:49:35 +0000 (14:49 +0000)]
x86/vmx: use a MEMF_no_refcount domheap page for APIC_DEFAULT_PHYS_BASE

vmx_alloc_vlapic_mapping() currently contains some very odd looking code
that allocates a MEMF_no_owner domheap page and then shares with the guest
as if it were a xenheap page. This then requires vmx_free_vlapic_mapping()
to call a special function in the mm code: free_shared_domheap_page().

By using a MEMF_no_refcount domheap page instead, the odd looking code in
vmx_alloc_vlapic_mapping() can simply use get_page_and_type() to set up a
writable mapping before insertion in the P2M and vmx_free_vlapic_mapping()
can simply release the page using put_page_alloc_ref() followed by
put_page_and_type(). This then allows free_shared_domheap_page() to be
purged.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agomm: make pages allocated with MEMF_no_refcount safe to assign
Paul Durrant [Thu, 30 Jan 2020 12:56:42 +0000 (12:56 +0000)]
mm: make pages allocated with MEMF_no_refcount safe to assign

Currently it is unsafe to assign a domheap page allocated with
MEMF_no_refcount to a domain because the domain't 'tot_pages' will not
be incremented, but will be decrement when the page is freed (since
free_domheap_pages() has no way of telling that the increment was skipped).

This patch allocates a new 'count_info' bit for a PGC_extra flag
which is then used to mark pages when alloc_domheap_pages() is called
with MEMF_no_refcount. assign_pages() because it still needs to call
domain_adjust_tot_pages() to make sure the domain is appropriately
referenced. Hence it is modified to do that for PGC_extra pages even if it
is passed MEMF_no_refount.

The number of PGC_extra pages assigned to a domain is tracked in a new
'extra_pages' counter, which is then subtracted from 'total_pages' in
the domain_tot_pages() helper. Thus 'normal' page assignments will still
be appropriately checked against 'max_pages'.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
Acked-by: George Dunlap <George.Dunlap@eu.citrix.com>
5 years agoadd a domain_tot_pages() helper function
Paul Durrant [Thu, 30 Jan 2020 11:55:35 +0000 (11:55 +0000)]
add a domain_tot_pages() helper function

This patch adds a new domain_tot_pages() inline helper function into
sched.h, which will be needed by a subsequent patch.

No functional change.

NOTE: While modifying the comment for 'tot_pages' in sched.h this patch
      makes some cosmetic fixes to surrounding comments.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: George Dunlap <George.Dunlap@eu.citrix.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agolibxl: mark parameters in stub functions as unused
Wei Liu [Thu, 13 Feb 2020 21:40:27 +0000 (21:40 +0000)]
libxl: mark parameters in stub functions as unused

Hopefully this can fix issues like:

In file included from ../../src/libxl/xen_xl.c:24:0:
/home/osstest/build.147035.build-amd64-libvirt/xendist/usr/local/include/libxl.h: In function 'libxl_cpuid_apply_policy':
/home/osstest/build.147035.build-amd64-libvirt/xendist/usr/local/include/libxl.h:2345:56: error: unused parameter 'ctx' [-Werror=unused-parameter]
 static inline void libxl_cpuid_apply_policy(libxl_ctx *ctx, uint32_t domid) {}

Fixes: dacb80f9 ("tools/libxl: Remove libxl_cpuid_{set,apply_policy}() from the API")
Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/arm: Use asm/ rather than asm-arm/ when including header
Julien Grall [Thu, 13 Feb 2020 12:39:06 +0000 (12:39 +0000)]
xen/arm: Use asm/ rather than asm-arm/ when including header

All the arch headers (i.e under asm-arm) are included using "asm/*.h".

To stay consistent, remove the only instance where "asm-arm/*.h" is
used.

Take the opportunity to move the inclusion with the rest of the asm/
include.

Signed-off-by: Julien Grall <julien@xen.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agolibxl: fix libxl__cpuid_legacy in libxl_nocpuid.c
Wei Liu [Thu, 13 Feb 2020 15:27:51 +0000 (15:27 +0000)]
libxl: fix libxl__cpuid_legacy in libxl_nocpuid.c

Its last parameter should be libxl_domain_build_info.

Fixes: 1b3cec69 ("tools/libxl: Combine legacy CPUID handling logic")
Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agoautomation: update debian:unstable-arm64v8 to have python3-config
Anthony PERARD [Thu, 13 Feb 2020 13:42:00 +0000 (13:42 +0000)]
automation: update debian:unstable-arm64v8 to have python3-config

The Arm container wasn't updated in the original patch.

Fixes: 1a3673da6482 ("automation: updating container to have python3-config binary")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxenstore: remove not applicable control commands in stubdom
Juergen Gross [Wed, 12 Feb 2020 07:41:54 +0000 (08:41 +0100)]
xenstore: remove not applicable control commands in stubdom

When run in a stubdom environment Xenstore can't select a logfile or
emit memory statistics to a specific file.

So remove or modify those control commands accordingly.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxenstore: add console xenstore entries for xenstore stubdom
Juergen Gross [Wed, 12 Feb 2020 07:41:53 +0000 (08:41 +0100)]
xenstore: add console xenstore entries for xenstore stubdom

In order to be able to connect to the console of Xenstore stubdom we
need to create the appropriate entries in Xenstore.

For the moment we don't support xenconsoled living in another domain
than dom0, as this information isn't available other then via
Xenstore which we are just setting up.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxenstore: setup xenstore stubdom console interface properly
Juergen Gross [Wed, 12 Feb 2020 07:41:52 +0000 (08:41 +0100)]
xenstore: setup xenstore stubdom console interface properly

In order to be able to get access to the console of Xenstore stubdom
we need an appropriate granttab entry. So call xc_dom_gnttab_init()
when constructing the domain and preset some information needed
for that function in the dom structure.

We need to create the event channel for the console, too. Do that and
store all necessary data locally.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxen: remove empty softirq_init()
Juergen Gross [Wed, 12 Feb 2020 09:55:06 +0000 (10:55 +0100)]
xen: remove empty softirq_init()

softirq_init() is empty since Xen 4.1. Remove it together with its call
sites.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoAMD/IOMMU: drop redundant code
Jan Beulich [Wed, 12 Feb 2020 09:54:08 +0000 (10:54 +0100)]
AMD/IOMMU: drop redundant code

The level 1 special exit path is unnecessary in iommu_pde_from_dfn() -
the subsequent code takes care of this case quite fine.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agodom0-build: fix build with clang5
Jan Beulich [Wed, 12 Feb 2020 09:52:20 +0000 (10:52 +0100)]
dom0-build: fix build with clang5

With non-empty CONFIG_DOM0_MEM clang5 produces

dom0_build.c:344:24: error: use of logical '&&' with constant operand [-Werror,-Wconstant-logical-operand]
    if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
                       ^  ~~~~~~~~~~~~~~~~~~
dom0_build.c:344:24: note: use '&' for a bitwise operation
    if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
                       ^~
                       &
dom0_build.c:344:24: note: remove constant to silence this warning
    if ( !dom0_mem_set && CONFIG_DOM0_MEM[0] )
                      ~^~~~~~~~~~~~~~~~~~~~~
1 error generated.

Obviously neither of the two suggestions are an option here. Oddly
enough swapping the operands of the && helps, while e.g. casting or
parenthesizing doesn't. Another workable variant looks to be the use of
!! on the constant.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agotools/libxl: Combine legacy CPUID handling logic
Andrew Cooper [Wed, 5 Feb 2020 15:25:21 +0000 (15:25 +0000)]
tools/libxl: Combine legacy CPUID handling logic

While we are in the process of overhauling boot time CPUID/MSR handling, the
existing logic is going to have to remain in roughly this form for backwards
compatibility.

Fold libxl__cpuid_apply_policy() and libxl__cpuid_set() together into a single
libxl__cpuid_legacy() to reduce the complexity for callers.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools/libxl: Remove libxl_cpuid_{set,apply_policy}() from the API
Andrew Cooper [Wed, 8 Jan 2020 12:53:49 +0000 (12:53 +0000)]
tools/libxl: Remove libxl_cpuid_{set,apply_policy}() from the API

These functions should never have been exposed.  They don't have external
users, and can't usefully be used for several reasons.

Move libxl_cpuid_{set,apply_policy}() to being internal functions, and leave
an equivalent of the nop stubs in the API for caller compatibility.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoMAINTAINERS: Add explicit check-in policy section
George Dunlap [Thu, 5 Dec 2019 17:19:47 +0000 (17:19 +0000)]
MAINTAINERS: Add explicit check-in policy section

The "nesting" section in the MAINTAINERS file was not initially
intended to describe the check-in policy for patches, but only how
nesting worked; but since there was no check-in policy, it has been
acting as a de-facto policy.

One problem with this is that the policy is not complete: It doesn't
cover open objections, time to check-in, or so on.  The other problem
with the policy is that, as written, it doesn't account for
maintainers submitting patches to files which they themselves
maintain.  This is fine for situations where there are are multiple
maintainers, but not for situations where there is only one
maintainer.

Add an explicit "Check-in policy" section to the MAINTAINERS document
to serve as the canonical reference for the check-in policy.  Move
paragraphs not explicitly related to nesting into it.

While here, "promote" the "The meaning of nesting" section title.

DISCUSSION

This seems to be a change from people's understanding of the current
policy.  Most people's understanding of the current policy seems to be:

1.  In order to get a change to a given file committed, it must have
an Ack or Review from at least one *maintainer* of that file other
than the submitter.

2. In the case where a file has only one maintainer, it must have an
Ack or Review from a "nested" maintainer.

I.e., if I submitted something to x86/mm, it would require an Ack from
Jan or Andy, or (in exceptional circumstances) The Rest; but an Ack from
(say) Roger or Juergen wouldn't suffice.

Let's call this the "maintainer-ack" approach (because it must have an
ack or r-b from a maintainer to be checked in), and the proposal in
this patch the "maintainer-approval" (since SoB from a maintainer
indicates approval).

The core issue I have with "maintainer-ack" is that it makes the
maintainer less privileged with regard to writing code than
non-maintainers.  If component X has maintainers A and B, then a
non-maintainer can have code checked in if reviewed either by A or B.
If A or B wants code checked in, they have to wait for exactly one
person to review it.

In fact, if B is quite busy, the easiest way for A really to get their
code checked in might be to hand it to a non-maintainer N, and ask N
to submit it as their own.  Then A can Ack the patches and check them
in.

The current system, therefore, either sets up a perverse incentive (if
you think the behavior described above is unacceptable) or unnecessary
bureaucracy (if you think it's acceptable).  Either way I think we
should set up our system to avoid it.

Other variations on "maintainer-ack" have been proposed:

- Allow maintainer's patches to go in with an R-b from "designated
  reviewers"

- Allow maintainer's patches to go in with an Ack from more general
  maintainer

Both fundamentally make it harder for maintainers to get their code in
and/or reviewed effectively than non-maintainers, setting up the
perverse incentive / unnecessary bureaucracy.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/vmx: Shrink TASK_SWITCH's hvm_task_switch_reason reasons[]
Andrew Cooper [Wed, 4 Dec 2019 18:21:04 +0000 (18:21 +0000)]
x86/vmx: Shrink TASK_SWITCH's hvm_task_switch_reason reasons[]

No need to use 4-byte integers to store two bits of information.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotools: Default to python3
Anthony PERARD [Mon, 20 Jan 2020 11:50:53 +0000 (11:50 +0000)]
tools: Default to python3

Main reason, newer version of QEMU doesn't support python 2.x anymore.
Second main reason, python2 is EOL.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/pvh: Adjust dom0's starting state
Andrew Cooper [Mon, 10 Feb 2020 18:33:26 +0000 (18:33 +0000)]
x86/pvh: Adjust dom0's starting state

Fixes: b25fb1a04e "xen/pvh: Fix segment selector ABI"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: Treat head/tail pointers as byte offsets
Andrew Cooper [Sun, 2 Feb 2020 18:23:47 +0000 (18:23 +0000)]
AMD/IOMMU: Treat head/tail pointers as byte offsets

The MMIO registers as already byte offsets.  Using them in this form removes
the need to shift their values for use.

It is also inefficient to store both entries and alloc_size (which only differ
by entry_size).  Rename alloc_size to size, and drop entries entirely, which
simplifies the allocation/deallocation helpers slightly.

Mark send_iommu_command() and invalidate_iommu_all() as static, as they have
no external declaration or callers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>