]> xenbits.xensource.com Git - xen.git/log
xen.git
4 years agox86/vPCI: check address in vpci_msi_update()
Jan Beulich [Tue, 5 Jan 2021 12:18:26 +0000 (13:18 +0100)]
x86/vPCI: check address in vpci_msi_update()

If the upper address bits don't match the interrupt delivery address
space window, entirely different behavior would need to be implemented.
Refuse such requests for the time being.

Replace adjacent hard tabs while introducing MSI_ADDR_BASE_MASK.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/vPCI: tolerate (un)masking a disabled MSI-X entry
Jan Beulich [Tue, 5 Jan 2021 12:17:54 +0000 (13:17 +0100)]
x86/vPCI: tolerate (un)masking a disabled MSI-X entry

None of the four reasons causing vpci_msix_arch_mask_entry() to get
called (there's just a single call site) are impossible or illegal prior
to an entry actually having got set up:
- the entry may remain masked (in this case, however, a prior masked ->
  unmasked transition would already not have worked),
- MSI-X may not be enabled,
- the global mask bit may be set,
- the entry may not otherwise have been updated.
Hence the function asserting that the entry was previously set up was
simply wrong. Since the caller tracks the masked state (and setting up
of an entry would only be effected when that software bit is clear),
it's okay to skip both masking and unmasking requests in this case.

Fixes: d6281be9d0145 ('vpci/msix: add MSI-X handlers')
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
4 years agox86: hypercall vector is unused when !PV32
Jan Beulich [Tue, 5 Jan 2021 12:17:02 +0000 (13:17 +0100)]
x86: hypercall vector is unused when !PV32

This vector can be used as an ordinary interrupt handling one in this
case. To be sure no references are left, make the #define itself
conditional.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/build: restrict contents of asm-offsets.h when !HVM / !PV
Jan Beulich [Tue, 5 Jan 2021 12:13:18 +0000 (13:13 +0100)]
x86/build: restrict contents of asm-offsets.h when !HVM / !PV

This file has a long dependencies list (through asm-offsets.[cs]) and a
long list of dependents. IOW if any of the former changes, all of the
latter will be rebuilt, even if there's no actual change to the
generated file. Therefore avoid producing symbols we don't actually
need, depending on configuration.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/build: limit #include-ing by asm-offsets.c
Jan Beulich [Tue, 5 Jan 2021 12:12:37 +0000 (13:12 +0100)]
x86/build: limit #include-ing by asm-offsets.c

This file has a long dependencies list and asm-offsets.h, generated from
it, has a long list of dependents. IOW if any of the former changes, all
of the latter will be rebuilt, even if there's no actual change to the
generated file. Therefore avoid including headers we don't actually need
(generally or configuration dependent).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/build: limit rebuilding of asm-offsets.h
Jan Beulich [Tue, 5 Jan 2021 12:12:15 +0000 (13:12 +0100)]
x86/build: limit rebuilding of asm-offsets.h

This file has a long dependencies list (through asm-offsets.[cs]) and a
long list of dependents. IOW if any of the former changes, all of the
latter will be rebuilt, even if there's no actual change to the
generated file. This is the primary scenario we have the move-if-changed
macro for.

Since debug information may easily cause the file contents to change in
benign ways, also avoid emitting this into the output file.

Finally already before this change *.new files needed including in what
gets removed by the "clean" target.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be determined
Jan Beulich [Tue, 5 Jan 2021 12:11:04 +0000 (13:11 +0100)]
x86/ACPI: don't invalidate S5 data when S3 wakeup vector cannot be determined

We can be more tolerant as long as the data collected from FACS is only
needed to enter S3. A prior change already added suitable checking to
acpi_enter_sleep().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/ACPI: fix S3 wakeup vector mapping
Jan Beulich [Tue, 5 Jan 2021 12:09:55 +0000 (13:09 +0100)]
x86/ACPI: fix S3 wakeup vector mapping

Use of __acpi_map_table() here was at least close to an abuse already
before, but it will now consistently return NULL here. Drop the layering
violation and use set_fixmap() directly. Re-use of the ACPI fixmap area
is hopefully going to remain "fine" for the time being.

Add checks to acpi_enter_sleep(): The vector now needs to be contained
within a single page, but the ACPI spec requires 64-byte alignment of
FACS anyway. Also bail if no wakeup vector was determined in the first
place, in part as preparation for a subsequent relaxation change.

Fixes: 1c4aa69ca1e1 ("xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoxen/arm: Activate TID3 in HCR_EL2
Bertrand Marquis [Thu, 17 Dec 2020 15:38:08 +0000 (15:38 +0000)]
xen/arm: Activate TID3 in HCR_EL2

Activate TID3 bit in HCR register when starting a guest.
This will trap all coprecessor ID registers so that we can give to guest
values corresponding to what they can actually use and mask some
features to guests even though they would be supported by the underlying
hardware (like SVE or MPAM).

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: Add CP10 exception support to handle MVFR
Bertrand Marquis [Thu, 17 Dec 2020 15:38:07 +0000 (15:38 +0000)]
xen/arm: Add CP10 exception support to handle MVFR

Add support for cp10 exceptions decoding to be able to emulate the
values for MVFR0, MVFR1 and MVFR2 when TID3 bit of HSR is activated.
This is required for aarch32 guests accessing MVFR registers using
vmrs and vmsr instructions.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: Add handler for cp15 ID registers
Bertrand Marquis [Thu, 17 Dec 2020 15:38:06 +0000 (15:38 +0000)]
xen/arm: Add handler for cp15 ID registers

Add support for emulation of cp15 based ID registers (on arm32 or when
running a 32bit guest on arm64).
The handlers are returning the values stored in the guest_cpuinfo
structure for known registers and RAZ for all reserved registers.
In the current status the MVFR registers are no supported.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
[Stefano: fix code style]
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: Add handler for ID registers on arm64
Bertrand Marquis [Thu, 17 Dec 2020 15:38:05 +0000 (15:38 +0000)]
xen/arm: Add handler for ID registers on arm64

Add vsysreg emulation for registers trapped when TID3 bit is activated
in HSR.
The emulation is returning the value stored in cpuinfo_guest structure
for know registers and is handling reserved registers as RAZ.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: create a cpuinfo structure for guest
Bertrand Marquis [Thu, 17 Dec 2020 15:38:04 +0000 (15:38 +0000)]
xen/arm: create a cpuinfo structure for guest

Create a cpuinfo structure for guest and mask into it the features that
we do not support in Xen or that we do not want to publish to guests.

Modify some values in the cpuinfo structure for guests to mask some
features which we do not want to allow to guests (like AMU) or we do not
support (like SVE).
Modify some values in the guest cpuinfo structure to guests to hide some
processor features:
- SVE as this is not supported by Xen and guest are not allowed to use
this features (ZEN is set to 0 in CPTR_EL2).
- AMU as HCPTR_TAM is set in CPTR_EL2 so AMU cannot be used by guests
All other bits are left untouched.
- RAS as this is not supported by Xen.

The code is trying to group together registers modifications for the
same feature to be able in the long term to easily enable/disable a
feature depending on user parameters or add other registers modification
in the same place (like enabling/disabling HCR bits).

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: Add arm64 ID registers definitions
Bertrand Marquis [Thu, 17 Dec 2020 15:38:03 +0000 (15:38 +0000)]
xen/arm: Add arm64 ID registers definitions

Add coprocessor registers definitions for all ID registers trapped
through the TID3 bit of HSR.
Those are the one that will be emulated in Xen to only publish to guests
the features that are supported by Xen and that are accessible to
guests.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: Add ID registers and complete cpuinfo
Bertrand Marquis [Thu, 17 Dec 2020 15:38:02 +0000 (15:38 +0000)]
xen/arm: Add ID registers and complete cpuinfo

Add definition and entries in cpuinfo for ID registers introduced in
newer Arm Architecture reference manual:
- ID_PFR2: processor feature register 2
- ID_DFR1: debug feature register 1
- ID_MMFR4 and ID_MMFR5: Memory model feature registers 4 and 5
- ID_ISA6: ISA Feature register 6
Add more bitfield definitions in PFR fields of cpuinfo.
Add MVFR2 register definition for aarch32.
Add MVFRx_EL1 defines for aarch32.
Add mvfr values in cpuinfo.
Add some registers definition for arm64 in sysregs as some are not
always know by compilers.
Initialize the new values added in cpuinfo in identify_cpu during init.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoxen/arm: Use READ_SYSREG instead of 32/64 versions
Bertrand Marquis [Thu, 17 Dec 2020 15:38:01 +0000 (15:38 +0000)]
xen/arm: Use READ_SYSREG instead of 32/64 versions

Modify identify_cpu function to use READ_SYSREG instead of READ_SYSREG32
or READ_SYSREG64.

All aarch32 specific registers (for example ID_PFR0_EL1) are 64bit when
accessed from aarch64 with upper bits read as 0, so it is right to
access them as 64bit registers on a 64bit platform.

Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agox86/p2m: Fix paging_gva_to_gfn() for nested virt
Andrew Cooper [Thu, 31 Dec 2020 16:55:20 +0000 (16:55 +0000)]
x86/p2m: Fix paging_gva_to_gfn() for nested virt

nestedhap_walk_L1_p2m() takes guest physical addresses, not frame numbers.
This means the l2 input is off-by-PAGE_SHIFT, as is the l1 value eventually
returned to the caller.

Delete the misleading comment as well.

Fixes: bab2bd8e222de ("xen/nested_p2m: Don't walk EPT tables with a regular PT walker")
Reported-by: Tamas K Lengyel <tamas@tklengyel.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Tamas K Lengyel <tamas@tklengyel.com>
4 years agox86/p2m: fix p2m_add_foreign error path
Roger Pau Monné [Mon, 4 Jan 2021 09:03:23 +0000 (10:03 +0100)]
x86/p2m: fix p2m_add_foreign error path

One of the error paths in p2m_add_foreign could call put_page with a
NULL page, thus triggering a fault.

Split the checks into two different if statements, so the appropriate
error path can be taken.

Fixes: 173ae325026bd ('x86/p2m: tidy p2m_add_foreign() a little')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen: remove the usage of the P ar option
Roger Pau Monne [Wed, 30 Dec 2020 17:34:46 +0000 (18:34 +0100)]
xen: remove the usage of the P ar option

It's not part of the POSIX standard [0] and as such non GNU ar
implementations don't usually have it.

It's not relevant for the use case here anyway, as the archive file is
recreated every time due to the rm invocation before the ar call. No
file name matching should happen so matching using the full path name
or a relative one should yield the same result.

This fixes the build on FreeBSD.

While there also drop the s option, as ar will already generate a
symbol table by default when creating the archive.

[0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ar.html

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/svm: Clean up MSR_K8_VM_CR definitions
Andrew Cooper [Wed, 30 Dec 2020 19:26:14 +0000 (19:26 +0000)]
x86/svm: Clean up MSR_K8_VM_CR definitions

Drop the unused shift number, and reposition the constants into the cleaned-up
section.  Rename VM_CR_SVM_DISABLE to be closer to its APM definition.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/hpet: Fix return value of hpet_setup()
Andrew Cooper [Tue, 29 Dec 2020 17:51:23 +0000 (17:51 +0000)]
x86/hpet: Fix return value of hpet_setup()

hpet_setup() is idempotent if the rate has already been calculated, and
returns the cached value.  However, this only works correctly when the return
statements are identical.

Use a sensibly named local variable, rather than a dead one with a bad name.

Fixes: a60bb68219 ("x86/time: reduce rounding errors in calculations")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agoxen/domain: Introduce domain_teardown()
Andrew Cooper [Mon, 28 Sep 2020 17:14:53 +0000 (18:14 +0100)]
xen/domain: Introduce domain_teardown()

There is no common equivelent of domain_reliquish_resources(), which has
caused various pieces of common cleanup to live in inappropriate
places.

Perhaps most obviously, evtchn_destroy() is called for every continuation of
domain_reliquish_resources(), which can easily be thousands of times.

Create domain_teardown() to be a new top level facility, and call it from the
appropriate positions in domain_kill() and domain_create()'s error path.  The
intention is for this to supersede domain_reliquish_resources() in due course.

No change in behaviour yet.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/domain: Reorder trivial initialisation in early domain_create()
Andrew Cooper [Mon, 28 Sep 2020 15:47:58 +0000 (16:47 +0100)]
xen/domain: Reorder trivial initialisation in early domain_create()

This improves the robustness of the error paths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agodocs: use predictable ordering in generated documentation
Maximilian Engelhardt [Fri, 18 Dec 2020 20:42:34 +0000 (21:42 +0100)]
docs: use predictable ordering in generated documentation

When the seq number is equal, sort by the title to get predictable
output ordering. This is useful for reproducible builds.

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/mm: p2m_add_foreign() is HVM-only
Jan Beulich [Tue, 22 Dec 2020 11:01:12 +0000 (12:01 +0100)]
x86/mm: p2m_add_foreign() is HVM-only

This is the case also for xenmem_add_to_physmap_one(), as is it's only
caller of the function. Move the latter next to p2m_add_foreign(),
allowing it one to become static at the same time. While moving, adjust
indentation of the body of the main switch().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/Intel: insert Tiger Lake model numbers
Jan Beulich [Tue, 22 Dec 2020 08:00:03 +0000 (09:00 +0100)]
x86/Intel: insert Tiger Lake model numbers

Both match prior generation processors as far as LBR and C-state MSRs
go (SDM rev 073). The if_pschange_mc erratum, according to the spec
update, is not applicable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/EFI: don't insert timestamp when SOURCE_DATE_EPOCH is defined
Maximilian Engelhardt [Tue, 22 Dec 2020 07:59:14 +0000 (08:59 +0100)]
x86/EFI: don't insert timestamp when SOURCE_DATE_EPOCH is defined

By default a timestamp gets added to the xen efi binary. Unfortunately
ld doesn't seem to provide a way to set a custom date, like from
SOURCE_DATE_EPOCH, so set a zero value for the timestamp (option
--no-insert-timestamp) if SOURCE_DATE_EPOCH is defined. This makes
reproducible builds possible.

This is an alternative to the patch suggested in [1]. This patch only
omits the timestamp when SOURCE_DATE_EPOCH is defined.

[1] https://lists.xenproject.org/archives/html/xen-devel/2020-10/msg02161.html

Signed-off-by: Maximilian Engelhardt <maxi@daemonizer.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86: verify function type (and maybe attribute) in switch_stack_and_jump()
Jan Beulich [Tue, 22 Dec 2020 07:57:19 +0000 (08:57 +0100)]
x86: verify function type (and maybe attribute) in switch_stack_and_jump()

It is imperative that the functions passed here are taking no arguments,
return no values, and don't return in the first place. While the type
can be checked uniformly, the attribute check is limited to gcc 9 and
newer (no clang support for this so far afaict).

Note that I didn't want to have the "true" fallback "implementation" of
__builtin_has_attribute(..., __noreturn__) generally available, as
"true" may not be a suitable fallback in other cases.

Note further that the noreturn addition to startup_cpu_idle_loop()'s
declaration requires adding unreachable() to Arm's
switch_stack_and_jump(), or else the build would break. I suppose this
should have been there already.

For vmx_asm_do_vmentry() along with adding the attribute, also restrict
its scope.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen: Rework WARN_ON() to return whether a warning was triggered
Julien Grall [Fri, 18 Dec 2020 13:30:54 +0000 (13:30 +0000)]
xen: Rework WARN_ON() to return whether a warning was triggered

So far, our implementation of WARN_ON() cannot be used in the following
situation:

if ( WARN_ON() )
    ...

This is because WARN_ON() doesn't return whether a warning has been
triggered. Such construciton can be handy if you want to print more
information and also dump the stack trace.

Therefore, rework the WARN_ON() implementation to return whether a
warning was triggered. The idea was borrowed from Linux

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/shadow: Fix build with !CONFIG_SHADOW_PAGING
Andrew Cooper [Mon, 21 Dec 2020 14:52:26 +0000 (14:52 +0000)]
x86/shadow: Fix build with !CONFIG_SHADOW_PAGING

Implement a stub for shadow_vcpu_teardown()

Fixes: d162f36848c4 ("xen/x86: Fix memory leak in vcpu_create() error path")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/x86: Fix memory leak in vcpu_create() error path
Andrew Cooper [Mon, 28 Sep 2020 14:25:44 +0000 (15:25 +0100)]
xen/x86: Fix memory leak in vcpu_create() error path

Various paths in vcpu_create() end up calling paging_update_paging_modes(),
which eventually allocate a monitor pagetable if one doesn't exist.

However, an error in vcpu_create() results in the vcpu being cleaned up
locally, and not put onto the domain's vcpu list.  Therefore, the monitor
table is not freed by {hap,shadow}_teardown()'s loop.  This is caught by
assertions later that we've successfully freed the entire hap/shadow memory
pool.

The per-vcpu loops in domain teardown logic is conceptually wrong, but exist
due to insufficient existing structure in the existing logic.

Break paging_vcpu_teardown() out of paging_teardown(), with mirrored breakouts
in the hap/shadow code, and use it from arch_vcpu_create()'s error path.  This
fixes the memory leak.

The new {hap,shadow}_vcpu_teardown() must be idempotent, and are written to be
as tolerable as possible, with the minimum number of safety checks possible.
In particular, drop the mfn_valid() check - if these fields are junk, then Xen
is going to explode anyway.

Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/Kconfig: Correct the NR_CPUS description
Andrew Cooper [Fri, 18 Dec 2020 23:30:04 +0000 (23:30 +0000)]
xen/Kconfig: Correct the NR_CPUS description

The description "physical CPUs" is especially wrong, as it implies the number
of sockets, which tops out at 8 on all but the very biggest servers.

NR_CPUS is the number of logical entities the scheduler can use.

Reported-by: hanetzer@startmail.com
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agoRevert "x86/mm: p2m_add_foreign() is HVM-only"
Andrew Cooper [Fri, 18 Dec 2020 17:53:13 +0000 (17:53 +0000)]
Revert "x86/mm: p2m_add_foreign() is HVM-only"

This reverts commit 8009c33b5179536e2ecce54462fe4cd069060f77.  It breaks the
PV-Shim build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/mm: p2m_add_foreign() is HVM-only
Jan Beulich [Fri, 18 Dec 2020 12:29:14 +0000 (13:29 +0100)]
x86/mm: p2m_add_foreign() is HVM-only

This is the case also for xenmem_add_to_physmap_one(), as is it's only
caller of the function. Move the latter next to p2m_add_foreign(),
allowing it one to become static at the same time. While moving, adjust
indentation of the body of the main switch().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/p2m: tidy p2m_add_foreign() a little
Jan Beulich [Fri, 18 Dec 2020 12:28:30 +0000 (13:28 +0100)]
x86/p2m: tidy p2m_add_foreign() a little

Drop a bogus ASSERT() - we don't typically assert incoming domain
pointers to be non-NULL, and there's no particular reason to do so here.

Replace the open-coded DOMID_SELF check by use of
rcu_lock_remote_domain_by_id(), at the same time covering the request
being made with the current domain's actual ID.

Move the "both domains same" check into just the path where it really
is meaningful.

Swap the order of the two puts, such that
- the p2m lock isn't needlessly held across put_page(),
- a separate put_page() on an error path can be avoided,
- they're inverse to the order of the respective gets.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolib: move sort code
Jan Beulich [Fri, 18 Dec 2020 12:25:40 +0000 (13:25 +0100)]
lib: move sort code

Build this code into an archive, partly paralleling bsearch().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agolib: move bsearch code
Jan Beulich [Fri, 18 Dec 2020 12:23:42 +0000 (13:23 +0100)]
lib: move bsearch code

Convert this code to an inline function (backed by an instance in an
archive in case the compiler decides against inlining), which results
in not having it in x86 final binaries. This saves a little bit of dead
code.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agolib: move rbtree code
Jan Beulich [Fri, 18 Dec 2020 12:22:54 +0000 (13:22 +0100)]
lib: move rbtree code

Build this code into an archive, which results in not linking it into
x86 final binaries. This saves about 1.5k of dead code.

While moving the source file, take the opportunity and drop the
pointless EXPORT_SYMBOL() and an instance of trailing whitespace.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agolib: move init_constructors()
Jan Beulich [Fri, 18 Dec 2020 12:22:10 +0000 (13:22 +0100)]
lib: move init_constructors()

... into its own CU, for being unrelated to other things in
common/lib.c.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agolib: move parse_size_and_unit()
Jan Beulich [Fri, 18 Dec 2020 12:21:25 +0000 (13:21 +0100)]
lib: move parse_size_and_unit()

... into its own CU, to build it into an archive.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
4 years agolib: move list sorting code
Jan Beulich [Fri, 18 Dec 2020 12:20:42 +0000 (13:20 +0100)]
lib: move list sorting code

Build the source file always, as by putting it into an archive it still
won't be linked into final binaries when not needed. This way possible
build breakage will be easier to notice, and it's more consistent with
us unconditionally building other library kind of code (e.g. sort() or
bsearch()).

While moving the source file, take the opportunity and drop the
pointless EXPORT_SYMBOL() and an unnecessary #include.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agolib: collect library files in an archive
Jan Beulich [Fri, 18 Dec 2020 12:17:57 +0000 (13:17 +0100)]
lib: collect library files in an archive

In order to (subsequently) drop odd things like CONFIG_NEEDS_LIST_SORT
just to avoid bloating binaries when only some arch-es and/or
configurations need generic library routines, combine objects under lib/
into an archive, which the linker then can pick the necessary objects
out of.

Note that we can't use thin archives just yet, until we've raised the
minimum required binutils version suitably.

Note further that --start-group / --end-group get put in place right
away to allow for symbol resolution across all archives, once we gain
multuiple ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoautomation: add domU creation to dom0 alpine linux test
Stefano Stabellini [Tue, 24 Nov 2020 21:33:14 +0000 (13:33 -0800)]
automation: add domU creation to dom0 alpine linux test

Add a trivial Busybox based domU.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: use the tests-artifacts kernel for qemu-smoke-arm64-gcc
Stefano Stabellini [Tue, 24 Nov 2020 21:22:17 +0000 (13:22 -0800)]
automation: use the tests-artifacts kernel for qemu-smoke-arm64-gcc

Use the tests-artifacts kernel, instead of the Debian kernel, for the
qemu-smoke-arm64-gcc job.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: create an alpine linux arm64 test job
Stefano Stabellini [Tue, 24 Nov 2020 21:15:51 +0000 (13:15 -0800)]
automation: create an alpine linux arm64 test job

Create a test job that starts Xen and Dom0 on QEMU based on the alpine
linux rootfs. Use the Linux kernel and rootfs from the tests-artifacts
containers. Add the Xen tools binaries from the Alpine Linux build job.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: make available the tests artifacts to the pipeline
Stefano Stabellini [Tue, 24 Nov 2020 21:13:50 +0000 (13:13 -0800)]
automation: make available the tests artifacts to the pipeline

In order to make available the pre-built binaries of the
automation/tests-artifacts containers to the gitlab-ci pipeline we need
to export them as gitlab artifacts.

To do that, we create two "fake" jobs that simply export the require
binaries as artifacts and do nothing else.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: add tests artifacts
Stefano Stabellini [Tue, 24 Nov 2020 21:08:20 +0000 (13:08 -0800)]
automation: add tests artifacts

Some tests (soon to come) will require pre-built binaries to run, such
as the Linux kernel binary. We don't want to rebuild the Linux kernel
for each gitlab-ci run: these builds should not be added to the current
list of build jobs.

Instead, create additional containers that today are built and uploaded
manually, but could be re-built automatically. The containers build the
required binarires during the "docker build" step and store them inside
the container itself.

gitlab-ci will be able to fetch these pre-built binaries during the
regular test runs, saving cycles.

Add two tests artifacts containers:
- one to build the Linux kernel ARM64
- one to create an Alpine Linux ARM64 rootfs for Dom0

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: add alpine linux x86 build jobs
Stefano Stabellini [Fri, 20 Nov 2020 17:56:25 +0000 (09:56 -0800)]
automation: add alpine linux x86 build jobs

Allow failure for these jobs. Currently they fail because hvmloader
doesn't build with musl. The failures don't block the pipeline.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: add alpine linux 3.12 x86 build container
Stefano Stabellini [Fri, 20 Nov 2020 17:54:01 +0000 (09:54 -0800)]
automation: add alpine linux 3.12 x86 build container

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: add alpine linux arm64 build test
Stefano Stabellini [Wed, 18 Nov 2020 01:07:43 +0000 (17:07 -0800)]
automation: add alpine linux arm64 build test

Based on the arm64 3.12 build container

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: add alpine linux 3.12 arm64 build container
Stefano Stabellini [Wed, 18 Nov 2020 01:03:55 +0000 (17:03 -0800)]
automation: add alpine linux 3.12 arm64 build container

The build container will be used for a new Alpine Linux 3.12 arm64 build
test.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: special configure flags for musl-based systems
Stefano Stabellini [Fri, 20 Nov 2020 03:20:15 +0000 (19:20 -0800)]
automation: special configure flags for musl-based systems

QEMU upstream builds with warnings when libc is musl:

  #warning redirecting incorrect #include <sys/signal.h> to <signal.h>

Disable -Werror by passing --disable-werror to the QEMUU config script
if libc is musl.

hvmloader doesn't build on musl systems today. Disable any guest
firmware build.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: add dom0less to the QEMU aarch64 smoke test
Stefano Stabellini [Fri, 13 Nov 2020 23:22:41 +0000 (15:22 -0800)]
automation: add dom0less to the QEMU aarch64 smoke test

Add a trivial dom0less test:
- fetch the Debian arm64 kernel and use it ad dom0/U kernel
- use busybox-static to create a trivial dom0/U ramdisk
- use ImageBuilder to generate the uboot boot script automatically
- install and use u-boot from the Debian package to start the test
- binaries are loaded from uboot via tftp

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoautomation: add a QEMU aarch64 smoke test
Stefano Stabellini [Fri, 13 Nov 2020 02:30:33 +0000 (18:30 -0800)]
automation: add a QEMU aarch64 smoke test

Use QEMU to start Xen (just the hypervisor) up until it stops because
there is no dom0 kernel to boot.

It is based on the existing build job unstable-arm64v8.

Also use make -j$(nproc) to build Xen.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoxen/hypfs: add new enter() and exit() per node callbacks
Juergen Gross [Thu, 17 Dec 2020 15:50:21 +0000 (16:50 +0100)]
xen/hypfs: add new enter() and exit() per node callbacks

In order to better support resource allocation and locking for dynamic
hypfs nodes add enter() and exit() callbacks to struct hypfs_funcs.

The enter() callback is called when entering a node during hypfs user
actions (traversing, reading or writing it), while the exit() callback
is called when leaving a node (accessing another node at the same or a
higher directory level, or when returning to the user).

For avoiding recursion this requires a parent pointer in each node.
Let the enter() callback return the entry address which is stored as
the last accessed node in order to be able to use a template entry for
that purpose in case of dynamic entries.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/hypfs: switch write function handles to const
Juergen Gross [Thu, 17 Dec 2020 15:49:49 +0000 (16:49 +0100)]
xen/hypfs: switch write function handles to const

The node specific write functions take a void user address handle as
parameter. As a write won't change the user memory use a const_void
handle instead.

This requires a new macro for casting a guest handle to a const type.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoxen/cpupool: support moving domain between cpupools with different granularity
Juergen Gross [Thu, 17 Dec 2020 15:49:11 +0000 (16:49 +0100)]
xen/cpupool: support moving domain between cpupools with different granularity

When moving a domain between cpupools with different scheduling
granularity the sched_units of the domain need to be adjusted.

Do that by allocating new sched_units and throwing away the old ones
in sched_move_domain().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
4 years agotools/xenstore: remove unused cruft from xenstored_domain.c
Juergen Gross [Tue, 15 Dec 2020 16:35:41 +0000 (17:35 +0100)]
tools/xenstore: remove unused cruft from xenstored_domain.c

domain->remote_port and restore_existing_connections() are useless and
can be removed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agotools/xenstore: make set_tdb_key() non-static
Juergen Gross [Tue, 15 Dec 2020 16:35:40 +0000 (17:35 +0100)]
tools/xenstore: make set_tdb_key() non-static

set_tdb_key() can be used by destroy_node(), too. So remove the static
attribute and move it to xenstored_core.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
4 years agotools/xenstore: switch barf[_perror]() to use syslog()
Juergen Gross [Tue, 15 Dec 2020 16:35:39 +0000 (17:35 +0100)]
tools/xenstore: switch barf[_perror]() to use syslog()

When xenstored crashes due to an unrecoverable condition it is calling
either barf() or barf_perror() to issue a message and then exit().

Make sure the message is visible somewhere by using syslog()
additionally to xprintf(), as the latter will be visible only with
tracing active.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/arm: Add workaround for Cortex-A53 erratum #843419
Luca Fancellu [Thu, 10 Dec 2020 10:42:58 +0000 (10:42 +0000)]
xen/arm: Add workaround for Cortex-A53 erratum #843419

On the Cortex A53, when executing in AArch64 state, a load or store instruction
which uses the result of an ADRP instruction as a base register, or which uses
a base register written by an instruction immediately after an ADRP to the
same register, might access an incorrect address.

The workaround is to enable the linker flag --fix-cortex-a53-843419
if present, to check and fix the affected sequence. Otherwise print a warning
that Xen may be susceptible to this errata

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agoRevert patches that break libxl API
Wei Liu [Wed, 16 Dec 2020 17:48:04 +0000 (17:48 +0000)]
Revert patches that break libxl API

This patch reverts eight patches from staging.

The offending patch is the one that introduced libxl_pci_bdf (last one
in the list). The rest depend on that patch so they are also reverted.

8bf0fab14256 "libxl / libxlu: support 'xl pci-attach/detach' by name"
e1141654c374 "docs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING"
93c16ae47baf "xl: support naming of assignable devices"
5ab684cb3e4d "libxl: introduce libxl_pci_bdf_assignable_add/remove/list/list_free(), ..."
66c2fbc6e82b "libxl: convert internal functions in libxl_pci.c..."
f73c5dd56d78 "docs/man: modify xl(1) in preparation for naming of assignable devices"
96ed6ff29741 "libxlu: introduce xlu_pci_parse_spec_string()"
929f23114061 "libxl: introduce 'libxl_pci_bdf' in the idl..."

Signed-off-by: Wei Liu <wl@xen.org>
4 years agox86/p2m: set_shared_p2m_entry() is MEM_SHARING-only
Jan Beulich [Wed, 16 Dec 2020 15:44:18 +0000 (16:44 +0100)]
x86/p2m: set_shared_p2m_entry() is MEM_SHARING-only

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
4 years agolivepatch: adjust a stale comment
Jan Beulich [Wed, 16 Dec 2020 15:43:32 +0000 (16:43 +0100)]
livepatch: adjust a stale comment

As of 005de45c887e ("xen: do live patching only from main idle loop")
the comment ahead of livepatch_do_action() has been stale.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
4 years agox86/PV: avoid double stack reset during schedule tail handling
Jan Beulich [Wed, 16 Dec 2020 15:42:50 +0000 (16:42 +0100)]
x86/PV: avoid double stack reset during schedule tail handling

Invoking check_wakeup_from_wait() from assembly allows the new
continue_pv_domain() to replace the prior continue_nonidle_domain() as
the tail hook, eliminating an extra reset_stack_and_jump().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
4 years agox86: clobber registers in switch_stack_and_jump() when !LIVEPATCH
Jan Beulich [Wed, 16 Dec 2020 15:41:46 +0000 (16:41 +0100)]
x86: clobber registers in switch_stack_and_jump() when !LIVEPATCH

In order to have the same effect on registers as a call to
check_for_livepatch_work() may have, clobber all call-clobbered
registers in debug builds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
4 years agolibxl / libxlu: support 'xl pci-attach/detach' by name
Paul Durrant [Tue, 8 Dec 2020 19:30:33 +0000 (19:30 +0000)]
libxl / libxlu: support 'xl pci-attach/detach' by name

This patch adds a 'name' field into the idl for 'libxl_device_pci' and
libxlu_pci_parse_spec_string() is modified to parse the new 'name'
parameter of PCI_SPEC_STRING detailed in the updated documention in
xl-pci-configuration(5).

If the 'name' field is non-NULL then both libxl_device_pci_add() and
libxl_device_pci_remove() will use it to look up the device BDF in
the list of assignable devices.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agodocs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING
Paul Durrant [Tue, 8 Dec 2020 19:30:32 +0000 (19:30 +0000)]
docs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING

Since assignable devices can be named, a subsequent patch will support use
of a PCI_SPEC_STRING containing a 'name' parameter instead of a 'bdf'. In
this case the name will be used to look up the 'bdf' in the list of assignable
(or assigned) devices.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoxl: support naming of assignable devices
Paul Durrant [Tue, 8 Dec 2020 19:30:31 +0000 (19:30 +0000)]
xl: support naming of assignable devices

This patch converts libxl to use libxl_pci_bdf_assignable_add/remove/list/
list_free() rather than libxl_device_pci_assignable_add/remove/list/
list_free(), which then allows naming of assignable devices to be supported.

With this patch applied 'xl pci-assignable-add' will take an optional '--name'
parameter, 'xl pci-assignable-remove' can be passed either a BDF or a name and
'xl pci-assignable-list' will take a optional '--show-names' flag which
determines whether names are displayed in its output.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: introduce libxl_pci_bdf_assignable_add/remove/list/list_free(), ...
Paul Durrant [Tue, 8 Dec 2020 19:30:30 +0000 (19:30 +0000)]
libxl: introduce libxl_pci_bdf_assignable_add/remove/list/list_free(), ...

which support naming and use 'libxl_pci_bdf' rather than 'libxl_device_pci',
as replacements for libxl_device_pci_assignable_add/remove/list/list_free().

libxl_pci_bdf_assignable_add() takes a 'name' parameter which is stored in
xenstore and facilitates two addtional functions added by this patch:
libxl_pci_bdf_assignable_name2bdf() and libxl_pci_bdf_assignable_bdf2name().
Currently there are no callers of these two functions. They will be added in
a subsequent patch.

libxl_device_pci_assignable_add/remove/list/list_free() are left in place
for compatibility but are re-implemented in terms of the newly introduced
functions.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: convert internal functions in libxl_pci.c...
Paul Durrant [Tue, 8 Dec 2020 19:30:29 +0000 (19:30 +0000)]
libxl: convert internal functions in libxl_pci.c...

... to use 'libx_pci_bdf' where appropriate.

No API change.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agodocs/man: modify xl(1) in preparation for naming of assignable devices
Paul Durrant [Tue, 8 Dec 2020 19:30:28 +0000 (19:30 +0000)]
docs/man: modify xl(1) in preparation for naming of assignable devices

A subsequent patch will introduce code to allow a name to be specified to
'xl pci-assignable-add' such that the assignable device may be referred to
by than name in subsequent operations.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxlu: introduce xlu_pci_parse_spec_string()
Paul Durrant [Tue, 8 Dec 2020 19:30:27 +0000 (19:30 +0000)]
libxlu: introduce xlu_pci_parse_spec_string()

This patch largely re-writes the code to parse a PCI_SPEC_STRING and enters
it via the newly introduced function. The new parser also deals with 'bdf'
and 'vslot' as non-positional paramaters, as per the documentation in
xl-pci-configuration(5).

The existing xlu_pci_parse_bdf() function remains, but now strictly parses
BDF values. Some existing callers of xlu_pci_parse_bdf() are
modified to call xlu_pci_parse_spec_string() as per the documentation in xl(1).

NOTE: Usage text in xl_cmdtable.c and error messages are also modified
      appropriately.
      As a side-effect this patch also fixes a bug where using '*' to specify
      all functions would lead to an assertion failure at the end of
      xlu_pci_parse_bdf().

Fixes: d25cc3ec93eb ("libxl: workaround gcc 10.2 maybe-uninitialized warning")
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: introduce 'libxl_pci_bdf' in the idl...
Paul Durrant [Tue, 8 Dec 2020 19:30:26 +0000 (19:30 +0000)]
libxl: introduce 'libxl_pci_bdf' in the idl...

... and use in 'libxl_device_pci'

This patch is preparatory work for restricting the type passed to functions
that only require BDF information, rather than passing a 'libxl_device_pci'
structure which is only partially filled. In this patch only the minimal
mechanical changes necessary to deal with the structural changes are made.
Subsequent patches will adjust the code to make better use of the new type.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Nick Rosbrook <rosbrookn@ainfosec.com>
4 years agodocs/man: fix xl(1) documentation for 'pci' operations
Paul Durrant [Tue, 8 Dec 2020 19:30:25 +0000 (19:30 +0000)]
docs/man: fix xl(1) documentation for 'pci' operations

Currently the documentation completely fails to mention the existence of
PCI_SPEC_STRING. This patch tidies things up, specifically clarifying that
'pci-assignable-add/remove' take <BDF> arguments where as 'pci-attach/detach'
take <PCI_SPEC_STRING> arguments (which will be enforced in a subsequent
patch).

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agodocs/man: improve documentation of PCI_SPEC_STRING...
Paul Durrant [Tue, 8 Dec 2020 19:30:24 +0000 (19:30 +0000)]
docs/man: improve documentation of PCI_SPEC_STRING...

... and prepare for adding support for non-positional parsing of 'bdf' and
'vslot' in a subsequent patch.

Also document 'BDF' as a first-class parameter type and fix the documentation
to state that the default value of 'rdm_policy' is actually 'strict', not
'relaxed', as can be seen in libxl__device_pci_setdefault().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agodocs/man: extract documentation of PCI_SPEC_STRING from the xl.cfg manpage...
Paul Durrant [Tue, 8 Dec 2020 19:30:23 +0000 (19:30 +0000)]
docs/man: extract documentation of PCI_SPEC_STRING from the xl.cfg manpage...

... and put it into a new xl-pci-configuration(5) manpage, akin to the
xl-network-configration(5) and xl-disk-configuration(5) manpages.

This patch moves the content of the section verbatim. A subsequent patch
will improve the documentation, once it is in its new location.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: use COMPARE_PCI() macro is_pci_in_array()...
Paul Durrant [Tue, 8 Dec 2020 19:30:22 +0000 (19:30 +0000)]
libxl: use COMPARE_PCI() macro is_pci_in_array()...

... rather than an open-coded equivalent.

This patch tidies up the is_pci_in_array() function, making it take a single
'libxl_device_pci' argument rather than separate domain, bus, device and
function arguments. The already-available COMPARE_PCI() macro can then be
used and it is also modified to return 'bool' rather than 'int'.

The patch also modifies libxl_pci_assignable() to use is_pci_in_array() rather
than a separate open-coded equivalent, and also modifies it to return a
'bool' rather than an 'int'.

NOTE: The COMPARE_PCI() macro is also fixed to include the 'domain' in its
      comparison, which should always have been the case.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: add libxl_device_pci_assignable_list_free()...
Paul Durrant [Tue, 8 Dec 2020 19:30:21 +0000 (19:30 +0000)]
libxl: add libxl_device_pci_assignable_list_free()...

... to be used by callers of libxl_device_pci_assignable_list().

Currently there is no API for callers of libxl_device_pci_assignable_list()
to free the list. The xl function pciassignable_list() calls
libxl_device_pci_dispose() on each element of the returned list, but
libxl_pci_assignable() in libxl_pci.c does not. Neither does the implementation
of libxl_device_pci_assignable_list() call libxl_device_pci_init().

This patch adds the new API function, makes sure it is used everywhere and
also modifies libxl_device_pci_assignable_list() to initialize list
entries rather than just zeroing them.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: make sure callers of libxl_device_pci_list() free the list after use
Paul Durrant [Tue, 8 Dec 2020 19:30:20 +0000 (19:30 +0000)]
libxl: make sure callers of libxl_device_pci_list() free the list after use

A previous patch introduced libxl_device_pci_list_free() which should be used
by callers of libxl_device_pci_list() to properly dispose of the exported
'libxl_device_pci' types and the free the memory holding them. Whilst all
current callers do ensure the memory is freed, only the code in xl's
pcilist() function actually calls libxl_device_pci_dispose(). As it stands
this laxity does not lead to any memory leaks, but the simple addition of
.e.g. a 'string' into the idl definition of 'libxl_device_pci' would lead
to leaks.

This patch makes sure all callers of libxl_device_pci_list() can call
libxl_device_pci_list_free() by keeping copies of 'libxl_device_pci'
structures inline in 'pci_add_state' and 'pci_remove_state' (and also making
sure these are properly disposed at the end of the operations) rather
than keeping pointers to the structures returned by libxl_device_pci_list().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: remove get_all_assigned_devices() from libxl_pci.c
Paul Durrant [Tue, 8 Dec 2020 19:30:19 +0000 (19:30 +0000)]
libxl: remove get_all_assigned_devices() from libxl_pci.c

Use of this function is a very inefficient way to check whether a device
has already been assigned.

This patch adds code that saves the domain id in xenstore at the point of
assignment, and removes it again when the device id de-assigned (or the
domain is destroyed). It is then straightforward to check whether a device
has been assigned by checking whether a device has a saved domain id.

NOTE: To facilitate the xenstore check it is necessary to move the
      pci_info_xs_read() earlier in libxl_pci.c. To keep related functions
      together, the rest of the pci_info_xs_XXX() functions are moved too.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: remove unnecessary check from libxl__device_pci_add()
Paul Durrant [Tue, 8 Dec 2020 19:30:18 +0000 (19:30 +0000)]
libxl: remove unnecessary check from libxl__device_pci_add()

The code currently checks explicitly whether the device is already assigned,
but this is actually unnecessary as assigned devices do not form part of
the list returned by libxl_device_pci_assignable_list() and hence the
libxl_pci_assignable() test would have already failed.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: generalise 'driver_path' xenstore access functions in libxl_pci.c
Paul Durrant [Tue, 8 Dec 2020 19:30:17 +0000 (19:30 +0000)]
libxl: generalise 'driver_path' xenstore access functions in libxl_pci.c

For the purposes of re-binding a device to its previous driver
libxl__device_pci_assignable_add() writes the driver path into xenstore.
This path is then read back in libxl__device_pci_assignable_remove().

The functions that support this writing to and reading from xenstore are
currently dedicated for this purpose and hence the node name 'driver_path'
is hard-coded. This patch generalizes these utility functions and passes
'driver_path' as an argument. Subsequent patches will invoke them to
access other nodes.

NOTE: Because functions will have a broader use (other than storing a
      driver path in lieu of pciback) the base xenstore path is also
      changed from '/libxl/pciback' to '/libxl/pci'.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: stop using aodev->device_config in libxl__device_pci_add()...
Paul Durrant [Tue, 8 Dec 2020 19:30:16 +0000 (19:30 +0000)]
libxl: stop using aodev->device_config in libxl__device_pci_add()...

... to hold a pointer to the device.

There is already a 'pci' field in 'pci_add_state' so simply use that from
the start. This also allows the 'pci' (#3) argument to be dropped from
do_pci_add().

NOTE: This patch also changes the type of the 'pci_domid' field in
      'pci_add_state' from 'int' to 'libxl_domid' which is more appropriate
      given what the field is used for.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: remove extraneous arguments to do_pci_remove() in libxl_pci.c
Paul Durrant [Tue, 8 Dec 2020 19:30:15 +0000 (19:30 +0000)]
libxl: remove extraneous arguments to do_pci_remove() in libxl_pci.c

Both 'domid' and 'pci' are available in 'pci_remove_state' so there is no
need to also pass them as separate arguments.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: s/detatched/detached in libxl_pci.c
Paul Durrant [Tue, 8 Dec 2020 19:30:14 +0000 (19:30 +0000)]
libxl: s/detatched/detached in libxl_pci.c

Simply spelling correction. Purely cosmetic fix.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: add/recover 'rdm_policy' to/from PCI backend in xenstore
Paul Durrant [Tue, 8 Dec 2020 19:30:13 +0000 (19:30 +0000)]
libxl: add/recover 'rdm_policy' to/from PCI backend in xenstore

Other parameters, such as 'msitranslate' and 'permissive' are dealt with
but 'rdm_policy' appears to be have been completely missed.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: Make sure devices added by pci-attach are reflected in the config
Paul Durrant [Tue, 8 Dec 2020 19:30:12 +0000 (19:30 +0000)]
libxl: Make sure devices added by pci-attach are reflected in the config

Currently libxl__device_pci_add_xenstore() is broken in that does not
update the domain's configuration for the first device added (which causes
creation of the overall backend area in xenstore). This can be easily observed
by running 'xl list -l' after adding a single device: the device will be
missing.

This patch fixes the problem and adds a DEBUG log line to allow easy
verification that the domain configuration is being modified. Also, the use
of libxl__device_generic_add() is dropped as it leads to a confusing situation
where only partial backend information is written under the xenstore
'/libxl' path. For LIBXL__DEVICE_KIND_PCI devices the only definitive
information in xenstore is under '/local/domain/0/backend' (the '0' being
hard-coded).

NOTE: This patch includes a whitespace in add_pcis_done().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: make libxl__device_list() work correctly for LIBXL__DEVICE_KIND_PCI...
Paul Durrant [Tue, 8 Dec 2020 19:30:11 +0000 (19:30 +0000)]
libxl: make libxl__device_list() work correctly for LIBXL__DEVICE_KIND_PCI...

... devices.

Currently there is an assumption built into libxl__device_list() that device
backends are fully enumarated under the '/libxl' path in xenstore. This is
not the case for PCI backend devices, which are only properly enumerated
under '/local/domain/0/backend'.

This patch adds a new get_path() method to libxl__device_type to allow a
backend implementation (such as PCI) to specify the xenstore path where
devices are enumerated and modifies libxl__device_list() to use this method
if it is available. Also, if the get_num() method is defined then the
from_xenstore() method expects to be passed the backend path without the device
number concatenated, so this issue is also rectified.

Having made libxl__device_list() work correctly, this patch removes the
open-coded libxl_pci_device_pci_list() in favour of an evaluation of the
LIBXL_DEFINE_DEVICE_LIST() macro. This has the side-effect of also defining
libxl_pci_device_pci_list_free() which will be used in subsequent patches.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoxl: s/pcidev/pci where possible
Paul Durrant [Tue, 8 Dec 2020 19:30:10 +0000 (19:30 +0000)]
xl: s/pcidev/pci where possible

To improve naming consistency, replaces occurrences of 'pcidev' with 'pci'.
The only remaining use of the term should be in relation to
'libxl_domain_config' where there are fields named 'pcidevs' and 'num_pcidevs'.

Purely cosmetic. No functional change.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agolibxl: s/pcidev/pci and remove DEFINE_DEVICE_TYPE_STRUCT_X
Paul Durrant [Tue, 8 Dec 2020 19:30:09 +0000 (19:30 +0000)]
libxl: s/pcidev/pci and remove DEFINE_DEVICE_TYPE_STRUCT_X

The seemingly arbitrary use of 'pci' and 'pcidev' in the code in libxl_pci.c
is confusing and also compromises use of some macros used for other device
types. Indeed it seems that DEFINE_DEVICE_TYPE_STRUCT_X exists solely because
of this duality.

This patch purges use of 'pcidev' from the libxl internal code, but
unfortunately the 'pcidevs' and 'num_pcidevs' fields in 'libxl_domain_config'
are part of the API and need to be retained to avoid breaking callers,
particularly libvirt.

DEFINE_DEVICE_TYPE_STRUCT_X is still removed to avoid the special case in
libxl_pci.c but DEFINE_DEVICE_TYPE_STRUCT is given an extra 'array' argument
which is used to identify the fields in 'libxl_domain_config' relating to
the device type.

NOTE: Some of the more gross formatting errors (such as lack of spaces after
      keywords) that came into context have been fixed in libxl_pci.c.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools: remove unused ORDER_LONG
Olaf Hering [Wed, 9 Dec 2020 15:54:50 +0000 (16:54 +0100)]
tools: remove unused ORDER_LONG

There are no users left, xenpaging has its own variant.
The last user was removed with commit 11d0044a168994de85b9b328452292852aedc871

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools: allocate bitmaps in units of unsigned long
Olaf Hering [Wed, 9 Dec 2020 15:54:49 +0000 (16:54 +0100)]
tools: allocate bitmaps in units of unsigned long

Allocate enough memory so that the returned pointer can be safely
accessed as an array of unsigned long.

The actual bitmap size in units of bytes, as returned by bitmap_size,
remains unchanged.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools/xenstore: rework path length check
Juergen Gross [Tue, 15 Dec 2020 15:04:11 +0000 (16:04 +0100)]
tools/xenstore: rework path length check

The different fixed limits for absolute and relative path lengths of
Xenstore nodes make it possible to create per-domain nodes via
absolute paths which are not accessible using relative paths, as the
two limits differ by 1024 characters.

Instead of this weird limits use only one limit, which applies to the
relative path length of per-domain nodes and to the absolute path
length of all other nodes. This means, the path length check is
applied to the path after removing a possible start of
"/local/domain/<n>/" with <n> being a domain id.

There has been the request to be able to limit the path lengths even
more, so an additional quota is added which can be applied to path
lengths. It is XENSTORE_REL_PATH_MAX (2048) per default, but can be
set to lower values. This is done via the new "-M" or "--path-max"
option when invoking xenstored.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoMAINTAINERS: add me as maintainer for tools/xenstore/
Juergen Gross [Tue, 8 Dec 2020 10:30:26 +0000 (11:30 +0100)]
MAINTAINERS: add me as maintainer for tools/xenstore/

I have been the major contributor for C Xenstore the past few years.

Add me as a maintainer for tools/xenstore/.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoexamples: Add PVH example to config example list
Elliott Mitchell [Tue, 15 Dec 2020 02:35:32 +0000 (18:35 -0800)]
examples: Add PVH example to config example list

Somewhat helpful to actually install the example configurations.

Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agox86/PV: guest_get_eff_kern_l1e() may still need to switch page tables
Jan Beulich [Tue, 15 Dec 2020 12:47:45 +0000 (13:47 +0100)]
x86/PV: guest_get_eff_kern_l1e() may still need to switch page tables

While indeed unnecessary for pv_ro_page_fault(), pv_map_ldt_shadow_page()
may run when guest user mode is active, and hence may need to switch to
the kernel page tables in order to retrieve an LDT page mapping.

Fixes: 9ff970564764 ("x86/mm: drop guest_get_eff_l1e()")
Reported-by: Manuel Bouyer <bouyer@antioche.eu.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Manuel Bouyer <bouyer@antioche.eu.org>
4 years agoevtchn/FIFO: re-order and synchronize (with) map_control_block()
Jan Beulich [Tue, 15 Dec 2020 12:46:37 +0000 (13:46 +0100)]
evtchn/FIFO: re-order and synchronize (with) map_control_block()

For evtchn_fifo_set_pending()'s check of the control block having been
set to be effective, ordering of respective reads and writes needs to be
ensured: The control block pointer needs to be recorded strictly after
the setting of all the queue heads, and it needs checking strictly
before any uses of them (this latter aspect was already guaranteed).

This is XSA-358 / CVE-2020-29570.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoevtchn/FIFO: add 2nd smp_rmb() to evtchn_fifo_word_from_port()
Jan Beulich [Tue, 15 Dec 2020 12:42:51 +0000 (13:42 +0100)]
evtchn/FIFO: add 2nd smp_rmb() to evtchn_fifo_word_from_port()

Besides with add_page_to_event_array() the function also needs to
synchronize with evtchn_fifo_init_control() setting both d->evtchn_fifo
and (subsequently) d->evtchn_port_ops.

This is XSA-359 / CVE-2020-29571.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agox86/irq: fix infinite loop in irq_move_cleanup_interrupt
Roger Pau Monné [Tue, 15 Dec 2020 12:42:16 +0000 (13:42 +0100)]
x86/irq: fix infinite loop in irq_move_cleanup_interrupt

If Xen enters irq_move_cleanup_interrupt with a dynamic vector below
IRQ_MOVE_CLEANUP_VECTOR pending in IRR (0x20 or 0x21) that's also
designated for a cleanup it will enter a loop where
irq_move_cleanup_interrupt continuously sends a cleanup IPI (vector
0x22) to itself while waiting for the vector with lower priority to be
injected - which will never happen because IRQ_MOVE_CLEANUP_VECTOR
takes precedence and it's always injected first.

Fix this by making sure vectors below IRQ_MOVE_CLEANUP_VECTOR are
marked as used and thus not available for APs. Also add some logic to
assert and prevent irq_move_cleanup_interrupt from entering such an
infinite loop, albeit that should never happen given the current code.

This is XSA-356 / CVE-2020-29567.

Fixes: 3fba06ba9f8 ('x86/IRQ: re-use legacy vector ranges on APs')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>