]> xenbits.xensource.com Git - people/sstabellini/xen-unstable.git/.git/log
people/sstabellini/xen-unstable.git/.git
5 years agoxen/arm: call iomem_permit_access for passthrough devices direct-map-1
Stefano Stabellini [Wed, 15 Apr 2020 00:42:21 +0000 (17:42 -0700)]
xen/arm: call iomem_permit_access for passthrough devices

iomem_permit_access should be called for MMIO regions of devices
assigned to a domain. Currently it is not called for MMIO regions of
passthrough devices of Dom0less guests. This patch fixes it.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
5 years agoxen/arm: if xen_force don't try to setup the IOMMU
Stefano Stabellini [Wed, 15 Apr 2020 00:42:21 +0000 (17:42 -0700)]
xen/arm: if xen_force don't try to setup the IOMMU

If xen_force (which means xen,force-assign-without-iommu was requested)
don't try to add the device to the IOMMU. Return early instead.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
5 years agoxen/arm: if is_domain_direct_mapped use native UART address for vPL011
Stefano Stabellini [Wed, 15 Apr 2020 00:42:21 +0000 (17:42 -0700)]
xen/arm: if is_domain_direct_mapped use native UART address for vPL011

We always use a fix address to map the vPL011 to domains. The address
could be a problem for domains that are directly mapped.

Instead, for domains that are directly mapped, reuse the address of the
physical UART on the platform to avoid potential clashes.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
5 years agoxen/arm: if is_domain_direct_mapped use native addresses for GICv3
Stefano Stabellini [Wed, 15 Apr 2020 00:42:21 +0000 (17:42 -0700)]
xen/arm: if is_domain_direct_mapped use native addresses for GICv3

Today we use native addresses to map the GICv3 for Dom0 and fixed
addresses for DomUs.

This patch changes the behavior so that native addresses are used for
any domain that is_domain_direct_mapped. The patch has to introduce one
#ifndef CONFIG_NEW_VGIC because the new vgic doesn't support GICv3.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
5 years agoxen/arm: if is_domain_direct_mapped use native addresses for GICv2
Stefano Stabellini [Wed, 15 Apr 2020 00:42:21 +0000 (17:42 -0700)]
xen/arm: if is_domain_direct_mapped use native addresses for GICv2

Today we use native addresses to map the GICv2 for Dom0 and fixed
addresses for DomUs.

This patch changes the behavior so that native addresses are used for
any domain that is_domain_direct_mapped.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
5 years agoxen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase
Stefano Stabellini [Wed, 15 Apr 2020 00:41:50 +0000 (17:41 -0700)]
xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase

To be uniform with the old vgic. Name uniformity will become immediately
useful in the following patch.

In vgic_v2_map_resources, use the fields in struct vgic_dist rather than
local variables.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
5 years agoxen/arm: reserve 1:1 memory for direct_map domUs
Stefano Stabellini [Wed, 15 Apr 2020 00:40:50 +0000 (17:40 -0700)]
xen/arm: reserve 1:1 memory for direct_map domUs

Use reserve_domheap_pages to implement the direct-map ranges allocation
for DomUs.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
5 years agoxen: introduce reserve_heap_pages
Stefano Stabellini [Wed, 15 Apr 2020 00:39:50 +0000 (17:39 -0700)]
xen: introduce reserve_heap_pages

Introduce a function named reserve_heap_pages (similar to
alloc_heap_pages) that allocates a requested memory range. Call
__alloc_heap_pages for the implementation.

Change __alloc_heap_pages so that the original page doesn't get
modified, giving back unneeded memory top to bottom rather than bottom
to top.

Also introduce a function named reserve_domheap_pages, similar to
alloc_domheap_pages, that checks memflags before calling
reserve_heap_pages. It also assign_pages to the domain on success.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: andrew.cooper3@citrix.com
CC: jbeulich@suse.com
CC: George Dunlap <george.dunlap@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
5 years agoxen: split alloc_heap_pages in two halves for reusability
Stefano Stabellini [Wed, 15 Apr 2020 00:37:56 +0000 (17:37 -0700)]
xen: split alloc_heap_pages in two halves for reusability

This patch splits the implementation of alloc_heap_pages into two halves
so that the second half can be reused by the next patch.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: andrew.cooper3@citrix.com
CC: jbeulich@suse.com
CC: George Dunlap <george.dunlap@citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
---
Comments are welcome. I am not convinced that this is the right way to
split it. Please let me know if you have any suggestions.

5 years agoxen/arm: introduce 1:1 mapping for domUs
Stefano Stabellini [Wed, 15 Apr 2020 00:37:09 +0000 (17:37 -0700)]
xen/arm: introduce 1:1 mapping for domUs

In some cases it is desirable to map domU memory 1:1 (guest physical ==
physical.) For instance, because we want to assign a device to the domU
but the IOMMU is not present or cannot be used. In these cases, other
mechanisms should be used for DMA protection, e.g. a MPU.

This patch introduces a new device tree option for dom0less guests to
request a domain to be directly mapped. It also specifies the memory
ranges. This patch documents the new attribute and parses it at boot
time. (However, the implementation of 1:1 mapping is missing and just
BUG() out at the moment.)  Finally the patch sets the new direct_map
flag for DomU domains.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
5 years agoxen/arm: introduce arch_xen_dom_flags and direct_map
Stefano Stabellini [Wed, 15 Apr 2020 00:35:01 +0000 (17:35 -0700)]
xen/arm: introduce arch_xen_dom_flags and direct_map

Introduce a new field in struct xen_dom_flags to store arch-specific
flags.

Add an ARM-specific flag to specify that the domain should be directly
mapped (guest physical addresses == physical addresses).

Also, add a direct_map flag under struct arch_domain and use it to
implement is_domain_direct_mapped.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: andrew.cooper3@citrix.com
CC: jbeulich@suse.com
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
5 years agoxen: introduce xen_dom_flags
Stefano Stabellini [Wed, 15 Apr 2020 00:33:20 +0000 (17:33 -0700)]
xen: introduce xen_dom_flags

We are passing an extra special boolean flag at domain creation to
specify whether we want to the domain to be privileged (i.e. dom0) or
not. Another flag will be introduced later in this series.

Introduce a new struct xen_dom_flags and move the privileged flag to it.
Other flags will be added to struct xen_dom_flags.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
CC: andrew.cooper3@citrix.com
CC: jbeulich@suse.com
CC: George Dunlap <George.Dunlap@eu.citrix.com>
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wl@xen.org>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: George Dunlap <george.dunlap@eu.citrix.com>
CC: Dario Faggioli <dfaggioli@suse.com>
5 years agox86/mem_sharing: Fix build with !CONFIG_XSM
Andrew Cooper [Thu, 9 Apr 2020 20:44:11 +0000 (21:44 +0100)]
x86/mem_sharing: Fix build with !CONFIG_XSM

A build fails with:

  mem_sharing.c: In function ‘copy_special_pages’:
  mem_sharing.c:1649:9: error: ‘HVM_PARAM_STORE_PFN’ undeclared (first use in this function)
           HVM_PARAM_STORE_PFN,
           ^~~~~~~~~~~~~~~~~~~
  ...

This is because xsm/xsm.h includes xsm/dummy.h for the !CONFIG_XSM case, which
brings public/hvm/params.h in.

Fixes: 41548c5472a "mem_sharing: VM forking"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agoxen/x86: ioapic: Simplify ioapic_init()
Julien Grall [Fri, 27 Mar 2020 18:36:20 +0000 (18:36 +0000)]
xen/x86: ioapic: Simplify ioapic_init()

Since commit 9facd54a45 "x86/ioapic: Add register level checks to detect
bogus io-apic entries", Xen is able to cope with IO APICs not mapped in
the fixmap.

Therefore the whole logic to allocate a fake page for some IO APICs is
unnecessary.

With the logic removed, the code can be simplified a lot as we don't
need to go through all the IO APIC if SMP has not been detected or a
bogus zero IO-APIC address has been detected.

To avoid another level of tabulation, the simplification is now moved in
its own function.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/x86: ioapic: Rename init_ioapic_mappings() to ioapic_init()
Julien Grall [Fri, 27 Mar 2020 18:23:21 +0000 (18:23 +0000)]
xen/x86: ioapic: Rename init_ioapic_mappings() to ioapic_init()

The function init_ioapic_mappings() is doing more than initialization
mappings. It is also initialization the number of IRQs/GSIs supported.

So rename the function to ioapic_init().

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/x86: ioapic: Use true/false in bad_ioapic_register()
Julien Grall [Fri, 27 Mar 2020 18:16:22 +0000 (18:16 +0000)]
xen/x86: ioapic: Use true/false in bad_ioapic_register()

bad_ioapic_register() is returning a bool, so we should switch to
true/false.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/xl: Remove the filelock when building VM if autoballooning is off
Dmitry Isaykin [Thu, 9 Apr 2020 14:55:50 +0000 (15:55 +0100)]
tools/xl: Remove the filelock when building VM if autoballooning is off

The presence of this filelock does not allow building several VMs at the same
time. This filelock was added to prevent other xl instances from using memory
freed for the currently building VM in autoballoon mode.

Signed-off-by: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxc/migration: Abort migration on precopy policy request
Andrew Panyakin [Tue, 7 Apr 2020 14:52:22 +0000 (14:52 +0000)]
libxc/migration: Abort migration on precopy policy request

libxc defines XGS_POLICY_ABORT for precopy policy to signal that migration
should be aborted (eg. if the estimated pause time is too huge for the
instance). Default simple precopy policy never returns that, but it could be
overriden with a custom one.

Signed-off-by: Andrew Panyakin <apanyaki@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
[wei: fix coding style issue]

5 years agox86/PoD: correct ordering of checks in p2m_pod_zero_check()
Jan Beulich [Wed, 8 Apr 2020 11:12:28 +0000 (13:12 +0200)]
x86/PoD: correct ordering of checks in p2m_pod_zero_check()

Commit 0537d246f8db ("mm: add 'is_special_page' inline function...")
moved the is_special_page() checks first in its respective changes to
PoD code. While this is fine for p2m_pod_zero_check_superpage(), the
validity of the MFN is inferred in both cases from the p2m_is_ram()
check, which therefore also needs to come first in this 2nd instance.

Take the opportunity and address latent UB here as well - transform
the MFN into struct page_info * only after having established that
this is a valid page.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: __hvm_copy()'s size parameter is an unsigned quantity
Jan Beulich [Wed, 8 Apr 2020 11:11:24 +0000 (13:11 +0200)]
x86/HVM: __hvm_copy()'s size parameter is an unsigned quantity

There are no negative sizes. Make the function's parameter as well as
that of its derivates "unsigned int". Similarly make its local "count"
variable "unsigned int", and drop "todo" altogether. Don't use min_t()
anymore to calculate "count". Restrict its scope as well as that of
other local variables of the function.

While at it I've also noticed that {copy_{from,to},clear}_user_hvm()
have been returning "unsigned long" for no apparent reason, as their
respective "size" parameters have already been "unsigned int". Adjust
this as well as a slightly wrong comment there at the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <pdurrant@amzn.com>
5 years agomem_sharing: reset a fork
Tamas K Lengyel [Wed, 8 Apr 2020 11:03:50 +0000 (13:03 +0200)]
mem_sharing: reset a fork

Implement hypercall that allows a fork to shed all memory that got allocated
for it during its execution and re-load its vCPU context from the parent VM.
This allows the forked VM to reset into the same state the parent VM is in a
faster way then creating a new fork would be. Measurements show about a 2x
speedup during normal fuzzing operations. Performance may vary depending how
much memory got allocated for the forked VM. If it has been completely
deduplicated from the parent VM then creating a new fork would likely be more
performant.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agomem_sharing: VM forking
Tamas K Lengyel [Wed, 8 Apr 2020 10:59:58 +0000 (12:59 +0200)]
mem_sharing: VM forking

VM forking is the process of creating a domain with an empty memory space and a
parent domain specified from which to populate the memory when necessary. For
the new domain to be functional the VM state is copied over as part of the fork
operation (HVM params, hap allocation, etc).

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agoconfig: use mini-os master for unstable
Juergen Gross [Tue, 7 Apr 2020 13:48:31 +0000 (15:48 +0200)]
config: use mini-os master for unstable

We haven't used mini-os master for about 2 years now due to a stubdom
test failing [1]. Booting a guest with mini-os master used for building
stubdom didn't reveal any problem, so use master for unstable in order
to let OSStest find any problems not showing up in the local test.

[1]: https://lists.xen.org/archives/html/minios-devel/2018-04/msg00015.html

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/ucode: Simplify the ops->collect_cpu_info() API
Andrew Cooper [Wed, 1 Apr 2020 15:18:32 +0000 (16:18 +0100)]
x86/ucode: Simplify the ops->collect_cpu_info() API

All callers pass &this_cpu(cpu_sig) for the cpu_sig parameter, and all
implementations unconditionally return 0.  Simplify it to be void.

Drop the long-stale comment on the AMD side, whose counterpart in
start_update() used to be "collect_cpu_info() doesn't fail so we're fine".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode: Drop ops->free_patch()
Andrew Cooper [Wed, 1 Apr 2020 15:32:16 +0000 (16:32 +0100)]
x86/ucode: Drop ops->free_patch()

With the newly cleaned up vendor logic, each struct microcode_patch is a
trivial object in memory with no dependent allocations.

This is unlikely to change moving forwards, and function pointers are
expensive in the days of retpoline.  Move the responsibility to xfree() back
to common code.  If the need does arise in the future, we can consider
reintroducing the hook.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode: Don't try to cope with NULL pointers in apply_microcode()
Andrew Cooper [Wed, 1 Apr 2020 21:45:22 +0000 (22:45 +0100)]
x86/ucode: Don't try to cope with NULL pointers in apply_microcode()

No paths to apply_microcode() pass a NULL pointer, and other hooks don't
tolerate one in the first place.  We can expect the core logic not to pass us
junk, so drop the checks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode: Drop ops->match_cpu()
Andrew Cooper [Wed, 1 Apr 2020 16:45:52 +0000 (17:45 +0100)]
x86/ucode: Drop ops->match_cpu()

It turns out there are no callers of the hook().  The only callers are the
local, which can easily be rearranged to use the appropriate internal helper.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/intel: Remove one CPUID from collect_cpu_info()
Andrew Cooper [Wed, 1 Apr 2020 14:52:43 +0000 (15:52 +0100)]
x86/ucode/intel: Remove one CPUID from collect_cpu_info()

The CPUID instruction is expensive.  No point executing it twice when once
will do fine.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agodocs: Render .md files using pandoc
Andrew Cooper [Fri, 3 Apr 2020 13:12:12 +0000 (14:12 +0100)]
docs: Render .md files using pandoc

This fixes the fact that qemu-deprivilege.md, non-cooperative-migration.md and
xenstore-migration.md don't currently get rendered at all, and are therefore
missing from xenbits.xen.org/docs

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Backport: 4.12

5 years agotools/xenstore: fix a use after free problem in xenstored
Juergen Gross [Fri, 3 Apr 2020 12:03:40 +0000 (13:03 +0100)]
tools/xenstore: fix a use after free problem in xenstored

Commit 562a1c0f7ef3fb ("tools/xenstore: dont unlink connection object
twice") introduced a potential use after free problem in
domain_cleanup(): after calling talloc_unlink() for domain->conn
domain->conn is set to NULL. The problem is that domain is registered
as talloc child of domain->conn, so it might be freed by the
talloc_unlink() call.

With Xenstore being single threaded there are normally no concurrent
memory allocations running and freeing a virtual memory area normally
doesn't result in that area no longer being accessible. A problem
could occur only in case either a signal received results in some
memory allocation done in the signal handler (SIGHUP is a primary
candidate leading to reopening the log file), or in case the talloc
framework would do some internal memory allocation during freeing of
the memory (which would lead to clobbering of the freed domain
structure).

Fixes: 562a1c0f7ef3fb ("tools/xenstore: dont unlink connection object twice")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agox86/p2m: make p2m_remove_page()'s parameters type-safe
Jan Beulich [Fri, 3 Apr 2020 15:19:11 +0000 (17:19 +0200)]
x86/p2m: make p2m_remove_page()'s parameters type-safe

Also add a couple of blank lines.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: use available local variable in guest_physmap_add_entry()
Jan Beulich [Fri, 3 Apr 2020 15:17:29 +0000 (17:17 +0200)]
x86/p2m: use available local variable in guest_physmap_add_entry()

The domain is being passed in - no need to obtain it from p2m->domain.
Also drop a pointless cast and simplify expressions while touching this
code anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/cpuidle: correct Cannon Lake residency MSRs
Jan Beulich [Fri, 3 Apr 2020 15:15:58 +0000 (17:15 +0200)]
x86/cpuidle: correct Cannon Lake residency MSRs

As per SDM rev 071 Cannon Lake has
- no CC3 residency MSR at 3FC,
- a CC1 residency MSR ar 660 (like various Atoms),
- a useless (always zero) CC3 residency MSR at 662.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: use macro DIV_ROUND_UP
Simran Singhal [Fri, 3 Apr 2020 08:58:18 +0000 (10:58 +0200)]
x86: use macro DIV_ROUND_UP

Use the DIV_ROUND_UP macro to replace open-coded divisor calculation
(((n) + (d) - 1) / (d)) to improve readability.

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: drop pointless nested variable from guest_physmap_add_entry()
Jan Beulich [Fri, 3 Apr 2020 08:57:41 +0000 (10:57 +0200)]
x86/p2m: drop pointless nested variable from guest_physmap_add_entry()

There's an outer scope rc already, and its use for the mem-sharing logic
does not conflict with its use elsewhere in the function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: don't assert that the passed in MFN matches for a remove
Jan Beulich [Fri, 3 Apr 2020 08:56:55 +0000 (10:56 +0200)]
x86/p2m: don't assert that the passed in MFN matches for a remove

guest_physmap_remove_page() gets handed an MFN from the outside, yet
takes the necessary lock to prevent further changes to the GFN <-> MFN
mapping itself. While some callers, in particular guest_remove_page()
(by way of having called get_gfn_query()), hold the GFN lock already,
various others (most notably perhaps the 2nd instance in
xenmem_add_to_physmap_one()) don't. While it also is an option to fix
all the callers, deal with the issue in p2m_remove_page() instead:
Replace the ASSERT() by a conditional and split the loop into two, such
that all checking gets done before any modification would occur.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: don't ignore p2m_remove_page()'s return value
Jan Beulich [Fri, 3 Apr 2020 08:56:10 +0000 (10:56 +0200)]
x86/p2m: don't ignore p2m_remove_page()'s return value

It's not very nice to return from guest_physmap_add_entry() after
perhaps already having made some changes to the P2M, but this is pre-
existing practice in the function, and imo better than ignoring errors.

Take the liberty and replace an mfn_add() instance with a local variable
already holding the result (as proven by the check immediately ahead).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: inherit HOSTCC when building 32-bit harness on 64-bit host
Jan Beulich [Fri, 3 Apr 2020 08:55:12 +0000 (10:55 +0200)]
x86emul: inherit HOSTCC when building 32-bit harness on 64-bit host

We're deliberately bringing XEN_COMPILE_ARCH and XEN_TARGET_ARCH out of
sync in this case, and hence HOSTCC won't get set from CC. Therefore
without this addition HOSTCC would not match a possible make command
line override of CC, but default to "gcc", likely causing the build to
fail for test_x86_emulator.c on systems with too old a gcc.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: suppress "not built" warning for test harness'es run targets
Jan Beulich [Fri, 3 Apr 2020 08:48:39 +0000 (10:48 +0200)]
x86emul: suppress "not built" warning for test harness'es run targets

The run* targets can be used to test whatever the tool chain is capable
of building, as long as at least the main harness source file builds.
Don't probe the tools chain, in particular to avoid issuing the warning,
in this case. While looking into this I also noticed the wording of the
respective comment isn't quite right, which therefore gets altered at
the same time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agocredit2: fix credit reset happening too few times
Dario Faggioli [Fri, 3 Apr 2020 08:46:53 +0000 (10:46 +0200)]
credit2: fix credit reset happening too few times

There is a bug in commit 5e4b4199667b9 ("xen: credit2: only reset
credit on reset condition"). In fact, the aim of that commit was to
make sure that we do not perform too many credit reset operations
(which are not super cheap, and in an hot-path). But the check used
to determine whether a reset is necessary was the wrong one.

In fact, knowing just that some vCPUs have been skipped, while
traversing the runqueue (in runq_candidate()), is not enough. We
need to check explicitly whether the first vCPU in the runqueue
has a negative amount of credit.

Since a trace record is changed, this patch updates xentrace format file
and xenalyze as well

This should be backported.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agocredit2: avoid vCPUs to ever reach lower credits than idle
Dario Faggioli [Fri, 3 Apr 2020 08:45:43 +0000 (10:45 +0200)]
credit2: avoid vCPUs to ever reach lower credits than idle

There have been report of stalls of guest vCPUs, when Credit2 was used.
It seemed like these vCPUs were not getting scheduled for very long
time, even under light load conditions (e.g., during dom0 boot).

Investigations led to the discovery that --although rarely-- it can
happen that a vCPU manages to run for very long timeslices. In Credit2,
this means that, when runtime accounting happens, the vCPU will lose a
large quantity of credits. This in turn may lead to the vCPU having less
credits than the idle vCPUs (-2^30). At this point, the scheduler will
pick the idle vCPU, instead of the ready to run vCPU, for a few
"epochs", which often times is enough for the guest kernel to think the
vCPU is not responding and crashing.

An example of this situation is shown here. In fact, we can see d0v1
sitting in the runqueue while all the CPUs are idle, as it has
-1254238270 credits, which is smaller than -2^30 = −1073741824:

    (XEN) Runqueue 0:
    (XEN)   ncpus              = 28
    (XEN)   cpus               = 0-27
    (XEN)   max_weight         = 256
    (XEN)   pick_bias          = 22
    (XEN)   instload           = 1
    (XEN)   aveload            = 293391 (~111%)
    (XEN)   idlers: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
    (XEN)   tickled: 00,00000000,00000000,00000000,00000000,00000000,00000000
    (XEN)   fully idle cores: 00,00000000,00000000,00000000,00000000,00000000,0fffffff
    [...]
    (XEN) Runqueue 0:
    (XEN) CPU[00] runq=0, sibling=00,..., core=00,...
    (XEN) CPU[01] runq=0, sibling=00,..., core=00,...
    [...]
    (XEN) CPU[26] runq=0, sibling=00,..., core=00,...
    (XEN) CPU[27] runq=0, sibling=00,..., core=00,...
    (XEN) RUNQ:
    (XEN)     0: [0.1] flags=0 cpu=5 credit=-1254238270 [w=256] load=262144 (~100%)

We certainly don't want, under any circumstance, this to happen.
Let's, therefore, define a minimum amount of credits a vCPU can have.
During accounting, we make sure that, for however long the vCPU has
run, it will never get to have less than such minimum amount of
credits. Then, we set the credits of the idle vCPU to an even
smaller value.

NOTE: investigations have been done about _how_ it is possible for a
vCPU to execute for so much time that its credits becomes so low. While
still not completely clear, there are evidence that:
- it only happens very rarely,
- it appears to be both machine and workload specific,
- it does not look to be a Credit2 (e.g., as it happens when
  running with Credit1 as well) issue, or a scheduler issue.

This patch makes Credit2 more robust to events like this, whatever
the cause is, and should hence be backported (as far as possible).

Reported-by: Glen <glenbarney@gmail.com>
Reported-by: Tomas Mozes <hydrapolic@gmail.com>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/ucode/amd: Rework parsing logic in cpu_request_microcode()
Andrew Cooper [Mon, 30 Mar 2020 18:39:08 +0000 (19:39 +0100)]
x86/ucode/amd: Rework parsing logic in cpu_request_microcode()

cpu_request_microcode() is still a confusing mess to follow, with sub
functions responsible for maintaining offset.  Rewrite it so all container
structure handling is in this one function.

Rewrite struct mpbhdr as struct container_equiv_table to aid parsing.  Drop
container_fast_forward() entirely, and shrink scan_equiv_cpu_table() to just
its searching/caching logic.

container_fast_forward() gets logically folded into the microcode blob
scanning loop, except that a skip path is inserted, which is conditional on
whether scan_equiv_cpu_table() thinks there is appropriate microcode to find.

With this change, we now scan to the end of all provided microcode containers,
and no longer give up at the first applicable one.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Fold structures together
Andrew Cooper [Mon, 30 Mar 2020 16:58:29 +0000 (17:58 +0100)]
x86/ucode/amd: Fold structures together

With all the necessary cleanup now in place, fold struct microcode_header_amd
into struct microcode_patch and drop the struct microcode_amd temporary
ifdef-ary.

This removes the memory allocation of struct microcode_amd which is a single
pointer to a separately allocated object, and therefore a waste.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Remove gratuitous memory allocations from cpu_request_microcode()
Andrew Cooper [Mon, 30 Mar 2020 17:50:25 +0000 (18:50 +0100)]
x86/ucode/amd: Remove gratuitous memory allocations from cpu_request_microcode()

Just as on the Intel side, there is no point having
get_ucode_from_buffer_amd() make $N memory allocations and free $N-1 of them.

Delete get_ucode_from_buffer_amd() and rewrite the loop in
cpu_request_microcode() to have 'saved' point into 'buf' until we finally
decide to duplicate that blob and return it to our caller.

Introduce a new struct container_microcode to simplify interpreting the
container format.  Doubly indent the logic to substantially reduce the churn
in a later change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Rename bufsize to size in cpu_request_microcode()
Andrew Cooper [Mon, 30 Mar 2020 18:56:36 +0000 (19:56 +0100)]
x86/ucode/amd: Rename bufsize to size in cpu_request_microcode()

To simplify future cleanup, rename this variable.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Alter API for microcode_fits()
Andrew Cooper [Mon, 30 Mar 2020 16:44:17 +0000 (17:44 +0100)]
x86/ucode/amd: Alter API for microcode_fits()

Although it is logically a step in the wrong direction overall, it simplifies
the rearranging of cpu_request_microcode() substantially for microcode_fits()
to take struct microcode_header_amd directly, and not require an intermediate
struct microcode_amd pointing at it.

Make this change (taking time to rename 'mc_amd' to its eventual 'patch' to
reduce the churn in the series), and a later cleanup will make it uniformly
take a struct microcode_patch.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Move verify_patch_size() into get_ucode_from_buffer_amd()
Andrew Cooper [Mon, 30 Mar 2020 17:10:50 +0000 (18:10 +0100)]
x86/ucode/amd: Move verify_patch_size() into get_ucode_from_buffer_amd()

We only stash the microcode blob size so it can be audited in
microcode_fits().  However, the patch size check depends only on the CPU
family.

Move the check earlier to when we are parsing the container, which avoids
caching bad microcode in the first place, and allows us to avoid storing the
size at all.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Overhaul the equivalent cpu table handling completely
Andrew Cooper [Fri, 27 Mar 2020 16:48:16 +0000 (16:48 +0000)]
x86/ucode/amd: Overhaul the equivalent cpu table handling completely

We currently copy the entire equivalency table, and the single correct
microcode.  This is not safe to heterogeneous scenarios, and as Xen doesn't
support such situations to begin with, can be used to simplify things further.

The CPUID.1.EAX => processor_rev_id mapping is fixed for an individual part.
We can cache the single appropriate entry on first discovery, and forgo
duplicating the entire table.

Alter install_equiv_cpu_table() to be scan_equiv_cpu_table() which is
responsible for checking the equivalency table and caching appropriate
details.  It now has a check for finding a different mapping (which indicates
that one of the tables we've seen is definitely wrong).

A return value of -ESRCH is now used to signify "everything fine, but nothing
applicable for the current CPU", which is used to select the
container_fast_forward() path.

Drop the printk(), as each applicable error path in scan_equiv_cpu_table()
already prints diagnostics.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Collect CPUID.1.EAX in collect_cpu_info()
Andrew Cooper [Fri, 27 Mar 2020 13:20:12 +0000 (13:20 +0000)]
x86/ucode/amd: Collect CPUID.1.EAX in collect_cpu_info()

... rather than collecting it repeatedly in microcode_fits().  This brings the
behaviour in line with the Intel side.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Don't use void * for microcode_patch->mpb
Andrew Cooper [Fri, 27 Mar 2020 12:48:08 +0000 (12:48 +0000)]
x86/ucode/amd: Don't use void * for microcode_patch->mpb

All code works fine with it having its correct type, and it even allows us to
drop two casts in a printk().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Move check_final_patch_levels() to apply_microcode()
Andrew Cooper [Mon, 30 Mar 2020 12:14:01 +0000 (13:14 +0100)]
x86/ucode/amd: Move check_final_patch_levels() to apply_microcode()

The microcode revision of whichever CPU runs cpu_request_microcode() is not
necessarily applicable to other CPUs.

If the BIOS left us with asymmetric microcode, rejecting updates in
cpu_request_microcode() would prevent us levelling the system even if only up
to the final level.  Also, failing to cache microcode misses an opportunity to
get beyond the final level via the S3 path.

Move check_final_patch_levels() earlier and use it in apply_microcode().
Reword the error message to be more informative, and use -ENXIO as this corner
case has nothing to do with permissions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Fix more potential buffer overruns with microcode parsing
Andrew Cooper [Fri, 27 Mar 2020 16:50:13 +0000 (16:50 +0000)]
x86/ucode/amd: Fix more potential buffer overruns with microcode parsing

cpu_request_microcode() doesn't know the buffer is at least 4 bytes long
before inspecting UCODE_MAGIC.

install_equiv_cpu_table() doesn't know the boundary of the buffer it is
interpreting as an equivalency table.  This case was clearly observed at one
point in the past, given the subsequent overrun detection, but without
comprehending that the damage was already done.

Make the logic consistent with container_fast_forward() and pass size_left in
to install_equiv_cpu_table().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/intel: Fold structures together
Andrew Cooper [Fri, 20 Mar 2020 18:32:31 +0000 (18:32 +0000)]
x86/ucode/intel: Fold structures together

With all the necessary cleanup now in place, fold struct
microcode_header_intel into struct microcode_patch and drop the struct
microcode_intel temporary ifdef-ary.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/intel: Clean up microcode_sanity_check()
Andrew Cooper [Fri, 20 Mar 2020 17:41:39 +0000 (17:41 +0000)]
x86/ucode/intel: Clean up microcode_sanity_check()

Rewrite the size checks in a way which doesn't depend on Xen being compiled as
64bit.

Introduce a check missing from the old code, that total_size is a multiple of
1024 bytes, and drop unnecessary defines/macros/structures.

No practical change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/intel: Clean up microcode_update_match()
Andrew Cooper [Fri, 20 Mar 2020 18:11:52 +0000 (18:11 +0000)]
x86/ucode/intel: Clean up microcode_update_match()

Implement a new get_ext_sigtable() helper to abstract the logic for
identifying whether an extended signature table exists.  As part of this,
rename microcode_intel.bits to data and change its type so it can be usefully
used in combination with the datasize header field.

Also, replace the sigmatch() macro with a static inline with a more useful
API, and an explanation of why it is safe to drop one of the previous
conditionals.

No practical change in behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/intel: Reimplement get_{data,total}size() helpers
Andrew Cooper [Thu, 19 Mar 2020 15:47:31 +0000 (15:47 +0000)]
x86/ucode/intel: Reimplement get_{data,total}size() helpers

Every caller actually passes a struct microcode_header_intel *, but it is more
helpful to us longterm to take struct microcode_patch *.  Implement the
helpers with proper types, and leave a comment explaining the Pentium Pro/II
behaviour with empty {data,total}size fields.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/intel: Remove gratuitous memory allocations from cpu_request_microcode()
Andrew Cooper [Fri, 20 Mar 2020 17:01:33 +0000 (17:01 +0000)]
x86/ucode/intel: Remove gratuitous memory allocations from cpu_request_microcode()

cpu_request_microcode() needs to scan its container and duplicate one blob,
but the get_next_ucode_from_buffer() helper duplicates every blob in turn.
Furthermore, the length checking is only safe from overflow in 64bit builds.

Delete get_next_ucode_from_buffer() and alter the purpose of the saved
variable to simply point somewhere in buf until we're ready to return.

This is only a modest reduction in absolute code size, but avoids making
memory allocations for every blob in the container.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/intel: Adjust microcode_sanity_check() to not take void *
Andrew Cooper [Fri, 20 Mar 2020 16:03:22 +0000 (16:03 +0000)]
x86/ucode/intel: Adjust microcode_sanity_check() to not take void *

microcode_sanity_check()'s callers actually call it with a mixture of
microcode_intel(/patch) and microcode_header_intel pointers, which is fragile.

Rework it to take struct microcode_patch *, which in turn requires
microcode_update_match()'s type to be altered.

No functional change - compiled binary is identical.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode: Remove unnecessary indirection in struct microcode_patch
Jan Beulich [Fri, 27 Mar 2020 00:29:55 +0000 (00:29 +0000)]
x86/ucode: Remove unnecessary indirection in struct microcode_patch

Currently, each cpu_request_microcode() allocates a struct microcode_patch,
which is a single pointer to a separate allocated structure.  This is
wasteful.

Fixing this is complicated because the common microcode_free_patch() code is
responsible for freeing struct microcode_patch, despite this being asymmetric
with how it is allocated.

Make struct microcode_patch fully opaque to the common logic.  This involves
moving the responsibility for freeing struct microcode_patch fully into the
free_patch() hook.

In each vendor logic, use some temporary ifdef-ary (cleaned up in subsequent
changes) to reduce the churn as much as possible, and forgo allocating the
intermediate pointer in cpu_request_microcode().

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/dom0: fix copy of low 1MB data for PVH
Roger Pau Monné [Wed, 1 Apr 2020 10:36:57 +0000 (12:36 +0200)]
x86/dom0: fix copy of low 1MB data for PVH

The orders of start and end are inverted in order to calculate the
size of the copy operation.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86emul: support SYSRET
Jan Beulich [Wed, 1 Apr 2020 10:34:33 +0000 (12:34 +0200)]
x86emul: support SYSRET

This is to augment SYSCALL, which we've been supporting for quite some
time.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: vendor specific SYSCALL behavior
Jan Beulich [Wed, 1 Apr 2020 10:32:17 +0000 (12:32 +0200)]
x86emul: vendor specific SYSCALL behavior

AMD CPUs permit the insn everywhere (even outside of protected mode),
while Intel ones restrict it to 64-bit mode. While at it also comment
about the apparently missing CPUID bit check.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/HVM: fix AMD ECS handling for Fam10
Jan Beulich [Wed, 1 Apr 2020 10:28:30 +0000 (12:28 +0200)]
x86/HVM: fix AMD ECS handling for Fam10

The involved comparison was, very likely inadvertently, converted from
>= to > when making changes unrelated to the actual family range.

Fixes: 9841eb71ea87 ("x86/cpuid: Drop a guests cached x86 family and model information")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agotools/libxc: misc: Mark const the parameter 'params' of xc_set_parameters()
Julien Grall [Mon, 30 Mar 2020 19:21:53 +0000 (20:21 +0100)]
tools/libxc: misc: Mark const the parameter 'params' of xc_set_parameters()

The parameter 'params' of xc_set_parameters() should never be modified.
So mark it as const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools/libxc: misc: Mark const the parameter 'keys' of xc_send_debug_keys()
Julien Grall [Mon, 30 Mar 2020 19:21:52 +0000 (20:21 +0100)]
tools/libxc: misc: Mark const the parameter 'keys' of xc_send_debug_keys()

OCaml is using a string to describe the parameter 'keys' of
xc_send_debug_keys(). Since Ocaml 4.06.01, String_val() will return a
const char * when using -safe-string. This will result to a build
failure because xc_send_debug_keys() expects a char *.

The function should never modify the parameter 'keys' and therefore the
parameter should be const. Unfortunately, this is not directly possible
because DECLARE_HYPERCALL_BOUNCE() is expecting a non-const variable.

A new macro DECLARE_HYPERCALL_BOUNCE_IN() is introduced and will take
care of const parameter. The first user will be xc_send_debug_keys() but
this can be used in more place in the future.

Reported-by: Dario Faggioli <dfaggioli@suse.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/public: sysctl: set_parameter.params and debug.keys should be const
Julien Grall [Mon, 30 Mar 2020 19:21:51 +0000 (20:21 +0100)]
xen/public: sysctl: set_parameter.params and debug.keys should be const

The fields set_parameter.params and debug.keys should never be modified
by the hypervisor. So mark them as const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild,arm: Fix deps check of head.o
Anthony PERARD [Tue, 31 Mar 2020 10:30:47 +0000 (11:30 +0100)]
build,arm: Fix deps check of head.o

arm*/head.o isn't in obj-y or extra-y, so make don't load the
associated .*.d file (or .*.cmd file when if_changed will be used).
There is a workaround where .*.d file is added manually into DEPS.

Changing DEPS isn't needed, we can simply add head.o into extra-y and
the dependency files will be loaded.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agoxen/arm: Configure early printk via Kconfig
Anthony PERARD [Tue, 31 Mar 2020 10:30:46 +0000 (11:30 +0100)]
xen/arm: Configure early printk via Kconfig

At the moment, early printk can only be configured on the make command
line. It is not very handy because a user has to remove the option
everytime it is using another command other than compiling the
hypervisor.

Furthermore, early printk is one of the few odds one that are not
using Kconfig.

So this is about time to move it to Kconfig.

The new kconfigs options allow a user to eather select a UART driver
to use at boot time, and set the parameters, or it is still possible
to select a platform which will set the parameters.

If CONFIG_EARLY_PRINTK is present in the environment or on the make
command line, make will return an error.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agoxen/arm: Rename all early printk macro
Anthony PERARD [Tue, 31 Mar 2020 10:30:45 +0000 (11:30 +0100)]
xen/arm: Rename all early printk macro

We are going to move the generation of the early printk macro into
Kconfig. This means all macro will be prefix with CONFIG_. We do that
ahead of the change.

We also take the opportunity to better name some variables, which are
used by only one driver and wouldn't make sens for other UART driver.
Thus,
    - EARLY_UART_REG_SHIFT became CONFIG_EARLY_UART_8250_REG_SHIFT
    - EARLY_PRINTK_VERSION_* became CONFIG_EARLY_UART_SCIF_VERSION_*

The other variables are change to have the prefix CONFIG_EARLY_UART_
when they change a parameter of the driver. So we have now:
    - CONFIG_EARLY_UART_BAUD_RATE
    - CONFIG_EARLY_UART_BASE_ADDRESS
    - CONFIG_EARLY_UART_INIT

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86: compress lines for immediate return
Simran Singhal [Tue, 31 Mar 2020 06:51:21 +0000 (08:51 +0200)]
x86: compress lines for immediate return

Compress two lines into a single line if immediate return statement is found.
It also remove variables retval, freq, effective, vector, ovf and now
as they are no longer needed.

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: remove unnecessary cast on void pointer
Simran Singhal [Tue, 31 Mar 2020 06:50:25 +0000 (08:50 +0200)]
x86: remove unnecessary cast on void pointer

Assignment to a typed pointer is sufficient in C.
No cast is needed.

Also, changed some u64/u32 to uint64_t/uint32_t.

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoSVM: split _np_enable VMCB field
Jan Beulich [Tue, 31 Mar 2020 06:46:44 +0000 (08:46 +0200)]
SVM: split _np_enable VMCB field

The nest paging enable is actually just a single bit within the 64-bit
VMCB field, which is particularly relevant for uses like the one in
nsvm_vcpu_vmentry(). Split the field, adding definitions for a few other
bits at the same time. To be able to generate accessors for bitfields,
VMCB_ACCESSORS() needs the type part broken out, as typeof() can't be
applied to bitfields. Unfortunately this means specification of the same
type in two distinct places.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agodocs/README: Fix a broken url
Ian Jackson [Mon, 30 Mar 2020 13:52:12 +0000 (14:52 +0100)]
docs/README: Fix a broken url

There was a / missing here.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agodocs etc.: https: Fix references to other Xen pages
Ian Jackson [Mon, 30 Mar 2020 13:51:51 +0000 (14:51 +0100)]
docs etc.: https: Fix references to other Xen pages

Change the url scheme to https.  This is all in-tree references to
xenbits and the main website except for those in Config.mk.

We leave Config.mk alone for now because those urls are used by CI
systems and we need to check that nothing breaks when we change the
download method.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agodocs etc.: https: Fix references to wiki.xen[project].org
Ian Jackson [Mon, 30 Mar 2020 13:43:06 +0000 (14:43 +0100)]
docs etc.: https: Fix references to wiki.xen[project].org

Change the url scheme to https.  This is all in-tree references to the
Xen wiki.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
5 years agoscripts: Use stat to check lock claim
Jason Andryuk [Thu, 12 Mar 2020 14:54:17 +0000 (10:54 -0400)]
scripts: Use stat to check lock claim

Replace the perl locking check with stat(1).  Stat is able to fstat
stdin (file descriptor 0) when passed '-' as an argument.  This is now
used to check $_lockfd.  stat(1) support for '-' was introduced to
coreutils in 2009.

After A releases its lock, script B will return from flock and execute
stat.  Since the lockfile has been removed by A, stat prints an error to
stderr and exits non-zero.  Redirect stderr to /dev/null to avoid
filling /var/log/xen/xen-hotplug.log with "No such file or directory"
messages.

Placing the stat call inside the "if" condition ensures we only check
the stat output when the command completed successfully.

This change removes the only runtime dependency of the xen toolstack on
perl.

Suggested-by: Ian Jackson <ian.jackson@citrix.com>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/x86: Remove parentheses from return arguments
Simran Singhal [Sun, 29 Mar 2020 06:37:47 +0000 (12:07 +0530)]
xen/x86: Remove parentheses from return arguments

This patch remove unnecessary parentheses from return arguments.

Signed-off-by: Simran Singhal <singhalsimran0@gmail.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/python: mismatch between pyxc_methods flags and PyObject definitions
YOUNG, MICHAEL A [Tue, 17 Mar 2020 23:01:43 +0000 (23:01 +0000)]
tools/python: mismatch between pyxc_methods flags and PyObject definitions

pygrub in xen-4.13.0 with python 3.8.2 fails with the error

Traceback (most recent call last):
  File "/usr/libexec/xen/bin/pygrub", line 21, in <module>
    import xen.lowlevel.xc
SystemError: bad call flags

This patch fixes mismatches in tools/python/xen/lowlevel/xc/xc.c
between the flag bits defined in pyxc_methods and the parameters passed
to the corresponding PyObject definitions.

With this patch applied pygrub works as expected.

Signed-off-by: Michael Young <m.a.young@durham.ac.uk>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
5 years agodocs/designs: Add a design document for migration of xenstore data
Paul Durrant [Fri, 27 Mar 2020 13:46:10 +0000 (13:46 +0000)]
docs/designs: Add a design document for migration of xenstore data

This patch details proposes extra migration data and xenstore protocol
extensions to support non-cooperative live migration of guests.

NOTE: doc/misc/xenstore.txt is also amended to replace the <mfn> term
      for the INTRODUCE operation with the <gfn>, since this is what
      it actually is.

Signed-off-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agodocs/designs: Add a design document for non-cooperative live migration
Paul Durrant [Fri, 27 Mar 2020 13:46:09 +0000 (13:46 +0000)]
docs/designs: Add a design document for non-cooperative live migration

It has become apparent to some large cloud providers that the current
model of cooperative migration of guests under Xen is not usable as it
relies on software running inside the guest, which is likely beyond the
provider's control.
This patch introduces a proposal for non-cooperative live migration,
designed not to rely on any guest-side software.

Signed-off-by: Paul Durrant <paul@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agoautomation/gitlab: add https transport support to Debian images
Roger Pau Monne [Fri, 27 Mar 2020 11:49:47 +0000 (12:49 +0100)]
automation/gitlab: add https transport support to Debian images

The LLVM repos have switched from http to https, and trying to access
using http will get redirected to https. Add the apt-transport-https
package to the x86 Debian containers that use the LLVM repos, in order
to support the https transport method.

Note that on Arm we only test with gcc, so don't add the package for
the Debian Arm container.

This fixes the following error seen on the QEMU smoke tests:

E: The method driver /usr/lib/apt/methods/https could not be found.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/nvmx: update exit bitmap when using virtual interrupt delivery
Roger Pau Monne [Fri, 27 Mar 2020 12:45:59 +0000 (13:45 +0100)]
x86/nvmx: update exit bitmap when using virtual interrupt delivery

Force an update of the EOI exit bitmap in nvmx_update_apicv, because
the one performed in vmx_intr_assist might not be reached if the
interrupt is intercepted by nvmx_intr_intercept returning true.

Extract the code to update the exit bitmap from vmx_intr_assist into a
helper and use it in nvmx_update_apicv.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/nvmx: split updating RVI from SVI in nvmx_update_apicv
Roger Pau Monne [Fri, 27 Mar 2020 12:45:58 +0000 (13:45 +0100)]
x86/nvmx: split updating RVI from SVI in nvmx_update_apicv

Updating SVI is required when an interrupt has been injected using the
Ack on exit VMEXIT feature, so that the in service interrupt in the
GUEST_INTR_STATUS matches the vector that is signaled in
VM_EXIT_INTR_INFO.

Updating RVI however is not tied to the Ack on exit feature, as it
signals the next vector to be injected, and hence should always be
updated to the next pending vector, regardless of whether Ack on exit
is enabled.

When not using the Ack on exit feature preserve the previous vector in
SVI, so that it's not lost when RVI is updated to contain the pending
vector to inject.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/ucode: Drop the sanity check for interrupts being disabled
Andrew Cooper [Fri, 27 Mar 2020 12:02:09 +0000 (12:02 +0000)]
x86/ucode: Drop the sanity check for interrupts being disabled

Of the substantial number of things which can go wrong during microcode load,
this is not one.  Loading occurs entirely within the boundary of a single
WRMSR instruction.  Its certainly not a BUG()-worthy condition.

Xen has legitimate reasons to not want interrupts enabled at this point, but
that is to do with organising the system rendezvous.  As these are private low
level helpers invoked only from the microcode core logic, forgo the check
entirely.

While dropping system.h, clean up the processor.h include which was an
oversight in the previous header cleanup.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ucode/amd: Fix potential buffer overrun with equiv table handling
Andrew Cooper [Fri, 27 Mar 2020 11:59:02 +0000 (11:59 +0000)]
x86/ucode/amd: Fix potential buffer overrun with equiv table handling

find_equiv_cpu_id() loops until it finds a 0 installed_cpu entry.  Well formed
AMD microcode containers have this property.

Extend the checking in install_equiv_cpu_table() to reject tables which don't
have a sentinal at the end.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: Introduce a xmemdup_bytes() helper
Andrew Cooper [Fri, 20 Mar 2020 20:53:58 +0000 (20:53 +0000)]
xen: Introduce a xmemdup_bytes() helper

Use it to simplify the x86 microcode logic, taking the opportunity to drop the
-ENOMEM printks.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agosoftirq: adjust comment placement
Juergen Gross [Fri, 27 Mar 2020 10:44:09 +0000 (11:44 +0100)]
softirq: adjust comment placement

With commit cef21210fb133 ("rcu: don't process callbacks when holding
a rcu_read_lock()") the comment in process_pending_softirqs() about
not entering the scheduler should have been moved.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agolibx86/CPUID: fix (not just) leaf 7 processing
Jan Beulich [Fri, 27 Mar 2020 10:40:59 +0000 (11:40 +0100)]
libx86/CPUID: fix (not just) leaf 7 processing

For one, subleaves within the respective union shouldn't live in
separate sub-structures. And then x86_cpuid_policy_fill_native() should,
as it did originally, iterate over all subleaves here as well as over
all main leaves. Switch to using a "<= MIN()"-based approach similar to
that used in x86_cpuid_copy_to_buffer(). Also follow this for the
extended main leaves then.

Fixes: 1bd2b750537b ("libx86: Fix 32bit stubdom build of x86_cpuid_policy_fill_native()")
Fixes: 97e4ebdcd765 ("x86/CPUID: support leaf 7 subleaf 1 / AVX512_BF16")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: x86: make init_intel_cacheinfo() void
Dario Faggioli [Thu, 26 Mar 2020 17:17:32 +0000 (18:17 +0100)]
xen: x86: make init_intel_cacheinfo() void

It seems that we took this code from Linux, back when the function was
'unsigned int' and the return value was used.

But we are currently not doing anything with such value, so let's get
rid of it and make the function void. As an anecdote, that's pretty much
the same that happened in Linux as, since commit 807e9bc8e2fe6 ("x86/CPU:
Move cpu_detect_cache_sizes() into init_intel_cacheinfo()") the function
is void there too.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoSVM: Add union intstat_t for offset 68h in vmcb struct
Pu Wen [Thu, 26 Mar 2020 13:44:30 +0000 (21:44 +0800)]
SVM: Add union intstat_t for offset 68h in vmcb struct

According to chapter "Appendix B Layout of VMCB" in the new version
(v3.32) AMD64 APM[1], bit 1 of the VMCB offset 68h is defined as
GUEST_INTERRUPT_MASK.

In current xen codes, it use whole u64 interrupt_shadow to setup
interrupt shadow, which will misuse other bit in VMCB offset 68h
as part of interrupt_shadow, causing svm_get_interrupt_shadow() to
mistake the guest having interrupts enabled as being in an interrupt
shadow.  This has been observed to cause SeaBIOS to hang on boot.

Add union intstat_t for VMCB offset 68h and fix codes to only use
bit 0 as intr_shadow according to the new APM description.

Reference:
[1] https://www.amd.com/system/files/TechDocs/24593.pdf

Signed-off-by: Pu Wen <puwen@hygon.cn>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/ucode: Document the behaviour of the microcode_ops hooks
Andrew Cooper [Fri, 20 Mar 2020 15:37:28 +0000 (15:37 +0000)]
x86/ucode: Document the behaviour of the microcode_ops hooks

... and struct cpu_signature for good measure.

No comment is passed on the suitability of the behaviour...

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: Drop raw_smp_processor_id()
Andrew Cooper [Thu, 19 Mar 2020 18:29:06 +0000 (18:29 +0000)]
xen: Drop raw_smp_processor_id()

There is only a single user of raw_smp_processor_id() left in the tree (and it
is unconditionally compiled out).  Drop the alias from all architectures.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/ucode: Fix error paths in apply_microcode()
Andrew Cooper [Fri, 20 Mar 2020 20:03:32 +0000 (20:03 +0000)]
x86/ucode: Fix error paths in apply_microcode()

In the unlikley case that patch application completes, but the resutling
revision isn't expected, sig->rev doesn't get updated to match reality.

It will get adjusted the next time collect_cpu_info() gets called, but in the
meantime Xen might operate on a stale value.  Nothing good will come of this.

Rewrite the logic to always update the stashed revision, before worrying about
whether the attempt was a success or failure.

Take the opportunity to make the printk() messages as consistent as possible.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/ucode/amd: Fix assertion in compare_patch()
Andrew Cooper [Thu, 19 Mar 2020 15:55:26 +0000 (15:55 +0000)]
x86/ucode/amd: Fix assertion in compare_patch()

This is clearly a typo.

Fixes: 9da23943ccd "microcode: introduce a global cache of ucode patch"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agocpu: sync any remaining RCU callbacks before CPU up/down
Igor Druzhinin [Thu, 26 Mar 2020 11:49:42 +0000 (12:49 +0100)]
cpu: sync any remaining RCU callbacks before CPU up/down

During CPU down operation RCU callbacks are scheduled to finish
off some actions later as soon as CPU is fully dead (the same applies
to CPU up operation in case error path is taken). If in the same grace
period another CPU up operation is performed on the same CPU, RCU callback
will be called later on a CPU in a potentially wrong (already up again
instead of still being down) state leading to eventual state inconsistency
and/or crash.

In order to avoid it - flush RCU callbacks explicitly before starting the
next CPU up/down operation.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorcu: add assertions to debug build
Juergen Gross [Thu, 26 Mar 2020 11:46:48 +0000 (12:46 +0100)]
rcu: add assertions to debug build

Xen's RCU implementation relies on no softirq handling taking place
while being in a RCU critical section. Add ASSERT()s in debug builds
in order to catch any violations.

For that purpose modify rcu_read_[un]lock() to use a dedicated percpu
counter additional to preempt_[en|dis]able() as this enables to test
that condition in __do_softirq() (ASSERT_NOT_IN_ATOMIC() is not
usable there due to __cpu_up() calling process_pending_softirqs()
while holding the cpu hotplug lock).

While at it switch the rcu_read_[un]lock() implementation to static
inline functions instead of macros.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorcu: don't process callbacks when holding a rcu_read_lock()
Juergen Gross [Thu, 26 Mar 2020 11:46:11 +0000 (12:46 +0100)]
rcu: don't process callbacks when holding a rcu_read_lock()

Some keyhandlers are calling process_pending_softirqs() while holding
a rcu_read_lock(). This is wrong, as process_pending_softirqs() might
activate rcu calls which should not happen inside a rcu_read_lock().

For that purpose modify process_pending_softirqs() to not allow rcu
callback processing when a rcu_read_lock() is being held.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorcu: don't use stop_machine_run() for rcu_barrier()
Juergen Gross [Thu, 26 Mar 2020 11:43:23 +0000 (12:43 +0100)]
rcu: don't use stop_machine_run() for rcu_barrier()

Today rcu_barrier() is calling stop_machine_run() to synchronize all
physical cpus in order to ensure all pending rcu calls have finished
when returning.

As stop_machine_run() is using tasklets this requires scheduling of
idle vcpus on all cpus imposing the need to call rcu_barrier() on idle
cpus only in case of core scheduling being active, as otherwise a
scheduling deadlock would occur.

There is no need at all to do the syncing of the cpus in tasklets, as
rcu activity is started in __do_softirq() called whenever softirq
activity is allowed. So rcu_barrier() can easily be modified to use
softirq for synchronization of the cpus no longer requiring any
scheduling activity.

As there already is a rcu softirq reuse that for the synchronization.

Remove the barrier element from struct rcu_data as it isn't used.

Finally switch rcu_barrier() to return void as it now can never fail.

Partially-based-on-patch-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoatomics: introduce smp_mb__[after|before]_atomic() barriers
Juergen Gross [Thu, 26 Mar 2020 11:42:19 +0000 (12:42 +0100)]
atomics: introduce smp_mb__[after|before]_atomic() barriers

When using atomic variables for synchronization barriers are needed
to ensure proper data serialization. Introduce smp_mb__before_atomic()
and smp_mb__after_atomic() as in the Linux kernel for that purpose.

Use the same definitions as in the Linux kernel.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agox86emul: support AVX512_BF16 insns
Jan Beulich [Thu, 26 Mar 2020 11:39:08 +0000 (12:39 +0100)]
x86emul: support AVX512_BF16 insns

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>