]> xenbits.xensource.com Git - people/sstabellini/xen-unstable.git/.git/log
people/sstabellini/xen-unstable.git/.git
5 years agoxen/arm: add reserved-memory regions to the dom0 memory node reserved-mem-7
Stefano Stabellini [Mon, 19 Aug 2019 17:02:59 +0000 (10:02 -0700)]
xen/arm: add reserved-memory regions to the dom0 memory node

Reserved memory regions are automatically remapped to dom0. Their device
tree nodes are also added to dom0 device tree. However, the dom0 memory
node is not currently extended to cover the reserved memory regions
ranges as required by the spec.  This commit fixes it.

Change make_memory_node to take a  struct meminfo * instead of a
kernel_info. Call it twice for dom0, once to create the first regular
memory node, and the second time to create a second memory node with the
ranges covering reserved-memory regions.

Also, make a small code style fix in make_memory_node.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v5:
- add acked-by

Changes in v4:
- pass struct meminfo * to make_memory_node
- call make_memory_node twice for dom0, once for normal memory, once for
  reserved-memory regions

5 years agoxen/arm: don't iomem_permit_access for reserved-memory regions
Stefano Stabellini [Mon, 19 Aug 2019 17:02:59 +0000 (10:02 -0700)]
xen/arm: don't iomem_permit_access for reserved-memory regions

Don't allow reserved-memory regions to be remapped into any unprivileged
guests, until reserved-memory regions are properly supported in Xen. For
now, do not call iomem_permit_access on them, because giving
iomem_permit_access to dom0 means that the toolstack will be able to
assign the region to a domU.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---

Changes in v7:
- update in-code comment

Changes in v6:
- compare against "/reserved-memory/"

Changes in v5:
- fix check condition
- use strnicmp
- return error
- improve commit message

Changes in v4:
- compare the parent name with reserved-memory
- use dt_node_cmp

Changes in v3:
- new patch

5 years agoxen/arm: handle reserved-memory in consider_modules and dt_unreserved_regions
Stefano Stabellini [Mon, 19 Aug 2019 17:02:59 +0000 (10:02 -0700)]
xen/arm: handle reserved-memory in consider_modules and dt_unreserved_regions

reserved-memory regions overlap with memory nodes. The overlapping
memory is reserved-memory and should be handled accordingly:
consider_modules and dt_unreserved_regions should skip these regions the
same way they are already skipping mem-reserve regions.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
---

Changes in v4:
- code style
- add acked-by

Changes in v3:
- coding style
- in-code comments

Changes in v2:
- fix commit message: full overlap
- remove check_reserved_memory
- extend consider_modules and dt_unreserved_regions

5 years agoxen/arm: early_print_info print reserved_mem
Stefano Stabellini [Mon, 19 Aug 2019 17:02:59 +0000 (10:02 -0700)]
xen/arm: early_print_info print reserved_mem

Improve early_print_info to also print the banks saved in
bootinfo.reserved_mem. Print them right after RESVD, increasing the same
index.

Since we are at it, also switch the existing RESVD print to use unsigned
int.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v6:
- add acked-by and reviewed-by
- fix indentation

Changes in v5:
- switch to unsigned

Changes in v4:
- new patch

5 years agoxen/arm: fix indentation in early_print_info
Stefano Stabellini [Mon, 19 Aug 2019 17:02:59 +0000 (10:02 -0700)]
xen/arm: fix indentation in early_print_info

No functional changes.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: keep track of reserved-memory regions
Stefano Stabellini [Mon, 19 Aug 2019 17:02:59 +0000 (10:02 -0700)]
xen/arm: keep track of reserved-memory regions

As we parse the device tree in Xen, keep track of the reserved-memory
regions as they need special treatment (follow-up patches will make use
of the stored information.)

Reuse process_memory_node to add reserved-memory regions to the
bootinfo.reserved_mem array.

Refuse to continue once we reach the max number of reserved memory
regions to avoid accidentally mapping any portions of them into a VM.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
---
Changes in v6:
- use dt_node_cmp
- add acked-by

Changes in v5:
- remove unneeded cast
- remove unneeded strlen check
- don't pass address_cells, size_cells, depth to device_tree_for_each_node

Changes in v4:
- depth + 1 in process_reserved_memory_node
- pass address_cells and size_cells to device_tree_for_each_node
- pass struct meminfo * instead of a boolean to process_memory_node
- improve in-code comment
- use a separate process_reserved_memory_node (separate from
  process_memory_node) function wrapper to have different error handling

Changes in v3:
- match only /reserved-memory
- put the warning back in place for reg not present on a normal memory
  region
- refuse to continue once we reach the max number of reserved memory
  regions

Changes in v2:
- call process_memory_node from process_reserved_memory_node to avoid
  duplication

5 years agoxen/arm: make process_memory_node a device_tree_node_func
Stefano Stabellini [Mon, 19 Aug 2019 17:02:59 +0000 (10:02 -0700)]
xen/arm: make process_memory_node a device_tree_node_func

Change the signature of process_memory_node to match
device_tree_node_func. Thanks to this change, the next patch will be
able to use device_tree_for_each_node to call process_memory_node on all
the children of a provided node.

Return error if there is no reg property or if nr_banks is reached. Let
the caller deal with the error.

Add a printk when device tree parsing fails.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---
Changes in v7:
- use -EINVAL as return in case size is 0

Changes in v6:
- fix out of space check
- bring back printk when address_cells or size_cells are not properly set
- return -EINVAL in that case (different from reg missing)
- add printk when parsing fails
- return -ENOENT when memory size is 0

Changes in v5:
- return -ENOENT if address_cells or size_cells are not properly set

Changes in v4:
- return error if there is no reg propery, remove printk
- return error if nr_banks is reached

Changes in v3:
- improve commit message
- check return value of process_memory_node

Changes in v2:
- new

5 years agoxen/arm: pass node to device_tree_for_each_node
Stefano Stabellini [Mon, 19 Aug 2019 17:02:28 +0000 (10:02 -0700)]
xen/arm: pass node to device_tree_for_each_node

Add a new parameter to device_tree_for_each_node: node, the node to
start the search from.

To avoid scanning device tree, and given that we only care about
relative increments of depth compared to the depth of the initial node,
we set the initial depth to 0. Then, we call func() for every node with
depth > 0.

Don't call func() on the parent node passed as an argument. Clarify the
change in the comment on top of the function. The current callers pass
the root node as argument: it is OK to skip the root node because no
relevant properties are in it, only subnodes.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
---
Changes in v7:
- fix commit message
- use const
- init depth and min_depth to 0

Changes in v6:
- fix code style
- don't call func() on the first node

Changes in v5:
- go back to v3
- code style improvement in acpi/boot.c
- improve comments and commit message
- increase min_depth to avoid parsing siblings
- replace for with do/while loop and increase min_depth to avoid
  scanning siblings of the initial node
- pass only node, calculate depth

Changes in v3:
- improve commit message
- improve in-code comments
- improve code style

Changes in v2:
- new

5 years agoxen/page_alloc: Keep away MFN 0 from the buddy allocator
Julien Grall [Fri, 9 Aug 2019 12:14:40 +0000 (13:14 +0100)]
xen/page_alloc: Keep away MFN 0 from the buddy allocator

Combining of buddies happens only such that the resulting larger buddy
is still order-aligned. To cross a zone boundary while merging, the
implication is that both the buddy [0, 2^n-1] and the buddy
[2^n, 2^(n+1)-1] are free.

Ideally we want to fix the allocator, but for now we can just prevent
adding the MFN 0 in the allocator to avoid merging across zone
boundaries.

On x86, the MFN 0 is already kept away from the buddy allocator. So the
bug can only happen on Arm platform where the first memory bank is
starting at 0.

As this is a specific to the allocator, the MFN 0 is removed in the common code
to cater all the architectures (current and future).

[Stefano: improve commit message]

Reported-by: Jeff Kubascik <jeff.kubascik@dornerworks.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
5 years agoxen/link: Introduce .bss.percpu.page_aligned
Andrew Cooper [Fri, 9 Aug 2019 14:36:58 +0000 (16:36 +0200)]
xen/link: Introduce .bss.percpu.page_aligned

Future changes are going to need to page align some percpu data.

Shuffle the exact link order of items within the BSS to give
.bss.percpu.page_aligned appropriate alignment, even on CPU0, which uses
.bss.percpu itself.

Insert explicit alignment such that there won't be a gap between
__per_cpu_start and the first actual per-CPU object.  The POINTER_ALIGN
for __bss_end is to cover the lack of SMP_CACHE_BYTES alignment, as the
loops which zero the BSS use pointer-sized stores on all architectures.

Rework __DEFINE_PER_CPU() so the caller passes in all attributes, and
adjust DEFINE_PER_CPU{,_READ_MOSTLY}() to match.  This has the added bonus
that it is now possible to grep for .bss.percpu and find all the users.

Finally, introduce DEFINE_PER_CPU_PAGE_ALIGNED() which specifies the
section attribute and verifies the type's alignment.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Make DEFINE_PER_CPU_PAGE_ALIGNED() verify the alignment rather than
specifying it. It is the underlying type which should be suitably aligned.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: define a few selector values
Jan Beulich [Fri, 9 Aug 2019 14:35:42 +0000 (16:35 +0200)]
x86: define a few selector values

TSS, LDT, and per-CPU entries all can benefit a little from also having
their selector values defined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agopython: fix -Wsign-compare warnings
Marek Marczykowski-Górecki [Fri, 9 Aug 2019 02:01:36 +0000 (03:01 +0100)]
python: fix -Wsign-compare warnings

Specifically:
xen/lowlevel/xc/xc.c: In function ‘pyxc_domain_create’:
xen/lowlevel/xc/xc.c:147:24: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
  147 |         for ( i = 0; i < sizeof(xen_domain_handle_t); i++ )
      |                        ^
xen/lowlevel/xc/xc.c: In function ‘pyxc_domain_sethandle’:
xen/lowlevel/xc/xc.c:312:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
  312 |     for ( i = 0; i < sizeof(xen_domain_handle_t); i++ )
      |                    ^
xen/lowlevel/xc/xc.c: In function ‘pyxc_domain_getinfo’:
xen/lowlevel/xc/xc.c:391:24: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
  391 |         for ( j = 0; j < sizeof(xen_domain_handle_t); j++ )
      |                        ^
xen/lowlevel/xc/xc.c: In function ‘pyxc_get_device_group’:
xen/lowlevel/xc/xc.c:677:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘uint32_t’ {aka ‘unsigned int’} [-Werror=sign-compare]
  677 |     for ( i = 0; i < num_sdevs; i++ )
      |                    ^
xen/lowlevel/xc/xc.c: In function ‘pyxc_physinfo’:
xen/lowlevel/xc/xc.c:988:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
  988 |     for ( i = 0; i < sizeof(pinfo.hw_cap)/4; i++ )
      |                    ^
xen/lowlevel/xc/xc.c:994:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
  994 |     for ( i = 0; i < ARRAY_SIZE(virtcaps_bits); i++ )
      |                    ^
xen/lowlevel/xc/xc.c:998:24: error: comparison of integer expressions of different signedness: ‘int’ and ‘long unsigned int’ [-Werror=sign-compare]
  998 |         for ( i = 0; i < ARRAY_SIZE(virtcaps_bits); i++ )
      |                        ^
xen/lowlevel/xs/xs.c: In function ‘xspy_ls’:
xen/lowlevel/xs/xs.c:191:23: error: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Werror=sign-compare]
  191 |         for (i = 0; i < xsval_n; i++)
      |                       ^
xen/lowlevel/xs/xs.c: In function ‘xspy_get_permissions’:
xen/lowlevel/xs/xs.c:297:23: error: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Werror=sign-compare]
  297 |         for (i = 0; i < perms_n; i++) {
      |                       ^
cc1: all warnings being treated as errors

Use size_t for loop iterators where it's compared with sizeof() or
similar construct.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/arm: unbreak arm64 build for older toolchains
Stefano Stabellini [Wed, 7 Aug 2019 16:49:15 +0000 (09:49 -0700)]
xen/arm: unbreak arm64 build for older toolchains

Commit 4941bfb "xen/arm64: macros: Introduce an assembly macro to alias
x30" moved

  lr      .req    x30

to macros.h. A later patch (1396dab "xen/arm64: head: Don't clobber
x30/lr in the macro PRINT") started to use "lr" in head.S, however, it
didn't add an #include macros.h to head.S. This commit fixes it.

The lack of alias breaks the build with
gcc-linaro-5.3.1-2016.05-x86_64_aarch64-linux-gnu. The alias was added
later to binutils 2.29 in 2017.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/sched: fix memory leak in credit2
Juergen Gross [Wed, 7 Aug 2019 11:04:49 +0000 (13:04 +0200)]
xen/sched: fix memory leak in credit2

csched2_deinit() is leaking the run-queue memory.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/percpu: Drop unused asm/percpu.h includes
Andrew Cooper [Fri, 26 Jul 2019 18:48:48 +0000 (19:48 +0100)]
xen/percpu: Drop unused asm/percpu.h includes

These files either don't use any PER_CPU() infrastructure at all, or use
DEFINE_PER_CPU_*().  This is declared in xen/percpu.h, not asm/percpu.h, which
means that xen/percpu.h is included via a different path.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/percpu: Drop unused xen/percpu.h includes
Andrew Cooper [Fri, 26 Jul 2019 19:26:24 +0000 (20:26 +0100)]
xen/percpu: Drop unused xen/percpu.h includes

None of these headers use any PER_CPU() infrastructure.

xen/rwlock.h however does, and picked it up transitively via xen/spinlock.h,
so include it properly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoarm/percpu: Move {get,set}_processor_id() into current.h
Andrew Cooper [Fri, 26 Jul 2019 19:41:03 +0000 (20:41 +0100)]
arm/percpu: Move {get,set}_processor_id() into current.h

For cleanup purposes, it is necessary for asm/percpu.h to not use
DECLARE_PER_CPU() itself.  asm/current.h is arguably a better place for this
functionality to live anyway.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agox86/desc: Shorten boot_{,compat_}gdt[] variable names
Andrew Cooper [Mon, 5 Aug 2019 10:17:46 +0000 (11:17 +0100)]
x86/desc: Shorten boot_{,compat_}gdt[] variable names

The current names, boot_cpu_{,compat_}gdt_table, have a table suffix which is
redundant with the T of GDT, and the cpu infix doesn't provide any meaningful
context.  Drop them both.

Likewise, shorten the {,compat_}gdt{,_l1e} variables.

Finally, rename gdt_descr to boot_gdtr to more clearly identify its purpose.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Set Accessed bits in boot_cpu_{,compat_}gdt_table[]
Andrew Cooper [Wed, 7 Aug 2019 11:29:01 +0000 (12:29 +0100)]
x86/boot: Set Accessed bits in boot_cpu_{,compat_}gdt_table[]

There is no point causing the CPU to performed a locked update of the
descriptors on first use.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/asm: Include msr-index.h rather than msr.h
Andrew Cooper [Fri, 2 Aug 2019 12:35:14 +0000 (13:35 +0100)]
x86/asm: Include msr-index.h rather than msr.h

There is nothing interesting for assembly code in msr.h.  Include msr-index.h
instead, and drop the __ASSEMBLY__ guards in msr.h.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agolibxl: 9pfs has a QEMU backend
Stefano Stabellini [Tue, 6 Aug 2019 17:25:00 +0000 (18:25 +0100)]
libxl: 9pfs has a QEMU backend

Add 9pfs to the kind of PV drivers that has a QEMU backend, specifically
to the macro QEMU_BACKEND.

This is needed otherwise upon domain destroy we get a timeout error:

libxl: error: libxl_device.c:1132:device_backend_callback: Domain 1:unable to remove device with path /local/domain/0/backend/9pfs/1/0
libxl: error: libxl_domain.c:1129:devices_destroy_cb: Domain 1:libxl__devices_destroy failed

This change should have been part of b53b4037cef6 "libxl/xl: add support
for Xen 9pfs".

Also add a comment in libxl_types_internal.idl to help remember changing
QEMU_BACKEND going forward.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
5 years agoAMD/IOMMU: drop stray "else"
Jan Beulich [Wed, 7 Aug 2019 10:12:00 +0000 (12:12 +0200)]
AMD/IOMMU: drop stray "else"

The blank line between it and the prior if() clearly indicates that this
was meant to be a standalone if().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoAMD/IOMMU: miscellaneous DTE handling adjustments
Jan Beulich [Wed, 7 Aug 2019 10:11:22 +0000 (12:11 +0200)]
AMD/IOMMU: miscellaneous DTE handling adjustments

First and foremost switch boolean fields to bool. Adjust a few related
function parameters as well. Then
- in amd_iommu_set_intremap_table() don't use literal numbers,
- in iommu_dte_add_device_entry() use a compound literal instead of many
  assignments,
- in amd_iommu_setup_domain_device()
  - eliminate a pointless local variable,
  - use || instead of && when deciding whether to clear an entry,
  - clear the I field without any checking of ATS / IOTLB state,
- leave reserved fields unnamed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agox86/apic: enable x2APIC mode before doing any setup
Roger Pau Monné [Wed, 7 Aug 2019 10:09:51 +0000 (12:09 +0200)]
x86/apic: enable x2APIC mode before doing any setup

Current code calls apic_x2apic_probe which does some initialization
and setup before having enabled x2APIC mode (if it's not already
enabled by the firmware).

This can lead to issues if the APIC ID doesn't match the x2APIC ID, as
apic_x2apic_probe calls init_apic_ldr_x2apic_cluster which depending
on the APIC mode might set cpu_2_logical_apicid using the APIC ID
instead of the x2APIC ID (because x2APIC might not be enabled yet).

Fix this by enabling x2APIC before calling apic_x2apic_probe.

As a remark, this was discovered while I was trying to figure out why
one of my test boxes didn't report any iommu faults. The root cause
was that the iommu MSI address field was set using the stale value in
cpu_2_logical_apicid, and thus the iommu fault interrupt would get
lost. Even if the MSI address field gets sets to a correct value
afterwards as soon as a single iommu fault is pending no further
interrupts would get injected, so losing a single iommu fault
interrupt is fatal.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoIntel TXT: add reviewer, move to Odd Fixes state
Lukasz Hawrylko [Wed, 7 Aug 2019 10:09:31 +0000 (12:09 +0200)]
Intel TXT: add reviewer, move to Odd Fixes state

Support for Intel TXT has orphaned status right now because
no active maintainter is listed. Adding myself as reviewer
and moving it to Odd Fixes state.

Signed-off-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoCODING_STYLE: document intended usage of types
Jan Beulich [Wed, 7 Aug 2019 10:08:38 +0000 (12:08 +0200)]
CODING_STYLE: document intended usage of types

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agopassthrough/amd: Drop "IOMMU not found" message
Andrew Cooper [Mon, 5 Aug 2019 16:40:36 +0000 (17:40 +0100)]
passthrough/amd: Drop "IOMMU not found" message

Since c/s 9fa94e10585 "x86/ACPI: also parse AMD IOMMU tables early", this
function is unconditionally called in all cases where a DMAR ACPI table
doesn't exist.

As a consequnce, "AMD-Vi: IOMMU not found!" is printed in all cases where an
IOMMU isn't present, even on non-AMD systems.  Drop the message - it isn't
terribly interesting anyway, and is now misleading is a number of common
cases.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agomm: Safe to clear PGC_allocated on xenheap pages without an extra reference
George Dunlap [Tue, 6 Aug 2019 11:19:55 +0000 (12:19 +0100)]
mm: Safe to clear PGC_allocated on xenheap pages without an extra reference

Commits ec83f825627 "mm.h: add helper function to test-and-clear
_PGC_allocated" (and subsequent fix-up 44a887d021d "mm.h: fix BUG_ON()
condition in put_page_alloc_ref()") introduced a BUG_ON() to detect
unsafe behavior of callers.

Unfortunately this condition still turns out to be too strict.
xenheap pages are somewhat "magic": calling free_domheap_pages() on
them will not cause free_heap_pages() to be called: whichever part of
Xen allocated them specially must call free_xenheap_pages()
specifically.  (They'll also be handled appropriately at domain
destruction time.)

Only crash Xen when put_page_alloc_ref() finds only a single refcount
if the page is not a xenheap page.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotests/x86emul: Annotate test blobs as executable code
Andrew Cooper [Fri, 24 May 2019 15:14:53 +0000 (16:14 +0100)]
tests/x86emul: Annotate test blobs as executable code

This causes objdump to disassemble them, rather than rendering them as
straight hex data.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/shim: Fix parallel build following c/s 32b1d62887d0
Andrew Cooper [Mon, 5 Aug 2019 13:48:21 +0000 (14:48 +0100)]
x86/shim: Fix parallel build following c/s 32b1d62887d0

Unfortunately, a parallel build from clean can fail in the following manner:

  xen.git$ make -j4 -C tools/firmware/xen-dir/
  make: Entering directory '/local/xen.git/tools/firmware/xen-dir'
  mkdir -p xen-root
  make: *** No rule to make target 'xen-root/xen/arch/x86/configs/pvshim_defconfig', needed by 'xen-root/xen/.config'.  Stop.
  make: *** Waiting for unfinished jobs....

The rule for pvshim_defconfig needs to depend on the linkfarm, rather than
$(D)/xen/.config specifically.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/shim: Refresh pvshim_defconfig
Andrew Cooper [Fri, 26 Jul 2019 09:54:41 +0000 (10:54 +0100)]
x86/shim: Refresh pvshim_defconfig

* Add a dependency so the shim gets rebuilt when pvshim_defconfig changes.
* Default to the NULL scheduler now that it works with vcpu online/offline.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: sched: refactor the ASSERTs around vcpu_deassing()
Dario Faggioli [Mon, 5 Aug 2019 10:50:57 +0000 (11:50 +0100)]
xen: sched: refactor the ASSERTs around vcpu_deassing()

It is all the time that we call vcpu_deassing() that the vcpu _must_ be
assigned to a pCPU, and hence that such pCPU can't be free.

Therefore, move the ASSERT-s which check for these properties in that
function, where they belong better.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citix.com>
Message-Id: <156412236781.2385.9110155201477198899.stgit@Palanthas>

5 years agoxen: sched: reassign vCPUs to pCPUs, when they come back online
Dario Faggioli [Mon, 5 Aug 2019 10:50:56 +0000 (11:50 +0100)]
xen: sched: reassign vCPUs to pCPUs, when they come back online

When a vcpu that was offline, comes back online, we do want it to either
be assigned to a pCPU, or go into the wait list.

Detecting that a vcpu is coming back online is a bit tricky. Basically,
if the vcpu is waking up, and is neither assigned to a pCPU, nor in the
wait list, it must be coming back from offline.

When this happens, we put it in the waitqueue, and we "tickle" an idle
pCPU (if any), to go pick it up.

Looking at the patch, it seems that the vcpu wakeup code is getting
complex, and hence that it could potentially introduce latencies.
However, all this new logic is triggered only by the case of a vcpu
coming online, so, basically, the overhead during normal operations is
just an additional 'if()'.

Signed-off-by: Dario Faggioli <dario.faggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Message-Id: <156412236222.2385.236340632846050170.stgit@Palanthas>

5 years agoxen: sched: deal with vCPUs being or becoming online or offline
Dario Faggioli [Mon, 5 Aug 2019 10:50:55 +0000 (11:50 +0100)]
xen: sched: deal with vCPUs being or becoming online or offline

If a vCPU is, or is going, offline we want it to be neither
assigned to a pCPU, nor in the wait list, so:
- if an offline vcpu is inserted (or migrated) it must not
  go on a pCPU, nor in the wait list;
- if an offline vcpu is removed, we are sure that it is
  neither on a pCPU nor in the wait list already, so we
  should just bail, avoiding doing any further action;
- if a vCPU goes offline we need to remove it either from
  its pCPU or from the wait list.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Message-Id: <156412235656.2385.13861979113936528474.stgit@Palanthas>

5 years agoxen: sched: refector code around vcpu_deassign() in null scheduler
Dario Faggioli [Mon, 5 Aug 2019 10:50:54 +0000 (11:50 +0100)]
xen: sched: refector code around vcpu_deassign() in null scheduler

vcpu_deassign() is called only once (in _vcpu_remove()).

Let's consolidate the two functions into one.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Message-Id: <156412235104.2385.3911161728130674771.stgit@Palanthas>

5 years agoautomation: add openSUSE Tumbleweed CI image
Dario Faggioli [Wed, 31 Jul 2019 16:58:46 +0000 (18:58 +0200)]
automation: add openSUSE Tumbleweed CI image

openSUSE comes in two flavours: Leap, which is non-rolling, and released
annualy, and Tumbleweed, which is rolling.

Reasons why it makes sense to have both (despite both being openSUSE,
package lists in dockerfiles being quite similar, etc) are:
- Leap share a lot with SUSE Linux Enterprise. So, regressions on Leap,
  not only means regressions for all openSUSE Leap users, but also helps
  prevent/catch regressions on SLE;
- Tumbleweed often has the most bleeding-edge software, so it will help
  us prevent/catch regressions with newly released versions of
  libraries, compilers, etc (e.g., at the time of writing this commit,
  some build issues, with GCC9, where discovered while trying to build
  in a Tumbleweed image).

Note that, considering the rolling nature of Tumbleweed, the container
would need to be rebuilt (e.g., periodically), even if the docker file
does not change.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agoautomation: try to keep openSUSE Leap image a little smaller
Dario Faggioli [Wed, 31 Jul 2019 16:58:40 +0000 (18:58 +0200)]
automation: try to keep openSUSE Leap image a little smaller

Using `--no-recommends` when updating or installing commands should
prevent non strictly necessary packages to be installed.

doing a `clean -a` after installing all the packages, should, in
theory, free more space (as opposed to using just `clean`).

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agoautomation: add info about container pushes
Doug Goldstein [Sat, 3 Aug 2019 14:44:17 +0000 (09:44 -0500)]
automation: add info about container pushes

To be able to push a container, users must have access and have logged
into the container registry. The docs did not explain this fully so this
documents the steps better.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoci: install C++ in opensuse-leap CI container
Dario Faggioli [Fri, 26 Jul 2019 10:03:25 +0000 (12:03 +0200)]
ci: install C++ in opensuse-leap CI container

The openSUSE Leap container image, built after
opensuse-leap.dockerfile was missing the gcc-c++,
which is necessary, e.g., for building OVMF.

Add it.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agox86/microcode: always collect_cpu_info() during boot
Sergey Dyasli [Thu, 1 Aug 2019 10:22:37 +0000 (18:22 +0800)]
x86/microcode: always collect_cpu_info() during boot

Currently cpu_sig struct is not updated during boot if no microcode blob
is specified by "ucode=[<interger>| scan]".

It will result in cpu_sig.rev being 0 which affects APIC's
check_deadline_errata() and retpoline_safe() functions.

Fix this by getting ucode revision early during boot and SMP bring up.
While at it, protect early_microcode_update_cpu() for cases when
microcode_ops is NULL.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agotools/xen-ucode: Upload a microcode blob to the hypervisor
Chao Gao [Thu, 1 Aug 2019 10:22:36 +0000 (18:22 +0800)]
tools/xen-ucode: Upload a microcode blob to the hypervisor

This patch provides a tool for late microcode update.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Use consistent style.  Add to gitignore.]
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/arm64: head: Don't setup the fixmap on secondary CPUs
Julien Grall [Sun, 9 Jun 2019 17:16:38 +0000 (18:16 +0100)]
xen/arm64: head: Don't setup the fixmap on secondary CPUs

setup_fixmap() will setup the fixmap in the boot page tables in order to
use earlyprintk and also update the register x23 holding the address to
the UART.

However, secondary CPUs are not using earlyprintk between turning the
MMU on and switching to the runtime page table. So setting up the
fixmap in the boot pages table is pointless.

This means most of setup_fixmap() is not necessary for the secondary
CPUs. The update of UART address is now moved out of setup_fixmap() and
duplicated in the CPU boot and secondary CPUs boot. Additionally, the
call to setup_fixmap() is removed from secondary CPUs boot.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Move assembly switch to the runtime PT in secondary CPUs path
Julien Grall [Mon, 15 Apr 2019 11:14:38 +0000 (12:14 +0100)]
xen/arm64: head: Move assembly switch to the runtime PT in secondary CPUs path

The assembly switch to the runtime PT is only necessary for the
secondary CPUs. So move the code in the secondary CPUs path.

While this is definitely not compliant with the Arm Arm as we are
switching between two differents set of page-tables without turning off
the MMU. Turning off the MMU is impossible here as the ID map may clash
with other mappings in the runtime page-tables. This will require more
rework to avoid the problem. So for now add a TODO in the code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Document enable_mmu()
Julien Grall [Fri, 7 Jun 2019 21:07:19 +0000 (22:07 +0100)]
xen/arm64: head: Document enable_mmu()

Document the behavior and the main registers usage within enable_mmu().

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Improve coding style and document create_pages_tables()
Julien Grall [Fri, 7 Jun 2019 20:53:37 +0000 (21:53 +0100)]
xen/arm64: head: Improve coding style and document create_pages_tables()

Adjust the coding style used in the comments within create_pages_tables()

Lastly, document the behavior and the main registers usage within the
function. Note that x25 is now only used within the function, so it does
not need to be part of the common register.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Improve coding style and document cpu_init()
Julien Grall [Fri, 7 Jun 2019 19:03:46 +0000 (20:03 +0100)]
xen/arm64: head: Improve coding style and document cpu_init()

Adjust the coding style used in the comments within cpu_init(). Take the
opportunity to alter the early print to match the function name.

Lastly, document the behavior and the main registers usage within the
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Rework and document zero_bss()
Julien Grall [Fri, 7 Jun 2019 18:59:15 +0000 (19:59 +0100)]
xen/arm64: head: Rework and document zero_bss()

On secondary CPUs, zero_bss() will be a NOP because BSS only need to be
zeroed once at boot. So the call in the secondary CPUs path can be
removed. It also means that x26 does not need to be set for secondary
CPU.

Note that we will need to keep x26 around for the boot CPU as BSS should
not be reset when booting via UEFI.

Lastly, document the behavior and the main registers usage within the
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Rework and document check_cpu_mode()
Julien Grall [Fri, 7 Jun 2019 18:29:03 +0000 (19:29 +0100)]
xen/arm64: head: Rework and document check_cpu_mode()

A branch in the success case can be avoided by inverting the branch
condition. At the same time, remove a pointless comment as Xen can only
run at EL2.

Lastly, document the behavior and the main registers usage within the
function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Introduce distinct paths for the boot CPU and secondary CPUs
Julien Grall [Fri, 7 Jun 2019 18:28:51 +0000 (19:28 +0100)]
xen/arm64: head: Introduce distinct paths for the boot CPU and secondary CPUs

The boot code is currently quite difficult to go through because of the
lack of documentation and a number of indirection to avoid executing
some path in either the boot CPU or secondary CPUs.

In an attempt to make the boot code easier to follow, each parts of the
boot are now in separate functions. Furthermore, the paths for the boot
CPU and secondary CPUs are now distinct and for now will call each
functions.

Follow-ups will remove unnecessary calls and do further improvement
(such as adding documentation and reshuffling).

Note that the switch from using the 1:1 mapping to the runtime mapping
is duplicated for each path. This is because in the future we will need
to stay longer in the 1:1 mapping for the boot CPU.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: platform: Add Raspberry Pi platform
Stewart Hildebrand [Mon, 29 Jul 2019 13:19:20 +0000 (09:19 -0400)]
xen/arm: platform: Add Raspberry Pi platform

The aux peripherals (uart1, spi1, and spi2) share an IRQ and a page of
memory. For debugging, it is helpful to use the aux UART in Xen. In
this case, Xen would try to assign spi1 and spi2 to dom0, but this
results in an error since the shared IRQ was already assigned to Xen.
Blacklist aux devices other than the UART to prevent mapping the shared
IRQ and memory range to dom0.

Blacklisting spi1 and spi2 unfortunately makes those peripherals
unavailable for use in the system. Future work could include forwarding
the IRQ for spi1 and spi2, and trap and mediate access to the memory
range for spi1 and spi2.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: types: Specify the zero padding in the definition of PRIregister
Julien Grall [Thu, 16 May 2019 22:39:36 +0000 (23:39 +0100)]
xen/arm: types: Specify the zero padding in the definition of PRIregister

The definition of PRIregister varies between Arm32 and Arm64 (32-bit vs
64-bit). However, some of the users uses the wrong padding and others
are not using padding at all.

For more consistency, the padding is now moved into the PRIregister and
varies depending on the architecture.

Signed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: vsmc: The function identifier is always 32-bit
Julien Grall [Thu, 16 May 2019 22:31:46 +0000 (23:31 +0100)]
xen/arm: vsmc: The function identifier is always 32-bit

On Arm64, the SMCCC function identifier is always stored in the first 32-bit
of x0 register. The rest of the bits are not defined and should be
ignored.

This means the variable funcid should be an uint32_t rather than
register_t.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: traps: Avoid BUG_ON() in do_trap_brk()
Julien Grall [Wed, 15 May 2019 16:48:04 +0000 (17:48 +0100)]
xen/arm: traps: Avoid BUG_ON() in do_trap_brk()

At the moment, do_trap_brk() is using a BUG_ON() to check the hardware
has been correctly configured during boot.

Any error when configuring the hardware could result to a guest 'brk'
trapping in the hypervisor and crash it.

This is pretty harsh to kill Xen when actually killing the guest would
be enough as misconfiguring this trap would not lead to exposing
sensitive data. Replace the BUG_ON() with crashing the guest.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: traps: Avoid using BUG_ON() in _show_registers()
Julien Grall [Wed, 15 May 2019 16:16:13 +0000 (17:16 +0100)]
xen/arm: traps: Avoid using BUG_ON() in _show_registers()

At the moment, _show_registers() is using a BUG_ON() to assert only
userspace will run 32-bit code in a 64-bit domain.

Such extra precaution is not necessary and could be avoided by only
checking the CPU mode to decide whether show_registers_64() or
show_reigsters_32() should be called.

This has also the nice advantage to avoid nested if in the code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Rework psr_mode_is_32bit()
Julien Grall [Wed, 15 May 2019 13:34:55 +0000 (14:34 +0100)]
xen/arm: Rework psr_mode_is_32bit()

psr_mode_is_32bit() prototype does not match the rest of the helpers for
the process state. Looking at the callers, most of them will access
struct cpu_user_regs just for calling psr_mode_is_32bit().

The macro is now reworked to take a struct cpu_user_regs in parameter.
At the same time take the opportunity to switch to a static inline
helper.

Lastly, when compiled for 32-bit, Xen will only support 32-bit guest. So
it is pointless to check whether the register state correspond to 64-bit
or not.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/doc: Improve Dom0-less documentation
Viktor Mitin [Wed, 31 Jul 2019 08:10:41 +0000 (11:10 +0300)]
xen/doc: Improve Dom0-less documentation

- Changed unprintable characters with %s/\%xA0/ /g
  So all the spaces are 0x20 now.

- Added address-cells and size-cells to configuration example.
  This resolves the dom0less boot issue in case of arm64.

- Added some notes about xl tools usage in case of dom0less.

Signed-off-by: Viktor Mitin <viktor_mitin@epam.com>
[julien: Remove newline at the end of the file]
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agons16550: Add compatible string for Raspberry Pi 4
Stewart Hildebrand [Mon, 29 Jul 2019 13:19:19 +0000 (09:19 -0400)]
ns16550: Add compatible string for Raspberry Pi 4

Per the BCM2835 peripherals datasheet [1] page 10:
"The UART core is build to emulate 16550 behaviour ... The implemented
UART is not a 16650 compatible UART However as far as possible the
first 8 control and status registers are laid out like a 16550 UART. Al
16550 register bits which are not supported can be written but will be
ignored and read back as 0. All control bits for simple UART receive/
transmit operations are available."

Additionally, Linux uses the 8250/16550 driver for the aux UART [2].

Unfortunately the brcm,bcm2835-aux-uart device tree binding doesn't
have the reg-shift and reg-io-width properties [3]. Thus, the reg-shift
and reg-io-width properties are inherent properties of this UART.

Thanks to Andre Przywara for contributing the reg-shift and
reg-io-width setting snippet.

In my testing, I have relied on enable_uart=1 being set in config.txt,
a configuration file read by the Raspberry Pi's firmware. With
enable_uart=1, the firmware performs UART initialization.

[1] https://www.raspberrypi.org/documentation/hardware/raspberrypi/bcm2835/BCM2835-ARM-Peripherals.pdf
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/tty/serial/8250/8250_bcm2835aux.c
[3] https://www.kernel.org/doc/Documentation/devicetree/bindings/serial/brcm,bcm2835-aux-uart.txt

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Tested-by: Andre Przywara <andre.przywara@arm.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/spec-ctrl: Speculative mitigation facilities report wrong status
Jin Nan Wang [Wed, 31 Jul 2019 13:33:44 +0000 (13:33 +0000)]
xen/spec-ctrl: Speculative mitigation facilities report wrong status

Booting with spec-ctrl=0 results in Xen printing "None MD_CLEAR".

  (XEN)   Support for HVM VMs: None MD_CLEAR
  (XEN)   Support for PV VMs: None MD_CLEAR

Add a check about X86_FEATURE_MD_CLEAR to avoid to print "None".

Signed-off-by: James Wang <jnwang@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/ubsan: Don't perform alignment checking on supporting compilers
Andrew Cooper [Mon, 24 Jun 2019 09:43:34 +0000 (10:43 +0100)]
x86/ubsan: Don't perform alignment checking on supporting compilers

GCC 5 introduced -fsanitize=alignment which is enabled by default by
CONFIG_UBSAN.  This trips a load of wont-fix cases in the ACPI tables and the
hypercall page and stubs writing logic.

It also causes the native Xen boot to crash before the console is set up, for
an as-yet unidentified reason (most likley a wont-fix case earlier on boot).

Disable alignment sanitisation on compilers which would try using it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/nodemask: Introduce a NODEMASK_PR() wrapper for printing
Andrew Cooper [Tue, 25 Jun 2019 09:48:22 +0000 (10:48 +0100)]
xen/nodemask: Introduce a NODEMASK_PR() wrapper for printing

Rework nodes_addr() into nodemask_bits() and change the indirection to match
its cpumask_bits() counterpart, and update the caller.

Use NODEMASK_PR() to fix up one opencoded access into nodemask.bits in
dump_domains().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/cpumask: Introduce a CPUMASK_PR() wrapper for printing
Andrew Cooper [Tue, 25 Jun 2019 09:48:22 +0000 (10:48 +0100)]
xen/cpumask: Introduce a CPUMASK_PR() wrapper for printing

Having to specify 'nr_cpu_id, cpumask_bits(foo)' for all printing operations
is quite repetative.  Introduce a wrapper to help.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/mask: Convert {cpu,node}mask_test() to be static inline
Andrew Cooper [Tue, 25 Jun 2019 09:48:22 +0000 (10:48 +0100)]
xen/mask: Convert {cpu,node}mask_test() to be static inline

The buggy version of GCC isn't supported by Xen, so reimplement the helpers
with type checking, using Xen's latest type expectations.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/nodemask: Drop any_online_node() and first_unset_node()
Andrew Cooper [Tue, 25 Jun 2019 09:48:22 +0000 (10:48 +0100)]
xen/nodemask: Drop any_online_node() and first_unset_node()

These have never been used in Xen, and it is unlikely that they would be
useful in the future.

any_online_cpu() was dropped by c/s 22bdce1c048 "eliminate first_cpu() etc"
but the API comment was left in place.  Drop that too.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/bitmap: Drop {bitmap,cpumask,nodes}_shift_{left,right}()
Andrew Cooper [Tue, 25 Jun 2019 09:48:22 +0000 (10:48 +0100)]
xen/bitmap: Drop {bitmap,cpumask,nodes}_shift_{left,right}()

These operations have never been used in Xen since their introduction, and it
doesn't seem likely that they will in the future.

Bloat-o-meter reports that they aren't the smallest when compiled, either.

  add/remove: 0/2 grow/shrink: 0/0 up/down: 0/-814 (-814)
  Function                                     old     new   delta
  __bitmap_shift_left                          366       -    -366
  __bitmap_shift_right                         448       -    -448
  Total: Before=3323730, After=3322916, chg -0.02%

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/page-alloc: Clamp get_free_buddy() to online nodes
Andrew Cooper [Mon, 24 Jun 2019 15:38:36 +0000 (16:38 +0100)]
xen/page-alloc: Clamp get_free_buddy() to online nodes

d->node_affinity defaults to NODE_MASK_ALL which has bits set outside of
node_online_map.  This in turn causes the loop in get_free_buddy() to waste
effort iterating over offline nodes.

Always clamp d->node_affinity to node_online_map.

This in turn requires ensuring that d->node_affinity intersects with
node_online_map, and there is one case via XEN_DOMCTL_setnodeaffinity where a
disjoint mask can end up being specified.

Tighten up the hypercall check, because there is no plausible reason to select
a node affinity which is disjoint with the system, and leave get_free_buddy()
with an assertion to the same effect, but with a runtime-safe fallback to the
full online node map.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agodomain: remove 'guest_type' field (and enum guest_type)
Paul Durrant [Wed, 31 Jul 2019 11:29:31 +0000 (13:29 +0200)]
domain: remove 'guest_type' field (and enum guest_type)

The enum guest_type was introduced in commit 6c6492780ea "pvh prep:
introduce pv guest type and has_hvm_container macros" to allow a new guest
type, distinct from either PV or HVM guest types, to be added in commit
8271d6522c6 "pvh: introduce PVH guest type". Subsequently, commit
33e5c32559e "x86: remove PVHv1 code" removed this third guest type.

This patch removes the struct domain field and enumeration as the guest
type can now be trivially determined from the 'options' field.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com>
Acked-by: George Dunlap <George.Dunlap@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoAMD/IOMMU: don't needlessly log headers when dumping IRTs
Jan Beulich [Wed, 31 Jul 2019 11:28:42 +0000 (13:28 +0200)]
AMD/IOMMU: don't needlessly log headers when dumping IRTs

Log SBDF headers only when there are actual IRTEs to log. This is
particularly important for the total volume of output when the ACPI
tables describe far more than just the existing devices. On my Rome
system so far there was one line for every function of every device on
all 256 buses of segment 0, with extremely few exceptions (like the
IOMMUs themselves).

Also only log one of the "per-device" or "shared" overall headers.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: correct IRTE updating
Jan Beulich [Wed, 31 Jul 2019 11:27:52 +0000 (13:27 +0200)]
AMD/IOMMU: correct IRTE updating

Flushing didn't get done along the lines of what the specification says.
Mark entries to be updated as not remapped (which will result in
interrupt requests to get target aborted, but the interrupts should be
masked anyway at that point in time), issue the flush, and only then
write the new entry.

In update_intremap_entry_from_msi_msg() also fold the duplicate initial
lock determination and acquire into just a single instance.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: enable x2APIC mode when available
Jan Beulich [Wed, 31 Jul 2019 11:25:42 +0000 (13:25 +0200)]
AMD/IOMMU: enable x2APIC mode when available

In order for the CPUs to use x2APIC mode, the IOMMU(s) first need to be
switched into suitable state.

The post-AP-bringup IRQ affinity adjustment is done also for the non-
x2APIC case, matching what VT-d does.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: adjust setup of internal interrupt for x2APIC mode
Jan Beulich [Wed, 31 Jul 2019 11:23:02 +0000 (13:23 +0200)]
AMD/IOMMU: adjust setup of internal interrupt for x2APIC mode

In order to be able to express all possible destinations we need to make
use of this non-MSI-capability based mechanism. The new IRQ controller
structure can re-use certain MSI functions, though.

For now general and PPR interrupts still share a single vector, IRQ, and
hence handler.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: allow enabling with IRQ not yet set up
Jan Beulich [Wed, 31 Jul 2019 11:19:27 +0000 (13:19 +0200)]
AMD/IOMMU: allow enabling with IRQ not yet set up

Early enabling (to enter x2APIC mode) requires deferring of the IRQ
setup. Code to actually do that setup in the x2APIC case will get added
subsequently.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: split amd_iommu_init_one()
Jan Beulich [Wed, 31 Jul 2019 11:18:20 +0000 (13:18 +0200)]
AMD/IOMMU: split amd_iommu_init_one()

Mapping the MMIO space and obtaining feature information needs to happen
slightly earlier, such that for x2APIC support we can set XTEn prior to
calling amd_iommu_update_ivrs_mapping_acpi() and
amd_iommu_setup_ioapic_remapping().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format
Jan Beulich [Wed, 31 Jul 2019 11:17:01 +0000 (13:17 +0200)]
AMD/IOMMU: introduce 128-bit IRTE non-guest-APIC IRTE format

This is in preparation of actually enabling x2APIC mode, which requires
this wider IRTE format to be used.

A specific remark regarding the first hunk changing
amd_iommu_ioapic_update_ire(): This bypass was introduced for XSA-36,
i.e. by 94d4a1119d ("AMD,IOMMU: Clean up old entries in remapping
tables when creating new one"). Other code introduced by that change has
meanwhile disappeared or further changed, and I wonder if - rather than
adding an x2apic_enabled check to the conditional - the bypass couldn't
be deleted altogether. For now the goal is to affect the non-x2APIC
paths as little as possible.

Take the liberty and use the new "fresh" flag to suppress an unneeded
flush in update_intremap_entry_from_ioapic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: pass IOMMU to {get,free,update}_intremap_entry()
Jan Beulich [Wed, 31 Jul 2019 11:16:14 +0000 (13:16 +0200)]
AMD/IOMMU: pass IOMMU to {get,free,update}_intremap_entry()

The functions will want to know IOMMU properties (specifically the IRTE
size) subsequently.

Rather than introducing a second error path bogusly returning -E... from
amd_iommu_read_ioapic_from_ire(), also change the existing one to follow
VT-d in returning the raw (untranslated) IO-APIC RTE.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: use bit field for IRTE
Jan Beulich [Wed, 31 Jul 2019 11:15:39 +0000 (13:15 +0200)]
AMD/IOMMU: use bit field for IRTE

At the same time restrict its scope to just the single source file
actually using it, and abstract accesses by introducing a union of
pointers. (A union of the actual table entries is not used to make it
impossible to [wrongly, once the 128-bit form gets added] perform
pointer arithmetic / array accesses on derived types.)

Also move away from updating the entries piecemeal: Construct a full new
entry, and write it out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: use bit field for control register
Jan Beulich [Wed, 31 Jul 2019 11:15:04 +0000 (13:15 +0200)]
AMD/IOMMU: use bit field for control register

Also introduce a field in struct amd_iommu caching the most recently
written control register. All writes should now happen exclusively from
that cached value, such that it is guaranteed to be up to date.

Take the opportunity and add further fields. Also convert a few boolean
function parameters to bool, such that use of !! can be avoided.

Because of there now being definitions beyond bit 31, writel() also gets
replaced by writeq() when updating hardware.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agoAMD/IOMMU: use bit field for extended feature register
Jan Beulich [Wed, 31 Jul 2019 11:14:27 +0000 (13:14 +0200)]
AMD/IOMMU: use bit field for extended feature register

This also takes care of several of the shift values wrongly having been
specified as hex rather than dec.

Take the opportunity and
- replace a readl() pair by a single readq(),
- add further fields.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
5 years agogrant_table: harden version dependent accesses
Norbert Manthey [Wed, 31 Jul 2019 11:13:09 +0000 (13:13 +0200)]
grant_table: harden version dependent accesses

Guests can issue grant table operations and provide guest controlled
data to them. This data is used as index for memory loads after bound
checks have been done. Depending on the grant table version, the
size of elements in containers differ. As the base data structure is
a page, the number of elements per page also differs. Consequently,
bound checks are version dependent, so that speculative execution can
happen in several stages, the bound check as well as the version check.

This commit mitigates cases where out-of-bound accesses could happen
due to the version comparison. In cases, where no different memory
locations are accessed on the code path that follow an if statement,
no protection is required. No different memory locations are accessed
in the following functions after a version check:

 * gnttab_setup_table: only calculated numbersi are used, and then
        function gnttab_grow_table is called, which is version protected

 * gnttab_transfer: the case that depends on the version check just gets
        into copying a page or not

 * acquire_grant_for_copy: the not fixed comparison is on the abort path
        and does not access other structures, and on the else branch
        accesses only structures that have been validated before

 * gnttab_set_version: all accessible data is allocated for both versions
        Furthermore, the functions gnttab_populate_status_frames and
        gnttab_unpopulate_status_frames received a block_speculation
        macro. Hence, this code will only be executed once the correct
        version is visible in the architectural state.

 * gnttab_release_mappings: this function is called only during domain
       destruction and control is not returned to the guest

 * mem_sharing_gref_to_gfn: speculation will be stoped by the second if
       statement, as that places a barrier on any path to be executed.

 * gnttab_get_status_frame_mfn: no version dependent check, because all
       accesses, except the gt->status[idx], do not perform index-based
       accesses, or speculative out-of-bound accesses in the
       gnttab_grow_table function call.

 * gnttab_usage_print: cannot be triggered by the guest

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agogrant_table: harden bound accesses
Norbert Manthey [Wed, 31 Jul 2019 11:12:12 +0000 (13:12 +0200)]
grant_table: harden bound accesses

Guests can issue grant table operations and provide guest controlled
data to them. This data is used as index for memory loads after bound
checks have been done. To avoid speculative out-of-bound accesses, we
use the array_index_nospec macro where applicable, or the macro
block_speculation. Note, the block_speculation macro is used on all
path in shared_entry_header and nr_grant_entries. This way, after a
call to such a function, all bound checks that happened before become
architectural visible, so that no additional protection is required
for corresponding array accesses. As the way we introduce an lfence
instruction might allow the compiler to reload certain values from
memory multiple times, we try to avoid speculatively continuing
execution with stale register data by moving relevant data into
function local variables.

Speculative execution is not blocked in case one of the following
properties is true:
 - path cannot be triggered by the guest
 - path does not return to the guest
 - path does not result in an out-of-bound access
 - path is unlikely to be executed repeatedly in rapid succession
Only the combination of the above properties allows to actually leak
continuous chunks of memory. Therefore, we only add the penalty of
protective mechanisms in case a potential speculative out-of-bound
access matches all the above properties.

This commit addresses only out-of-bound accesses whose index is
directly controlled by the guest, and the index is checked before.
Potential out-of-bound accesses that are caused by speculatively
evaluating the version of the current table are not addressed in this
commit. Hence, speculative out-of-bound accesses might still be
possible, for example in gnttab_get_status_frame_mfn, when calling
gnttab_grow_table, the assertion that the grant table version equals
two might not hold under speculation.

This is part of the speculative hardening effort.

Signed-off-by: Norbert Manthey <nmanthey@amazon.de>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Fix build dependenices for reloc.c
Andrew Cooper [Tue, 30 Jul 2019 16:40:33 +0000 (17:40 +0100)]
x86/boot: Fix build dependenices for reloc.c

c/s 201f852eaf added start_info.h and kconfig.h to reloc.c, but only updated
start_info.h in RELOC_DEPS.

This causes reloc.c to not be regenerated when Kconfig changes.  It is most
noticeable when enabling CONFIG_PVH and finding the resulting binary crash
early with:

  (d9) (XEN)
  (d9) (XEN) ****************************************
  (d9) (XEN) Panic on CPU 0:
  (d9) (XEN) Magic value is wrong: c2c2c2c2
  (d9) (XEN) ****************************************
  (d9) (XEN)
  (d9) (XEN) Reboot in five seconds...
  (XEN) d9v0 Triple fault - invoking HVM shutdown action 1

Reported-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoxen: credit2: avoid using cpumask_weight() in hot-paths
Dario Faggioli [Mon, 29 Jul 2019 10:49:09 +0000 (12:49 +0200)]
xen: credit2: avoid using cpumask_weight() in hot-paths

cpumask_weight() is known to be expensive. In Credit2, we use it in
load-balancing, but only for knowing how many CPUs are active in a
runqueue.

Keeping such count in an integer field of the per-runqueue data
structure we have, completely avoids the need for cpumask_weight().

While there, remove as much other uses of it as we can, even if not in
hot-paths.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86: don't include {amd-,}iommu.h from fixmap.h
Jan Beulich [Tue, 30 Jul 2019 10:00:05 +0000 (12:00 +0200)]
x86: don't include {amd-,}iommu.h from fixmap.h

The #include was added by 0700c962ac ("Add AMD IOMMU support into
hypervisor") and I then didn't drop it again in d7f913b8de ("AMD IOMMU:
use ioremap()"); similarly for xen/iommu.h in 99321e0e6c ("VT-d: use
ioremap()"). Avoid needlessly re-building unrelated files when only
IOMMU definitions have changed.

Two #include-s of xen/init.h turn out necessary as replacement.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agodomain: stash xen_domctl_createdomain flags in struct domain
Paul Durrant [Tue, 30 Jul 2019 09:59:01 +0000 (11:59 +0200)]
domain: stash xen_domctl_createdomain flags in struct domain

These are canonical source of data used to set various other flags. If
they are available directly in struct domain then the other flags are no
longer needed.

This patch simply copies the flags into a new 'options' field in
struct domain. Subsequent patches will do the related clean-up work.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoxen/arm64: head: Introduce print_reg
Julien Grall [Mon, 22 Jul 2019 21:39:28 +0000 (22:39 +0100)]
xen/arm64: head: Introduce print_reg

At the moment, the user should save x30/lr if it cares about it.

Follow-up patches will introduce more use of putn in place where lr
should be preserved.

Furthermore, any user of putn should also move the value to register x0
if it was stored in a different register.

For convenience, a new macro is introduced to print a given register.
The macro will take care for us to move the value to x0 and also
preserve lr.

Lastly the new macro is used to replace all the callsite of putn. This
will simplify rework/review later on.

Note that CurrentEL is now stored in x5 instead of x4 because the latter
will be clobbered by the macro print_reg.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Rework UART initialization on boot CPU
Julien Grall [Mon, 22 Jul 2019 21:39:27 +0000 (22:39 +0100)]
xen/arm64: head: Rework UART initialization on boot CPU

Anything executed after the label common_start can be executed on all
CPUs. However most of the instructions executed between the label
common_start and init_uart are not executed on the boot CPU.

The only instructions executed are to lookup the CPUID so it can be
printed on the console (if earlyprintk is enabled). Printing the CPUID
is not entirely useful to have for the boot CPU and requires a
conditional branch to bypass unused instructions.

Furthermore, the function init_uart is only called for boot CPU
requiring another conditional branch. This makes the code a bit tricky
to follow.

The UART initialization is now moved before the label common_start. This
now requires to have a slightly altered print for the boot CPU and set
the early UART base address in each the two path (boot CPU and
secondary CPUs).

This has the nice effect to remove a couple of conditional branch in
the code.

After this rework, the CPUID is only used at the very beginning of the
secondary CPUs boot path. So there is no need to "reserve" x24 for the
CPUID.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Don't clobber x30/lr in the macro PRINT
Julien Grall [Mon, 22 Jul 2019 21:39:26 +0000 (22:39 +0100)]
xen/arm64: head: Don't clobber x30/lr in the macro PRINT

The current implementation of the macro PRINT will clobber x30/lr. This
means the user should save lr if it cares about it.

Follow-up patches will introduce more use of PRINT in place where lr
should be preserved. Rather than requiring all the users to preserve
lr, the macro PRINT is modified to save and restore it.

While the comment state x3 will be clobbered, this is not the case. So
PRINT will use x3 to preserve lr.

Lastly, take the opportunity to move the comment on top of PRINT and use
PRINT in init_uart. Both changes will be helpful in a follow-up patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: head: Mark the end of subroutines with ENDPROC
Julien Grall [Mon, 22 Jul 2019 21:39:25 +0000 (22:39 +0100)]
xen/arm64: head: Mark the end of subroutines with ENDPROC

putn() and puts() are two subroutines. Add ENDPROC for the benefits of
static analysis tools and the reader.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: macros: Introduce an assembly macro to alias x30
Julien Grall [Mon, 22 Jul 2019 21:39:24 +0000 (22:39 +0100)]
xen/arm64: macros: Introduce an assembly macro to alias x30

The return address of a function is always stored in x30. For convenience,
introduce a register alias so "lr" can be used in assembly.

This is defined in asm-arm/arm64/macros.h to allow all assembly files
to use it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: SCTLR_EL1 is a 64-bit register on Arm64
Julien Grall [Tue, 23 Jul 2019 21:35:48 +0000 (22:35 +0100)]
xen/arm: SCTLR_EL1 is a 64-bit register on Arm64

On Arm64, system registers are always 64-bit including SCTLR_EL1.
However, Xen is assuming this is 32-bit because earlier revision of
Armv8 had the top 32-bit RES0 (see ARM DDI0595.b).

>From Armv8.5, some bits in [63:32] will be defined and allowed to be
modified by the guest. So we would effectively reset those bits to 0
after each context switch. This means the guest may not function
correctly afterwards.

Rather than resetting to 0 the bits [63:32], preserve them across
context switch.

Note that the corresponding register on Arm32 (i.e SCTLR) is always
32-bit. So we need to use register_t anywhere we deal the SCTLR{,_EL1}.

Outside interface is switched to use 64-bit to allow ABI compatibility
between 32-bit and 64-bit.

[Stefano: fix typo in commit message]

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Volodymyr Babchuk <volodymyr.babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/public: arch-arm: Restrict the visibility of struct vcpu_guest_core_regs
Julien Grall [Tue, 23 Jul 2019 21:35:47 +0000 (22:35 +0100)]
xen/public: arch-arm: Restrict the visibility of struct vcpu_guest_core_regs

Currently, the structure vcpu_guest_core_regs is part of the public API.
This implies that any change in the structure should be backward
compatible.

However, the structure is only needed by the tools and Xen. It is also
not expected to be ever used outside of that context. So we could save us
some headache by only declaring the structure for Xen and tools.

[Stefano: improve comment code style]

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: irq: Don't use _IRQ_PENDING when handling host interrupt
Julien Grall [Sun, 2 Jun 2019 10:26:14 +0000 (11:26 +0100)]
xen/arm: irq: Don't use _IRQ_PENDING when handling host interrupt

While SPIs are shared between CPU, it is not possible to receive the
same interrupts on a different CPU while the interrupt is in active
state.

For host interrupt (i.e routed to Xen), the deactivation of the
interrupt is done at the end of the handling. This can alternatively be
done outside of the handler by calling gic_set_active_state().

At the moment, gic_set_active_state() is only called by the vGIC for
interrupt routed to the guest. It is hard to find a reason for Xen to
directly play with the active state for interrupt routed to Xen.

To simplify the handling of host interrupt, gic_set_activate_state() is
now restricted to interrupts routed to guest.

This means the _IRQ_PENDING logic is now unecessary on Arm as a same
interrupt can never come up while in the loop and nobody should play
with the flag behind our back.

[Stefano: improve in-code comment]

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/public: arch-arm: Use xen_mk_ullong instead of suffixing value with ULL
Julien Grall [Mon, 3 Jun 2019 16:08:29 +0000 (17:08 +0100)]
xen/public: arch-arm: Use xen_mk_ullong instead of suffixing value with ULL

There are a few places in include/public/arch-arm.h that are still
suffixing immediate with ULL instead of using xen_mk_ullong.

The latter allows a consumer to easily tweak the header if ULL is not
supported.

So switch the remaining users of ULL to xen_mk_ullong.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen: don't longjmp() after domain_crash() in check_wakeup_from_wait()
Juergen Gross [Mon, 29 Jul 2019 04:36:24 +0000 (06:36 +0200)]
xen: don't longjmp() after domain_crash() in check_wakeup_from_wait()

Continuing on the stack saved by __prepare_to_wait() on the wrong cpu
is rather dangerous.

Instead of doing so just call the scheduler again as it already is
happening in the similar case in __prepare_to_wait() when doing the
setjmp() would be wrong.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/arm: cpuerrata: Align a virtual address before unmap
Andrii Anisov [Thu, 18 Jul 2019 13:22:20 +0000 (16:22 +0300)]
xen/arm: cpuerrata: Align a virtual address before unmap

After changes introduced by 9cc0618eb0 "xen/arm: mm: Sanity check any
update of Xen page tables" we are able to vmap/vunmap page aligned
addresses only.

So if we add a page address remainder to the mapped virtual address,
we have to mask it out before unmapping.

Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agotools: ipxe: update for fixing build with GCC9
Dario Faggioli [Fri, 26 Jul 2019 22:13:49 +0000 (00:13 +0200)]
tools: ipxe: update for fixing build with GCC9

Building with GCC9 (on openSUSE Tubmleweed) generates a lot of errors of
the "taking address of packed member of ... may result in an unaligned
pointer value" kind.

Updating to upstream commit 1dd56dbd11082 ("[build] Workaround compilation
error with gcc 9.1") seems to fix the problem.

For more info, see:

https://git.ipxe.org/ipxe.git/commit/1dd56dbd11082fb622c2ed21cfaced4f47d798a6

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/libxl: Add iothread support for COLO
Zhang Chen [Fri, 26 Jul 2019 16:27:23 +0000 (00:27 +0800)]
tools/libxl: Add iothread support for COLO

Xen COLO and KVM COLO shared lots of code in Qemu.
The colo-compare object in Qemu now requires an 'iothread' property since QEMU 2.11.

Detail:
https://wiki.qemu.org/Features/COLO

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agoRe-instate "xen/arm: fix mask calculation in pdx_init_mask"
Stefano Stabellini [Fri, 21 Jun 2019 20:20:25 +0000 (13:20 -0700)]
Re-instate "xen/arm: fix mask calculation in pdx_init_mask"

The commit 11911563610786615c2b3a01cdcaaf09a6f9e38d "xen/arm: fix mask
calculation in pdx_init_mask" was correct, but exposed a bug in
maddr_to_virt(). The bug in maddr_to_virt() was fixed by
612d476e74a314be514ee6a9744eea8db09d32e5 "xen/arm64: Correctly compute
the virtual address in maddr_to_virt()", so we can re-instate the
first commit now.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
5 years agoxen/arm64: Correctly compute the virtual address in maddr_to_virt()
Julien Grall [Thu, 18 Jul 2019 11:57:14 +0000 (12:57 +0100)]
xen/arm64: Correctly compute the virtual address in maddr_to_virt()

The helper maddr_to_virt() is used to translate a machine address to a
virtual address. To save some valuable address space, some part of the
machine address may be compressed.

In theory the PDX code is free to compress any bits so there are no
guarantee the machine index computed will be always greater than
xenheap_mfn_start. This would result to return a virtual address that is
not part of the direct map and trigger a crash at least on debug-build later
on because of the check in virt_to_page().

A recently reverted patch (see 1191156361 "xen/arm: fix mask calculation
in pdx_init_mask") allows the PDX to compress more bits and triggered a
crash on AMD Seattle Platform.

Avoid the crash by keeping track of the base PDX for the xenheap and use
it for computing the virtual address.

Note that virt_to_maddr() does not need to have similar modification as
it is using the hardware to translate the virtual address to a machine
address.

Take the opportunity to fix the ASSERT() as the direct map base address
correspond to the start of the RAM (this is not always 0).

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agosched: refactor code around vcpu_deassign() in null scheduler
Dario Faggioli [Fri, 26 Jul 2019 08:46:38 +0000 (10:46 +0200)]
sched: refactor code around vcpu_deassign() in null scheduler

vcpu_deassign() is called only once (in _vcpu_remove()).

Let's consolidate the two functions into one.

No functional change intended.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agoxen: merge temporary vcpu pinning scenarios
Juergen Gross [Fri, 26 Jul 2019 08:45:49 +0000 (10:45 +0200)]
xen: merge temporary vcpu pinning scenarios

Today there are two scenarios which are pinning vcpus temporarily to
a single physical cpu:

- wait_event() handling
- SCHEDOP_pin_override handling

Each of those cases are handled independently today using their own
temporary cpumask to save the old affinity settings.

The two cases can be combined as the first case will only pin a vcpu to
the physical cpu it is already running on, while SCHEDOP_pin_override is
allowed to fail.

So merge the two temporary pinning scenarios by only using one cpumask
and a per-vcpu bitmask for specifying which of the scenarios is
currently active (they are both allowed to be active for the same vcpu).

Note that we don't need to call domain_update_node_affinity() as we
are only pinning for a brief period of time.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>