]> xenbits.xensource.com Git - people/larsk/xen.git/log
people/larsk/xen.git
5 years agoxen/sched: let credit scheduler control its timer all alone staging
Juergen Gross [Mon, 7 Oct 2019 06:35:19 +0000 (08:35 +0200)]
xen/sched: let credit scheduler control its timer all alone

The credit scheduler is the only scheduler with tick_suspend and
tick_resume callbacks. Today those callbacks are invoked without being
guarded by the scheduler lock which is critical when at the same the
cpu those callbacks are active is being moved to or from a cpupool.

Crashes like the following are possible due to that race:

(XEN) ----[ Xen-4.13.0-8.0.12-d  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    79
(XEN) RIP:    e008:[<ffff82d0802467dc>] set_timer+0x39/0x1f7
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
<snip>
(XEN) Xen call trace:
(XEN)    [<ffff82d0802467dc>] set_timer+0x39/0x1f7
(XEN)    [<ffff82d08022c1f4>]
sched_credit.c#csched_tick_resume+0x54/0x59
(XEN)    [<ffff82d080241dfe>] sched_tick_resume+0x67/0x86
(XEN)    [<ffff82d0802eda52>] mwait-idle.c#mwait_idle+0x32b/0x357
(XEN)    [<ffff82d08027939e>] domain.c#idle_loop+0xa6/0xc2
(XEN)
(XEN) Pagetable walk from 0000000000000048:
(XEN)  L4[0x000] = 00000082cfb9c063 ffffffffffffffff
(XEN)  L3[0x000] = 00000082cfb9b063 ffffffffffffffff
(XEN)  L2[0x000] = 00000082cfb9a063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 79:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000048
(XEN) ****************************************

The callbacks are used when the cpu is going to or coming from idle in
order to allow higher C-states.

The credit scheduler knows when it is going to schedule an idle
scheduling unit or another one after idle, so it can easily stop or
resume the timer itself removing the need to do so via the callback.
As this timer handling is done in the main scheduling function the
scheduler lock is still held, so the race with cpupool operations can
no longer occur. Note that calling the callbacks from schedule_cpu_rm()
and schedule_cpu_add() is no longer needed, as the transitions to and
from idle in the cpupool with credit active will automatically occur
and do the right thing.

With the last user of the callbacks gone those can be removed.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/xsm: flask: Check xmalloc_array() return in security_sid_to_context()
Julien Grall [Fri, 4 Oct 2019 16:53:26 +0000 (17:53 +0100)]
xen/xsm: flask: Check xmalloc_array() return in security_sid_to_context()

xmalloc_array() may return NULL if there are memory. Rather than trying
to deference it directly, we should check the return value first.

Coverity-ID: 1381852
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/xsm: flask: Prevent NULL deference in flask_assign_{, dt}device()
Julien Grall [Fri, 4 Oct 2019 16:32:49 +0000 (17:32 +0100)]
xen/xsm: flask: Prevent NULL deference in flask_assign_{, dt}device()

flask_assign_{, dt}device() may be used to check whether you can test if
a device is assigned. In this case, the domain will be NULL.

However, flask_iommu_resource_use_perm() will be called and may end up
to deference a NULL pointer. This can be prevented by moving the call
after we check the validity for the domain pointer.

Coverity-ID: 1486741
Fixes: 71e617a6b8 ('use is_iommu_enabled() where appropriate...')
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/Kconfig: Invert the defaults for CONFIG_{PVH_GUEST,PV_SHIM}
Andrew Cooper [Tue, 1 Oct 2019 16:27:49 +0000 (17:27 +0100)]
x86/Kconfig: Invert the defaults for CONFIG_{PVH_GUEST,PV_SHIM}

This is a minor UI change, but users which have elected to enable
XEN_GUEST (which still defaults to no) will definitely need one of these
options, and will typically want both.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_ARRAY
Andrew Cooper [Thu, 31 Jan 2019 18:01:16 +0000 (18:01 +0000)]
xen/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_ARRAY

There are legitimate circumstance where array hardening is not wanted or
needed.  Allow it to be turned off.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/spec-ctrl: Annotate remaining model names
Andrew Cooper [Thu, 3 Oct 2019 14:04:03 +0000 (15:04 +0100)]
x86/spec-ctrl: Annotate remaining model names

The names in retpoline_safe() are copied from should_use_eager_fpu().  The
names in mds_calculations() come partly from Linux's intel-family.h, and
partly from conversations with Intel.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/arm: add dom0-less device assignment info to docs
Stefano Stabellini [Thu, 3 Oct 2019 17:35:41 +0000 (10:35 -0700)]
xen/arm: add dom0-less device assignment info to docs

Add info about the SPI used for the virtual pl011.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: introduce nr_spis
Stefano Stabellini [Thu, 3 Oct 2019 17:34:15 +0000 (10:34 -0700)]
xen/arm: introduce nr_spis

We don't have a clear way to know how many virtual SPIs we need for the
dom0-less domains. Introduce a new option under xen,domain to specify
the number of SPIs to allocate for a domain.

The property is optional. When absent, we'll use the physical number of
GIC lines for dom0-less domains, or GUEST_VPL011_SPI+1 if vpl011 is
requested, whichever is greater.

Remove the old setting of nr_spis based on the presence of the vpl011.

The implication of this change is that without nr_spis dom0less domains
get the same amount of SPI allocated as dom0, regardless of how many
physical devices they have assigned, and regardless of whether they have
a virtual pl011 (which also needs an emulated SPI). This is done because
the SPIs allocation needs to be done before parsing any passthrough
information, so we have to account for any potential physical SPI
assigned to the domain.

When nr_spis is present, the domain gets exactly nr_spis allocated SPIs.
If the number is too low, it might not be enough for the devices
assigned it to it. If the number is less than GUEST_VPL011_SPI, the
virtual pl011 won't work.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: handle "multiboot,device-tree" compatible nodes
Stefano Stabellini [Thu, 3 Oct 2019 17:34:15 +0000 (10:34 -0700)]
xen/arm: handle "multiboot,device-tree" compatible nodes

Detect "multiboot,device-tree" compatible nodes. Add them to the bootmod
array as BOOTMOD_GUEST_DTB.  In kernel_probe, find the right
BOOTMOD_GUEST_DTB and store a pointer to it in dtb_bootmodule.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: assign devices to boot domains
Stefano Stabellini [Thu, 3 Oct 2019 17:34:05 +0000 (10:34 -0700)]
xen/arm: assign devices to boot domains

Scan the user provided dtb fragment at boot. For each device node, map
memory to guests, and route interrupts and setup the iommu.

The memory region to remap is specified by the "xen,reg" property.

The iommu is setup by passing the node of the device to assign on the
host device tree. The path is specified in the device tree fragment as
the "xen,path" string property.

The interrupts are remapped based on the information from the
corresponding node on the host device tree. Call
handle_device_interrupts to remap interrupts. Interrupts related device
tree properties are copied from the device tree fragment, same as all
the other properties.

Require both xen,reg and xen,path to be present, unless
xen,force-assign-without-iommu is also set. In that case, tolerate a
missing xen,path, also tolerate iommu setup failure for the passthrough
device.

Also set add the new flag XEN_DOMCTL_CDF_iommu so that dom0less domU
can use the IOMMU if a partial dtb is specified.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: copy dtb fragment to guest dtb
Stefano Stabellini [Mon, 30 Sep 2019 23:13:37 +0000 (16:13 -0700)]
xen/arm: copy dtb fragment to guest dtb

Read the dtb fragment corresponding to a passthrough device from memory
at the location referred to by the "multiboot,device-tree" compatible
node.

Add a new field named dtb_bootmodule to struct kernel_info to keep track
of the dtb fragment location.

Copy the fragment to the guest dtb (only /aliases and /passthrough).

Set kinfo->phandle_gic based on the phandle of the special "/gic"
node in the device tree fragment. "/gic" is a dummy node in the dtb
fragment that represents the gic interrupt controller. Other properties
in the dtb fragment might refer to it (for instance interrupt-parent of
a device node). We reuse the phandle of "/gic" from the dtb fragment as
the phandle of the full GIC node that will be created for the guest
device tree. That way, when we copy properties from the device tree
fragment to the domU device tree the links remain unbroken.

scan_passthrough_prop is introduced here and not used in this patch but
it will be used by later patches.

Some of the code below is taken from tools/libxl/libxl_arm.c. Note that
it is OK to take LGPL 2.1 code and including it into a GPLv2 code base.
The result is GPLv2 code.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
----
Changes in v6:
- code style
- in-code comment
- commit message improvements

Changes in v5:
- code style
- in-code comment
- remove depth parameter from scan_pfdt_node
- for instead of loop in domain_handle_dtb_bootmodule
- move "gic" check to domain_handle_dtb_bootmodule
- add check_partial_fdt
- use DT_ROOT_NODE_ADDR/SIZE_CELLS_DEFAULT
- add scan_passthrough_prop parameter, set it to false for "/aliases"

Changes in v4:
- use recursion in the implementation
- rename handle_properties to handle_prop_pfdt
- rename scan_pt_node to scan_pfdt_node
- pass kinfo to handle_properties
- use uint32_t instead of u32
- rename r to res
- add "passthrough" and "aliases" check
- add a name == NULL check
- code style
- move DTB fragment scanning earlier, before DomU GIC node creation
- set guest_phandle_gic based on "/gic"
- in-code comment

Changes in v3:
- switch to using device_tree_for_each_node for the copy

Changes in v2:
- add a note about the code coming from libxl in the commit message
- copy /aliases
- code style

5 years agoxen/arm: introduce kinfo->phandle_gic
Stefano Stabellini [Mon, 30 Sep 2019 23:13:37 +0000 (16:13 -0700)]
xen/arm: introduce kinfo->phandle_gic

Instead of always hard-coding the GIC phandle (GUEST_PHANDLE_GIC), store
it in a variable under kinfo. This way it can be dynamically chosen per
domain. Remove the fdt pointer argument to the make_*_domU_node
functions and oass a struct kernel_info * instead. The fdt pointer can
be accessed from kinfo->fdt. Remove the struct domain *d parameter to
the make_*_domU_node functions because it becomes unused.

Initialize phandle_gic to GUEST_PHANDLE_GIC at the beginning of
prepare_dtb_domU for DomUs. Later patches will change the value of
phandle_gic depending on user provided information.

For Dom0, initialize phandle_gic to dt_interrupt_controller->phandle
(current value) at the beginning of prepare_dtb.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: export device_tree_get_reg and device_tree_get_u32
Stefano Stabellini [Mon, 30 Sep 2019 23:13:37 +0000 (16:13 -0700)]
xen/arm: export device_tree_get_reg and device_tree_get_u32

They'll be used in later patches.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: introduce handle_device_interrupts
Stefano Stabellini [Mon, 30 Sep 2019 23:13:37 +0000 (16:13 -0700)]
xen/arm: introduce handle_device_interrupts

Move the interrupt handling code out of handle_device to a new function
so that it can be reused for dom0less VMs (it will be used in later
patches).

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: boot with device trees with "mmu-masters" and "iommus"
Stefano Stabellini [Mon, 30 Sep 2019 20:56:18 +0000 (13:56 -0700)]
xen/arm: boot with device trees with "mmu-masters" and "iommus"

Some Device Trees may expose both legacy SMMU and generic IOMMU bindings
together. However, the SMMU driver in Xen is only supporting the legacy
SMMU bindings, leading to fatal initialization errors at boot time.

This patch fixes the booting problem by adding a check to
iommu_add_dt_device: if the Xen driver doesn't support the new generic
bindings, and the device is behind an IOMMU, do not return error. The
following iommu_assign_dt_device should succeed.

This check will become superfluous, hence removable, once the Xen SMMU
driver gets support for the generic IOMMU bindings.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: don't try to manipulate json config for stubdomain
Marek Marczykowski-Górecki [Sat, 28 Sep 2019 14:20:37 +0000 (15:20 +0100)]
libxl: don't try to manipulate json config for stubdomain

Stubdomain do not have it's own config file - its configuration is
derived from target domains. Do not try to manipulate it when attaching
PCI device.

This bug prevented starting HVM with stubdomain and PCI passthrough
device attached.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: attach PCI device to qemu only after setting pciback/pcifront
Marek Marczykowski-Górecki [Tue, 1 Oct 2019 04:24:19 +0000 (05:24 +0100)]
libxl: attach PCI device to qemu only after setting pciback/pcifront

When qemu is running in stubdomain, handling "pci-ins" command will fail
if pcifront is not initialized already. Fix this by sending such command
only after confirming that pciback/front is running.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
Marek Marczykowski-Górecki [Sat, 28 Sep 2019 14:20:35 +0000 (15:20 +0100)]
libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain.
But pciback serve also second function - it reset the device when it is
deassigned from the guest and for this reason pciback needs to be used
with HVM domain too.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself may prevent attaching xen-pciback to the
(PV) stubdomain, effectively breaking PCI passthrough.

Fix this by attaching pciback only to one domain: if PV stubdomain is in
use, let it be stubdomain (the commit prevents attaching device to target
HVM in this case); otherwise, attach it to the target domain.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: fix cold plugged PCI device with stubdomain
Marek Marczykowski-Górecki [Sat, 28 Sep 2019 14:20:34 +0000 (15:20 +0100)]
libxl: fix cold plugged PCI device with stubdomain

When libxl__device_pci_add() is called, stubdomain is already running,
even when still constructing the target domain. Previously, do_pci_add()
was called with 'starting' hardcoded to false, but now do_pci_add() shares
'starting' flag in pci_add_state for both target domain and stubdomain.

Fix this by resetting (local) 'starting' to false in pci_add_dm_done()
(previously part of do_pci_add()) when handling stubdomain, regardless
of pas->starting value.

Fixes: 11db56f9a6 (libxl_pci: Use libxl__ao_device with libxl__device_pci_add)
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86emul: adjust MOVSXD source operand handling
Jan Beulich [Fri, 4 Oct 2019 15:57:03 +0000 (17:57 +0200)]
x86emul: adjust MOVSXD source operand handling

XED commit 1b2fd94425 ("Update MOVSXD to modern behavior") points out
that as of SDM rev 064 MOVSXD is specified to read only 16 bits from
memory (or register) when used without REX.W and with operand size
override. Since the upper 16 bits of the value read won't be used
anyway in this case, make the emulation uniformly follow this more
compatible behavior when not emulating an AMD-like CPU, at the risk
of missing an exception when emulating on/for older hardware (the
boundary at SandyBridge noted in said commit looks questionable - I've
observed the "new" behavior also on Westmere, and a discussion there
lead to Mark finding that even Merom has this behavior already).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl_pci: Fix guest shutdown with PCI PT attached
Anthony PERARD [Mon, 30 Sep 2019 16:39:40 +0000 (17:39 +0100)]
libxl_pci: Fix guest shutdown with PCI PT attached

Before the problematic commit, libxl used to ignore error when
destroying (force == true) a passthrough device. If the DM failed to
detach the pci device within the allowed time, the timed out error
raised skip part of pci_remove_*, but also raise the error up to the
caller of libxl__device_pci_destroy_all, libxl__destroy_domid, and
thus the destruction of the domain fails.

When a *pci_destroy* function is called (so we have force=true), error
should mostly be ignored. If the DM didn't confirmed that the device
is removed, we will print a warning and keep going if force=true.
The patch reorder the functions so that pci_remove_timeout() calls
pci_remove_detatched() like it's done when DM calls are successful.

We also clean the QMP states and associated timeouts earlier, as soon
as they are not needed anymore.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Fixes: fae4880c45fe015e567afa223f78bf17a6d98e1b
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl_pci: Don't ignore PCI PT error at guest creation
Anthony PERARD [Mon, 30 Sep 2019 15:35:52 +0000 (16:35 +0100)]
libxl_pci: Don't ignore PCI PT error at guest creation

Fixes: 11db56f9a6291
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agodocs: add "sched-gran" boot parameter documentation
Juergen Gross [Wed, 2 Oct 2019 07:27:45 +0000 (09:27 +0200)]
docs: add "sched-gran" boot parameter documentation

Add documentation for the new "sched-gran" hypervisor boot parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
5 years agoxen/sched: add scheduling granularity enum
Juergen Gross [Wed, 2 Oct 2019 07:27:44 +0000 (09:27 +0200)]
xen/sched: add scheduling granularity enum

Add a scheduling granularity enum ("cpu", "core", "socket") for
specification of the scheduling granularity. Initially it is set to
"cpu", this can be modified by the new boot parameter (x86 only)
"sched-gran".

According to the selected granularity sched_granularity is set after
all cpus are online.

A test is added for all sched resources holding the same number of
cpus. Fall back to core- or cpu-scheduling in that case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: disable scheduling when entering ACPI deep sleep states
Juergen Gross [Wed, 2 Oct 2019 07:27:43 +0000 (09:27 +0200)]
xen/sched: disable scheduling when entering ACPI deep sleep states

When entering deep sleep states all domains are paused resulting in
all cpus only running idle vcpus. This enables us to stop scheduling
completely in order to avoid synchronization problems with core
scheduling when individual cpus are offlined.

Disabling the scheduler is done by replacing the softirq handler
with a dummy scheduling routine only enabling tasklets to run.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: support core scheduling for moving cpus to/from cpupools
Juergen Gross [Wed, 2 Oct 2019 07:27:42 +0000 (09:27 +0200)]
xen/sched: support core scheduling for moving cpus to/from cpupools

With core scheduling active it is necessary to move multiple cpus at
the same time to or from a cpupool in order to avoid split scheduling
resources in between.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: support differing granularity in schedule_cpu_[add/rm]()
Juergen Gross [Wed, 2 Oct 2019 07:27:41 +0000 (09:27 +0200)]
xen/sched: support differing granularity in schedule_cpu_[add/rm]()

With core scheduling active schedule_cpu_[add/rm]() has to cope with
different scheduling granularity: a cpu not in any cpupool is subject
to granularity 1 (cpu scheduling), while a cpu in a cpupool might be
in a scheduling resource with more than one cpu.

Handle that by having arrays of old/new pdata and vdata and loop over
those where appropriate.

Additionally the scheduling resource(s) must either be merged or
split.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: support multiple cpus per scheduling resource
Juergen Gross [Wed, 2 Oct 2019 07:27:40 +0000 (09:27 +0200)]
xen/sched: support multiple cpus per scheduling resource

Prepare supporting multiple cpus per scheduling resource by allocating
the cpumask per resource dynamically.

Modify sched_res_mask to have only one bit per scheduling resource set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: protect scheduling resource via rcu
Juergen Gross [Wed, 2 Oct 2019 07:27:39 +0000 (09:27 +0200)]
xen/sched: protect scheduling resource via rcu

In order to be able to move cpus to cpupools with core scheduling
active it is mandatory to merge multiple cpus into one scheduling
resource or to split a scheduling resource with multiple cpus in it
into multiple scheduling resources. This in turn requires to modify
the cpu <-> scheduling resource relation. In order to be able to free
unused resources protect struct sched_resource via RCU. This ensures
there are no users left when freeing such a resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: split schedule_cpu_switch()
Juergen Gross [Wed, 2 Oct 2019 07:27:38 +0000 (09:27 +0200)]
xen/sched: split schedule_cpu_switch()

Instead of letting schedule_cpu_switch() handle moving cpus from and
to cpupools, split it into schedule_cpu_add() and schedule_cpu_rm().

This will allow us to drop allocating/freeing scheduler data for free
cpus as the idle scheduler doesn't need such data.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: prepare per-cpupool scheduling granularity
Juergen Gross [Wed, 2 Oct 2019 07:27:37 +0000 (09:27 +0200)]
xen/sched: prepare per-cpupool scheduling granularity

On- and offlining cpus with core scheduling is rather complicated as
the cpus are taken on- or offline one by one, but scheduling wants them
rather to be handled per core.

As the future plan is to be able to select scheduling granularity per
cpupool prepare that by storing the granularity in struct
sched_resource (we need it there for free cpus which are not
associated to any cpupool). Free cpus will always use granularity 1.

Store the selected granularity option (cpu, core or socket) in the
cpupool , as we will need it to select the appropriate cpu mask when
populating the cpupool with cpus.

This will make on- and offlining of cpus much easier and avoids
writing code which would needed to be thrown away later.

Move the granularity related variables to cpupool.c as they are now
used form there only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: reject switching smt on/off with core scheduling active
Juergen Gross [Wed, 2 Oct 2019 07:27:36 +0000 (09:27 +0200)]
xen/sched: reject switching smt on/off with core scheduling active

When core or socket scheduling are active enabling or disabling smt is
not possible as that would require a major host reconfiguration.

Add a bool sched_disable_smt_switching which will be set for core or
socket scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: move per-cpu variable cpupool to struct sched_resource
Juergen Gross [Wed, 2 Oct 2019 07:27:35 +0000 (09:27 +0200)]
xen/sched: move per-cpu variable cpupool to struct sched_resource

Having a pointer to struct cpupool in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: move per-cpu variable scheduler to struct sched_resource
Juergen Gross [Wed, 2 Oct 2019 07:27:34 +0000 (09:27 +0200)]
xen/sched: move per-cpu variable scheduler to struct sched_resource

Having a pointer to struct scheduler in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware
Juergen Gross [Wed, 2 Oct 2019 14:43:30 +0000 (16:43 +0200)]
xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware

vcpu_wake() and vcpu_sleep() need to be made core scheduling aware:
they might need to switch a single vcpu of an already scheduled unit
between running and not running.

Especially when vcpu_sleep() for a vcpu is being called by a vcpu of
the same scheduling unit special care must be taken in order to avoid
a deadlock: the vcpu to be put asleep must be forced through a
context switch without doing so for the calling vcpu. For this
purpose add a vcpu flag handled in sched_slave() and in
sched_wait_rendezvous_in() allowing a vcpu of the currently running
unit to switch state at a higher priority than a normal schedule
event.

Use the same mechanism when waking up a vcpu of a currently active
unit.

While at it make vcpu_sleep_nosync_locked() static as it is used in
schedule.c only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/sched: add fall back to idle vcpu when scheduling unit
Juergen Gross [Wed, 2 Oct 2019 07:27:32 +0000 (09:27 +0200)]
xen/sched: add fall back to idle vcpu when scheduling unit

When scheduling an unit with multiple vcpus there is no guarantee all
vcpus are available (e.g. above maxvcpus or vcpu offline). Fall back to
idle vcpu of the current cpu in that case. This requires to store the
correct schedule_unit pointer in the idle vcpu as long as it used as
fallback vcpu.

In order to modify the runstates of the correct vcpus when switching
schedule units merge sched_unit_runstate_change() into
sched_switch_units() and loop over the affected physical cpus instead
of the unit's vcpus. This in turn requires an access function to the
current variable of other cpus.

Today context_saved() is called in case previous and next vcpus differ
when doing a context switch. With an idle vcpu being capable to be a
substitute for an offline vcpu this is problematic when switching to
an idle scheduling unit. An idle previous vcpu leaves us in doubt which
schedule unit was active previously, so save the previous unit pointer
in the per-schedule resource area. If it is NULL the unit has not
changed and we don't have to set the previous unit to be not running.

When running an idle vcpu in a non-idle scheduling unit use a specific
guest idle loop not performing any non-softirq tasklets and
livepatching in order to avoid populating the cpu caches with memory
used by other domains (as far as possible). Softirqs are considered to
be save.

In order to avoid livepatching when going to guest idle another
variant of reset_stack_and_jump() not calling check_for_livepatch_work
is needed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/sched: add a percpu resource index
Juergen Gross [Wed, 2 Oct 2019 07:27:31 +0000 (09:27 +0200)]
xen/sched: add a percpu resource index

Add a percpu variable holding the index of the cpu in the current
sched_resource structure. This index is used to get the correct vcpu
of a sched_unit on a specific cpu.

For now this index will be zero for all cpus, but with core scheduling
it will be possible to have higher values, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: support allocating multiple vcpus into one sched unit
Juergen Gross [Wed, 2 Oct 2019 07:27:30 +0000 (09:27 +0200)]
xen/sched: support allocating multiple vcpus into one sched unit

With a scheduling granularity greater than 1 multiple vcpus share the
same struct sched_unit. Support that.

Setting the initial processor must be done carefully: we can't use
sched_set_res() as that relies on for_each_sched_unit_vcpu() which in
turn needs the vcpu already as a member of the domain's vcpu linked
list, which isn't the case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: modify cpupool_domain_cpumask() to be an unit mask
Juergen Gross [Wed, 2 Oct 2019 07:27:29 +0000 (09:27 +0200)]
xen/sched: modify cpupool_domain_cpumask() to be an unit mask

cpupool_domain_cpumask() is used by scheduling to select cpus or to
iterate over cpus. In order to support scheduling units spanning
multiple cpus rename cpupool_domain_cpumask() to
cpupool_domain_master_cpumask() and let it return a cpumask with only
one bit set per scheduling resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: add support for multiple vcpus per sched unit where missing
Juergen Gross [Wed, 2 Oct 2019 07:27:28 +0000 (09:27 +0200)]
xen/sched: add support for multiple vcpus per sched unit where missing

In several places there is support for multiple vcpus per sched unit
missing. Add that missing support (with the exception of initial
allocation) and missing helpers for that.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/sched: introduce unit_runnable_state()
Juergen Gross [Wed, 2 Oct 2019 07:27:27 +0000 (09:27 +0200)]
xen/sched: introduce unit_runnable_state()

Today the vcpu runstate of a new scheduled vcpu is always set to
"running" even if at that time vcpu_runnable() is already returning
false due to a race (e.g. with pausing the vcpu).

With core scheduling this can no longer work as not all vcpus of a
schedule unit have to be "running" when being scheduled. So the vcpu's
new runstate has to be selected at the same time as the runnability of
the related schedule unit is probed.

For this purpose introduce a new helper unit_runnable_state() which
will save the new runstate of all tested vcpus in a new field of the
vcpu struct.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: add code to sync scheduling of all vcpus of a sched unit
Juergen Gross [Wed, 2 Oct 2019 07:27:26 +0000 (09:27 +0200)]
xen/sched: add code to sync scheduling of all vcpus of a sched unit

When switching sched units synchronize all vcpus of the new unit to be
scheduled at the same time.

A variable sched_granularity is added which holds the number of vcpus
per schedule unit.

As tasklets require to schedule the idle unit it is required to set the
tasklet_work_scheduled parameter of do_schedule() to true if any cpu
covered by the current schedule() call has any pending tasklet work.

For joining other vcpus of the schedule unit we need to add a new
softirq SCHED_SLAVE_SOFTIRQ in order to have a way to initiate a
context switch without calling the generic schedule() function
selecting the vcpu to switch to, as we already know which vcpu we
want to run. This has the other advantage not to loose any other
concurrent SCHEDULE_SOFTIRQ events.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agodocs: remove tmem references from man pages
Juergen Gross [Wed, 2 Oct 2019 13:41:56 +0000 (15:41 +0200)]
docs: remove tmem references from man pages

The "TO BE DOCUMENTED" section of the xl man page still references
tmem. So does the xl.conf man page. Remove the references.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/arm: traps: Mark check_stack_alignment_constraints as unused
Julien Grall [Tue, 26 Mar 2019 21:31:16 +0000 (21:31 +0000)]
xen/arm: traps: Mark check_stack_alignment_constraints as unused

Clang will throw an error if a function is unused unless you tell
to ignore it. This can be done using __maybe_unused.

While modifying the declaration, update it to match prototype of similar
functions (see build_assertions). This helps to understand that the sole
purpose of the function is to hold BUILD_BUG_ON().

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: mm: Mark check_memory_layout_alignment_constraints as unused
Julien Grall [Wed, 27 Mar 2019 18:23:11 +0000 (18:23 +0000)]
xen/arm: mm: Mark check_memory_layout_alignment_constraints as unused

Clang will throw an error if a function is unused unless you tell
to ignore it. This can be done using __maybe_unused.

While modifying the declaration, update it to match prototype of similar
functions (see build_assertions). This helps to understand that the sole
purpose of the function is to hold BUILD_BUG_ON().

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: cpufeature: Match register size with value size in cpus_have_const_cap
Julien Grall [Tue, 26 Mar 2019 21:26:57 +0000 (21:26 +0000)]
xen/arm: cpufeature: Match register size with value size in cpus_have_const_cap

Clang is pickier than GCC for the register size in asm statement. It
expects the register size to match the value size.

The asm statement expects a 32-bit (resp. 64-bit) value on Arm32
(resp. Arm64) whereas the value is a boolean (Clang consider to be
32-bit).

It would be possible to impose 32-bit register for both architecture
but this require the code to use __OP32. However, it does no really
improve the assembly generated. Instead, replace switch the variable to
use register_t.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: cpuerrata: Match register size with value size in check_workaround_*
Julien Grall [Tue, 26 Mar 2019 20:53:09 +0000 (20:53 +0000)]
xen/arm: cpuerrata: Match register size with value size in check_workaround_*

Clang is pickier than GCC for the register size in asm statement. It
expects the register size to match the value size.

The asm statement expects a 32-bit (resp. 64-bit) value on Arm32
(resp. Arm64) whereas the value is a boolean (Clang consider to be
32-bit).

It would be possible to impose 32-bit register for both architecture
but this require the code to use __OP32. However, it does not really
improve the assembly generated. Instead, replace switch the variable
to use register_t.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: bitops: Match the register size with the value size in flsl
Julien Grall [Tue, 26 Mar 2019 20:30:05 +0000 (20:30 +0000)]
xen/arm64: bitops: Match the register size with the value size in flsl

Clang is pickier than GCC for the register size in asm statement. It expects
the register size to match the value size.

The instruction clz is expecting the two operands to be the same size
(i.e 32-bit or 64-bit). As the flsl function is dealing with 64-bit
value, we need to make the destination variable 64-bit as well.

While at it, add a newline before the return statement.

Note that the return type of flsl is not updated because the result will
always be smaller than 64 and therefore fit in 32-bit.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: fix get_cpu_info() when built with clang
Julien Grall [Tue, 26 Mar 2019 21:52:20 +0000 (21:52 +0000)]
xen/arm: fix get_cpu_info() when built with clang

Clang understands the GCCism in use here, but still complains that sp is
unitialised. In such cases, resort to the older versions of this code,
which directly read sp into the temporary variable.

Note that GCCism is still kept in default because other compilers (e.g.
clang) may also define __GNUC__, so AFAIK there are no proper way to
detect properly GCC.

This means that in the event Xen is ported to a new compiler, the code
will need to be updated. But that likely not going to be the only place
where Xen will need to be adapted...

This is based on the x86 counterpart.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agolibxl: create: style: Add a pair of missing { ]
Ian Jackson [Wed, 2 Oct 2019 15:55:47 +0000 (16:55 +0100)]
libxl: create: style: Add a pair of missing { ]

From CODING_STYLE:

  Every indented statement is braced, but blocks that contain just one
  statement may have the braces omitted.  To avoid confusion, either all
  the blocks in an if...else chain have braces, or none of them do.

CC: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: choose an appropriate default for passthrough...
Paul Durrant [Tue, 1 Oct 2019 14:57:14 +0000 (15:57 +0100)]
libxl: choose an appropriate default for passthrough...

...if there is no IOMMU or it is globally disabled.

Without this patch, the following assertion may be hit:

xl: libxl_create.c:589: libxl__domain_make: Assertion `info->passthrough != LIBXL_PASSTHROUGH_UNKNOWN' failed.

This is because libxl__domain_create_info_setdefault() currently only sets
an appropriate value for 'passthrough' in the case that 'cap_hvm_directio'
is true, which is not the case unless an IOMMU is present and enabled in
the system. This is normally masked by xl choosing a default value, but
that will not happen if xl is not used (e.g. when using libvirt) or when
a stub domain is being created.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: replace 'enabled' with 'unknown' in libxl_passthrough enumeration
Paul Durrant [Tue, 1 Oct 2019 14:57:13 +0000 (15:57 +0100)]
libxl: replace 'enabled' with 'unknown' in libxl_passthrough enumeration

This is mostly a cosmetic patch to avoid the default enumeration value
being 'enabled'. The only non-cosmetic parts are in xl_parse.c where it now
becomes necessary to explicitly parse the 'enabled' value for xl.cfg
'passthrough' option, and error on the value 'unknown', because there is no
longer a direct mapping between valid xl.cfg values and the enumeration.

Suggested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: wait for the ack when issuing power control requests
Roger Pau Monne [Tue, 1 Oct 2019 15:22:33 +0000 (17:22 +0200)]
libxl: wait for the ack when issuing power control requests

Currently only suspend power control requests wait for an ack from the
domain, while power off or reboot requests simply write the command to
xenstore and exit.

Introduce a 1 minute wait for the domain to acknowledge the request, or
else return an error. The suspend code is slightly modified to use the
new infrastructure added, but shouldn't have any functional change.

Fix the ocaml bindings and also provide a backwards compatible
interface for the reboot and poweroff libxl API functions.

Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
[ wei: change ret to rc to fix build ]
Signed-off-by: Wei Liu <wl@xen.org>
5 years agotools/xen-cpuid: avoid producing bogus output
Jan Beulich [Wed, 2 Oct 2019 11:38:02 +0000 (13:38 +0200)]
tools/xen-cpuid: avoid producing bogus output

I was (mistakenly, as - looking at the code - it's clearly not intended
to work) passing the tool "Raw" and "Host" as command line arguments.
Avoid printing just "Raw       " with not even a newline at the end in
such a case. Instead report what wasn't understood by the parsing logic.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoMAINTAINERS: add tools/misc/xen-cpuid to "X86 ARCHITECTURE"
Jan Beulich [Wed, 2 Oct 2019 11:37:43 +0000 (13:37 +0200)]
MAINTAINERS: add tools/misc/xen-cpuid to "X86 ARCHITECTURE"

Along the lines of other x86-specific pieces under tools/.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoIOMMU: add missing HVM check
Jan Beulich [Wed, 2 Oct 2019 11:36:59 +0000 (13:36 +0200)]
IOMMU: add missing HVM check

Fix an unguarded d->arch.hvm access in assign_device().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agomicrocode: rendezvous CPUs in NMI handler and load ucode
Sergey Dyasli [Wed, 2 Oct 2019 11:35:44 +0000 (13:35 +0200)]
microcode: rendezvous CPUs in NMI handler and load ucode

When one core is loading ucode, handling NMI on sibling threads or
on other cores in the system might be problematic. By rendezvousing
all CPUs in NMI handler, it prevents NMI acceptance during ucode
loading.

Basically, some work previously done in stop_machine context is
moved to NMI handler. Primary threads call in and load ucode in
NMI handler. Secondary threads wait for the completion of ucode
loading on all CPU cores. An option is introduced to disable this
behavior.

Control thread doesn't rendezvous in NMI handler by calling self_nmi()
(in case of unknown_nmi_error() being triggered). The side effect is
control thread might be handling an NMI while other threads are loading
ucode. If an ucode is to update something shared by a whole socket,
control thread may be accessing things that are being updating by the
ucode loading on other cores. It is not safe. Update ucode on the
control thread first to mitigate this issue.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/crash: force unlock console before printing on kexec crash
Igor Druzhinin [Tue, 1 Oct 2019 19:15:57 +0000 (20:15 +0100)]
x86/crash: force unlock console before printing on kexec crash

There is a small window where shootdown NMI might come to a CPU
(e.g. in serial interrupt handler) where console lock is taken. In order
not to leave following console prints waiting infinitely for shot down
CPUs to free the lock - force unlock the console.

The race has been frequently observed while crashing nested Xen in
an HVM domain.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoSUPPORT.md: Describe Renesas IPMMU-VMSA support (Arm)
Oleksandr Tyshchenko [Thu, 26 Sep 2019 14:22:02 +0000 (17:22 +0300)]
SUPPORT.md: Describe Renesas IPMMU-VMSA support (Arm)

Renesas IPMMU-VMSA support (Arm) can be considered
as Technological Preview feature.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Implement workaround for Cortex A-57 and Cortex A72 AT speculate
Julien Grall [Tue, 24 Sep 2019 10:39:10 +0000 (11:39 +0100)]
xen/arm: Implement workaround for Cortex A-57 and Cortex A72 AT speculate

Both Cortex-A57 (erratum 1319537) and Cortex-A72 (erratum 1319367) can
end with corrupted TLBs if they speculate an AT instruction while S1/S2
system registers in inconsistent state.

The workaround is the same as for Cortex A-76 implemented by commit
a18be06aca "xen/arm: Implement workaround for Cortex-A76 erratum 1165522",
so it is only necessary to plumb in the cpuerrata framework.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: domain_build: Don't continue if unable to allocate all dom0 banks
Julien Grall [Wed, 21 Aug 2019 21:42:31 +0000 (22:42 +0100)]
xen/arm: domain_build: Don't continue if unable to allocate all dom0 banks

Xen will only print a warning if there are memory unallocated when using
1:1 mapping (only used by dom0). This also includes the case where no
memory has been allocated.

It will bring to all sort of issues that can be hard to diagnostic for
users (the warning can be difficult to spot or disregard).

If the users request 1GB of memory, then most likely they want the exact
amount and not 512MB. So panic if all the memory has not been allocated.

After this change, the behavior is the same as for non-1:1 memory
allocation (used by domU).

At the same time, reflow the message to have the format on a single
line.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: domain_build: Print the correct domain in initrd_load()
Julien Grall [Thu, 15 Aug 2019 17:30:42 +0000 (18:30 +0100)]
xen/arm: domain_build: Print the correct domain in initrd_load()

initrd_load() can be called by other domain than dom0. To avoid
confusion in the log, print the correct domain.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: domain_build: Remove redundant check in make_vpl011_uart_node()
Julien Grall [Thu, 15 Aug 2019 19:46:24 +0000 (20:46 +0100)]
xen/arm: domain_build: Remove redundant check in make_vpl011_uart_node()

None of the code since the last check of res modify the value. So the
check can be removed.

Coverity-ID: 1476824
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: domain_build: Print the correct domain in construct_domain()
Julien Grall [Thu, 15 Aug 2019 17:34:21 +0000 (18:34 +0100)]
xen/arm: domain_build: Print the correct domain in construct_domain()

construct_domain() can be called by other domain than dom0. To avoid
confusion in the log, print the correct domain.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
5 years agoxen/arm: p2m: Free the p2m entry after flushing the IOMMU TLBs
Julien Grall [Fri, 9 Aug 2019 12:59:15 +0000 (13:59 +0100)]
xen/arm: p2m: Free the p2m entry after flushing the IOMMU TLBs

When freeing a p2m entry, all the sub-tree behind it will also be freed.
This may include intermediate page-tables or any l3 entry requiring to
drop a reference (e.g for foreign pages). As soon as pages are freed,
they may be re-used by Xen or another domain. Therefore it is necessary
to flush *all* the TLBs beforehand.

While CPU TLBs will be flushed before freeing the pages, this is not
the case for IOMMU TLBs. This can be solved by moving the IOMMU TLBs
flush earlier in the code.

This wasn't considered as a security issue as device passthrough on Arm
is not security supported.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Tested-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/arm: p2m: Fix typo in the comment on top of P2M_ROOT_LEVEL
Julien Grall [Sun, 29 Sep 2019 16:35:10 +0000 (17:35 +0100)]
xen/arm: p2m: Fix typo in the comment on top of P2M_ROOT_LEVEL

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: domain_build: Avoid implicit conversion from ULL to UL
Julien Grall [Sun, 29 Sep 2019 15:56:26 +0000 (16:56 +0100)]
xen/arm: domain_build: Avoid implicit conversion from ULL to UL

Clang 8.0 will fail to build domain_build.c on Arm32 because of the
following error:

domain_build.c:448:21: error: implicit conversion from 'unsigned long long' to 'unsigned long' changes value from 1090921693184 to 0
[-Werror,-Wconstant-conversion]
    bank_size = MIN(GUEST_RAM1_SIZE, kinfo->unassigned_mem);

Arm32 is able to support more than 4GB of physical memory, so it would
be theorically possible to create domain with more the 4GB of RAM.
Therefore, the size of a bank may not fit in 32-bit.

This can be resolved by switch the variable bank_size and the parameter
tot_size to "paddr_t".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Released-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: head: Fix build when using GAS 2.25.0
Julien Grall [Mon, 30 Sep 2019 18:44:25 +0000 (19:44 +0100)]
xen/arm32: head: Fix build when using GAS 2.25.0

GAS 2.25.0 throws multiple errors when building arm32/head.S:

arm32/head.S: Assembler messages:
arm32/head.S:452: Error: invalid constant (f7f) after fixup
arm32/head.S:453: Error: invalid constant (f7f) after fixup
arm32/head.S:495: Error: invalid constant (f7f) after fixup
arm32/head.S:510: Error: invalid constant (f7f) after fixup
arm32/head.S:514: Error: invalid constant (f7f) after fixup
arm32/head.S:516: Error: invalid constant (f7f) after fixup
arm32/head.S:633: Error: invalid constant (f7f) after fixup

This makes sense because the instruction mov is only able to deal with a
specific set of immediate (see "modified immediate constants in ARM
instructions"). For any 16-bit immediate, the instruction movw should be
used.

It looks like newer version of GAS will seemly switch to movw if the
immediate does not fit in the immediate encoding for mov. But we should
not rely on this. So switch to movw.

Fixes: 23dfe48d10 ("xen/arm32: head: Introduce macros to create table and mapping entry")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoMAINTAINERS: Update e-mail address
Julien Grall [Mon, 30 Sep 2019 16:53:09 +0000 (17:53 +0100)]
MAINTAINERS: Update e-mail address

I will soon lose access to my Arm e-mail address. Update it to
julien@xen.org

Signed-off-by: Julien Grall <julien.grall@arm.com>
Cc: julien@xen.org
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/iommu: fix hwdom iommu requirements check
Roger Pau Monné [Mon, 30 Sep 2019 13:46:57 +0000 (15:46 +0200)]
x86/iommu: fix hwdom iommu requirements check

Both a shadow and a HAP hwdom require an iommu and must be run in
strict mode. Change the HAP check into a hvm domain check.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86: correct bogus error indicator of cpu_add()
Jan Beulich [Mon, 30 Sep 2019 13:46:24 +0000 (15:46 +0200)]
x86: correct bogus error indicator of cpu_add()

Commit 54ce2db8b8 ("x86/numa: adjust datatypes for node and pxm")
changed this from the -1 (i.e. -EPERM, which was already bogus) that
comes back from setup_node() to NUMA_NO_NODE (0xff). Use a proper error
indicator instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86emul: move ARPL #UD check
Jan Beulich [Mon, 30 Sep 2019 13:45:16 +0000 (15:45 +0200)]
x86emul: move ARPL #UD check

The #UD for being outside of protected mode gets raised for ARPL only
after having read the memory operand - correct this by moving up the
respective construct.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agons16550: make PCI device hiding uniform
Jan Beulich [Tue, 3 Sep 2019 13:58:08 +0000 (15:58 +0200)]
ns16550: make PCI device hiding uniform

The difference between pci_hide_device() and pci_ro_device() is that
the former only prevents a device from getting assigned to a guest,
while the latter additionally arranges for Dom0 write attempts to the
device's config space to be ignored/discarded. Whether we want one or
the other certainly doesn't depend on whether the device is in our set
of known devices. All that matters is whether we use a PCI device: Call
pci_ro_device() in any such case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/sched: move struct task_slice into struct sched_unit
Juergen Gross [Fri, 27 Sep 2019 07:00:31 +0000 (09:00 +0200)]
xen/sched: move struct task_slice into struct sched_unit

In order to prepare for multiple vcpus per schedule unit move struct
task_slice in schedule() from the local stack into struct sched_unit
of the currently running unit. To make access easier for the single
schedulers add the pointer of the currently running unit as a parameter
of do_schedule().

While at it switch the tasklet_work_scheduled parameter of
do_schedule() from bool_t to bool.

As struct task_slice is only ever modified with the local schedule
lock held it is safe to directly set the different units in struct
sched_unit instead of using an on-stack copy for returning the data.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: Change vcpu_migrate_*() to operate on schedule unit
Juergen Gross [Fri, 27 Sep 2019 07:00:30 +0000 (09:00 +0200)]
xen/sched: Change vcpu_migrate_*() to operate on schedule unit

vcpu_migrate_start() and vcpu_migrate_finish() are used only to ensure
a vcpu is running on a suitable processor, so they can be switched to
operate on schedule units instead of vcpus.

While doing that rename them accordingly.

Call vcpu_sync_execstate() for each vcpu of the unit when changing
processors in order to make that an explicit action (otherwise this
would happen later when either the vcpu is scheduled on the new
processor or another non-idle vcpu is scheduled on the old processor).

vcpu_move_locked() is switched to schedule unit, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: add runstate counters to struct sched_unit
Juergen Gross [Fri, 27 Sep 2019 07:00:29 +0000 (09:00 +0200)]
xen/sched: add runstate counters to struct sched_unit

Add counters to struct sched_unit summing up runstates of associated
vcpus. This allows doing quick checks whether a unit has any vcpu
running or whether only a single vcpu of a unit is running.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen: switch from for_each_vcpu() to for_each_sched_unit()
Juergen Gross [Fri, 27 Sep 2019 07:00:28 +0000 (09:00 +0200)]
xen: switch from for_each_vcpu() to for_each_sched_unit()

Where appropriate switch from for_each_vcpu() to for_each_sched_unit()
in order to prepare core scheduling.

As it is beneficial once here and for sure in future add a
unit_scheduler() helper and let vcpu_scheduler() use it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/sched: switch sched_move_irqs() to take sched_unit as parameter
Juergen Gross [Fri, 27 Sep 2019 07:00:27 +0000 (09:00 +0200)]
xen/sched: switch sched_move_irqs() to take sched_unit as parameter

sched_move_irqs() should work on a sched_unit as that is the unit
moved between cpus.

Rename the current function to vcpu_move_irqs() as it is still needed
in schedule().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: switch schedule() from vcpus to sched_units
Juergen Gross [Fri, 27 Sep 2019 07:00:26 +0000 (09:00 +0200)]
xen/sched: switch schedule() from vcpus to sched_units

Use sched_units instead of vcpus in schedule(). This includes the
introduction of sched_unit_runstate_change() as a replacement of
vcpu_runstate_change() in schedule().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: use sched_resource cpu instead smp_processor_id in schedulers
Juergen Gross [Fri, 27 Sep 2019 07:00:25 +0000 (09:00 +0200)]
xen/sched: use sched_resource cpu instead smp_processor_id in schedulers

Especially in the do_schedule() functions of the different schedulers
using smp_processor_id() for the local cpu number is correct only if
the sched_unit is a single vcpu. As soon as larger sched_units are
used most uses should be replaced by the master_cpu number of the local
sched_resource instead.

Add a helper to get that sched_resource master_cpu and modify the
schedulers to use it in a correct way.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen: let vcpu_create() select processor
Juergen Gross [Fri, 27 Sep 2019 07:00:24 +0000 (09:00 +0200)]
xen: let vcpu_create() select processor

Today there are two distinct scenarios for vcpu_create(): either for
creation of idle-domain vcpus (vcpuid == processor) or for creation of
"normal" domain vcpus (including dom0), where the caller selects the
initial processor on a round-robin scheme of the allowed processors
(allowed being based on cpupool and affinities).

Instead of passing the initial processor to vcpu_create() and passing
on to sched_init_vcpu() let sched_init_vcpu() do the processor
selection. For supporting dom0 vcpu creation use the node_affinity of
the domain as a base for selecting the processors. User domains will
have initially all nodes set, so this is no different behavior compared
to today. In theory this is not guaranteed as vcpus are created only
with XEN_DOMCTL_max_vcpus being called, but this call is going to be
removed in future and the toolstack doesn't call
XEN_DOMCTL_setnodeaffinity before calling XEN_DOMCTL_max_vcpus.

To be able to use const struct domain * make cpupool_domain_cpumask()
take a const domain pointer, too.

A further simplification is possible by having a single function for
creating the dom0 vcpus with vcpu_id > 0 and doing the required pinning
for all vcpus after that. This allows to make sched_set_affinity()
private to schedule.c and switch it to sched_units easily. Note that
this functionality is x86 only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com> [x86]
5 years agoxen: add sched_unit_pause_nosync() and sched_unit_unpause()
Juergen Gross [Fri, 27 Sep 2019 07:00:23 +0000 (09:00 +0200)]
xen: add sched_unit_pause_nosync() and sched_unit_unpause()

The credit scheduler calls vcpu_pause_nosync() and vcpu_unpause()
today. Add sched_unit_pause_nosync() and sched_unit_unpause() to
perform the same operations on scheduler units instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: make arinc653 scheduler vcpu agnostic.
Juergen Gross [Fri, 27 Sep 2019 07:00:22 +0000 (09:00 +0200)]
xen/sched: make arinc653 scheduler vcpu agnostic.

Switch arinc653 scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: make credit2 scheduler vcpu agnostic.
Juergen Gross [Fri, 27 Sep 2019 07:00:21 +0000 (09:00 +0200)]
xen/sched: make credit2 scheduler vcpu agnostic.

Switch credit2 scheduler completely from vcpu to sched_unit usage.

As we are touching lots of lines remove some white space at the end of
the line, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: make credit scheduler vcpu agnostic.
Juergen Gross [Fri, 27 Sep 2019 07:00:20 +0000 (09:00 +0200)]
xen/sched: make credit scheduler vcpu agnostic.

Switch credit scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: make rt scheduler vcpu agnostic.
Juergen Gross [Fri, 27 Sep 2019 07:00:19 +0000 (09:00 +0200)]
xen/sched: make rt scheduler vcpu agnostic.

Switch rt scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: make null scheduler vcpu agnostic.
Juergen Gross [Fri, 27 Sep 2019 07:00:18 +0000 (09:00 +0200)]
xen/sched: make null scheduler vcpu agnostic.

Switch null scheduler completely from vcpu to sched_unit usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: add is_running indicator to struct sched_unit
Juergen Gross [Fri, 27 Sep 2019 07:00:17 +0000 (09:00 +0200)]
xen/sched: add is_running indicator to struct sched_unit

Add an is_running indicator to struct sched_unit which will be set
whenever the unit is being scheduled. Switch scheduler code to use
unit->is_running instead of vcpu->is_running for scheduling decisions.

At the same time introduce a state_entry_time field in struct
sched_unit being updated whenever the is_running indicator is changed.
Use that new field in the schedulers instead of the similar vcpu field.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: switch struct task_slice from vcpu to sched_unit
Juergen Gross [Fri, 27 Sep 2019 07:00:16 +0000 (09:00 +0200)]
xen/sched: switch struct task_slice from vcpu to sched_unit

Let the schedulers put a sched_unit pointer into struct task_slice
instead of a vcpu pointer.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: rename scheduler related perf counters
Juergen Gross [Fri, 27 Sep 2019 07:00:15 +0000 (09:00 +0200)]
xen/sched: rename scheduler related perf counters

Rename the scheduler related perf counters from vcpu* to unit* where
appropriate.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: add scheduler helpers hiding vcpu
Juergen Gross [Fri, 27 Sep 2019 07:00:14 +0000 (09:00 +0200)]
xen/sched: add scheduler helpers hiding vcpu

Add the following helpers using a sched_unit as input instead of a
vcpu:

- is_idle_unit() similar to is_idle_vcpu()
- is_unit_online() similar to is_vcpu_online() (returns true when any
  of its vcpus is online)
- unit_runnable() like vcpu_runnable() (returns true if any of its
  vcpus is runnable)
- sched_set_res() to set the current scheduling resource of a unit
- sched_unit_master() to get the current processor of a unit (returns
  the master_cpu of the scheduling resource of a unit)
- sched_{set|clear}_pause_flags[_atomic]() to modify pause_flags of the
  associated vcpu(s) (modifies the pause_flags of all vcpus of the
  unit)
- sched_idle_unit() to get the sched_unit pointer of the idle vcpu of a
  specific physical cpu

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: move some per-vcpu items to struct sched_unit
Juergen Gross [Fri, 27 Sep 2019 07:00:13 +0000 (09:00 +0200)]
xen/sched: move some per-vcpu items to struct sched_unit

Affinities are scheduler specific attributes, they should be per
scheduling unit. So move all affinity related fields in struct vcpu
to struct sched_unit. While at it switch affinity related functions in
sched-if.h to use a pointer to sched_unit instead to vcpu as parameter.

The affinity_broken flag must be kept per vcpu as it is related to
guest actions on specific vcpus. When support of multiple vcpus per
sched_unit is being added, a unit is regarded as being subject to
"broken affinity" when any of its vcpus has the affinity_broken flag
set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: switch vcpu_schedule_lock to unit_schedule_lock
Juergen Gross [Fri, 27 Sep 2019 07:00:12 +0000 (09:00 +0200)]
xen/sched: switch vcpu_schedule_lock to unit_schedule_lock

Rename vcpu_schedule_[un]lock[_irq]() to unit_schedule_[un]lock[_irq]()
and let it take a sched_unit pointer instead of a vcpu pointer as
parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: move per cpu scheduler private data into struct sched_resource
Juergen Gross [Fri, 27 Sep 2019 07:00:11 +0000 (09:00 +0200)]
xen/sched: move per cpu scheduler private data into struct sched_resource

This prepares support of larger scheduling granularities, e.g. core
scheduling.

While at it move sched_has_urgent_vcpu() from include/asm-x86/cpuidle.h
into sched.h removing the need for including sched-if.h in cpuidle.h.
For that purpose remobe urgent_count from the scheduler private data
and make it a plain percpu variable.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: switch schedule_data.curr to point at sched_unit
Juergen Gross [Fri, 27 Sep 2019 07:00:10 +0000 (09:00 +0200)]
xen/sched: switch schedule_data.curr to point at sched_unit

In preparation of core scheduling let the percpu pointer
schedule_data.curr point to a strct sched_unit instead of the related
vcpu. At the same time rename the per-vcpu scheduler specific structs
to per-unit ones.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: let pick_cpu return a scheduler resource
Juergen Gross [Fri, 27 Sep 2019 07:00:09 +0000 (09:00 +0200)]
xen/sched: let pick_cpu return a scheduler resource

Instead of returning a physical cpu number let pick_cpu() return a
scheduler resource instead. Rename pick_cpu() to pick_resource() to
reflect that change.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: introduce struct sched_resource
Juergen Gross [Fri, 27 Sep 2019 07:00:08 +0000 (09:00 +0200)]
xen/sched: introduce struct sched_resource

Add a scheduling abstraction layer between physical processors and the
schedulers by introducing a struct sched_resource. Each scheduler unit
running is active on such a scheduler resource. For the time being
there is one struct sched_resource per cpu, but in future there might
be one for each core or socket only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: build a linked list of struct sched_unit
Juergen Gross [Fri, 27 Sep 2019 07:00:07 +0000 (09:00 +0200)]
xen/sched: build a linked list of struct sched_unit

In order to make it easy to iterate over sched_unit elements of a
domain, build a single linked list and add an iterator for it. The new
list is guarded by the same mechanisms as the vcpu linked list as it
is modified only via vcpu_create() or vcpu_destroy().

For completeness add another iterator for_each_sched_unit_vcpu() which
will iterate over all vcpus of a sched_unit (right now only one). This
will be needed later for larger scheduling granularity (e.g. cores).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: move per-vcpu scheduler private data pointer to sched_unit
Juergen Gross [Fri, 27 Sep 2019 07:00:06 +0000 (09:00 +0200)]
xen/sched: move per-vcpu scheduler private data pointer to sched_unit

This prepares making the different schedulers vcpu agnostic.

Note that some scheduler specific accessor function are misnamed after
this patch. This will be corrected in later patches.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: use new sched_unit instead of vcpu in scheduler interfaces
Juergen Gross [Fri, 27 Sep 2019 07:00:05 +0000 (09:00 +0200)]
xen/sched: use new sched_unit instead of vcpu in scheduler interfaces

In order to prepare core- and socket-scheduling use a new struct
sched_unit instead of struct vcpu for interfaces of the different
schedulers.

Rename the per-scheduler functions insert_vcpu and remove_vcpu to
insert_unit and remove_unit to reflect the change of the parameter.
In the schedulers rename local functions switched to sched_unit, too.

Rename alloc_vdata and free_vdata functions to alloc_udata and
free_udata.

For now this new struct will contain a domain, a vcpu pointer and a
unit_id only and is allocated at vcpu creation time.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>