]> xenbits.xensource.com Git - people/liuw/xen.git/log
people/liuw/xen.git
5 years agox86: implement Hyper-V clock source hyperv-ref-tsc-1
Wei Liu [Thu, 24 Oct 2019 14:54:15 +0000 (15:54 +0100)]
x86: implement Hyper-V clock source

Implement a clock source using Hyper-V's reference TSC page.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
Relevant spec:

https://github.com/MicrosoftDocs/Virtualization-Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v5.0C.pdf

Section 12.6.

5 years agox86/hyperv: provide hyperv_guest variable
Wei Liu [Thu, 24 Oct 2019 14:21:05 +0000 (15:21 +0100)]
x86/hyperv: provide hyperv_guest variable

It will be used to gate Hyper-V related code outside of the guest
directory.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
5 years agox86: use running_on_hypervisor to gate hypervisor_setup
Wei Liu [Thu, 24 Oct 2019 13:34:29 +0000 (14:34 +0100)]
x86: use running_on_hypervisor to gate hypervisor_setup

The hypervisor_setup method is not unique to Xen guest.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
5 years agox86: add a comment regarding the location of hypervisor_probe
Wei Liu [Thu, 24 Oct 2019 13:39:25 +0000 (14:39 +0100)]
x86: add a comment regarding the location of hypervisor_probe

Signed-off-by: Wei Liu <liuwe@microsoft.com>
5 years agox86/hyperv: extract more information from Hyper-V
Wei Liu [Thu, 24 Oct 2019 13:22:53 +0000 (14:22 +0100)]
x86/hyperv: extract more information from Hyper-V

Provide a structure to store that information. The structure will be
accessed from other places later so make it public.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
5 years agox86: fix up hyperv-tlfs.h
Wei Liu [Thu, 24 Oct 2019 11:41:57 +0000 (12:41 +0100)]
x86: fix up hyperv-tlfs.h

Do the following:
1. include xen/types.h and xen/bitops.h
2. fix up invocations of BIT macro

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
This can be squashed into previous patch if preferred.

5 years agox86: import hyperv-tlfs.h from Linux
Wei Liu [Thu, 24 Oct 2019 11:17:03 +0000 (12:17 +0100)]
x86: import hyperv-tlfs.h from Linux

Taken from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649.

This is a pristine copy from Linux. It is not used yet and probably
doesn't compile. Changes to make it work will come later.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
5 years agox86: introduce CONFIG_HYPERV and detection code
Wei Liu [Mon, 30 Sep 2019 13:34:50 +0000 (14:34 +0100)]
x86: introduce CONFIG_HYPERV and detection code

We use the same code structure as we did for Xen.

As starters, detect Hyper-V in probe routine. More complex
functionalities will be added later.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
V4:
1. Add Paul's Rb
2. Add comment

V3:
1. Remove some unused code
2. Rename structure
3. Also detect HV#1 signature

5 years agox86: be more verbose when running on a hypervisor
Wei Liu [Mon, 30 Sep 2019 13:23:27 +0000 (14:23 +0100)]
x86: be more verbose when running on a hypervisor

Signed-off-by: Wei Liu <liuwe@microsoft.com>
---
V3: Address Roger's comment, add ASSERTs

5 years agox86: switch xen implementation to use hypervisor framework
Wei Liu [Mon, 30 Sep 2019 13:05:09 +0000 (14:05 +0100)]
x86: switch xen implementation to use hypervisor framework

Take the chance to change probe_hypervisor to hypervisor_probe.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
---
V3:
1. Address Roger's comments
2. Change xen_hypervisor_ops to xen_ops

5 years agox86: rename hypervisor_{alloc,free}_unused_page
Wei Liu [Mon, 30 Sep 2019 12:53:16 +0000 (13:53 +0100)]
x86: rename hypervisor_{alloc,free}_unused_page

They are used in Xen code only.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86: introduce hypervisor framework
Wei Liu [Mon, 30 Sep 2019 10:06:39 +0000 (11:06 +0100)]
x86: introduce hypervisor framework

We will soon implement Hyper-V support for Xen. Add a framework for
that.

This requires moving some of the hypervisor_* functions from xen.h to
hypervisor.h.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86: include xen/lib.h in guest/hypercall.h
Wei Liu [Mon, 30 Sep 2019 14:42:07 +0000 (15:42 +0100)]
x86: include xen/lib.h in guest/hypercall.h

We need ASSERT_UNREACHABLE.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86: drop hypervisor_cpuid_base
Wei Liu [Thu, 19 Sep 2019 14:04:25 +0000 (15:04 +0100)]
x86: drop hypervisor_cpuid_base

The only user is Xen specific code in PV shim. We can therefore export
the variable directly.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86: include asm_defns.h directly in hypercall.h
Wei Liu [Thu, 19 Sep 2019 13:04:00 +0000 (14:04 +0100)]
x86: include asm_defns.h directly in hypercall.h

ASM_CALL_CONSTRAINT is defined there.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86: introduce CONFIG_GUEST and move code
Wei Liu [Thu, 19 Sep 2019 12:22:05 +0000 (13:22 +0100)]
x86: introduce CONFIG_GUEST and move code

Xen is able to run as a guest on Xen. We plan to make it able to run
on Hyper-V as well.

Introduce CONFIG_GUEST which is set to true if either running on Xen
or Hyper-V is desired. Restructure code hierarchy for new code to
come.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoMAINTAINERS: correct description of M:
Jan Beulich [Fri, 25 Oct 2019 08:40:12 +0000 (10:40 +0200)]
MAINTAINERS: correct description of M:

Let's reflect reality, its use by add_maintainers.pl / get_maintainer.pl,
as well as what
https://wiki.xenproject.org/wiki/Submitting_Xen_Project_Patches says.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86: fix off-by-one in is_xen_fixed_mfn()
Jan Beulich [Fri, 25 Oct 2019 08:38:58 +0000 (10:38 +0200)]
x86: fix off-by-one in is_xen_fixed_mfn()

__2M_rwdata_end marks the first byte after the Xen image, not its last
byte. Subtract 1 to obtain the upper bound to compare against. (Note
that instead switching from <= to < is less desirable, as in principle
__pa() might return rubbish for addresses outside of the Xen image.)

Since the & needs to be dropped from the line in question, also drop it
from the adjacent one.

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: On ARM, reject future new passthrough modes too
Ian Jackson [Wed, 23 Oct 2019 12:55:54 +0000 (13:55 +0100)]
libxl: On ARM, reject future new passthrough modes too

This is most pleasantly done by also changing the if to a switch.

Suggested-by: Julien Grall <julien@xen.org>
CC: Julien Grall <julien@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agolibxl/xl: Overhaul passthrough setting logic
Ian Jackson [Mon, 7 Oct 2019 16:59:15 +0000 (17:59 +0100)]
libxl/xl: Overhaul passthrough setting logic

LIBXL_PASSTHROUGH_UNKNOWN (aka "ENABLED" in an earlier uncommitted
version of this code) is doing double duty.  We actually need all of
the following to be specifiable:
  * "default": enable PT iff we have devices to
    pass through specified in the initial config file.
  * "enabled" (and fail if the platform doesn't support it).
  * "disabled" (and reject future PT hotplug).
  * "share_pt"/"sync_pt": enable PT and set a specific PT mode.

Defaulting and error checking should be done in libxl.  So, we make
several changes here.

We introduce "enabled", and rename "unknown" to "default".

We move all of the error checking and defaulting code from xl into
libxl.  Now, libxl__domain_config_setdefault has all of the necessary
information to get this right.  So we can do it all there.  Choosing
the specific mode is arch-specific.

We can also arrange to have only one place each which calculates
(i) whether passthrough needs to be enabled because pt devices were
specified (ii) whether pt_share can be used (for each arch).

xl now only has to parse the enum in the same way as it parses all
other enums.

This change fixes a regression from earlier 4.13-pre: until recent
changes, passthrough was only enabled by default if passthrough
devices were specified.  We restore this behaviour.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien@xen.org>
CC: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
CC: Andrew Cooper <Andrew.Cooper3@citrix.com>
CC: Paul Durrant <pdurrant@gmail.com>
CC: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agolibxl: Move domain_create_info_setdefault earlier
Ian Jackson [Fri, 11 Oct 2019 16:16:44 +0000 (17:16 +0100)]
libxl: Move domain_create_info_setdefault earlier

We need this before we start to figure out the passthrough mode.

I have checked that nothing in libxl__domain_create_info_setdefault
nor the two implementations of ..._arch_... accesses anything else,
other than (i) the domain type (which this function is responsible for
setting and nothing before it looks at) (ii) c_info->ssidref (which is
defaulted by flask code near the top of
libxl__domain_config_setdefault and not accessed afterwards).

So no functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: create: setdefault: Move physinfo into config_setdefault
Ian Jackson [Mon, 7 Oct 2019 16:50:06 +0000 (17:50 +0100)]
libxl: create: setdefault: Move physinfo into config_setdefault

No functional change.  This will let us refer to it in code we are
about to add to this function.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: create: setdefault: Make libxl_physinfo info[1]
Ian Jackson [Mon, 7 Oct 2019 16:47:46 +0000 (17:47 +0100)]
libxl: create: setdefault: Make libxl_physinfo info[1]

No functional change.  This will let us make it into a pointer without
textual change other than to the definition.

While we are here, fix some style errors (missing { }).

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: Remove/deprecate libxl_get_required_*_memory from the API
Ian Jackson [Fri, 4 Oct 2019 14:36:59 +0000 (15:36 +0100)]
libxl: Remove/deprecate libxl_get_required_*_memory from the API

These are now redundant because shadow_memkb and iommu_memkb are now
defaulted automatically by libxl_domain_need_memory and
libxl_domain_create etc.  Callers should not now call these; instead,
they should just let libxl take care of it.

libxl_get_required_shadow_memory was introduced in f89f555827a6
  "remove late (on-demand) construction of IOMMU page tables"
We can freely remove it because it was never in any release.

libxl_get_required_shadow_memory has been in libxl approximately
forever.  It should probably not have survived the creation of
libxl_domain_create, but it seems the API awkwardnesses we see in
recent commits prevented this.  So we have to keep it.  It remains
functional but we can deprecate it.  Hopefully we can get rid of it
completely before we find the need to change the calculation to use
additional information which its arguments do not currently supply.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: Move shadow_memkb and iommu_memkb defaulting into libxl
Ian Jackson [Fri, 4 Oct 2019 10:45:59 +0000 (11:45 +0100)]
libxl: Move shadow_memkb and iommu_memkb defaulting into libxl

Defaulting is supposed to be done by libxl.  So these calculations
should be here in libxl.  libxl__domain_config_setdefault has all the
necessary information including the values of max_memkb and max_vcpus.

The overall functional effect depends on the caller:

For xl, no change.  The code moves from xl to libxl.

For callers who set one or both shadow_memkb and iommu_memkb (whether
from libxl_get_required_shadow_memory or otherwise) before calling
libxl_domain_need_memory (any version): the new code will leave their
setting(s) unchanged.

For callers who do not call libxl_domain_need_memory at all, and who
fail to set one of these memory values: now they are both are properly
set.  The shadow and iommu memory to be properly accounted for as
intended.

For callers which call libxl_domain_need_memory and request the
current API (4.13) or which track libxl, the default values are also
now right and everything works as intended.

For callers which call libxl_domain_need_memory, and request an old
pre-4.13 libxl API, and which leave one of these memkb settings unset,
we take special measures to preserve the old behaviour.

This means that they don't get the additional iommu memory and are at
risk of the domain running out of memory as a result of f89f555827a6
"remove late (on-demand) construction of IOMMU page tables".  But this
is no worse than the state just after f89f555827a6, which already
broke such callers in that way.  This is perhaps justifiable because
of the API stability warning next to libxl_domain_need_memory.

An alternative would be to drop the special-casing of these callers.
That would cause a discrepancy between libxl_domain_need_memory and
libxl_domain_create: the former would not include the iommu memory and
the latter would.  That seems worse, but it's debateable.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: libxl_domain_need_memory: Make it take a domain_config
Ian Jackson [Thu, 3 Oct 2019 15:58:32 +0000 (16:58 +0100)]
libxl: libxl_domain_need_memory: Make it take a domain_config

This should calculate the extra memory needed for shadow and iommu,
the defaults for which depend on values in c_info.  So we need this to
have the complete domain config available.

And the defaults should actually be updated and stored.  So make it
non-const.

We provide the usual kind of compatibility function for callers
expecting 4.12 and earlier.  This function becomes responsible for the
clone-and-modify of the b_info.

No overall functional change for external libxl callers which use the
API version system to request a particular API version.

Other external libxl callers will need to update their calling code,
and will then find that the new version of this function fills in most
of the defaults in d_config.  Because libxl__domain_config_setdefault
doesn't quite do all of the defaults, that's only partial.  For
present purposes that doesn't matter because none of the missing
settings are used by the memory calculations.  It does mean we need to
document in the API spec that the defaulting is only partial.

This lack of functional change is despite the fact that
numa_place_domain now no longer calls
libxl__domain_build_info_setdefault (via libxl_domain_need_memory).
That is OK because it's idempotent and numa_place_domain's one call
site is libxl__build_pre which is called from libxl__domain_build
which is called from domcreate_bootloader_done, well after the
defaults are set by initiate_domain_create.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: libxl__domain_config_setdefault: New function
Ian Jackson [Thu, 3 Oct 2019 16:31:15 +0000 (17:31 +0100)]
libxl: libxl__domain_config_setdefault: New function

Break out this into a new function.  We are going to want to call it
from a new call site.

Unfortunately not all of the defaults can be moved into the new
function without changing the order in which things are done.  That
does not seem wise at this stage of the release.  The effect is that
additional calls to libxl__domain_config_setdefault (which are going
to be introduced) do not quite set everything.  But they will do what
is needed.  After Xen 4.13 is done, we should move those settings into
the right order.

No functional change.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxl: Pass libxl_domain_config to freemem(), instead of b_info
Ian Jackson [Thu, 3 Oct 2019 16:06:43 +0000 (17:06 +0100)]
xl: Pass libxl_domain_config to freemem(), instead of b_info

We are going to change the libxl API in a moment and this change will
make it simpler.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: Offer API versions 0x040700 and 0x040800
Ian Jackson [Fri, 4 Oct 2019 14:30:22 +0000 (15:30 +0100)]
libxl: Offer API versions 0x040700 and 0x040800

According to git log -G:

0x040700 was introduced in 304400459ef0 (aka 4.7.0-rc1~481)
  "tools/libxl: rename remus device to checkpoint device"

0x040800 was introduced in 57f8b13c7240 (aka 4.8.0-rc1~437)
  "libxl: memory size in kb requires 64 bit variable"

It is surprising that no-one noticed this.

Anyway, in the meantime, we should fix it.  Backporting this is
probably a good idea: it won't change the behaviour for existing
callers but it will avoid errors for some older correct uses.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoMAINTAINERS: Switch SVM maintainership to x86
Andrew Cooper [Fri, 23 Aug 2019 14:19:14 +0000 (15:19 +0100)]
MAINTAINERS: Switch SVM maintainership to x86

We are now down to 0 SVM maintainers who are active and wish to hold the
position.  In agreement with AMD, Jan and I will take over maintainership in
the short term.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/arm: domain_build: Don't expose IOMMU specific properties to hwdom
Oleksandr Tyshchenko [Wed, 16 Oct 2019 10:08:07 +0000 (13:08 +0300)]
xen/arm: domain_build: Don't expose IOMMU specific properties to hwdom

We always skip the IOMMU device when creating DT for hwdom if there is
an appropriate driver for it in Xen (device_get_class(iommu_node)
returns DEVICE_IOMMU). So, even if it is not used by Xen it will be skipped.

We should also skip the IOMMU specific properties of the master device
behind that IOMMU in order to avoid exposing an half complete IOMMU
bindings to hwdom.

According to the Linux's docs:
1. Documentation/devicetree/bindings/iommu/iommu.txt
2. Documentation/devicetree/bindings/pci/pci-iommu.txt

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen: Fix strange byte in common/Kconfig
Anthony PERARD [Wed, 23 Oct 2019 16:48:15 +0000 (17:48 +0100)]
xen: Fix strange byte in common/Kconfig

Current description of the file by `file`:
    common/Kconfig: Non-ISO extended-ASCII text

Change that byte to an ascii quote so the file can become properly
encoded, and all ASCII.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/tsc: update vcpu time info on guest TSC adjustments
Roger Pau Monné [Wed, 23 Oct 2019 08:57:39 +0000 (10:57 +0200)]
x86/tsc: update vcpu time info on guest TSC adjustments

If a HVM/PVH guest writes to MSR_IA32_TSC{_ADJUST} and thus changes
the value of the time stamp counter the vcpu time info must also be
updated, or the time calculated by the guest using the Xen PV clock
interface will be skewed.

Update the vcpu time info when the guest writes to either MSR_IA32_TSC
or MSR_IA32_TSC_ADJUST. This fixes lockups seen when running the
pv-shim on AMD hardware, since the shim will aggressively try to keep
TSCs in sync by periodically writing to MSR_IA32_TSC if the TSC is not
reliable.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/pvhsim: fix cpu onlining
Juergen Gross [Wed, 23 Oct 2019 15:53:52 +0000 (16:53 +0100)]
xen/pvhsim: fix cpu onlining

Since commit 8d3c326f6756d1 ("xen: let vcpu_create() select processor")
the initial processor for all pv-shim vcpus will be 0, as no other cpus
are online when the vcpus are created. Before that commit the vcpus
would have processors set not being online yet, which worked just by
chance.

When the pv-shim vcpu becomes active it will have a hard affinity
not matching its initial processor assignment leading to failing
ASSERT()s or other problems depending on the selected scheduler.

Fix that by doing the affinity setting after onlining the cpu but
before taking the vcpu up. For vcpu 0 this is still in
sched_setup_dom0_vcpus(), for the other vcpus setting the affinity
there can be dropped.

Fixes: 8d3c326f6756d1 ("xen: let vcpu_create() select processor")
Reported-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/vvmx: Fix the use of RDTSCP when it is intercepted at L0
Andrew Cooper [Wed, 2 Oct 2019 17:44:42 +0000 (18:44 +0100)]
x86/vvmx: Fix the use of RDTSCP when it is intercepted at L0

Linux has started using RDTSCP as of v5.1.  This has highlighted a bug in Xen,
where virtual vmexit simply gives up.

  (XEN) d1v1 Unhandled nested vmexit: reason 51
  (XEN) domain_crash called from vvmx.c:2671
  (XEN) Domain 1 (vcpu#1) crashed on cpu#2:

Handle RDTSCP in the virtual vmexit hander in the same was as RDTSC
intercepts.

Reported-by: Sarah Newman <srn@prgmr.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Chris Brannon <cmb@prgmr.com>
Reviewed-by: Wei Liu <wl@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/VT-d: Drop unhelpful information in diagnostics
Andrew Cooper [Fri, 11 Oct 2019 14:56:51 +0000 (15:56 +0100)]
x86/VT-d: Drop unhelpful information in diagnostics

The virtual address of the base of the IOMMU's regsters is not useful for
diagnostic purposes, and is quite voluminous.  The PCI coordinates is by far
the most useful piece of identifying information.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agodocs: Extend with details about runtime microcode loading
Andrew Cooper [Sat, 12 Oct 2019 18:05:09 +0000 (19:05 +0100)]
docs: Extend with details about runtime microcode loading

The xen-ucode utility is new with the late loading improvements in 4.13.
Update the documentation suitably.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/arm: domain_build: Indent correctly parameters of alloc_bank_memory()
Julien Grall [Sun, 29 Sep 2019 15:56:27 +0000 (16:56 +0100)]
xen/arm: domain_build: Indent correctly parameters of alloc_bank_memory()

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/arm: mm: Allow generic xen page-tables helpers to be called early
Julien Grall [Tue, 15 Oct 2019 19:16:10 +0000 (20:16 +0100)]
xen/arm: mm: Allow generic xen page-tables helpers to be called early

The current implementations of xen_{map, unmap}_table() expect
{map, unmap}_domain_page() to be usable. Those helpers are used to
map/unmap page tables while update Xen page-tables.

Since commit 022387ee1a "xen/arm: mm: Don't open-code Xen PT update in
{set, clear}_fixmap()", setup_fixmap() will make use of the helpers
mentioned above. When booting Xen using GRUB, setup_fixmap() may be used
before map_domain_page() can be called. This will result to data abort:

(XEN) Data Abort Trap. Syndrome=0x5
(XEN) CPU0: Unexpected Trap: Data Abort

[...]

(XEN) Xen call trace:
(XEN)    [<000000000025ab6c>] mm.c#xen_pt_update+0x2b4/0x59c (PC)
(XEN)    [<000000000025ab20>] mm.c#xen_pt_update+0x268/0x59c (LR)
(XEN)    [<000000000025ae70>] set_fixmap+0x1c/0x2c
(XEN)    [<00000000002a9c98>] copy_from_paddr+0x7c/0xdc
(XEN)    [<00000000002a4ae0>] has_xsm_magic+0x18/0x34
(XEN)    [<00000000002a5b5c>] bootfdt.c#early_scan_node+0x398/0x560
(XEN)    [<00000000002a5de0>] device_tree_for_each_node+0xbc/0x144
(XEN)    [<00000000002a5ed4>] boot_fdt_info+0x6c/0x260
(XEN)    [<00000000002ac0d0>] start_xen+0x108/0xc74
(XEN)    [<000000000020044c>] arm64/head.o#paging+0x60/0x88

During early boot, the page tables are either statically allocated in
Xen binary or allocated via alloc_boot_pages().

For statically allocated page-tables, they will already be mapped as
part of Xen binary. So we can easily find the virtual address.

For dynamically allocated page-tables, we need to rely
map_domain_page() to be functionally working.

For arm32, the call will be usable much before page can be dynamically
allocated (see setup_pagetables()). For arm64, the call will be usable
after setup_xenheap_mappings().

In both cases, memory are given to the boot allocator afterwards. So we
can rely on map_domain_page() for mapping page tables allocated
dynamically.

The helpers xen_{map, unmap}_table() are now updated to take into
account the case where page-tables are part of Xen binary.

Fixes: 022387ee1a ('xen/arm: mm: Don't open-code Xen PT update in {set, clear}_fixmap()')
Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: setup: Calculate correctly the size of Xen
Julien Grall [Wed, 16 Oct 2019 11:12:51 +0000 (12:12 +0100)]
xen/arm: setup: Calculate correctly the size of Xen

The current size of Xen is computed using _end - _start + 1. However,
_end is pointing one past the end of Xen, so the size of Xen is
off-by-one.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Don't use _end in is_xen_fixed_mfn()
Julien Grall [Wed, 16 Oct 2019 10:53:03 +0000 (11:53 +0100)]
xen/arm: Don't use _end in is_xen_fixed_mfn()

virt_to_maddr() is using the hardware page-table walk instructions to
translate a virtual address to physical address. The function should
only be called on virtual address mapped.

_end points past the end of Xen binary and may not be mapped when the
binary size is page-aligned. This means virt_to_maddr() will not be able
to do the translation and therefore crash Xen.

Note there is also an off-by-one issue in this code, but the panic will
trump that.

Both issues can be fixed by using _end - 1 in the check.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agogolang/xenlight: fix calls to libxl_domain_unpause/pause
Nick Rosbrook [Tue, 22 Oct 2019 14:06:59 +0000 (15:06 +0100)]
golang/xenlight: fix calls to libxl_domain_unpause/pause

These functions require a third argument of type const *libxl_asyncop_how.

Pass nil to fix compilation errors. This will have the effect of
performing these operations synchronously.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agodocs/sphinx: Introduction
Andrew Cooper [Fri, 19 Jul 2019 07:57:50 +0000 (08:57 +0100)]
docs/sphinx: Introduction

Put together an introduction page for the Sphinx/RST docs, along with a
glossary which will accumulate over time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Lars Kurth <lars.kurth@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoMAINTAINERS: drop Tim Deegan from 'The Rest'
Tim Deegan [Thu, 17 Oct 2019 06:18:16 +0000 (07:18 +0100)]
MAINTAINERS: drop Tim Deegan from 'The Rest'

I have not been active in this role for a while now.

Signed-off-by: Tim Deegan <tim@xen.org>
5 years agoxen/arm: mm: Clear boot pagetables before bringing-up each secondary CPU
Julien Grall [Thu, 13 Jun 2019 17:11:45 +0000 (18:11 +0100)]
xen/arm: mm: Clear boot pagetables before bringing-up each secondary CPU

At the moment, boot pagetables are only cleared once at boot. This means
when booting CPU2 (and onwards) then boot pagetables will not be
cleared.

To keep the interface exactly the same for all secondary CPU, the boot
pagetables are now cleared before bringing-up each secondary CPU.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/arm: domain_build: Print the correct domain in dtb_load()
Julien Grall [Tue, 13 Aug 2019 18:11:28 +0000 (19:11 +0100)]
xen/arm: domain_build: Print the correct domain in dtb_load()

dtb_load() can be called by other domain than dom0. To avoid confusion
in the log, print the correct domain.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agostubdom/vtpm: include stdio.h for declaration of printf
Olaf Hering [Wed, 2 Oct 2019 17:05:36 +0000 (19:05 +0200)]
stubdom/vtpm: include stdio.h for declaration of printf

The function read_vtpmblk uses printf(3), but stdio.h is not included
in this file. This results in a warning from gcc-7:

vtpmblk.c: In function 'read_vtpmblk':
vtpmblk.c:322:7: warning: implicit declaration of function 'printf' [-Wimplicit-function-declaration]
       printf("Expected: ");
vtpmblk.c:322:7: warning: incompatible implicit declaration of built-in function 'printf'
vtpmblk.c:322:7: note: include '<stdio.h>' or provide a declaration of 'printf'

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agodocs/sphinx: Indent cleanup
Andrew Cooper [Fri, 19 Jul 2019 07:57:50 +0000 (08:57 +0100)]
docs/sphinx: Indent cleanup

Sphinx, its linters, and RST modes in common editors, expect 3 spaces of
indentation.  Some bits already conform to this expectation.  Update the
rest to match.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Lars Kurth <lars.kurth@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/microcode: Drop trailing whitespace in printk()
Andrew Cooper [Tue, 8 Oct 2019 19:23:26 +0000 (20:23 +0100)]
x86/microcode: Drop trailing whitespace in printk()

This has actually been present since c/s bd7c09c0 in 2008, and survived
through all of the recent microcode refactoring.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoPrep for 4.13.0-rc1: Set version to -rc
Ian Jackson [Mon, 14 Oct 2019 10:31:31 +0000 (11:31 +0100)]
Prep for 4.13.0-rc1: Set version to -rc

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoPrep for 4.13.0-rc1: Pin QEMU_* and MINIOS to tags
Ian Jackson [Mon, 14 Oct 2019 10:30:52 +0000 (11:30 +0100)]
Prep for 4.13.0-rc1: Pin QEMU_* and MINIOS to tags

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/arm: domain_build: harden make_cpus_node()
Stefano Stabellini [Thu, 10 Oct 2019 00:42:11 +0000 (17:42 -0700)]
xen/arm: domain_build: harden make_cpus_node()

make_cpus_node() is using a static buffer to generate the FDT node name.
While mpdir_aff is a 64-bit integer, we only ever use the bits [23:0] as
only AFF{0, 1, 2} are supported for now.

To avoid any potential issues in the future, check that mpdir_aff has
only bits [23:0] set.

Take the opportunity to reduce the size of the buffer. Indeed, only 8
characters are needed to print a 32-bit hexadecimal number. So
sizeof("cpu@") + 8 + 1 (for '\0') = 13 characters is sufficient.

Fixes: c81a791d34 (xen/arm: Set 'reg' of cpu node for dom0 to match MPIDR's affinity)
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/mm: don't needlessly veto migration
Paul Durrant [Thu, 10 Oct 2019 15:45:15 +0000 (17:45 +0200)]
x86/mm: don't needlessly veto migration

Now that xl.cfg has an option to explicitly enable IOMMU mappings for a
domain, migration may be needlessly vetoed due to the check of
is_iommu_enabled() in paging_log_dirty_enable().
There is actually no need to prevent logdirty from being enabled unless
devices are assigned to a domain.

NOTE: While in the neighbourhood, the bool_t parameter type in
      paging_log_dirty_enable() is replaced with a bool and the format
      of the comment in assign_device() is fixed.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/efi: properly handle 0 in pixel reserved bitmask
Igor Druzhinin [Thu, 10 Oct 2019 14:50:50 +0000 (16:50 +0200)]
x86/efi: properly handle 0 in pixel reserved bitmask

In some graphics modes firmware is allowed to return 0 in pixel reserved
bitmask which doesn't go against UEFI Spec 2.8 (12.9 Graphics Output Protocol).

Without this change non-TrueColor modes won't work which will cause
GOP init to fail - observed while trying to boot EFI Xen with Cirrus VGA.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/docs: arm: Update dom0less binding and example
Julien Grall [Tue, 13 Aug 2019 21:11:15 +0000 (22:11 +0100)]
xen/docs: arm: Update dom0less binding and example

The binding for the dom0less module does not match Xen implementation.
Any module should contain the compatible "multiboot,module" to be
recognized.

This was clearly an oversight as other examples with Xen code base
provide the compatible "multiboot,module".

So fix the binding and the example associated to it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/hvm: Fix the use of "hap=0" following c/s c0902a9a143a
Andrew Cooper [Wed, 9 Oct 2019 18:21:14 +0000 (19:21 +0100)]
x86/hvm: Fix the use of "hap=0" following c/s c0902a9a143a

c/s c0902a9a143a refactored hvm_enable() a little, but dropped the logic which
cleared hap_supported in the case that the user had asked for it off.

This results in Xen booting up, claiming:

  (XEN) HVM: Hardware Assisted Paging (HAP) detected but disabled

but with HAP advertised via sysctl, and XEN_DOMCTL_CDF_hap being accepted in
domain_create().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agopci: clear {host/guest}_maskall field on assign
Roger Pau Monné [Thu, 10 Oct 2019 08:59:27 +0000 (10:59 +0200)]
pci: clear {host/guest}_maskall field on assign

The current implementation of host_maskall makes it sticky across
assign and deassign calls, which means that once a guest forces Xen to
set host_maskall the maskall bit is not going to be cleared until a
call to PHYSDEVOP_prepare_msix is performed. Such call however
shouldn't be part of the normal flow when doing PCI passthrough, and
hence the flag needs to be cleared when assigning in order to prevent
host_maskall being carried over from previous assignations.

Note that the entry maskbit is reset when the msix capability is
initialized, and the guest_maskall field is also cleared so that the
hardware value matches Xen's internal state (hardware maskall =
host_maskall | guest_maskall).

Also note that doing the reset of host_maskall there would allow the
guest to reset such field by enabling and disabling MSIX, which is not
intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Chao Gao <chao.gao@intel.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoefi/boot: make sure graphics mode is set while booting through MB2
Igor Druzhinin [Thu, 10 Oct 2019 08:58:45 +0000 (10:58 +0200)]
efi/boot: make sure graphics mode is set while booting through MB2

If a bootloader is using native driver instead of EFI GOP it might
reset graphics mode to be different from what has been originally set
by firmware. While booting through MB2 Xen either need to parse video
setting passed by MB2 and use them instead of what GOP reports or
reset the mode to synchronise it with firmware - prefer the latter.

Observed while booting Xen using MB2 with EFI GRUB2 compiled with
all possible video drivers where native drivers take priority over firmware.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoefi/boot: add missing pointer dereference in set_color
Igor Druzhinin [Thu, 10 Oct 2019 08:58:09 +0000 (10:58 +0200)]
efi/boot: add missing pointer dereference in set_color

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoAMD/IOMMU: pre-fill all DTEs right after table allocation
Jan Beulich [Thu, 10 Oct 2019 07:51:46 +0000 (09:51 +0200)]
AMD/IOMMU: pre-fill all DTEs right after table allocation

Make sure we don't leave any DTEs unexpected requests through which
would be passed through untranslated. Set V and IV right away (with
all other fields left as zero), relying on the V and/or IV bits
getting cleared only by amd_iommu_set_root_page_table() and
amd_iommu_set_intremap_table() under special pass-through circumstances.
Switch back to initial settings in amd_iommu_disable_domain_device().

Take the liberty and also make the latter function static, constifying
its first parameter at the same time, at this occasion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoAMD/IOMMU: allow callers to request allocate_buffer() to skip its memset()
Jan Beulich [Thu, 10 Oct 2019 07:51:12 +0000 (09:51 +0200)]
AMD/IOMMU: allow callers to request allocate_buffer() to skip its memset()

The command ring buffer doesn't need clearing up front in any event.
Subsequently we'll also want to avoid clearing the device tables.

While playing with functions signatures replace undue use of fixed width
types at the same time, and extend this to deallocate_buffer() as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoAMD/IOMMU: allocate one device table per PCI segment
Jan Beulich [Thu, 10 Oct 2019 07:50:00 +0000 (09:50 +0200)]
AMD/IOMMU: allocate one device table per PCI segment

Having a single device table for all segments can't possibly be right.
(Even worse, the symbol wasn't static despite being used in just one
source file.) Attach the device tables to their respective IVRS mapping
ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoiommu/arm: Remove arch_iommu_populate_page_table() completely
Oleksandr Tyshchenko [Mon, 30 Sep 2019 10:34:31 +0000 (13:34 +0300)]
iommu/arm: Remove arch_iommu_populate_page_table() completely

The Arm realization should have been removed in the following commit
as redundant:
f89f555 remove late (on-demand) construction of IOMMU page tables

So, remove unused function completely.

Fixes: f89f555 ('remove late (on-demand) construction of IOMMU page tables')
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/arm: fix duplicate memory node in DT
Stefano Stabellini [Tue, 8 Oct 2019 01:15:01 +0000 (18:15 -0700)]
xen/arm: fix duplicate memory node in DT

When reserved-memory regions are present in the host device tree, dom0
is started with multiple memory nodes. Each memory node should have a
unique name, but today they are all called "memory" leading to Linux
printing the following warning at boot:

  OF: Duplicate name in base, renamed to "memory#1"

This patch fixes the problem by appending a "@<unit-address>" to the
name, as per the Device Tree specification, where <unit-address> matches
the base of address of the first region.

Fixes: 248faa637d2 (xen/arm: add reserved-memory regions to the dom0 memory node)
Reported-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/arm: make_memory_node return error on nr_banks == 0
Stefano Stabellini [Tue, 8 Oct 2019 01:15:00 +0000 (18:15 -0700)]
xen/arm: make_memory_node return error on nr_banks == 0

Call make_memory_node for reserved_memory only if we actually have any
reserved_memory regions to handle.

Add a check in make_memory_node to return an error if it has been called
with no memory banks as argument.

Fixes: 248faa637d2 (xen/arm: add reserved-memory regions to the dom0 memory node)
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agodocs: update all URLs in man pages
Lars Kurth [Thu, 3 Oct 2019 15:47:05 +0000 (08:47 -0700)]
docs: update all URLs in man pages

Specifically
* xen.org to xenproject.org
* http to https
* Replaced pages where page has moved
  (including on xen pages as well as external pages)
* Removed some URLs (e.g. downloads for Linux PV drivers)

Tested-by: Lars Kurth <lars.kurth@citrix.com>
Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/sched: let credit scheduler control its timer all alone
Juergen Gross [Mon, 7 Oct 2019 06:35:19 +0000 (08:35 +0200)]
xen/sched: let credit scheduler control its timer all alone

The credit scheduler is the only scheduler with tick_suspend and
tick_resume callbacks. Today those callbacks are invoked without being
guarded by the scheduler lock which is critical when at the same the
cpu those callbacks are active is being moved to or from a cpupool.

Crashes like the following are possible due to that race:

(XEN) ----[ Xen-4.13.0-8.0.12-d  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    79
(XEN) RIP:    e008:[<ffff82d0802467dc>] set_timer+0x39/0x1f7
(XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
<snip>
(XEN) Xen call trace:
(XEN)    [<ffff82d0802467dc>] set_timer+0x39/0x1f7
(XEN)    [<ffff82d08022c1f4>]
sched_credit.c#csched_tick_resume+0x54/0x59
(XEN)    [<ffff82d080241dfe>] sched_tick_resume+0x67/0x86
(XEN)    [<ffff82d0802eda52>] mwait-idle.c#mwait_idle+0x32b/0x357
(XEN)    [<ffff82d08027939e>] domain.c#idle_loop+0xa6/0xc2
(XEN)
(XEN) Pagetable walk from 0000000000000048:
(XEN)  L4[0x000] = 00000082cfb9c063 ffffffffffffffff
(XEN)  L3[0x000] = 00000082cfb9b063 ffffffffffffffff
(XEN)  L2[0x000] = 00000082cfb9a063 ffffffffffffffff
(XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 79:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: 0000000000000048
(XEN) ****************************************

The callbacks are used when the cpu is going to or coming from idle in
order to allow higher C-states.

The credit scheduler knows when it is going to schedule an idle
scheduling unit or another one after idle, so it can easily stop or
resume the timer itself removing the need to do so via the callback.
As this timer handling is done in the main scheduling function the
scheduler lock is still held, so the race with cpupool operations can
no longer occur. Note that calling the callbacks from schedule_cpu_rm()
and schedule_cpu_add() is no longer needed, as the transitions to and
from idle in the cpupool with credit active will automatically occur
and do the right thing.

With the last user of the callbacks gone those can be removed.

Suggested-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/xsm: flask: Check xmalloc_array() return in security_sid_to_context()
Julien Grall [Fri, 4 Oct 2019 16:53:26 +0000 (17:53 +0100)]
xen/xsm: flask: Check xmalloc_array() return in security_sid_to_context()

xmalloc_array() may return NULL if there are memory. Rather than trying
to deference it directly, we should check the return value first.

Coverity-ID: 1381852
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/xsm: flask: Prevent NULL deference in flask_assign_{, dt}device()
Julien Grall [Fri, 4 Oct 2019 16:32:49 +0000 (17:32 +0100)]
xen/xsm: flask: Prevent NULL deference in flask_assign_{, dt}device()

flask_assign_{, dt}device() may be used to check whether you can test if
a device is assigned. In this case, the domain will be NULL.

However, flask_iommu_resource_use_perm() will be called and may end up
to deference a NULL pointer. This can be prevented by moving the call
after we check the validity for the domain pointer.

Coverity-ID: 1486741
Fixes: 71e617a6b8 ('use is_iommu_enabled() where appropriate...')
Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/Kconfig: Invert the defaults for CONFIG_{PVH_GUEST,PV_SHIM}
Andrew Cooper [Tue, 1 Oct 2019 16:27:49 +0000 (17:27 +0100)]
x86/Kconfig: Invert the defaults for CONFIG_{PVH_GUEST,PV_SHIM}

This is a minor UI change, but users which have elected to enable
XEN_GUEST (which still defaults to no) will definitely need one of these
options, and will typically want both.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_ARRAY
Andrew Cooper [Thu, 31 Jan 2019 18:01:16 +0000 (18:01 +0000)]
xen/nospec: Introduce CONFIG_SPECULATIVE_HARDEN_ARRAY

There are legitimate circumstance where array hardening is not wanted or
needed.  Allow it to be turned off.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86/spec-ctrl: Annotate remaining model names
Andrew Cooper [Thu, 3 Oct 2019 14:04:03 +0000 (15:04 +0100)]
x86/spec-ctrl: Annotate remaining model names

The names in retpoline_safe() are copied from should_use_eager_fpu().  The
names in mds_calculations() come partly from Linux's intel-family.h, and
partly from conversations with Intel.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agoxen/arm: add dom0-less device assignment info to docs
Stefano Stabellini [Thu, 3 Oct 2019 17:35:41 +0000 (10:35 -0700)]
xen/arm: add dom0-less device assignment info to docs

Add info about the SPI used for the virtual pl011.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: introduce nr_spis
Stefano Stabellini [Thu, 3 Oct 2019 17:34:15 +0000 (10:34 -0700)]
xen/arm: introduce nr_spis

We don't have a clear way to know how many virtual SPIs we need for the
dom0-less domains. Introduce a new option under xen,domain to specify
the number of SPIs to allocate for a domain.

The property is optional. When absent, we'll use the physical number of
GIC lines for dom0-less domains, or GUEST_VPL011_SPI+1 if vpl011 is
requested, whichever is greater.

Remove the old setting of nr_spis based on the presence of the vpl011.

The implication of this change is that without nr_spis dom0less domains
get the same amount of SPI allocated as dom0, regardless of how many
physical devices they have assigned, and regardless of whether they have
a virtual pl011 (which also needs an emulated SPI). This is done because
the SPIs allocation needs to be done before parsing any passthrough
information, so we have to account for any potential physical SPI
assigned to the domain.

When nr_spis is present, the domain gets exactly nr_spis allocated SPIs.
If the number is too low, it might not be enough for the devices
assigned it to it. If the number is less than GUEST_VPL011_SPI, the
virtual pl011 won't work.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: handle "multiboot,device-tree" compatible nodes
Stefano Stabellini [Thu, 3 Oct 2019 17:34:15 +0000 (10:34 -0700)]
xen/arm: handle "multiboot,device-tree" compatible nodes

Detect "multiboot,device-tree" compatible nodes. Add them to the bootmod
array as BOOTMOD_GUEST_DTB.  In kernel_probe, find the right
BOOTMOD_GUEST_DTB and store a pointer to it in dtb_bootmodule.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: assign devices to boot domains
Stefano Stabellini [Thu, 3 Oct 2019 17:34:05 +0000 (10:34 -0700)]
xen/arm: assign devices to boot domains

Scan the user provided dtb fragment at boot. For each device node, map
memory to guests, and route interrupts and setup the iommu.

The memory region to remap is specified by the "xen,reg" property.

The iommu is setup by passing the node of the device to assign on the
host device tree. The path is specified in the device tree fragment as
the "xen,path" string property.

The interrupts are remapped based on the information from the
corresponding node on the host device tree. Call
handle_device_interrupts to remap interrupts. Interrupts related device
tree properties are copied from the device tree fragment, same as all
the other properties.

Require both xen,reg and xen,path to be present, unless
xen,force-assign-without-iommu is also set. In that case, tolerate a
missing xen,path, also tolerate iommu setup failure for the passthrough
device.

Also set add the new flag XEN_DOMCTL_CDF_iommu so that dom0less domU
can use the IOMMU if a partial dtb is specified.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: copy dtb fragment to guest dtb
Stefano Stabellini [Mon, 30 Sep 2019 23:13:37 +0000 (16:13 -0700)]
xen/arm: copy dtb fragment to guest dtb

Read the dtb fragment corresponding to a passthrough device from memory
at the location referred to by the "multiboot,device-tree" compatible
node.

Add a new field named dtb_bootmodule to struct kernel_info to keep track
of the dtb fragment location.

Copy the fragment to the guest dtb (only /aliases and /passthrough).

Set kinfo->phandle_gic based on the phandle of the special "/gic"
node in the device tree fragment. "/gic" is a dummy node in the dtb
fragment that represents the gic interrupt controller. Other properties
in the dtb fragment might refer to it (for instance interrupt-parent of
a device node). We reuse the phandle of "/gic" from the dtb fragment as
the phandle of the full GIC node that will be created for the guest
device tree. That way, when we copy properties from the device tree
fragment to the domU device tree the links remain unbroken.

scan_passthrough_prop is introduced here and not used in this patch but
it will be used by later patches.

Some of the code below is taken from tools/libxl/libxl_arm.c. Note that
it is OK to take LGPL 2.1 code and including it into a GPLv2 code base.
The result is GPLv2 code.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
----
Changes in v6:
- code style
- in-code comment
- commit message improvements

Changes in v5:
- code style
- in-code comment
- remove depth parameter from scan_pfdt_node
- for instead of loop in domain_handle_dtb_bootmodule
- move "gic" check to domain_handle_dtb_bootmodule
- add check_partial_fdt
- use DT_ROOT_NODE_ADDR/SIZE_CELLS_DEFAULT
- add scan_passthrough_prop parameter, set it to false for "/aliases"

Changes in v4:
- use recursion in the implementation
- rename handle_properties to handle_prop_pfdt
- rename scan_pt_node to scan_pfdt_node
- pass kinfo to handle_properties
- use uint32_t instead of u32
- rename r to res
- add "passthrough" and "aliases" check
- add a name == NULL check
- code style
- move DTB fragment scanning earlier, before DomU GIC node creation
- set guest_phandle_gic based on "/gic"
- in-code comment

Changes in v3:
- switch to using device_tree_for_each_node for the copy

Changes in v2:
- add a note about the code coming from libxl in the commit message
- copy /aliases
- code style

5 years agoxen/arm: introduce kinfo->phandle_gic
Stefano Stabellini [Mon, 30 Sep 2019 23:13:37 +0000 (16:13 -0700)]
xen/arm: introduce kinfo->phandle_gic

Instead of always hard-coding the GIC phandle (GUEST_PHANDLE_GIC), store
it in a variable under kinfo. This way it can be dynamically chosen per
domain. Remove the fdt pointer argument to the make_*_domU_node
functions and oass a struct kernel_info * instead. The fdt pointer can
be accessed from kinfo->fdt. Remove the struct domain *d parameter to
the make_*_domU_node functions because it becomes unused.

Initialize phandle_gic to GUEST_PHANDLE_GIC at the beginning of
prepare_dtb_domU for DomUs. Later patches will change the value of
phandle_gic depending on user provided information.

For Dom0, initialize phandle_gic to dt_interrupt_controller->phandle
(current value) at the beginning of prepare_dtb.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: export device_tree_get_reg and device_tree_get_u32
Stefano Stabellini [Mon, 30 Sep 2019 23:13:37 +0000 (16:13 -0700)]
xen/arm: export device_tree_get_reg and device_tree_get_u32

They'll be used in later patches.

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: introduce handle_device_interrupts
Stefano Stabellini [Mon, 30 Sep 2019 23:13:37 +0000 (16:13 -0700)]
xen/arm: introduce handle_device_interrupts

Move the interrupt handling code out of handle_device to a new function
so that it can be reused for dom0less VMs (it will be used in later
patches).

Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Acked-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: boot with device trees with "mmu-masters" and "iommus"
Stefano Stabellini [Mon, 30 Sep 2019 20:56:18 +0000 (13:56 -0700)]
xen/arm: boot with device trees with "mmu-masters" and "iommus"

Some Device Trees may expose both legacy SMMU and generic IOMMU bindings
together. However, the SMMU driver in Xen is only supporting the legacy
SMMU bindings, leading to fatal initialization errors at boot time.

This patch fixes the booting problem by adding a check to
iommu_add_dt_device: if the Xen driver doesn't support the new generic
bindings, and the device is behind an IOMMU, do not return error. The
following iommu_assign_dt_device should succeed.

This check will become superfluous, hence removable, once the Xen SMMU
driver gets support for the generic IOMMU bindings.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: don't try to manipulate json config for stubdomain
Marek Marczykowski-Górecki [Sat, 28 Sep 2019 14:20:37 +0000 (15:20 +0100)]
libxl: don't try to manipulate json config for stubdomain

Stubdomain do not have it's own config file - its configuration is
derived from target domains. Do not try to manipulate it when attaching
PCI device.

This bug prevented starting HVM with stubdomain and PCI passthrough
device attached.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: attach PCI device to qemu only after setting pciback/pcifront
Marek Marczykowski-Górecki [Tue, 1 Oct 2019 04:24:19 +0000 (05:24 +0100)]
libxl: attach PCI device to qemu only after setting pciback/pcifront

When qemu is running in stubdomain, handling "pci-ins" command will fail
if pcifront is not initialized already. Fix this by sending such command
only after confirming that pciback/front is running.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: do not attach xen-pciback to HVM domain, if stubdomain is in use
Marek Marczykowski-Górecki [Sat, 28 Sep 2019 14:20:35 +0000 (15:20 +0100)]
libxl: do not attach xen-pciback to HVM domain, if stubdomain is in use

HVM domains use IOMMU and device model assistance for communicating with
PCI devices, xen-pcifront/pciback isn't directly needed by HVM domain.
But pciback serve also second function - it reset the device when it is
deassigned from the guest and for this reason pciback needs to be used
with HVM domain too.
When HVM domain has device model in stubdomain, attaching xen-pciback to
the target domain itself may prevent attaching xen-pciback to the
(PV) stubdomain, effectively breaking PCI passthrough.

Fix this by attaching pciback only to one domain: if PV stubdomain is in
use, let it be stubdomain (the commit prevents attaching device to target
HVM in this case); otherwise, attach it to the target domain.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl: fix cold plugged PCI device with stubdomain
Marek Marczykowski-Górecki [Sat, 28 Sep 2019 14:20:34 +0000 (15:20 +0100)]
libxl: fix cold plugged PCI device with stubdomain

When libxl__device_pci_add() is called, stubdomain is already running,
even when still constructing the target domain. Previously, do_pci_add()
was called with 'starting' hardcoded to false, but now do_pci_add() shares
'starting' flag in pci_add_state for both target domain and stubdomain.

Fix this by resetting (local) 'starting' to false in pci_add_dm_done()
(previously part of do_pci_add()) when handling stubdomain, regardless
of pas->starting value.

Fixes: 11db56f9a6 (libxl_pci: Use libxl__ao_device with libxl__device_pci_add)
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agox86emul: adjust MOVSXD source operand handling
Jan Beulich [Fri, 4 Oct 2019 15:57:03 +0000 (17:57 +0200)]
x86emul: adjust MOVSXD source operand handling

XED commit 1b2fd94425 ("Update MOVSXD to modern behavior") points out
that as of SDM rev 064 MOVSXD is specified to read only 16 bits from
memory (or register) when used without REX.W and with operand size
override. Since the upper 16 bits of the value read won't be used
anyway in this case, make the emulation uniformly follow this more
compatible behavior when not emulating an AMD-like CPU, at the risk
of missing an exception when emulating on/for older hardware (the
boundary at SandyBridge noted in said commit looks questionable - I've
observed the "new" behavior also on Westmere, and a discussion there
lead to Mark finding that even Merom has this behavior already).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl_pci: Fix guest shutdown with PCI PT attached
Anthony PERARD [Mon, 30 Sep 2019 16:39:40 +0000 (17:39 +0100)]
libxl_pci: Fix guest shutdown with PCI PT attached

Before the problematic commit, libxl used to ignore error when
destroying (force == true) a passthrough device. If the DM failed to
detach the pci device within the allowed time, the timed out error
raised skip part of pci_remove_*, but also raise the error up to the
caller of libxl__device_pci_destroy_all, libxl__destroy_domid, and
thus the destruction of the domain fails.

When a *pci_destroy* function is called (so we have force=true), error
should mostly be ignored. If the DM didn't confirmed that the device
is removed, we will print a warning and keep going if force=true.
The patch reorder the functions so that pci_remove_timeout() calls
pci_remove_detatched() like it's done when DM calls are successful.

We also clean the QMP states and associated timeouts earlier, as soon
as they are not needed anymore.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Fixes: fae4880c45fe015e567afa223f78bf17a6d98e1b
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agolibxl_pci: Don't ignore PCI PT error at guest creation
Anthony PERARD [Mon, 30 Sep 2019 15:35:52 +0000 (16:35 +0100)]
libxl_pci: Don't ignore PCI PT error at guest creation

Fixes: 11db56f9a6291
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
5 years agodocs: add "sched-gran" boot parameter documentation
Juergen Gross [Wed, 2 Oct 2019 07:27:45 +0000 (09:27 +0200)]
docs: add "sched-gran" boot parameter documentation

Add documentation for the new "sched-gran" hypervisor boot parameter.

Signed-off-by: Juergen Gross <jgross@suse.com>
5 years agoxen/sched: add scheduling granularity enum
Juergen Gross [Wed, 2 Oct 2019 07:27:44 +0000 (09:27 +0200)]
xen/sched: add scheduling granularity enum

Add a scheduling granularity enum ("cpu", "core", "socket") for
specification of the scheduling granularity. Initially it is set to
"cpu", this can be modified by the new boot parameter (x86 only)
"sched-gran".

According to the selected granularity sched_granularity is set after
all cpus are online.

A test is added for all sched resources holding the same number of
cpus. Fall back to core- or cpu-scheduling in that case.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: disable scheduling when entering ACPI deep sleep states
Juergen Gross [Wed, 2 Oct 2019 07:27:43 +0000 (09:27 +0200)]
xen/sched: disable scheduling when entering ACPI deep sleep states

When entering deep sleep states all domains are paused resulting in
all cpus only running idle vcpus. This enables us to stop scheduling
completely in order to avoid synchronization problems with core
scheduling when individual cpus are offlined.

Disabling the scheduler is done by replacing the softirq handler
with a dummy scheduling routine only enabling tasklets to run.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: support core scheduling for moving cpus to/from cpupools
Juergen Gross [Wed, 2 Oct 2019 07:27:42 +0000 (09:27 +0200)]
xen/sched: support core scheduling for moving cpus to/from cpupools

With core scheduling active it is necessary to move multiple cpus at
the same time to or from a cpupool in order to avoid split scheduling
resources in between.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: support differing granularity in schedule_cpu_[add/rm]()
Juergen Gross [Wed, 2 Oct 2019 07:27:41 +0000 (09:27 +0200)]
xen/sched: support differing granularity in schedule_cpu_[add/rm]()

With core scheduling active schedule_cpu_[add/rm]() has to cope with
different scheduling granularity: a cpu not in any cpupool is subject
to granularity 1 (cpu scheduling), while a cpu in a cpupool might be
in a scheduling resource with more than one cpu.

Handle that by having arrays of old/new pdata and vdata and loop over
those where appropriate.

Additionally the scheduling resource(s) must either be merged or
split.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: support multiple cpus per scheduling resource
Juergen Gross [Wed, 2 Oct 2019 07:27:40 +0000 (09:27 +0200)]
xen/sched: support multiple cpus per scheduling resource

Prepare supporting multiple cpus per scheduling resource by allocating
the cpumask per resource dynamically.

Modify sched_res_mask to have only one bit per scheduling resource set.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: protect scheduling resource via rcu
Juergen Gross [Wed, 2 Oct 2019 07:27:39 +0000 (09:27 +0200)]
xen/sched: protect scheduling resource via rcu

In order to be able to move cpus to cpupools with core scheduling
active it is mandatory to merge multiple cpus into one scheduling
resource or to split a scheduling resource with multiple cpus in it
into multiple scheduling resources. This in turn requires to modify
the cpu <-> scheduling resource relation. In order to be able to free
unused resources protect struct sched_resource via RCU. This ensures
there are no users left when freeing such a resource.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: split schedule_cpu_switch()
Juergen Gross [Wed, 2 Oct 2019 07:27:38 +0000 (09:27 +0200)]
xen/sched: split schedule_cpu_switch()

Instead of letting schedule_cpu_switch() handle moving cpus from and
to cpupools, split it into schedule_cpu_add() and schedule_cpu_rm().

This will allow us to drop allocating/freeing scheduler data for free
cpus as the idle scheduler doesn't need such data.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: prepare per-cpupool scheduling granularity
Juergen Gross [Wed, 2 Oct 2019 07:27:37 +0000 (09:27 +0200)]
xen/sched: prepare per-cpupool scheduling granularity

On- and offlining cpus with core scheduling is rather complicated as
the cpus are taken on- or offline one by one, but scheduling wants them
rather to be handled per core.

As the future plan is to be able to select scheduling granularity per
cpupool prepare that by storing the granularity in struct
sched_resource (we need it there for free cpus which are not
associated to any cpupool). Free cpus will always use granularity 1.

Store the selected granularity option (cpu, core or socket) in the
cpupool , as we will need it to select the appropriate cpu mask when
populating the cpupool with cpus.

This will make on- and offlining of cpus much easier and avoids
writing code which would needed to be thrown away later.

Move the granularity related variables to cpupool.c as they are now
used form there only.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: reject switching smt on/off with core scheduling active
Juergen Gross [Wed, 2 Oct 2019 07:27:36 +0000 (09:27 +0200)]
xen/sched: reject switching smt on/off with core scheduling active

When core or socket scheduling are active enabling or disabling smt is
not possible as that would require a major host reconfiguration.

Add a bool sched_disable_smt_switching which will be set for core or
socket scheduling.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: move per-cpu variable cpupool to struct sched_resource
Juergen Gross [Wed, 2 Oct 2019 07:27:35 +0000 (09:27 +0200)]
xen/sched: move per-cpu variable cpupool to struct sched_resource

Having a pointer to struct cpupool in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: move per-cpu variable scheduler to struct sched_resource
Juergen Gross [Wed, 2 Oct 2019 07:27:34 +0000 (09:27 +0200)]
xen/sched: move per-cpu variable scheduler to struct sched_resource

Having a pointer to struct scheduler in struct sched_resource instead
of per cpu is enough.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>