]> xenbits.xensource.com Git - people/dariof/xen.git/log
people/dariof/xen.git
5 years agox86/dom0: Fix build with clang
Andrew Cooper [Thu, 5 Mar 2020 17:57:37 +0000 (17:57 +0000)]
x86/dom0: Fix build with clang

find_memory() isn't marked as __init, so if it isn't fully inlined, it ends up
tripping:

  Error: size of dom0_build.o:.text is 0x0c1

Fixes: 73b47eea21 "x86/dom0: improve PVH initrd and metadata placement"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoxen/grant-table: Remove 'led' variable in map_grant_ref
Julien Grall [Tue, 25 Feb 2020 18:36:33 +0000 (18:36 +0000)]
xen/grant-table: Remove 'led' variable in map_grant_ref

The name of the variable 'led' is confusing and only used in one place a
line after. So remove it.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/grant-table: Remove outdated warning in gnttab_grow_table()
Julien Grall [Tue, 25 Feb 2020 12:32:49 +0000 (12:32 +0000)]
xen/grant-table: Remove outdated warning in gnttab_grow_table()

One of the warning message in gnttab_grow_table() refers to a function
was removed in commit 6425f91c72 "xen/gnttab: Fold grant_table_{create,
set_limits}() into grant_table_init()".

Since the commit, gt->active will be allocated while initializing the
grant table at domain creation. Therefore gt-active will always be
valid.

Rather than replacing the warning by another one, drop the check
completely as we will likely not come back to a semi-initialized world.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/x86: hap: Clean-up and harden hap_enable()
Julien Grall [Mon, 3 Feb 2020 23:57:05 +0000 (23:57 +0000)]
xen/x86: hap: Clean-up and harden hap_enable()

Unlike shadow_enable(), hap_enable() can only be called once during
domain creation and with the mode equal to
PG_external | PG_translate | PG_refcounts.

If it were called twice, then we might have some interesting problems
as the p2m tables would be re-allocated (and therefore all the mappings
would be lost).

Add code to sanity check the mode and that the function is only called
once. Take the opportunity to an if checking that PG_translate is set.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/x86: hap: Fix coding style in hap_enable()
Julien Grall [Mon, 3 Feb 2020 23:57:53 +0000 (23:57 +0000)]
xen/x86: hap: Fix coding style in hap_enable()

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agoiommu: fix check for autotranslated hardware domain
Roger Pau Monné [Thu, 5 Mar 2020 09:43:46 +0000 (10:43 +0100)]
iommu: fix check for autotranslated hardware domain

The current position of the check_hwdom_reqs is wrong, as there's a
is_iommu_enabled at the top of the function that will prevent getting
to the check on systems without an IOMMU, because the hardware domain
won't have the XEN_DOMCTL_CDF_iommu flag set.

Move the position of the check so it's done before the
is_iommu_enabled one, and thus attempts to create a translated
hardware domain without an IOMMU can be detected.

Fixes: f89f555827a ('remove late (on-demand) construction of IOMMU page tables')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/dom0: improve PVH initrd and metadata placement
Roger Pau Monné [Thu, 5 Mar 2020 09:43:15 +0000 (10:43 +0100)]
x86/dom0: improve PVH initrd and metadata placement

Don't assume there's going to be enough space at the tail of the
loaded kernel and instead try to find a suitable memory area where the
initrd and metadata can be loaded.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mm: switch to new APIs in arch_init_memory
Wei Liu [Thu, 5 Mar 2020 09:42:18 +0000 (10:42 +0100)]
x86/mm: switch to new APIs in arch_init_memory

The function will map and unmap pages on demand.

Since we now map and unmap Xen PTE pages, we would like to track the
lifetime of mappings so that 1) we do not dereference memory through a
variable after it is unmapped, 2) we do not unmap more than once.
Therefore, we introduce the UNMAP_DOMAIN_PAGE macro to nullify the
variable after unmapping, and ignore NULL.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoallow only sizeof(bool) variables for boolean_param()
Juergen Gross [Thu, 5 Mar 2020 09:40:40 +0000 (10:40 +0100)]
allow only sizeof(bool) variables for boolean_param()

Support of other variable sizes than that of normal bool ones for
boolean_param() don't make sense, so catch any other sized variables
at build time.

Fix the one parameter using a plain int instead of bool.

Signed-off-by: Juergen Gross <jgross@suse.com>
[add __read_mostly]
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agolibxl: wait for console path before firing console_available
Paweł Marczewski [Tue, 3 Mar 2020 13:28:20 +0000 (14:28 +0100)]
libxl: wait for console path before firing console_available

If the path doesn't become available after LIBXL_INIT_TIMEOUT
seconds, fail the domain creation.

If we skip the bootloader, the TTY path will be set by xenconsoled.
However, there is no guarantee that this will happen by the time we
want to call the console_available callback, so we have to wait.

Signed-off-by: Paweł Marczewski <pawel@invisiblethingslab.com>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agoautomation: document vsyscall=emulate for old glibc
Wei Liu [Tue, 25 Feb 2020 12:10:48 +0000 (12:10 +0000)]
automation: document vsyscall=emulate for old glibc

Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agoxen/arm: Workaround clang/armclang support for register allocation
Julien Grall [Mon, 17 Feb 2020 22:20:34 +0000 (22:20 +0000)]
xen/arm: Workaround clang/armclang support for register allocation

Clang 8.0 (see [1]) and by extent some of the version of armclang does
not support register allocation using the syntax rN.

Thankfully, both GCC [2] and clang are able to support the xN syntax for
Arm64. Introduce a new macro ASM_REG() and use in common code for
register allocation.

[1] https://reviews.llvm.org/rL328829
[2] https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html

Cc: Andrii Anisov <andrii_anisov@epam.com>
Signed-off-by: Julien Grall <julien@xen.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoMAINTAINERS: remove myself from REST and Public interfaces
Konrad Rzeszutek Wilk [Tue, 3 Mar 2020 15:04:03 +0000 (16:04 +0100)]
MAINTAINERS: remove myself from REST and Public interfaces

.due to -ENOTIME. Been busy with management and have had
not much chance to do anything besides that.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
5 years agoMAINTAINERS: update my email address (again)
Paul Durrant [Tue, 3 Mar 2020 15:03:35 +0000 (16:03 +0100)]
MAINTAINERS: update my email address (again)

It is now more convenient for me to use my @amzn.com address rather
than @amazon.com.

Signed-off-by: Paul Durrant <pdurrant@amzn.com>
5 years agoMAINTAINERS: Paul to co-maintain vendor-independent IOMMU code
Jan Beulich [Tue, 3 Mar 2020 15:03:13 +0000 (16:03 +0100)]
MAINTAINERS: Paul to co-maintain vendor-independent IOMMU code

Having just a single maintainer is not helpful anywhere, and can be
avoided here quite easily, seeing that Paul has been doing quite a bit
of IOMMU work lately.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agosched: fix error path in cpupool_unassign_cpu_start()
Juergen Gross [Tue, 3 Mar 2020 15:02:32 +0000 (16:02 +0100)]
sched: fix error path in cpupool_unassign_cpu_start()

In case moving away all domains from the cpu to be removed is failing
in cpupool_unassign_cpu_start() the error path is missing to release
sched_res_rculock.

The normal exit path is releasing domlist_read_lock instead (this is
currently no problem as the reference to the specific rcu lock is not
used by rcu_read_unlock()).

While at it indent the present error label by one space.

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agocredit2: avoid NULL deref in csched2_res_pick() when tracing
Jan Beulich [Tue, 3 Mar 2020 15:01:30 +0000 (16:01 +0100)]
credit2: avoid NULL deref in csched2_res_pick() when tracing

The issue here results from one of the downsides of using goto: The
early "goto out" and "goto out_up" in the function very clearly bypass
any possible initialization of min_rqd, yet the tracing code at the end
of the function consumes the value. There's even a comment regarding the
trace record not being accurate in this case.

CID: 1460432
Fixes: 9c84bc004653 ("sched: rework credit2 run-queue allocation")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen: do live patching only from main idle loop
Juergen Gross [Tue, 11 Feb 2020 09:31:22 +0000 (10:31 +0100)]
xen: do live patching only from main idle loop

One of the main design goals of core scheduling is to avoid actions
which are not directly related to the domain currently running on a
given cpu or core. Live patching is one of those actions which are
allowed taking place on a cpu only when the idle scheduling unit is
active on that cpu.

Unfortunately live patching tries to force the cpus into the idle loop
just by raising the schedule softirq, which will no longer be
guaranteed to work with core scheduling active. Additionally there are
still some places in the hypervisor calling check_for_livepatch_work()
without being in the idle loop.

It is easy to force a cpu into the main idle loop by scheduling a
tasklet on it. So switch live patching to use tasklets for switching to
idle and raising scheduling events. Additionally the calls of
check_for_livepatch_work() outside the main idle loop can be dropped.

As tasklets are only running on idle vcpus and stop_machine_run()
is activating tasklets on all cpus but the one it has been called on
to rendezvous, it is mandatory for stop_machine_run() to be called on
an idle vcpu, too, as otherwise there is no way for scheduling to
activate the idle vcpu for the tasklet on the sibling of the cpu
stop_machine_run() has been called on.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
5 years agox86/mce: fix logic and comments around MSR_PPIN_CTL
Tony Luck [Mon, 2 Mar 2020 14:40:50 +0000 (15:40 +0100)]
x86/mce: fix logic and comments around MSR_PPIN_CTL

There are two implemented bits in the PPIN_CTL MSR:

Bit0: LockOut (R/WO)
      Set 1 to prevent further writes to MSR_PPIN_CTL.

Bit 1: Enable_PPIN (R/W)
       If 1, enables MSR_PPIN to be accessible using RDMSR.
       If 0, an attempt to read MSR_PPIN will cause #GP.

So there are four defined values:
0: PPIN is disabled, PPIN_CTL may be updated
1: PPIN is disabled. PPIN_CTL is locked against updates
2: PPIN is enabled. PPIN_CTL may be updated
3: PPIN is enabled. PPIN_CTL is locked against updates

Code would only enable the X86_FEATURE_INTEL_PPIN feature for case "2".
When it should have done so for both case "2" and case "3".

Fix the final test to just check for the enable bit.
Also fix some of the other comments in this function.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[Linux commit ???]

One of the adjusted comments doesn't exist in our code, and I disagree
with the adjustment to the other one and its associate code change: I
don't think there's a point trying to enable PPIN if the locked bit is
set. Hence it's just the main code change that gets pulled in, plus it
gets cloned to the AMD side.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mce: add Xeon Icelake to list of CPUs that support PPIN
Tony Luck [Mon, 2 Mar 2020 14:40:09 +0000 (15:40 +0100)]
x86/mce: add Xeon Icelake to list of CPUs that support PPIN

New CPU model, same MSRs to control and read the inventory number.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[Linux commit dc6b025de95bcd22ff37c4fabb022ec8a027abf1]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/guest: prepare hypervisor ops to use alternative calls
Roger Pau Monné [Mon, 2 Mar 2020 14:37:35 +0000 (15:37 +0100)]
xen/guest: prepare hypervisor ops to use alternative calls

Adapt the hypervisor ops framework so it can be used with the
alternative calls framework. So far no hooks are modified to make use
of the alternatives patching, as they are not in any hot path.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: make sure stop_machine_run() is always called in a tasklet
Juergen Gross [Fri, 28 Feb 2020 17:13:48 +0000 (18:13 +0100)]
xen: make sure stop_machine_run() is always called in a tasklet

With core scheduling active it is mandatory for stop_machine_run() to
be called in idle context only (so either during boot or in a tasklet),
as otherwise a scheduling deadlock would occur: stop_machine_run()
does a cpu rendezvous by activating a tasklet on all other cpus. In
case stop_machine_run() was not called in an idle vcpu it would block
scheduling the idle vcpu on its siblings with core scheduling being
active, resulting in a hang.

Put a BUG_ON() into stop_machine_run() to test for being called in an
idle vcpu only and adapt the missing call site (ucode loading) to use a
tasklet for calling stop_machine_run().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoIOMMU/x86: don't bypass softirq processing in arch_iommu_hwdom_init()
Jan Beulich [Mon, 2 Mar 2020 09:49:48 +0000 (10:49 +0100)]
IOMMU/x86: don't bypass softirq processing in arch_iommu_hwdom_init()

Even when a page doesn't need mapping, we should check whether softirq
processing should be invoked. Otherwise with sufficiently much RAM
chances of a to-be-mapped page actually occurring with the loop counter
having the "right" value may become diminishingly small.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoAMD/IOMMU: correct handling when XT's prereq features are unavailable
Jan Beulich [Fri, 28 Feb 2020 15:25:43 +0000 (16:25 +0100)]
AMD/IOMMU: correct handling when XT's prereq features are unavailable

We should neither cause IOMMU initialization as a whole to fail in this
case (we should still be able to bring up the system in non-x2APIC or
x2APIC physical mode), nor should the remainder of the function be
skipped (as the main part of it won't get entered a 2nd time) in such an
event. It is merely necessary for the function to indicate to the caller
(iov_supports_xt()) that setup failed as far as x2APIC is concerned.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/smp: use a dedicated CPU mask in send_IPI_mask
Roger Pau Monné [Fri, 28 Feb 2020 15:24:26 +0000 (16:24 +0100)]
x86/smp: use a dedicated CPU mask in send_IPI_mask

Some callers of send_IPI_mask pass the scratch cpumask as the mask
parameter of send_IPI_mask, so the scratch cpumask cannot be used by
the function. The following trace has been obtained with a debug patch
and shows one of those callers:

(XEN) scratch CPU mask already in use by arch/x86/mm.c#_get_page_type+0x1f9/0x1abf
(XEN) Xen BUG at smp.c:45
[...]
(XEN) Xen call trace:
(XEN)    [<ffff82d0802abb53>] R scratch_cpumask+0xd3/0xf9
(XEN)    [<ffff82d0802abc21>] F send_IPI_mask+0x72/0x1ca
(XEN)    [<ffff82d0802ac13e>] F flush_area_mask+0x10c/0x16c
(XEN)    [<ffff82d080296c56>] F arch/x86/mm.c#_get_page_type+0x3ff/0x1abf
(XEN)    [<ffff82d080298324>] F get_page_type+0xe/0x2c
(XEN)    [<ffff82d08038624f>] F pv_set_gdt+0xa1/0x2aa
(XEN)    [<ffff82d08027dfd6>] F arch_set_info_guest+0x1196/0x16ba
(XEN)    [<ffff82d080207a55>] F default_initialise_vcpu+0xc7/0xd4
(XEN)    [<ffff82d08027e55b>] F arch_initialise_vcpu+0x61/0xcd
(XEN)    [<ffff82d080207e78>] F do_vcpu_op+0x219/0x690
(XEN)    [<ffff82d08038be16>] F pv_hypercall+0x2f6/0x593
(XEN)    [<ffff82d080396432>] F lstar_enter+0x112/0x120

_get_page_type will use the scratch cpumask to call flush_tlb_mask,
which in turn calls send_IPI_mask.

Fix this by using a dedicated per CPU cpumask in send_IPI_mask.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoiommu/arm: Don't allow the same micro-TLB to be shared between domains
Oleksandr Tyshchenko [Mon, 17 Feb 2020 15:05:35 +0000 (17:05 +0200)]
iommu/arm: Don't allow the same micro-TLB to be shared between domains

For the IPMMU-VMSA we need to prevent the use cases where devices
which use the same micro-TLB are assigned to different Xen domains
(micro-TLB cannot be shared between multiple Xen domains, since it
points to the context bank to use for the page walk).

As each Xen domain uses individual context bank pointed by context_id,
we can potentially recognize that use case by comparing current and new
context_id for the already enabled micro-TLB and prevent different
context bank from being set.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
5 years agotools/libxl: Simplify callback handling in libxl-save-helper
Andrew Cooper [Thu, 2 Jan 2020 19:06:54 +0000 (19:06 +0000)]
tools/libxl: Simplify callback handling in libxl-save-helper

The {save,restore}_callback helpers can have their scope reduced vastly, and
helper_setcallbacks_{save,restore}() doesn't need to use a ternary operator to
write 0 (meaning NULL) into an already zeroed structure.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/cpuid: Introduce and use default CPUID policies
Andrew Cooper [Fri, 21 Feb 2020 15:23:31 +0000 (15:23 +0000)]
x86/cpuid: Introduce and use default CPUID policies

For now, the default and max policies remain identical, but this will change
in the future.

Introduce calculate_{pv,hvm}_def_policy().  As *_def derives from *_max, quite
a bit of the derivation logic can be avoided the second time around - this
will cope with simple feature differences for now.

Update XEN_SYSCTL_get_cpu_* and init_domain_cpuid_policy() to use the default
policies as appropriate.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/cpuid: Compile out unused logic/objects
Andrew Cooper [Tue, 25 Feb 2020 17:36:12 +0000 (17:36 +0000)]
x86/cpuid: Compile out unused logic/objects

CPUID Policy objects are large (1860 bytes at the time of writing), so
compiling them out based on CONFIG_{PV,HVM} makes a lot of sense.

This involves a bit of complexity in init_domain_cpuid_policy() and
recalculate_cpuid_policy() as is_pv_domain() can't be evaulated at compile
time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/msr: Introduce and use default MSR policies
Andrew Cooper [Fri, 21 Feb 2020 15:23:31 +0000 (15:23 +0000)]
x86/msr: Introduce and use default MSR policies

For now, the default and max policies remain identical, but this will change
in the future.

Update XEN_SYSCTL_get_cpu_policy and init_domain_msr_policy() to use the
default policies.

Take the opportunity sort PV ahead of HVM, as is the prevailing style
elsewhere.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/msr: Compile out unused logic/objects
Andrew Cooper [Wed, 26 Feb 2020 12:26:14 +0000 (12:26 +0000)]
x86/msr: Compile out unused logic/objects

Arrange to compile out the PV or HVM logic and objects as applicable.  This
involves a bit of complexity in init_domain_msr_policy() as is_pv_domain()
can't be evaulated at compile time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/gen-cpuid: Create max and default variations of INIT_*_FEATURES
Andrew Cooper [Tue, 25 Feb 2020 12:30:49 +0000 (12:30 +0000)]
x86/gen-cpuid: Create max and default variations of INIT_*_FEATURES

For now, write the same content for both.  Update the users of the
initialisers to use the new name, and extend xen-cpuid to dump both default
and max featuresets.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/gen-cpuid: Rework internal logic to ease future changes
Andrew Cooper [Tue, 25 Feb 2020 12:59:35 +0000 (12:59 +0000)]
x86/gen-cpuid: Rework internal logic to ease future changes

Better split the logic between parse/calculate/write.  Collect the feature
comment by their comment character(s), and perform the accumulation operations
in crunch_numbers().

Avoid rendering the featuresets to C uint32_t's in crunch_numbers(), and
instead do this in write_results().  Update format_uint32s() to call
featureset_to_uint32s() internally.

No functional change - the generated cpuid-autogen.h is identical.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/libxc: Simplify xc_get_static_cpu_featuremask()
Andrew Cooper [Wed, 26 Feb 2020 18:15:35 +0000 (18:15 +0000)]
tools/libxc: Simplify xc_get_static_cpu_featuremask()

Drop XC_FEATUREMASK_DEEP_FEATURES.  It isn't used by any callers, and unlike
the other static masks, won't be of interest to anyone without other pieces of
cpuid-autogen.h

In xc_get_static_cpu_featuremask(), use a 2d array instead of individually
named variables, and drop the switch statement completely.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/sysctl: Don't return cpu policy data for compiled-out support (2)
Andrew Cooper [Wed, 26 Feb 2020 15:28:27 +0000 (15:28 +0000)]
x86/sysctl: Don't return cpu policy data for compiled-out support (2)

Just as with c/s 96dc77b4b1 for XEN_SYSCTL_get_cpu_policy,
XEN_SYSCTL_get_cpu_featureset wants to become conditional.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: fix section-renaming of libfdt and libelf
Anthony PERARD [Thu, 27 Feb 2020 14:47:23 +0000 (15:47 +0100)]
build: fix section-renaming of libfdt and libelf

In common/libelf/Makefile, when SECTIONS gets defined
SPECIAL_DATA_SECTIONS doesn't exist, so only "text data" sections are
been renamed. This was different before 48115d14743e ("Move more
kernel decompression bits to .init.* sections"). By introducing the
same renaming mechanism the to libfdt (9ba1f198f61e ["xen/libfdt: Put
all libfdt in init"]), the issue was extended to there as well.

Move SPECIAL_DATA_SECTIONS in Rules.mk before including "Makefile" to
fix this.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: allow to test clang .include without asm symlink
Anthony PERARD [Thu, 27 Feb 2020 14:46:14 +0000 (15:46 +0100)]
build: allow to test clang .include without asm symlink

The clang test for "asm()-s support .include." needs to be modified
because the symbolic link asm -> asm-x86 may not exist when the test
is runned. Since it's an x86 test, we don't need the link.

This will be an issue with the following patch "xen/build: have the
root Makefile generates the CFLAGS".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agolibxl/PCI: align reserved device memory boundary for HAP guests
Jan Beulich [Thu, 27 Feb 2020 14:45:31 +0000 (15:45 +0100)]
libxl/PCI: align reserved device memory boundary for HAP guests

As the code comment says, this will allow use of a 2Mb super page
mapping at the end of "low" memory.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl/PCI: pass correct "hotplug" argument to libxl__device_pci_setdefault()
Jan Beulich [Thu, 27 Feb 2020 14:45:05 +0000 (15:45 +0100)]
libxl/PCI: pass correct "hotplug" argument to libxl__device_pci_setdefault()

Uniformly passing "false" can't be right, but has been benign because of
the function not using this parameter.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl/PCI: make "rdm=" parsing comply with documentation
Jan Beulich [Thu, 27 Feb 2020 14:44:41 +0000 (15:44 +0100)]
libxl/PCI: make "rdm=" parsing comply with documentation

Documentation says "<RDM_RESERVATION_STRING> is a comma separated list
of <KEY=VALUE> settings, from the following list". There's no mention
of a specific order, yet so far the parsing logic did accept only
strategy, then policy (and neither of the two omitted). Make "state"
move
- back to STATE_TYPE when finding a comma after having parsed the
  <VALUE> part of a setting,
- to STATE_TERMINAL otherwise.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl/PCI: establish per-device reserved memory policy earlier
Jan Beulich [Thu, 27 Feb 2020 14:44:17 +0000 (15:44 +0100)]
libxl/PCI: establish per-device reserved memory policy earlier

Reserved device memory region processing as well as E820 table creation
happen earlier than the adding of (PCI) devices, yet they consume the
policy (e.g. to decide whether to add entries to the E820 table). But so
far it was only at the stage of PCI device addition that the final
policy was established (i.e. if not explicitly specified by the guest
config file).

Note that I couldn't find the domain ID to be available in
libxl__domain_device_construct_rdm(), but observing that
libxl__device_pci_setdefault() also doesn't use it, for the time being
DOMID_INVALID gets passed. An obvious alternative would be to drop the
unused parameter/argument, yet at that time the question would be
whether to also drop other unused ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl/PCI: honor multiple per-device reserved memory regions
Jan Beulich [Thu, 27 Feb 2020 14:43:55 +0000 (15:43 +0100)]
libxl/PCI: honor multiple per-device reserved memory regions

While in "host" strategy all regions get processed, of the per-device
ones only the first entry has been consumed so far.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl: add initializers for libxl__domid_history
Paul Durrant [Wed, 26 Feb 2020 13:12:13 +0000 (13:12 +0000)]
libxl: add initializers for libxl__domid_history

This patch fixes Coverity issue CID 1459006 (Insecure data handling
(INTEGER_OVERFLOW)).

The problem is that the error paths for libxl__mark_domid_recent() and
libxl__is_domid_recent() check the 'f' field in struct libxl__domid_history
when it may not have been initialized.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agodomctl: fix typo in comment
Olaf Hering [Wed, 26 Feb 2020 16:13:39 +0000 (17:13 +0100)]
domctl: fix typo in comment

Add missing 'a' to sharing.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
5 years agobuild: remove use of AFLAGS-y
Anthony PERARD [Wed, 26 Feb 2020 16:41:53 +0000 (17:41 +0100)]
build: remove use of AFLAGS-y

And simply add directly to AFLAGS.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: remove confusing comment on the %.s:%.S rule
Anthony PERARD [Wed, 26 Feb 2020 16:41:37 +0000 (17:41 +0100)]
build: remove confusing comment on the %.s:%.S rule

That comment was introduce by 3943db776371 ("[XEN] Can be built
-std=gnu99 (except for .S files).") to explain why CFLAGS was removed
from the command line. The comment is already written where the
-std=gnu flags gets remove from AFLAGS, no need to repeat it.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoMakefile: fix install-tests
Anthony PERARD [Wed, 26 Feb 2020 16:41:02 +0000 (17:41 +0100)]
Makefile: fix install-tests

The top-level makefile make uses of internal implementation detail of
the xen build system. Avoid that by creating a new target
"install-tests" in xen/Makefile, and by fixing the top-level Makefile
to not call xen/Rules.mk anymore.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/include: remove include of Config.mk
Anthony PERARD [Wed, 26 Feb 2020 16:40:06 +0000 (17:40 +0100)]
xen/include: remove include of Config.mk

It isn't necessary to include Config.mk here because this Makefile is
only used by xen/Rules.mk which already includes Config.mk.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/smp: do not use scratch_cpumask when in interrupt or exception context
Roger Pau Monné [Wed, 26 Feb 2020 16:38:58 +0000 (17:38 +0100)]
x86/smp: do not use scratch_cpumask when in interrupt or exception context

Using scratch_cpumask in send_IPI_mask is not safe in IRQ or exception
context because it can nest, and hence send_IPI_mask could be
overwriting another user scratch cpumask data when used in such
contexts.

Fallback to not using the scratch cpumask (and hence not attemping to
optimize IPI sending by using a shorthand) when in IRQ or exception
context. Note that the scratch cpumask cannot be used when
non-maskable interrupts are being serviced (NMI or #MC) and hence
fallback to not using the shorthand in that case, like it was done
previously.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: track when in #MC context
Roger Pau Monné [Wed, 26 Feb 2020 16:38:11 +0000 (17:38 +0100)]
x86: track when in #MC context

Add helpers to track when executing in #MC handler context. This is
modeled after the in_irq helpers.

Note that there are no users of in_mce_handler() introduced by the
change, further users will be added by followup changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: track when in NMI context
Roger Pau Monné [Wed, 26 Feb 2020 16:37:22 +0000 (17:37 +0100)]
x86: track when in NMI context

Add helpers to track when running in NMI handler context. This is
modeled after the in_irq helpers.

The SDM states that no NMI can be delivered while handling a NMI
until the processor has executed an iret instruction. It's possible
however that another fault is received while handling the NMI (a #MC
for example), and thus the iret from that fault would allow further
NMIs to be injected while still processing the previous one, and
hence an integer is needed in order to keep track of in service NMIs.
The added macros only track when the execution context is in the NMI
handler, but that doesn't mean NMIs are blocked for the reasons listed
above.

Note that there are no users of in_nmi_handler() introduced by the
change, further users will be added by followup changes.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: introduce a nmi_count tracking variable
Roger Pau Monné [Wed, 26 Feb 2020 16:36:30 +0000 (17:36 +0100)]
x86: introduce a nmi_count tracking variable

This is modeled after the irq_count variable, and is used to account
for all the NMIs handled by the system.

This will allow to repurpose the nmi_count() helper so it can be used
in a similar manner as local_irq_count(): account for the NMIs
currently in service.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/vPMU: don't blindly assume IA32_PERF_CAPABILITIES MSR exists
Jan Beulich [Wed, 26 Feb 2020 16:35:48 +0000 (17:35 +0100)]
x86/vPMU: don't blindly assume IA32_PERF_CAPABILITIES MSR exists

Just like VMX'es lbr_tsx_fixup_check() the respective CPUID bit should
be consulted first.

Reported-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mm: drop p2mt parameter from map_domain_gfn()
Jan Beulich [Wed, 26 Feb 2020 16:35:07 +0000 (17:35 +0100)]
x86/mm: drop p2mt parameter from map_domain_gfn()

No caller actually consumes it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoSVM: drop asm/hvm/emulate.h inclusion from vmcb.h
Jan Beulich [Wed, 26 Feb 2020 16:33:57 +0000 (17:33 +0100)]
SVM: drop asm/hvm/emulate.h inclusion from vmcb.h

It's not needed there and introduces a needless, almost global
dependency. Include the file (or in some cases just xen/err.h) where
actually needed, or - in one case - simply forward-declare a struct. In
microcode*.c take the opportunity and also re-order a few other
#include-s.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/sysctl: Don't return cpu policy data for compiled-out support
Andrew Cooper [Tue, 25 Feb 2020 16:57:03 +0000 (16:57 +0000)]
x86/sysctl: Don't return cpu policy data for compiled-out support

Policy objects aren't tiny, and the derivation logic isn't trivial.  We are
about to increase the number of policy objects, so will have the opportunity
to drop logic and storage space based on CONFIG_{PV,HVM}.

Start by causing XEN_SYSCTL_get_cpu_policy to fail with -EOPNOTSUPP when
requesting data for a compiled-out subsystem.  Update xen-cpuid to cope and
continue to further system policies, seeing as the indicies are interleaved.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/gen-cpuid: Fix Py2/3 compatibility
Andrew Cooper [Tue, 25 Feb 2020 15:43:55 +0000 (15:43 +0000)]
x86/gen-cpuid: Fix Py2/3 compatibility

There is a fencepost error on the sys.version_info check which will break on
Python 3.0.  Reverse the logic to make py2 compatible with py3 (rather than
py3 compatible with py2) which will be more natural to follow as py2 usage
reduces.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agons16550: Re-order the serial port address checking
Wei Xu [Wed, 26 Feb 2020 09:56:23 +0000 (10:56 +0100)]
ns16550: Re-order the serial port address checking

The serial port address space ID qualifies the address. Whether a value
of zero for the serial port address can sensibly mean "disabled" depends
on the address space ID. Hence check the address space ID before
checking the address.

Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agosmp: convert cpu_hotplug_begin into a blocking lock acquisition
Roger Pau Monné [Wed, 26 Feb 2020 09:55:22 +0000 (10:55 +0100)]
smp: convert cpu_hotplug_begin into a blocking lock acquisition

Don't allow cpu_hotplug_begin to fail by converting the trylock into a
blocking lock acquisition. Write users of the cpu_add_remove_lock are
limited to CPU plug/unplug operations, and cannot deadlock between
themselves or other users taking the lock in read mode as
cpu_add_remove_lock is always locked with interrupts enabled. There
are also no other locks taken during the plug/unplug operations.

The exclusive lock usage in register_cpu_notifier is also converted
into a blocking lock acquisition, as it was previously not allowed to
fail anyway.

This is meaningful when running Xen in shim mode, since VCPU_{up/down}
hypercalls use cpu hotplug/unplug operations in the background, and
hence failing to take the lock results in VPCU_{up/down} failing with
-EBUSY, which most users are not prepared to handle.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agorwlock: allow recursive read locking when already locked in write mode
Roger Pau Monné [Wed, 26 Feb 2020 09:53:03 +0000 (10:53 +0100)]
rwlock: allow recursive read locking when already locked in write mode

Allow a CPU already holding the lock in write mode to also lock it in
read mode. There's no harm in allowing read locking a rwlock that's
already owned by the caller (ie: CPU) in write mode. Allowing such
accesses is required at least for the CPU maps use-case.

In order to do this reserve 12bits of the lock, this allows to support
up to 4096 CPUs. Also reduce the write lock mask to 2 bits: one to
signal there are pending writers waiting on the lock and the other to
signal the lock is owned in write mode.

This reduces the maximum number of concurrent readers from 16777216 to
262144, I think this should still be enough, or else the lock field
can be expanded from 32 to 64bits if all architectures support atomic
operations on 64bit integers.

Fixes: 5872c83b42c608 ('smp: convert the cpu maps lock into a rw lock')
Reported-by: Jan Beulich <jbeulich@suse.com>
Reported-by: Jürgen Groß <jgross@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoatomic: add atomic_and operations
Roger Pau Monné [Wed, 26 Feb 2020 09:51:31 +0000 (10:51 +0100)]
atomic: add atomic_and operations

To x86 and Arm. This performs an atomic AND operation against an
atomic_t variable with the provided mask.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien@xen.org>
5 years agosched: rework credit2 run-queue allocation
Juergen Gross [Wed, 26 Feb 2020 09:50:26 +0000 (10:50 +0100)]
sched: rework credit2 run-queue allocation

Currently the memory for each run-queue of the credit2 scheduler is
allocated at the scheduler's init function: for each cpu in the system
a struct csched2_runqueue_data is being allocated, even if the
current scheduler only handles one physical cpu or is configured to
work with a single run-queue. As each struct contains 4 cpumasks this
sums up to rather large memory sizes pretty fast.

Rework the memory allocation for run-queues to be done only when
needed, i.e. when adding a physical cpu to the scheduler requiring a
new run-queue.

In fact this fixes a bug in credit2 related to run-queue handling:
cpu_to_runqueue() will return the first free or matching run-queue,
which ever is found first. So in case a cpu is removed from credit2
this could result in e.g. run-queue 0 becoming free, so when another
cpu is added it will in any case be assigned to that free run-queue,
even if it would have found another run-queue matching later.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86/pvh: drop v2 suffix from pvh.pandoc
Wei Liu [Tue, 25 Feb 2020 14:22:32 +0000 (14:22 +0000)]
x86/pvh: drop v2 suffix from pvh.pandoc

There is now only one version of PVH implementation in Xen. Drop "v2" to
avoid confusion.

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agolibxl: fix build with older glibc
Paul Durrant [Tue, 25 Feb 2020 12:33:43 +0000 (12:33 +0000)]
libxl: fix build with older glibc

Commit 2b02882ebbbc "libxl: add infrastructure to track and query
'recent' domids" added a call to clock_gettime() into libxl. The man-
page for this states:

"Link with -lrt (only for glibc versions before 2.17)."

Unfortunately Centos 6 does have an glibc prior to that version, and the
libxl Makefile was not updated to add '-lrt' so the build will fail in
that environment.

This patch simply adds '-lrt' to LIBXL_LIBS unconditionally, as it does
no harm in newer environments.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Fixes: 2b02882ebbbc ("libxl: add infrastructure to track and query 'recent' domids")
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/dom0_build: PVH ABI is now in pvh.pandoc
Wei Liu [Sun, 23 Feb 2020 21:13:30 +0000 (21:13 +0000)]
x86/dom0_build: PVH ABI is now in pvh.pandoc

Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxl: allow domid to be preserved on save/restore or migrate
Paul Durrant [Wed, 8 Jan 2020 15:40:55 +0000 (15:40 +0000)]
xl: allow domid to be preserved on save/restore or migrate

This patch adds a '-D' command line option to save and migrate to allow
the domain id to be incorporated into the saved domain configuration and
hence be preserved.

NOTE: Logically it may seem as though preservation of domid should be
      dealt with by libxl, but the libxl migration stream has no record
      in which to transfer domid and remote domain creation occurs before
      the migration stream is parsed. Hence this patch modifies xl rather
      then libxl.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxl.conf: introduce 'domid_policy'
Paul Durrant [Wed, 8 Jan 2020 12:32:14 +0000 (12:32 +0000)]
xl.conf: introduce 'domid_policy'

This patch adds a new global 'domid_policy' configuration option to decide
how domain id values are allocated for new domains. It may be set to one of
two values:

"xen", the default value, will cause an invalid domid value to be passed
to do_domain_create() preserving the existing behaviour of having Xen
choose the domid value during domain_create().

"random" will cause the special RANDOM_DOMID value to be passed to
do_domain_create() such that libxl__domain_make() will select a random
domid value.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: allow creation of domains with a specified or random domid
Paul Durrant [Mon, 23 Dec 2019 17:16:20 +0000 (17:16 +0000)]
libxl: allow creation of domains with a specified or random domid

This patch adds a 'domid' field to libxl_domain_create_info and then
modifies libxl__domain_make() to have Xen use that value if it is valid.
If the domid value is invalid then Xen will choose the domid, as before,
unless the value is the new special RANDOM_DOMID value added to the API.
This value instructs libxl__domain_make() to choose a random domid value
for Xen to use.

If Xen determines that a domid specified to or chosen by
libxl__domain_make() co-incides with an existing domain then the create
operation will fail. In this case, if RANDOM_DOMID was specified to
libxl__domain_make() then a new random value will be chosen and the create
operation will be re-tried, otherwise libxl__domain_make() will fail.

After Xen has successfully created a new domain, libxl__domain_make() will
check whether its domid matches any recently used domid values. If it does
then the domain will be destroyed. If the domid used in creation was
specified to libxl__domain_make() then it will fail at this point,
otherwise the create operation will be re-tried with either a new random
or Xen-selected domid value.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agopublic/xen.h: add a definition for a 'valid domid' mask
Paul Durrant [Wed, 19 Feb 2020 08:53:31 +0000 (08:53 +0000)]
public/xen.h: add a definition for a 'valid domid' mask

A subsequent patch will modify libxl to allow selection of a random domid
value when creating domains. Valid values are limited to a width of 15 bits,
so add an appropriate mask definition to the public header.

NOTE: It is reasonable for this mask definition to be in a Xen public header
      rather than in, say, a libxenctrl header since it relates to the
      validity of a value passed to XEN_DOMCTL_createdomain. This new
      definition is placed in xen.h rather than domctl.h only to co-locate
      it with other domid-related defitions.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agolibxl: add infrastructure to track and query 'recent' domids
Paul Durrant [Tue, 7 Jan 2020 13:46:45 +0000 (13:46 +0000)]
libxl: add infrastructure to track and query 'recent' domids

A domid is considered recent if the domain it represents was destroyed
less than a specified number of seconds ago. For debugging and/or testing
purposes the number can be set using the environment variable
LIBXL_DOMID_REUSE_TIMEOUT. If the variable does not exist then a default
value of 60s is used.

Whenever a domain is destroyed, a time-stamped record will be written into
a history file (/var/run/xen/domid-history). To avoid the history file
growing too large, any records with time-stamps that indicate that the
age of a domid has exceeded the re-use timeout will also be purged.

A new utility function, libxl__is_recent_domid(), has been added. This
function reads the same history file checking whether a specified domid
has a record that does not exceed the re-use timeout. Since this utility
function does not write to the file, no records are actually purged by it.

NOTE: The history file is purged on boot to it is safe to use
      CLOCK_MONOTONIC as a time source.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/msr: Drop {pv,hvm}_max_vcpu_msrs objects
Andrew Cooper [Mon, 24 Feb 2020 13:52:24 +0000 (13:52 +0000)]
x86/msr: Drop {pv,hvm}_max_vcpu_msrs objects

It turns out that these are unused, and we dup a type-dependent block of
zeros.  Use xzalloc() instead.

Read/write MSRs typically default 0, and non-zero defaults would need dealing
with at suitable INIT/RESET points (e.g. arch_vcpu_regs_init).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/msr: Start cleaning up msr-index.h
Andrew Cooper [Fri, 25 May 2018 15:12:05 +0000 (16:12 +0100)]
x86/msr: Start cleaning up msr-index.h

Make a start on cleaning up the constants in msr-index.h.

No functional change - only formatting changes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agons16550: add ACPI support for ARM only
Wei Xu [Fri, 21 Feb 2020 16:20:22 +0000 (17:20 +0100)]
ns16550: add ACPI support for ARM only

Parse the ACPI SPCR table and initialize the 16550 compatible serial port
for ARM only. Currently we only support one UART on ARM. Some fields
which we do not care yet on ARM are ignored.

Signed-off-by: Wei Xu <xuwei5@hisilicon.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agox86/p2m: drop p2m_access_t parameter from set_mmio_p2m_entry()
Jan Beulich [Fri, 21 Feb 2020 16:19:16 +0000 (17:19 +0100)]
x86/p2m: drop p2m_access_t parameter from set_mmio_p2m_entry()

Both callers request the host P2M's default access, which can as well be
done inside the function. While touching this anyway, make the "gfn"
parameter type-safe as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@ciitrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: p2m_flags_to_type() deals only with "unsigned int"
Jan Beulich [Fri, 21 Feb 2020 16:16:25 +0000 (17:16 +0100)]
x86/p2m: p2m_flags_to_type() deals only with "unsigned int"

PTE flags, for now at least, get stored in "unsigned int". Hence there's
no need to widen the values to "unsigned long" before processing them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: adjust non-PoD accounting in p2m_pod_decrease_reservation()
Jan Beulich [Fri, 21 Feb 2020 16:15:22 +0000 (17:15 +0100)]
x86/p2m: adjust non-PoD accounting in p2m_pod_decrease_reservation()

Throughout the function the equation

pod + nonpod == (1UL << order)

should hold. This has been violated by the final loop of the function:
* changing a range from a type other than p2m_populate_on_demand to
  p2m_invalid doesn't alter the amount of non-PoD pages in the region,
* changing a range from p2m_populate_on_demand to p2m_invalid does
  increase the amount of non-PoD pages in the region along with
  decreasing the amount of PoD pages there.
Fortunately the variable isn't used anymore after the loop. Instead of
correcting the updating of the "nonpod" variable, however, drop it
altogether, to avoid getting the above equation to not hold again by a
future change.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/p2m: fix PoD accounting in guest_physmap_add_entry()
Jan Beulich [Fri, 21 Feb 2020 16:09:28 +0000 (17:09 +0100)]
x86/p2m: fix PoD accounting in guest_physmap_add_entry()

The initial observation was that the mfn_valid() check comes too late:
Neither mfn_add() nor mfn_to_page() (let alone de-referencing the
result of the latter) are valid for MFNs failing this check. Move it up
and - noticing that there's no caller doing so - also add an assertion
that this should never produce "false" here.

In turn this would have meant that the "else" to that if() could now go
away, which didn't seem right at all. And indeed, considering callers
like memory_exchange() or various grant table functions, the PoD
accounting should have been outside of that if() from the very
beginning.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/public: Obsolete HVM_PARAM_PAE_ENABLED
Andrew Cooper [Wed, 5 Feb 2020 14:33:00 +0000 (14:33 +0000)]
xen/public: Obsolete HVM_PARAM_PAE_ENABLED

Xen has never acted upon the value of HVM_PARAM_PAE_ENABLED, contrary perhaps
to expectations based on how other boolean fields work.

It was only ever used as a non-standard calling convention for
xc_cpuid_apply_policy() but that has been fixed now.

Purge its use, and any possible confusion over its behaviour, by having Xen
reject any attempts to use it.  Forgo setting it up in libxl's
hvm_set_conf_params().  The only backwards compatibility necessary is to have
the HVM restore stream discard it if found.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/hvm: Fix handling of obsolete HVM_PARAMs
Andrew Cooper [Thu, 6 Feb 2020 12:40:50 +0000 (12:40 +0000)]
xen/hvm: Fix handling of obsolete HVM_PARAMs

The local xc_hvm_param_deprecated_check() in libxc tries to guess Xen's
behaviour for the MEMORY_EVENT params, but is wrong for the get side, where
Xen would return 0 (which is also a bug).  Delete the helper.

In Xen, perform the checks in hvm_allow_set_param(), rather than
hvm_set_param(), and actually implement checks on the get side so the
hypercall doesn't return successfully with 0 as an answer.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/splitlock: CPUID and MSR details
Andrew Cooper [Mon, 23 Dec 2019 14:10:29 +0000 (14:10 +0000)]
x86/splitlock: CPUID and MSR details

A splitlock is an atomic operation which crosses a cache line boundary.  It
serialises operations in the cache coherency fabric and comes with a
multi-thousand cycle stall.

Intel Tremont CPUs introduce MSR_CORE_CAPS to enumerate various core-specific
features, and MSR_TEST_CTRL to adjust the behaviour in the case of a
splitlock.

Virtualising this for guests is distinctly tricky owing to the fact that
MSR_TEST_CTRL has core rather than thread scope.  In the meantime however,
prevent the MSR values leaking into guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
5 years agox86/p2m: Allow p2m_get_page_from_gfn to return shared entries
Tamas K Lengyel [Mon, 10 Feb 2020 19:21:25 +0000 (11:21 -0800)]
x86/p2m: Allow p2m_get_page_from_gfn to return shared entries

The owner domain of shared pages is dom_cow, use that for get_page
otherwise the function fails to return the correct page under some
situations. The check if dom_cow should be used was only performed in
a subset of use-cases. Fixing the error and simplifying the existing check
since we can't have any shared entries with dom_cow being NULL.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/arm: Restrict access to most HVM_PARAM's
Andrew Cooper [Wed, 5 Sep 2018 13:38:42 +0000 (14:38 +0100)]
xen/arm: Restrict access to most HVM_PARAM's

ARM currently has no restrictions on toolstack and guest access to the entire
HVM_PARAM block.  As the monitor feature isn't under security support, this
doesn't need an XSA.

The CALLBACK_IRQ and {STORE,CONSOLE}_{PFN,EVTCHN} details are only exposed
read-only to the guest, while MONITOR_RING_PFN is restricted to only toolstack
access.  No other parameters are used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>
5 years agoMAINTAINERS: Step back to designated reviewer for mm/
George Dunlap [Thu, 20 Feb 2020 18:09:17 +0000 (18:09 +0000)]
MAINTAINERS: Step back to designated reviewer for mm/

With having to take over Lars' role as community manager, I don't have
the necessary time to review the mm/ subsystem.  Step back to being only
a designated reviewer, reverting mantainership to the x86 maintianers.

While here, fix my e-mail address in other places.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl: modify libxl__logv() to only log valid domid values
Paul Durrant [Fri, 21 Feb 2020 11:20:45 +0000 (11:20 +0000)]
libxl: modify libxl__logv() to only log valid domid values

Some code-paths use values other than INVALID_DOMID to indicate an invalid
domain id. Specifically, xl will pass a value of 0 when creating/restoring
a domain. Therefore modify libxl__logv() to use libxl_domid_valid_guest()
as a validity test.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxen: Move async_exception_* infrastructure into x86
Andrew Cooper [Thu, 13 Feb 2020 12:58:35 +0000 (12:58 +0000)]
xen: Move async_exception_* infrastructure into x86

The async_exception_{state,mask} infrastructure is implemented in common code,
but is limited to x86 because of the VCPU_TRAP_LAST ifdef-ary.

The internals are very x86 specific (and even then, in need of correction),
and won't be of interest to other architectures.  Move it all into x86
specific code.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/nmi: Corrections and improvements to do_nmi_stats()
Andrew Cooper [Thu, 13 Feb 2020 14:06:50 +0000 (14:06 +0000)]
x86/nmi: Corrections and improvements to do_nmi_stats()

The hardware domain doesn't necessarily have the domid 0.  Render v instead,
adjusting the strings to avoid printing trailing whitespace.

Rename i to cpu, and use separate booleans for pending/masked.  Drop the
unnecessary domain local variable.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/msr: Virtualise MSR_PLATFORM_ID properly
Andrew Cooper [Tue, 30 Apr 2019 11:07:04 +0000 (12:07 +0100)]
x86/msr: Virtualise MSR_PLATFORM_ID properly

This is an Intel-only, read-only MSR related to microcode loading.  Expose it
in similar circumstances as the PATCHLEVEL MSR.

This should have been alongside c/s 013896cb8b2 "x86/msr: Fix handling of
MSR_AMD_PATCHLEVEL/MSR_IA32_UCODE_REV"

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoMAINTAINERS: make Roger VPCI maintainer
Wei Liu [Thu, 20 Feb 2020 15:58:43 +0000 (15:58 +0000)]
MAINTAINERS: make Roger VPCI maintainer

Roger has kindly agreed to take on the burden.

Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86: introduce a new set of APIs to manage Xen page tables
Wei Liu [Tue, 28 Jan 2020 13:50:05 +0000 (13:50 +0000)]
x86: introduce a new set of APIs to manage Xen page tables

We are going to switch to using domheap page for page tables.
A new set of APIs is introduced to allocate and free pages of page
tables based on mfn instead of the xenheap direct map address. The
allocation and deallocation work on mfn_t but not page_info, because
they are required to work even before frame table is set up.

Implement the old functions with the new ones. We will rewrite, site
by site, other mm functions that manipulate page tables to use the new
APIs.

After the allocation, one needs to map and unmap via map_domain_page to
access the PTEs. This does not break xen half way, since the new APIs
still use xenheap pages underneath, and map_domain_page will just use
the directmap for mappings. They will be switched to use domheap and
dynamic mappings when usage of old APIs is eliminated.

No functional change intended in this patch.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/xentop: Cleanup some trailing whitespace
Sander Eikelenboom [Wed, 19 Feb 2020 20:31:32 +0000 (21:31 +0100)]
tools/xentop: Cleanup some trailing whitespace

Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/xentop: Remove dead code
Sander Eikelenboom [Wed, 19 Feb 2020 20:31:31 +0000 (21:31 +0100)]
tools/xentop: Remove dead code

The freeable_mb variable was made to always be zero when purging tmem
from tools. We can in fact just delete it and the code associated with
it.

Fixes: c588c002cc1 ("tools: remove tmem code and commands")
Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/xentop: Fix calculation of used memory
Sander Eikelenboom [Wed, 19 Feb 2020 20:31:30 +0000 (21:31 +0100)]
tools/xentop: Fix calculation of used memory

Used memory should be calculated by subtracting free memory from total
memory.

Fixes: c588c002cc1 ("tools: remove tmem code and commands")
Signed-off-by: Sander Eikelenboom <linux@eikelenboom.it>
Acked-by: Wei Liu <wl@xen.org>
5 years agosched: don't disable interrupts all the time when dumping run-queues
Juergen Gross [Thu, 20 Feb 2020 10:38:31 +0000 (11:38 +0100)]
sched: don't disable interrupts all the time when dumping run-queues

Having interrupts disabled all the time when running dump_runq() is
not necessary. All the called functions are doing proper locking
and disable interrupts if needed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoAMD/IOMMU: drop unused PCI-generic #define-s
Jan Beulich [Thu, 20 Feb 2020 10:38:00 +0000 (11:38 +0100)]
AMD/IOMMU: drop unused PCI-generic #define-s

Quite possibly they had been in use when some of the PCI interfacing was
done in an ad hoc way rather than using the PCI functions we have. Right
now these have no users (left).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: "spec-ctrl=no-xen" should also disable branch hardening
Jan Beulich [Thu, 20 Feb 2020 10:37:01 +0000 (11:37 +0100)]
x86: "spec-ctrl=no-xen" should also disable branch hardening

This is controlling Xen behavior alone, after all.

Reported-by: Jin Nan Wang <jnwang@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agosched: add some diagnostic info in the run queue keyhandler
Juergen Gross [Thu, 20 Feb 2020 10:36:16 +0000 (11:36 +0100)]
sched: add some diagnostic info in the run queue keyhandler

When dumping the run queue information add some more data regarding
current and (if known) previous vcpu for each physical cpu.

With core scheduling activated the printed data will be e.g.:

(XEN) CPUs info:
(XEN) CPU[00] current=d[IDLE]v0, curr=d[IDLE]v0, prev=NULL
(XEN) CPU[01] current=d[IDLE]v1
(XEN) CPU[02] current=d[IDLE]v2, curr=d[IDLE]v2, prev=NULL
(XEN) CPU[03] current=d[IDLE]v3

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agosmp: convert the cpu maps lock into a rw lock
Roger Pau Monné [Wed, 19 Feb 2020 15:09:03 +0000 (16:09 +0100)]
smp: convert the cpu maps lock into a rw lock

Most users of the cpu maps just care about the maps not changing while
the lock is being held, but don't actually modify the maps.

Convert the lock into a rw lock, and take the lock in read mode in
get_cpu_maps and in write mode in cpu_hotplug_begin. This will lower
the contention around the lock, since plug and unplug operations that
take the lock in write mode are not that common.

Note that the read lock can be taken recursively (as it's a shared
lock), and hence will keep the same behavior as the previously used
recursive lock. As for the write lock, it's only used by CPU
plug/unplug operations, and the lock is never taken recursively in
that case.

While there also change get_cpu_maps return type to bool.

Reported-by: Julien Grall <julien@xen.org>
Suggested-also-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agosched: fix get_cpu_idle_time() with core scheduling
Juergen Gross [Wed, 19 Feb 2020 15:08:11 +0000 (16:08 +0100)]
sched: fix get_cpu_idle_time() with core scheduling

get_cpu_idle_time() is calling vcpu_runstate_get() for an idle vcpu.
With core scheduling active this is fragile, as idle vcpus are assigned
to other scheduling units temporarily, and that assignment is changed
in some cases without holding the scheduling lock, and
vcpu_runstate_get() is using v->sched_unit as parameter for
unit_schedule_[un]lock_irq(), resulting in an ASSERT() triggering in
unlock in case v->sched_unit has changed meanwhile.

Fix that by using a local unit variable holding the correct unit.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agosysctl: use xmalloc_array() for XEN_SYSCTL_page_offline_op
Jan Beulich [Tue, 18 Feb 2020 16:52:10 +0000 (17:52 +0100)]
sysctl: use xmalloc_array() for XEN_SYSCTL_page_offline_op

This is more robust than the raw xmalloc_bytes().

Also add a sanity check on the input page range, to avoid returning
the less applicable -ENOMEM in such cases (and trying the allocation in
the first place).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
5 years agonvmx: always trap accesses to x2APIC MSRs
Roger Pau Monne [Wed, 19 Feb 2020 10:22:56 +0000 (11:22 +0100)]
nvmx: always trap accesses to x2APIC MSRs

Nested VMX doesn't expose support for
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE,
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT, and hence the x2APIC MSRs should
always be trapped in the nested guest MSR bitmap, or else a nested
guest could access the hardware x2APIC MSRs given certain conditions.

Accessing the hardware MSRs could be achieved by forcing the L0 Xen to
use SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE and
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT (if supported), and then creating a
L2 guest with a MSR bitmap that doesn't trap accesses to the x2APIC
MSR range. Then OR'ing both L0 and L1 MSR bitmaps would result in a
bitmap that doesn't trap certain x2APIC MSRs and a VMCS that doesn't
have SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE and
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY or
SECONDARY_EXEC_APIC_REGISTER_VIRT set either.

Fix this by making sure x2APIC MSRs are always trapped in the nested
MSR bitmap.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>