]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agoxen: fix a (latent) cpupool-related race during domain destroy
Dario Faggioli [Wed, 3 Aug 2016 12:31:49 +0000 (13:31 +0100)]
xen: fix a (latent) cpupool-related race during domain destroy

So, during domain destruction, we do:
 cpupool_rm_domain()    [ in domain_destroy() ]
 sched_destroy_domain() [ in complete_domain_destroy() ]

Therefore, there's a window during which, from the
scheduler's point of view, a domain stilsts outside
of any cpupool.

In fact, cpupool_rm_domain() does d->cpupool=NULL,
and we don't allow that to hold true, for anything
but the idle domain (and there are, in fact, ASSERT()s
and BUG_ON()s to that effect).

Currently, we never really check d->cpupool during the
window, but that does not mean the race is not there.
For instance, Credit2 at some point (during load balancing)
iterates on the list of domains, and if we add logic that
needs checking d->cpupool, and any one of them had
cpupool_rm_domain() called on itself already... Boom!

(In fact, calling __vcpu_has_soft_affinity() from inside
balance_load() makes `xl shutdown <domid>' reliably
crash, and this is how I discovered this.)

On the other hand, cpupool_rm_domain() "only" does
cpupool related bookkeeping, and there's no harm
postponing it a little bit.

Also, considering that, during domain initialization,
we do:
 cpupool_add_domain()
 sched_init_domain()

It makes sense for the destruction path to look like
the opposite of it, i.e.:
 sched_destroy_domain()
 cpupool_rm_domain()

And hence that's what this patch does.

Actually, for better robustness, what we really do is
moving both cpupool_add_domain() and cpupool_rm_domain()
inside sched_init_domain() and sched_destroy_domain(),
respectively (and also add a couple of ASSERT()-s).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: issues in csched2_cpu_pick(), when tracing is enabled.
Dario Faggioli [Wed, 27 Jul 2016 03:09:49 +0000 (05:09 +0200)]
xen: credit2: issues in csched2_cpu_pick(), when tracing is enabled.

In fact, when not finding a suitable runqueue where to
place a vCPU, and hence using a fallback, we either:
 - don't issue any trace record (while we should, at
   least, output the chosen pcpu),
 - risk underruning when accessing the runqueues
   array, while preparing the trace record.

Fix both issues and, while there, also a couple of style
problems found nearby.

Spotted by Coverity.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agomwait-idle: add Denverton
Jacob Pan [Wed, 3 Aug 2016 12:41:13 +0000 (14:41 +0200)]
mwait-idle: add Denverton

Denverton is an Intel Atom based micro server which shares the same
Goldmont architecture as Broxton. The available C-states on
Denverton is a subset of Broxton with only C1, C1e, and C6.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit: 0080d65b7719fc58e60b5595fc61acded330004f]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/time: introduce and use rdtsc_ordered()
Jan Beulich [Wed, 3 Aug 2016 12:40:44 +0000 (14:40 +0200)]
x86/time: introduce and use rdtsc_ordered()

Matching Linux commit 03b9730b76 ("x86/asm/tsc: Add rdtsc_ordered() and
use it in trivial call sites") and earlier ones it builds upon, let's
make sure timing loops don't have their rdtsc()-s re-ordered, as that
would harm precision of the result (values were observed to be several
hundred clocks off without this adjustment).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
8 years agox86/time: adjust local system time initialization
Jan Beulich [Wed, 3 Aug 2016 12:39:31 +0000 (14:39 +0200)]
x86/time: adjust local system time initialization

Using the bare return value from read_platform_stime() is not suitable
when local_time_calibration() is going to use its fast path: Divergence
of several dozen microseconds between NOW() return values on different
CPUs results when platform and local time don't stay in close sync.

Latch local and platform time on the CPU initiating AP bringup, such
that the AP can use these values to seed its stime_local_stamp with as
little of an error as possible. The boot CPU, otoh, can simply
calculate the correct initial value (other CPUs could do so too with
even greater accuracy than the approach being introduced, but that can
work only if all CPUs' TSCs start ticking at the same time, which
generally can't be assumed to be the case on multi-socket systems).

This slightly defers init_percpu_time() (moved ahead by commit
dd2658f966 ["x86/time: initialise time earlier during
start_secondary()"]) in order to reduce as much as possible the gap
between populating the stamps and consuming them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Dario Faggioli <dario.faggioli@citrix.com>
Tested-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibxl: use llabs() instead abs() for int64_t argument
Juergen Gross [Tue, 2 Aug 2016 17:25:42 +0000 (19:25 +0200)]
libxl: use llabs() instead abs() for int64_t argument

Commit 57f8b13c724023c78fa15a80452d1de3e51a1418 ("libxl: memory size
in kb requires 64 bit variable") introduced a bug: abs() shouldn't
be called with an int64_t argument. llabs() is to be used here.

Caught by clang build with error message:

libxl.c:4198:33: error: absolute value function 'abs' given an argument
of type
    'int64_t' (aka 'long') but has parameter of type 'int' which may cause
    truncation of value [-Werror,-Wabsolute-value]
    if (target_memkb < 0 && abs(target_memkb) > current_target_memkb)

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/mm: Annotate gfn_get_* helpers as requiring non-NULL parameters
Andrew Cooper [Wed, 27 Jul 2016 17:34:39 +0000 (18:34 +0100)]
x86/mm: Annotate gfn_get_* helpers as requiring non-NULL parameters

Introduce and use the nonnull attribute to help the compiler catch NULL
parameters being passed to function which require their parameters not to be
NULL.  Experimentally, GCC 4.9 on Debian Jessie only warns of non-NULL-ness
from immediate callers, so propagate the attributes out to all helpers.

A sample error looks like:

mem_sharing.c: In function ‘mem_sharing_nominate_page’:
mem_sharing.c:884:13: error: null argument where non-null required (argument 3) [-Werror=nonnull]
             amfn = get_gfn_type_access(ap2m, gfn, NULL, &ap2ma, 0, NULL);
             ^

As part of this, replace the get_gfn_type_access() macro with an equivalent
static inline function for extra type safety, and the ability to be annotated.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agosystemd: remove hard-coded pid file in xendriverdomain service
Wei Liu [Wed, 20 Jul 2016 15:36:15 +0000 (16:36 +0100)]
systemd: remove hard-coded pid file in xendriverdomain service

Per the discussion in [0], the hard-coded pid file can be removed
completely. Systemd has no trouble figuring out the pid of devd all by
itself.

[0]: https://lists.xen.org/archives/html/xen-devel/2016-07/msg01393.html

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxl: memory size in kb requires 64 bit variable
Juergen Gross [Thu, 28 Jul 2016 13:35:19 +0000 (15:35 +0200)]
libxl: memory size in kb requires 64 bit variable

libxl_set_memory_target() and several other interface functions of
libxl use a 32 bit sized parameter for a memory size value in kBytes.
This limits the maximum size to be passed in such a parameter
depending on signedness of the parameter to 2TB or 4TB.

Correct this by using 64 bit types.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/mem-sharing: mem-sharing a range of memory
Tamas K Lengyel [Mon, 1 Aug 2016 17:14:27 +0000 (11:14 -0600)]
x86/mem-sharing: mem-sharing a range of memory

Currently mem-sharing can be performed on a page-by-page basis from the control
domain. However, this process is quite wasteful when a range of pages have to
be deduplicated.

This patch introduces a new mem_sharing memop for range sharing where
the user doesn't have to separately nominate each page in both the source and
destination domain, and the looping over all pages happen in the hypervisor.
This significantly reduces the overhead of sharing a range of memory.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agolibxl: create xenstore nodes for control/feature-XXX flags
Paul Durrant [Mon, 1 Aug 2016 08:57:10 +0000 (09:57 +0100)]
libxl: create xenstore nodes for control/feature-XXX flags

The xenstore-paths documentation specifies various control/feature-XXX
flags to allow a guest to tell a toolstack about its abilities to
respond to values written to control/shutdown. However, because the
parent control xenstore key is created read-only to the guest, unless
empty nodes for the feature flags are also created reat/write by the
toolstack, the guest will not be able to set any flags.

This patch adds code to create all specified feature flag nodes at
domain creation time.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: fix printing hotplug arguments/environment
Roger Pau Monne [Tue, 2 Aug 2016 10:49:51 +0000 (12:49 +0200)]
libxl: fix printing hotplug arguments/environment

An OS could decide to not pass any environment variables to hotplug scripts,
and this will trigger a bug in device_hotplug logic, since it expects the
environment array to exist. Allow env to be NULL.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodocs: define semantics of vncpasswd in xl.cfg
Jim Fehlig [Fri, 29 Jul 2016 22:56:22 +0000 (16:56 -0600)]
docs: define semantics of vncpasswd in xl.cfg

A recent discussion around LSN-2016-0001 [1] included defining
the sematics of an empty string for a VNC password. It was stated
that "libxl interprets an empty password in the caller's
configuration to mean that passwordless access should be permitted".

The same applies for vncpasswd setting in xl.cfg. This patch
extends to xl.cfg documentation to define the semantics of setting
vncpasswd to an empty string.

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/PCI: update ACPI Check to include SGI Ux3
Boris Ostrovsky [Tue, 2 Aug 2016 15:52:44 +0000 (17:52 +0200)]
x86/PCI: update ACPI Check to include SGI Ux3

These systems use variations of SGI3* for ID string.

Instead of adding abother set of strings do what Linux did
in commit 526018bc and look at first three letters.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: use gcc6'es flags asm() output support
Jan Beulich [Tue, 2 Aug 2016 15:51:10 +0000 (17:51 +0200)]
x86: use gcc6'es flags asm() output support

..., rendering affected code more efficient and smaller.

Note that in atomic.h this at once does away with the redundant output
and input specifications of the memory location touched.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
8 years agoxen/types: Correct the definition of uintptr_t
Andrew Cooper [Mon, 1 Aug 2016 12:36:44 +0000 (13:36 +0100)]
xen/types: Correct the definition of uintptr_t

uintptr_t is specified as unsigned int in 32bit, not unsigned long.  This is
why, when copying inttypes.h from GCC, the use of PRIxPTR and similar is
broken for 32bit builds.

Use __attribute__((__mode__(__pointer__))) to get the compilers default
pointer type, which matches the pre-existing inttypes.h

Fix the identified breakage with ELF_PRPTRVAL

Compile tested on all architectures, with a manual printk() to trigger any
potential -Wformat issues.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/common: Sort the obj build order
Andrew Cooper [Mon, 1 Aug 2016 13:03:32 +0000 (14:03 +0100)]
xen/common: Sort the obj build order

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/types: Alter typedef for bool_t
Andrew Cooper [Mon, 1 Aug 2016 10:34:35 +0000 (11:34 +0100)]
xen/types: Alter typedef for bool_t

As xen/stdbool.h is included, the typedef should use bool rather than _Bool.

Suggested-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mm: Clean up the construction of base_disallow_mask
Andrew Cooper [Fri, 15 Jul 2016 18:34:00 +0000 (19:34 +0100)]
x86/mm: Clean up the construction of base_disallow_mask

 * Use _PAGE_AVAIL_HIGH and _PAGE_NX instead of opencoding them
 * Drop further remenants of the 32bit hypervisor build

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/mm: Avoid NULL dereference when checking altp2m's for shareability
Andrew Cooper [Wed, 27 Jul 2016 17:54:16 +0000 (18:54 +0100)]
x86/mm: Avoid NULL dereference when checking altp2m's for shareability

Coverity identifies that __get_gfn_type_access() unconditionally writes to its
type parameter under a number of circumstances.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agox86/vMSI-x: check whether msixtbl_list in msixtbl_pt_register()
Chao Gao [Mon, 1 Aug 2016 16:22:54 +0000 (18:22 +0200)]
x86/vMSI-x: check whether msixtbl_list in msixtbl_pt_register()

MSI-x tables' initializtion had been deferred in the commit
74c6dc2d0ac4dcab0c6243cdf6ed550c1532b798. If an assigned device does not support
MSI-x, the msixtbl_list won't be initialized. However, the following paths
    XEN_DOMCTL_bind_pt_irq
pt_irq_create_bind
    msixtbl_pt_register
do not check this case. Some errors(malwares, etc.) may lead to calling
XEN_DOMCTL_bind_pt_irq without a clear gtable and will cause Xen panic.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agomwait-idle: correct/improve BXT support
Jan Beulich [Mon, 1 Aug 2016 16:21:37 +0000 (18:21 +0200)]
mwait-idle: correct/improve BXT support

Linux commit 5dcef69486 ("intel_idle: add BXT support") added an
8-element lookup array with just a 2-bit value used for lookups. As per
the SDM that bit field is really 3 bits wide. Since the top two array
entries are zero, deal with the resulting invalid (zero) values by
moving the zero-MSR-value check into irtl_2_usec() and having that
function's caller check its result instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Linux commit: 3451ab3ebf92b12801878d8b5c94845afd4219f0]
[Linux commit: bef450962597ff39a7f9d53a30523aae9eb55843]

8 years agoMAINTAINERS: update Quan Xu's email address
Quan Xu [Mon, 1 Aug 2016 10:41:26 +0000 (11:41 +0100)]
MAINTAINERS: update Quan Xu's email address

Signed-off-by: Quan Xu <xuquan8@huawei.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
8 years agolibxl: compilation warning fix for arm & aarch64
Chris Patterson [Wed, 27 Jul 2016 20:01:26 +0000 (16:01 -0400)]
libxl: compilation warning fix for arm & aarch64

GCC 6 will warn on unused static const variables in c modules:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00847.html

When compiling with LIBXL_HAVE_NO_SUSPEND_RESUME set (arm & aarch64),
the compiler emits the following errors:
  xl_cmdimpl.c:101:19: error: 'migrate_report'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:99:19: error: 'migrate_permission_to_go'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:97:19: error: 'migrate_receiver_ready'
      defined but not used [-Werror=unused-const-variable=]
  xl_cmdimpl.c:95:19: error: 'migrate_receiver_banner'
      defined but not used [-Werror=unused-const-variable=]

These unused const variables are only used in functions which exist between
the ifndef block:
   #ifndef LIBXL_HAVE_NO_SUSPEND_RESUME
   ...
   #endif

Wrap the same ifndef around these variables.

Signed-off-by: Chris Patterson <pattersonc@ainfosec.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxsm: don't require configuring tools to build xen xsm blob
Wei Liu [Mon, 25 Jul 2016 15:13:13 +0000 (16:13 +0100)]
xsm: don't require configuring tools to build xen xsm blob

Starting from 08cffe66 ("xsm: add a default policy to .init.data") we
can attach a xsm policy blob to hypervisor. To build that policy blob
now hypervisor build system needs to enter tools directory.

The expectation for hypervisor and tools build systems is different. We
don't want xen build system to depend on configure but we want tools
build system to. That commit broke this expectation because it required
users to run configure before building hypervisor. This broke ARM build
because ARM developers normally build hypervisor and tools separately
(and possibly on different platforms). It can also break x86 if
developers don't run configure before building hypervisor with XSM on.

To fix it, move major part of tools/flask/policy/Makefile into
Makefile.common and create tools only Makefile to include that common
Makefile. Hypervisor Makefile will use Makefile.common to build xsm
policy.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Tested-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm: p2m: Pass the p2m in parameter rather the domain when it is possible
Julien Grall [Thu, 28 Jul 2016 14:20:20 +0000 (15:20 +0100)]
xen/arm: p2m: Pass the p2m in parameter rather the domain when it is possible

Some p2m functions do not care about the domain except to get the
associate p2m.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Replace flush_tlb_domain by p2m_flush_tlb
Julien Grall [Thu, 28 Jul 2016 14:20:19 +0000 (15:20 +0100)]
xen/arm: p2m: Replace flush_tlb_domain by p2m_flush_tlb

The function to flush the TLBs for a given p2m does not need to know about
the domain. So pass directly the p2m in parameter.

At the same time rename the function to p2m_flush_tlb to match the
parameter change.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Don't export flush_tlb_domain
Julien Grall [Thu, 28 Jul 2016 14:20:18 +0000 (15:20 +0100)]
xen/arm: Don't export flush_tlb_domain

The function flush_tlb_domain is not used outside of the file where it
has been declared.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Inline p2m_load_VTTBR into p2m_restore_state
Julien Grall [Thu, 28 Jul 2016 14:20:17 +0000 (15:20 +0100)]
xen/arm: p2m: Inline p2m_load_VTTBR into p2m_restore_state

p2m_restore_state is the last caller of p2m_load_VTTBR and already check
if the vCPU does not belong to the idle domain.

Note that it is likely possible to remove some isb in the function
p2m_restore_state, however this is not the purpose of this patch. So the
numerous isb have been left.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Rework the context switch to another VTTBR in flush_tlb_domain
Julien Grall [Thu, 28 Jul 2016 14:20:16 +0000 (15:20 +0100)]
xen/arm: p2m: Rework the context switch to another VTTBR in flush_tlb_domain

The current implementation of flush_tlb_domain is relying on the domain
to have a single p2m. With the upcoming feature altp2m, a single domain
may have different p2m. So we would need to switch to the correct p2m in
order to flush the TLBs.

Rather than checking whether the domain is not the current domain, check
whether the VTTBR is different. The resulting assembly code is much
smaller: from 38 instructions (+ 2 functions call) to 22 instructions.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Don't need to restore the state for an idle vCPU.
Julien Grall [Thu, 28 Jul 2016 14:20:15 +0000 (15:20 +0100)]
xen/arm: p2m: Don't need to restore the state for an idle vCPU.

The function p2m_restore_state could be called with an idle vCPU in
arguments (when called by construct_dom0). However, we will never return
to EL0/EL1 in this case, so it is not necessary to restore the p2m
registers.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Move the vttbr field from arch_domain to p2m_domain
Julien Grall [Thu, 28 Jul 2016 14:20:14 +0000 (15:20 +0100)]
xen/arm: p2m: Move the vttbr field from arch_domain to p2m_domain

The field vttbr holds the base address of the translation table for
guest. Its value will depends on how the p2m has been initialized and
will only be used by the P2M code.

So move the field from arch_domain to p2m_domain. This will also ease
the implementation of altp2m.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Don't call p2m_alloc_table from arch_domain_create
Julien Grall [Thu, 28 Jul 2016 14:20:13 +0000 (15:20 +0100)]
xen/arm: Don't call p2m_alloc_table from arch_domain_create

The p2m root table does not need to be allocate separately.

Also remove unnecessary fields initialization as the structure is already
memset to 0 and the fields will be overridden by p2m_alloc_table.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Switch the p2m lock from spinlock to rwlock
Julien Grall [Thu, 28 Jul 2016 14:20:12 +0000 (15:20 +0100)]
xen/arm: p2m: Switch the p2m lock from spinlock to rwlock

P2M reads do not require to be serialized. This will add contention
when PV drivers are using multi-queue because parallel grant
map/unmaps/copies will happen on DomU's p2m.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Introduce p2m_{read,write}_{,un}lock helpers
Julien Grall [Thu, 28 Jul 2016 14:20:11 +0000 (15:20 +0100)]
xen/arm: p2m: Introduce p2m_{read,write}_{,un}lock helpers

Some functions in the p2m code do not require to modify the P2M code.
Document it by introducing separate helpers to lock the p2m.

This patch does not change the lock. This will be done in a subsequent
patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Remove unnecessary locking
Julien Grall [Thu, 28 Jul 2016 14:20:10 +0000 (15:20 +0100)]
xen/arm: p2m: Remove unnecessary locking

The p2m is not yet in use when p2m_init and p2m_allocate_table are
called. Furthermore the p2m is not used anymore when p2m_teardown is
called. So taking the p2m lock is not necessary.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Find the memory attributes based on the p2m type
Julien Grall [Thu, 28 Jul 2016 14:20:09 +0000 (15:20 +0100)]
xen/arm: p2m: Find the memory attributes based on the p2m type

Currently, mfn_to_p2m_entry is relying on the caller to provide the
correct memory attribute and will deduce the sharability based on it.

Some of the callers, such as p2m_create_table, are using same memory
attribute regardless the underlying p2m type. For instance, this will
lead to use change the memory attribute from MATTR_DEV to MATTR_MEM when
a MMIO superpage is shattered.

Furthermore, it makes more difficult to support different shareability
with the same memory attribute.

All the memory attributes could be deduced via the p2m type. This will
simplify the code by dropping one parameter.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Differentiate cacheable vs non-cacheable MMIO
Julien Grall [Thu, 28 Jul 2016 14:20:08 +0000 (15:20 +0100)]
xen/arm: p2m: Differentiate cacheable vs non-cacheable MMIO

Currently, the p2m type p2m_mmio_direct is used to map in stage-2
cacheable MMIO (via map_regions_rw_cache) and non-cacheable one (via
map_mmio_regions). The p2m code is relying on the caller to give the
correct memory attribute.

In a follow-up patch, the p2m code will rely on the p2m type to find the
correct memory attribute. In preparation of this, introduce
p2m_mmio_direct_nc and p2m_mimo_direct_c to differentiate the
cacheability of the MMIO.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Use a whitelist rather than blacklist in get_page_from_gfn
Julien Grall [Thu, 28 Jul 2016 14:20:07 +0000 (15:20 +0100)]
xen/arm: p2m: Use a whitelist rather than blacklist in get_page_from_gfn

Currently, the check in get_page_from_gfn is using a blacklist. This is
very fragile because we may forgot to update the check when a new p2m
type is added.

To avoid any possible issue, use a whitelist. All type backed by a RAM
page can could potential be valid. The check is borrowed from x86.

Note with this change, it is not possible anymore to retrieve a page when
the p2m type is p2m_iommu_map_*. This is fine because they are special
mappings for direct mapping workaround and the associated GFN should be
used at all by callers of get_page_from_gfn.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Use the typesafe MFN in mfn_to_p2m_entry
Julien Grall [Thu, 28 Jul 2016 14:20:06 +0000 (15:20 +0100)]
xen/arm: p2m: Use the typesafe MFN in mfn_to_p2m_entry

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agolibxl: fix double free when doing xl save
Juergen Gross [Thu, 28 Jul 2016 07:21:43 +0000 (09:21 +0200)]
libxl: fix double free when doing xl save

Commit d2412fd63b14c6c21d0a3d4367afa448425dfb8a ("libxl: move common
nic stuff into one source") introduced a double free error in libxl
which occurred during "xl save".

Correct this error.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxen/arm: Fix coding style and update comment in acpi_route_spis
Julien Grall [Wed, 27 Jul 2016 13:58:30 +0000 (14:58 +0100)]
xen/arm: Fix coding style and update comment in acpi_route_spis

The comment was not correctly indented. Also the preferred name for the
initial domain is "hardware domain" and not "dom0, so replace it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: acpi: route all unused IRQs to DOM0
Julien Grall [Wed, 27 Jul 2016 13:58:29 +0000 (14:58 +0100)]
xen/arm: acpi: route all unused IRQs to DOM0

It is not possible to know which IRQs will be used by DOM0 when ACPI is
inuse. The approach implemented by this patch, will route all unused
IRQs to DOM0 before it has booted.

The number of IRQs routed is based on the maximum SPIs supported by the
hardware (up to ~1000). However, some of them might not be wired. So we
would allocate resource for nothing.

For each IRQ routed, Xen is allocating memory for irqaction (40 bytes)
and irq_guest (16 bytes). So in the worst case scenario ~54KB of memory
will be allocated. Given that ACPI will mostly be used by server, I
think it is a small drawback.

map_irq_to_domain is slightly reworked to remove the dependency on
device-tree. So the function can be also be used for ACPI and will
avoid code duplication.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
8 years agoxen/arm: Allow DOM0 to set the IRQ type
Julien Grall [Wed, 27 Jul 2016 13:58:28 +0000 (14:58 +0100)]
xen/arm: Allow DOM0 to set the IRQ type

The function route_irq_to_guest mandates the IRQ type, stored in
desc->arch.type, to be valid. However, in case of ACPI, these
information is not part of the static tables. Therefore Xen needs to
rely on DOM0 to provide a valid type based on the firmware tables.

A new helper, irq_type_set_by_domain is provided to check whether a
domain is allowed to set the IRQ type. For now, only DOM0 is allowed to
configure.

When the helper returns 1, the routing function will not check whether
the IRQ type is correctly set and configure the GIC. Instead, this will
be done when the domain will enable the interrupt.

Note that irq_set_spi_type is not called because it validates the type
and does not allow it the domain to change the type after the first
write. It means that desc->arch.type may never be set, which is fine
because the field is only used to configure the type during the routing.

Based on 4.3.13 in ARM IHI 0048B.b, changing the value of Int_config is
UNPREDICTABLE when the corresponding interrupt is not disabled.

Therefore, setting the IRQ type when the guest is writing into ICFGR
would require more work to make sure the IRQ has been disabled before
writing into the host ICFGR. As the behavior is UNPREDICTABLE, the type
will be set before enabling the physical IRQ associated to the virtual IRQ.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Tested-by: Shanker Donthineni <shankerd@codeaurora.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoRevert "xen/arm: warn the user that we cannot route SPIs to Dom0 on ACPI"
Julien Grall [Wed, 27 Jul 2016 13:58:27 +0000 (14:58 +0100)]
Revert "xen/arm: warn the user that we cannot route SPIs to Dom0 on ACPI"

This reverts commit f91c84edebe67296e4051af055dbf0adafb13a37. SPI
routing for ACPI support will be added in a follow-up patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: Document how gic_set_irq_type should be called
Julien Grall [Wed, 27 Jul 2016 13:58:26 +0000 (14:58 +0100)]
xen/arm: gic: Document how gic_set_irq_type should be called

Changing the value of Int_config is UNPREDICTABLE when the corresponding
interrupt is not disabled.

The driver is assuming the interrupt will be disabled by the caller of
gic_set_irq_type. Add an ASSERT to ensure it.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: set_type: Pass the type in parameter rather than in desc->arch.type
Julien Grall [Wed, 27 Jul 2016 13:58:25 +0000 (14:58 +0100)]
xen/arm: gic: set_type: Pass the type in parameter rather than in desc->arch.type

A follow-up patch will not store the type in desc->arch.type. Also, the
callback prototype is more logical.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: split set_irq_properties
Julien Grall [Wed, 27 Jul 2016 13:58:24 +0000 (14:58 +0100)]
xen/arm: gic: split set_irq_properties

The callback set_irq_properties will configure the GIC for a specific
IRQ with the type and the priority.

In a follow-up patch, Xen will configure the type and the priority at
different stage of the routing. So split it in 2 separate callbacks.

At the same time, move the ASSERT to check the validity of the type and
if the desc->lock is locked in the common code (gic.c). This is because
the constraint are the same between GICv2 and GICv3, however the driver
of the latter did not contain any sanity check.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: Do not configure affinity during routing
Julien Grall [Wed, 27 Jul 2016 13:58:23 +0000 (14:58 +0100)]
xen/arm: gic: Do not configure affinity during routing

The affinity of a guest IRQ is set every time the guest enable it (see
vgic_enable_irqs).

It is not necessary to set the affinity when the IRQ is routed to the
guest because Xen will never receive the IRQ until it hass been enabled
by the guest.

To keep gic_route_irq_to_{xen,guest} behaving the same way (i.e just
setting up the routing), the affinity of IRQ routed to Xen is moved into
__setup_irq.

Signed-off-by: Julien grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: gic: Consolidate the IRQ affinity set in a single place
Julien Grall [Wed, 27 Jul 2016 13:58:22 +0000 (14:58 +0100)]
xen/arm: gic: Consolidate the IRQ affinity set in a single place

The code to set the IRQ affinity is duplicated: once in
gicv{2,3}_set_properties and the other is gicv{2,3}_irq_set_affinity.

Remove the code from gicv{2,3}_set_properties and call directly the
affinity set helper from the common code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/domctl: Add DOMINFO_hap to xen_domctl_getdomaininfo
Andrew Cooper [Fri, 15 Jul 2016 15:43:48 +0000 (16:43 +0100)]
xen/domctl: Add DOMINFO_hap to xen_domctl_getdomaininfo

This allows a toolstack to identify whether a running domain is using hardware
assisted paging or not.

The appropriate tests differ by architecture, so introduce
arch_get_domain_info().  ARM unconditionally sets the new flag, while x86
checks with the paging subsystem first.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agolibxl: move common nic stuff into one source
Juergen Gross [Tue, 12 Jul 2016 15:30:44 +0000 (17:30 +0200)]
libxl: move common nic stuff into one source

Put all nic related stuff of libxl form common files into a dedicated
source file.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: add config update callback to device type framework
Juergen Gross [Tue, 12 Jul 2016 15:30:43 +0000 (17:30 +0200)]
libxl: add config update callback to device type framework

Some device types require a configuration update after resume of
domain. Add a callback for this purpose.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: split libxl vtpm code into one source
Juergen Gross [Tue, 12 Jul 2016 15:30:42 +0000 (17:30 +0200)]
libxl: split libxl vtpm code into one source

Put all vtpm related stuff of libxl into a dedicated source file.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: move library pvusb specific code into libxl_pvusb.c
Juergen Gross [Tue, 12 Jul 2016 15:30:41 +0000 (17:30 +0200)]
libxl: move library pvusb specific code into libxl_pvusb.c

Outside libxl_pvusb.c only libxl_util.c still contains some pvusb code.

Move it to libxl_pvusb.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: add "pv device mode needed" support to device type framework
Juergen Gross [Tue, 12 Jul 2016 15:30:40 +0000 (17:30 +0200)]
libxl: add "pv device mode needed" support to device type framework

Add another callback to the device type framework in order to aid
decision whether a pv domain needs a device model.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: add "merge" function to generic device type support
Juergen Gross [Tue, 12 Jul 2016 15:30:39 +0000 (17:30 +0200)]
libxl: add "merge" function to generic device type support

Instead of using a macro generating the code to merge xenstore and
json configuration data, use the generic device type support for
this purpose.

This requires to add some accessor functions to the framework and
a structure for disks (as disks are added separately they didn't need
such a structure up to now).

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoaltp2m: Allow shared entries to be copied to altp2m views during lazycopy
Tamas K Lengyel [Wed, 27 Jul 2016 09:31:59 +0000 (10:31 +0100)]
altp2m: Allow shared entries to be copied to altp2m views during lazycopy

Move sharing locks above altp2m to avoid locking order violation and crashing
the hypervisor during unsharing operations when altp2m is active.

Applying mem_access settings or remapping gfns in altp2m views will
automatically unshare the page if it was shared previously. Also,
disallow nominating pages for which there are pre-existing altp2m
mem_access settings or remappings present. However, allow altp2m to
populate altp2m views with shared entries during lazycopy as unsharing
will automatically propagate the change to these entries in altp2m
views as well.

While we're here, switch to using the appropriate wrappers rather than
calling p2m->get_entry() directly.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen/arm: p2m: Simplify p2m type check by using bitmask
Julien Grall [Wed, 20 Jul 2016 16:10:50 +0000 (17:10 +0100)]
xen/arm: p2m: Simplify p2m type check by using bitmask

The resulting assembly code for the macros is much simpler and will
never contain more than one instruction branch.

The idea is taken from x86 (see include/asm-x86/p2m.h). Also move the
two helpers earlier to keep all the p2m type definitions together.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Use p2m_is_foreign in get_page_from_gfn to avoid open coding
Julien Grall [Wed, 20 Jul 2016 16:10:49 +0000 (17:10 +0100)]
xen/arm: p2m: Use p2m_is_foreign in get_page_from_gfn to avoid open coding

No functional change.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Clean-up mfn_to_p2m_entry
Julien Grall [Wed, 20 Jul 2016 16:10:47 +0000 (17:10 +0100)]
xen/arm: p2m: Clean-up mfn_to_p2m_entry

The physical address is computed from the machine frame number, so
checking if the physical address is page aligned is pointless.

Furthermore, directly assigned the MFN to the corresponding field in the
entry rather than converting to a physical address and orring the value.
It will avoid to rely on the field position and make the code clearer.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoarm/vgic: Change fixed number of mmio handlers to variable number
Shanker Donthineni [Wed, 20 Jul 2016 14:00:56 +0000 (09:00 -0500)]
arm/vgic: Change fixed number of mmio handlers to variable number

Compute the number of mmio handlers that are required for vGICv3 and
vGICv2 emulation drivers in vgic_v3_init()/vgic_v2_init(). Augment
this variable number of mmio handlers to a fixed number MAX_IO_HANDLER
and pass it to domain_io_init() to allocate enough memory.

New code path:
 domain_vgic_register(&count)
   domain_io_init(count + MAX_IO_HANDLER)
     domain_vgic_init()

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm: io: Use binary search for mmio handler lookup
Shanker Donthineni [Wed, 20 Jul 2016 14:00:55 +0000 (09:00 -0500)]
xen/arm: io: Use binary search for mmio handler lookup

As the number of I/O handlers increase, the overhead associated with
linear lookup also increases. The system might have maximum of 144
(assuming CONFIG_NR_CPUS=128) mmio handlers. In worst case scenario,
it would require 144 iterations for finding a matching handler. Now
it is time for us to change from linear (complexity O(n)) to a binary
search (complexity O(log n) for reducing mmio handler lookup overhead.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen: Add generic implementation of binary search
Shanker Donthineni [Wed, 20 Jul 2016 14:00:54 +0000 (09:00 -0500)]
xen: Add generic implementation of binary search

This patch adds the generic implementation of binary search algorithm
which is copied from Linux kernel v4.7-rc7. No functional changes.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
8 years agoarm/io: Use separate memory allocation for mmio handlers
Shanker Donthineni [Wed, 20 Jul 2016 14:00:53 +0000 (09:00 -0500)]
arm/io: Use separate memory allocation for mmio handlers

The number of mmio handlers are limited to a compile time macro
MAX_IO_HANDLER which is 16. This number is not at all sufficient
to support per CPU distributor regions. Either it needs to be
increased to a bigger number, at least CONFIG_NR_CPUS+16, or
allocate a separate memory for mmio handlers dynamically during
domain build.

This patch uses the dynamic allocation strategy to reduce memory
footprint for 'struct domain' instead of static allocation.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86/entry: Avoid SMAP violation in compat_create_bounce_frame()
Andrew Cooper [Wed, 15 Jun 2016 17:32:14 +0000 (18:32 +0100)]
x86/entry: Avoid SMAP violation in compat_create_bounce_frame()

A 32bit guest kernel might be running on user mappings.
compat_create_bounce_frame() must whitelist its guest accesses to avoid
risking a SMAP violation.

For both variants of create_bounce_frame(), re-blacklist user accesses if
execution exits via an exception table redirection.

This is XSA-183 / CVE-2016-6259

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/pv: Remove unsafe bits from the mod_l?_entry() fastpath
Andrew Cooper [Mon, 11 Jul 2016 13:32:03 +0000 (14:32 +0100)]
x86/pv: Remove unsafe bits from the mod_l?_entry() fastpath

All changes in writeability and cacheability must go through full
re-validation.

Rework the logic as a whitelist, to make it clearer to follow.

This is XSA-182

Reported-by: Jérémie Boutoille <jboutoille@ext.quarkslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
8 years agoxen: Remove buggy initial placement algorithm
George Dunlap [Fri, 15 Jul 2016 17:25:52 +0000 (18:25 +0100)]
xen: Remove buggy initial placement algorithm

The initial placement algorithm sometimes picks cpus outside of the
mask it's given, does a lot of unnecessary bitmasking, does its own
separate load calculation, and completely ignores vcpu hard and soft
affinities.  Just get rid of it and rely on the schedulers to do
initial placement.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoxen: Have schedulers revise initial placement
George Dunlap [Fri, 15 Jul 2016 16:20:36 +0000 (17:20 +0100)]
xen: Have schedulers revise initial placement

The generic domain creation logic in
xen/common/domctl.c:default_vcpu0_location() attempts to try to do
initial placement load-balancing by placing vcpu 0 on the least-busy
non-primary hyperthread available.  Unfortunately, the logic can end
up picking a pcpu that's not in the online mask.  When this is passed
to a scheduler such which assumes that the initial assignment is
valid, it causes a null pointer dereference looking up the runqueue.

Furthermore, this initial placement doesn't take into account hard or
soft affinity, or any scheduler-specific knowledge (such as historic
runqueue load, as in credit2).

To solve this, when inserting a vcpu, always call the per-scheduler
"pick" function to revise the initial placement.  This will
automatically take all knowledge the scheduler has into account.

csched2_cpu_pick ASSERTs that the vcpu's pcpu scheduler lock has been
taken.  Grab and release the lock to minimize time spend with irqs
disabled.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Reviwed-by: Dario Faggioli <dario.faggioli@citrix.com>
8 years agoxen: Some code motion to avoid having to do forward-declaration
George Dunlap [Mon, 25 Jul 2016 11:09:52 +0000 (12:09 +0100)]
xen: Some code motion to avoid having to do forward-declaration

For sched_credit2, move the vcpu insert / remove / free functions near the domain
insert / remove / alloc / free functions (and after cpu_pick).

For sched_rt, move rt_cpu_pick() further up.

This is pure code motion; no functional change.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>​
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
8 years agosystemd: use standard dependencies for xendriverdomain.service
Marek Marczykowski-Górecki [Sun, 24 Jul 2016 19:26:57 +0000 (21:26 +0200)]
systemd: use standard dependencies for xendriverdomain.service

Having DefaultDependencies=no means it can be started before / is
remounted read-write, which will result in various failures (to start
with opening the log).
Since "libxl: trigger attach events for devices attached before xl devd
startup" it is no longer important to start it as early as possible,
because it will process devices created before its startup.

Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/libxc: Properly increment ApicIdCoreSize field on AMD
Boris Ostrovsky [Fri, 22 Jul 2016 17:14:01 +0000 (13:14 -0400)]
tools/libxc: Properly increment ApicIdCoreSize field on AMD

Current code incorrectly adds 1 to full register instead of
incrementing the field in bits 15:12.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices
Andrew Cooper [Mon, 18 Jul 2016 21:04:43 +0000 (22:04 +0100)]
x86/vMSI-X: Fix host crash when shutting down guests with MSI capable devices

c/s 74c6dc2d "x86/vMSI-X: defer intercept handler registration" caused MSI-X
table infrastructure not to always be initialised, but it missed one path
which needed an is-initialised check.

If a devices is passed through to a domain which is MSI capable but not MSI-X
capable, the call to msixtbl_init() is omitted, but a XEN_DOMCTL_unbind_pt_irq
hypercall still calls into msixtbl_pt_unregister().  This follows the linked
list pointer which is still NULL.

Introduce an is-initalised check to msixtbl_pt_unregister().

Furthermore, the purpose of the open-coded msixtbl_list.next check is rather
subtle.  Introduce an msixtbl_initialised() predicate instead, which makes its
purpose far more obvious.

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: don't let b_avgload go negative.
Dario Faggioli [Fri, 22 Jul 2016 12:04:53 +0000 (14:04 +0200)]
xen: credit2: don't let b_avgload go negative.

The ASSERT() made effective by b5b5876619bd8ec2e
("xen: credit2: fix two s_time_t handling issues
in load balancing") triggers for b_avgload (spotted
by OSSTest).

b_avgload is where we store the prediction of how
the load of a runqueue will look like in the medium
to long term, because of a vcpu being added to or
removed from there.

On vcpu removal, saturate down b_avgload to zero,
as it makes very few sense to predict that the
load of a runqueue will at some point become negative!

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen/arm: p2m: Fix multi-lines coding style comments
Julien Grall [Wed, 20 Jul 2016 16:10:46 +0000 (17:10 +0100)]
xen/arm: p2m: Fix multi-lines coding style comments

The start and end markers should be on separate lines.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Restrict usage of get_page_from_gva to the current vCPU
Julien Grall [Wed, 20 Jul 2016 16:10:45 +0000 (17:10 +0100)]
xen/arm: p2m: Restrict usage of get_page_from_gva to the current vCPU

The function get_page_from_gva translates a guest virtual address to a
machine address. The translation involves the register VTTBR_EL2,
TTBR0_EL1, TTBR1_EL1 and SCTLR_EL1.

Currently, only the first register is context switch is the current
domain is not the same. This will result to use the wrong TTBR*_EL1 and
SCTLR_EL1 for the translation.

To fix the code properly, we would have to context switch all the
registers mentioned above when the vCPU in parameter is not the current
one. Similar things would need to be done in the callee
p2m_mem_check_and_get_page.

Given that the only caller of this function with the vCPU that may not
be current is a guest debugging function (show_guest_stack), restrict
the usage to the current vCPU for the time being.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: p2m: Pass the vCPU in parameter to get_page_from_gva
Julien Grall [Wed, 20 Jul 2016 16:10:44 +0000 (17:10 +0100)]
xen/arm: p2m: Pass the vCPU in parameter to get_page_from_gva

The function get_page_from_gva translates a guest virtual address to a
machine address. The translation involves the register VTTBR_EL2,
TTBR0_EL1, TTBR1_EL1 and SCTLR_EL1. Whilst the first register is per
domain (the p2m is common to every vCPUs), the last 3 are per-vCPU.

Therefore, the function should take the vCPU in parameter and not the
domain. Fixing the actual code path will be done a separate patch.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: system: Use the correct parameter name in local_irq_restore
Julien Grall [Wed, 20 Jul 2016 16:10:43 +0000 (17:10 +0100)]
xen/arm: system: Use the correct parameter name in local_irq_restore

The parameter to store the flags is called 'x' and not 'flags'.
Thankfully all the user of the macro is passing 'flags'.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoarm/traps: fix bug in dump_guest_s1_walk handling of level 2 page tables
Jonathan Daugherty [Wed, 20 Jul 2016 16:10:17 +0000 (09:10 -0700)]
arm/traps: fix bug in dump_guest_s1_walk handling of level 2 page tables

dump_guest_s1_walk intends to walk to level 2 page table entries but
was failing to do so because of a check that caused level 2 page table
descriptors to be ignored. This change fixes the check so that level 2
page table walks occur as intended by ignoring descriptors unless their
low two bits match the expected sequence [0,1].

For more information, see the ARMv7-A ARM DDI 0406C.b, section B3.5.1.

Signed-off-by: Jonathan Daugherty <jtd@galois.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoarm/traps: fix bug in dump_guest_s1_walk L1 page table offset computation
Jonathan Daugherty [Wed, 20 Jul 2016 16:10:16 +0000 (09:10 -0700)]
arm/traps: fix bug in dump_guest_s1_walk L1 page table offset computation

The dump_guest_s1_walk function was incorrectly using the top 10 bits of
the virtual address to select the L1 page table index.  The correct
amount is 12 bits, resulting in a shift of 20 bits rather than 22.

For more details, see the ARMv7-A ARM DDI 0406C.b, section B3.5,
"Short-descriptor translation table format."

Signed-off-by: Jonathan Daugherty <jtd@galois.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxenstore: add assertion in database dumping code
Wei Liu [Wed, 20 Jul 2016 14:13:42 +0000 (15:13 +0100)]
xenstore: add assertion in database dumping code

If memfile is NULL, the signal handler won't be installed, hence fopen
won't dereference NULL. Coverity is not smart enough to figure that out
unfortunately.

Add an assertion to prevent coverity from complaining.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxenstore: send error earlier in do_mkdir
Wei Liu [Wed, 20 Jul 2016 14:13:41 +0000 (15:13 +0100)]
xenstore: send error earlier in do_mkdir

XenServer's coverity instance complains that a few lines below
create_node dereferences NULL if name == NULL. It however fails to
figure out that if node is NULL, errno won't be ENOENT, so do_mkdir
should have bailed before create_node.

That said, it would be good if we don't need to go through the hops.  We
can bail earlier if name is NULL.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agooxenstored: honour XEN_RUN_DIR
Wei Liu [Mon, 11 Jul 2016 17:28:09 +0000 (18:28 +0100)]
oxenstored: honour XEN_RUN_DIR

Move default the pid file under XEN_RUN_DIR. Note that it changes the
location from /var/run to /var/run/xen.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: David Scott <dave@recoil.org>
8 years agolibxenstat: honour XEN_RUN_DIR
Wei Liu [Mon, 11 Jul 2016 17:28:08 +0000 (18:28 +0100)]
libxenstat: honour XEN_RUN_DIR

This is because libxl uses XEN_RUN_DIR to generate the socket path for
libxenstat while libxenstat itself uses hard-coded path, which is not
necessarily the same path as XEN_RUN_DIR.  The default configuration
happened to work because XEN_RUN_DIR defaulted to /var/run/xen, which
matched the hard-coded path.

We should make libxenstat use XEN_RUN_DIR so that it works with
non-default configuration.

Generate a _paths.h because it is required to make this change work.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agohotplug/Linux: honour XEN_RUN_DIR
Wei Liu [Mon, 11 Jul 2016 17:28:07 +0000 (18:28 +0100)]
hotplug/Linux: honour XEN_RUN_DIR

Store various PID files under XEN_RUN_DIR. Note that this change the
default location from /var/run to /var/run/xen.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agohotplug/NetBSD: honour XEN_RUN_DIR
Wei Liu [Mon, 11 Jul 2016 17:28:06 +0000 (18:28 +0100)]
hotplug/NetBSD: honour XEN_RUN_DIR

Store xldevd.pid under XEN_RUN_DIR. Note that this will change the
default location from /var/run to /var/run/xen.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agohotplug/FreeBSD: honour XEN_RUN_DIR
Wei Liu [Mon, 11 Jul 2016 17:28:05 +0000 (18:28 +0100)]
hotplug/FreeBSD: honour XEN_RUN_DIR

Store xldevd.pid under XEN_RUN_DIR. Note that the default location would
change from /var/run to /var/run/xen.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools/helper: honour XEN_RUN_DIR in init-xenstore-domain.c
Wei Liu [Mon, 11 Jul 2016 17:28:04 +0000 (18:28 +0100)]
tools/helper: honour XEN_RUN_DIR in init-xenstore-domain.c

Place the PID file under XEN_RUN_DIR. Note that this change the default
location from /var/run to /var/run/xen.

Generate a _paths.h as that is required to make this change work.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxenconsoled: honour XEN_RUN_DIR
Wei Liu [Mon, 11 Jul 2016 17:28:03 +0000 (18:28 +0100)]
xenconsoled: honour XEN_RUN_DIR

Place the PID file under XEN_RUN_DIR by default. Note this change the
default location from /var/run to /var/run/xen.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxl: rename variable pause to pause_after_migration
Wei Liu [Wed, 20 Jul 2016 08:30:17 +0000 (09:30 +0100)]
xl: rename variable pause to pause_after_migration

Gcc 4.4.4 complained that the "pause" variable introduced in 22b430e0
("xl: add option to leave domain paused after migration") shadowed
pause(2) declaration in unistd.h.

Rename "pause" to "pause_after_migration" to fix this issue.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxen: credit2: fix two s_time_t handling issues in load balancing
Dario Faggioli [Wed, 20 Jul 2016 09:50:12 +0000 (10:50 +0100)]
xen: credit2: fix two s_time_t handling issues in load balancing

both introduced in d205f8a7f48e2ec ("xen: credit2: rework
load tracking logic").

First, in __update_runq_load(), the ASSERT() was actually
useless. Let's instead check that the computed value of
the load has not overflowed (and hence gone negative).

While there, do that in __update_svc_load() as well.

Second, in balance_load(), cpus_max needs being extended
in order to be correctly shifted, and the result compared
with an s_time_t value, without risking loosing info.

Spotted by Coverity.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen: credit2: implement true SMT support
Dario Faggioli [Wed, 20 Jul 2016 09:55:55 +0000 (10:55 +0100)]
xen: credit2: implement true SMT support

In fact, right now, we recommend keepeing runqueues
arranged per-core, so that it is the inter-runqueue load
balancing code that automatically spreads the work in an
SMT friendly way. This means that any other runq
arrangement one may want to use falls short of SMT
scheduling optimizations.

This commit implements SMT awareness --similar to the
one we have in Credit1-- for any possible runq
arrangement. This turned out to be pretty easy to do,
as the logic can live entirely in runq_tickle()
(although, in order to avoid for_each_cpu loops in
that function, we use a new cpumask which indeed needs
to be updated in other places).

In addition to disentangling SMT awareness from load
balancing, this also allows us to support the
sched_smt_power_savings parametar in Credit2 as well.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Anshul Makkar <anshul.makkar@citrix.com>
8 years agoxl: add option to leave domain paused after migration
Roger Pau Monne [Tue, 19 Jul 2016 08:58:15 +0000 (10:58 +0200)]
xl: add option to leave domain paused after migration

This is useful for debugging domains that crash on resume from migration.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: trigger attach events for devices attached before xl devd startup
Marek Marczykowski-Górecki [Fri, 15 Jul 2016 23:47:56 +0000 (01:47 +0200)]
libxl: trigger attach events for devices attached before xl devd startup

When this daemon is started after creating backend device, that device
will not be configured.

Racy situation:
1. driver domain is started
2. frontend domain is started (just after kicking driver domain off)
3. device in frontend domain is connected to the backend (as specified
   in frontend domain configuration)
4. xl devd is started in driver domain

End result is that backend device in driver domain is not configured
(like network interface is not enabled), so the device doesn't work.

Fix this by artifically triggering events for devices already present in
xenstore before xl devd is started. Do this only after xenstore watch is
already registered, and only for devices not already initialized (in
XenbusStateInitWait state).

Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoxenstore: add memory allocation debugging capability
Juergen Gross [Tue, 19 Jul 2016 12:08:18 +0000 (14:08 +0200)]
xenstore: add memory allocation debugging capability

Add support for debugging memory allocation statistics to xenstored.
Specifying "-M <file>" on the command line will enable the feature.
Whenever xenstored receives SIGUSR1 it will dump out a full talloc
report to <file>. This helps finding e.g. memory leaks in xenstored.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxenstore: use temporary memory context for firing watches
Juergen Gross [Tue, 19 Jul 2016 11:30:46 +0000 (13:30 +0200)]
xenstore: use temporary memory context for firing watches

Use a temporary memory context for memory allocations when firing
watches. This will avoid leaking memory in case of long living
connections and/or xenstore entries.

This requires adding a new parameter to fire_watches() and add_event()
to specify the memory context to use for allocations.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxenstore: add explicit memory context parameter to get_node()
Juergen Gross [Tue, 19 Jul 2016 11:30:45 +0000 (13:30 +0200)]
xenstore: add explicit memory context parameter to get_node()

Add a parameter to xenstored get_node() function to explicitly
specify the memory context to be used for allocations. This will make
it easier to avoid memory leaks by using a context which is freed
soon.

This requires adding the temporary context to errno_from_parents() and
ask_parents(), too.

When calling get_node() select a sensible memory context for the new
parameter by preferring a temporary one.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxenstore: add explicit memory context parameter to read_node()
Juergen Gross [Tue, 19 Jul 2016 11:30:44 +0000 (13:30 +0200)]
xenstore: add explicit memory context parameter to read_node()

Add a parameter to xenstored read_node() function to explicitly
specify the memory context to be used for allocations. This will make
it easier to avoid memory leaks by using a context which is freed
soon.

When calling read_node() select a sensible memory context for the new
parameter by preferring a temporary one.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxenstore: add explicit memory context parameter to get_parent()
Juergen Gross [Tue, 19 Jul 2016 11:30:43 +0000 (13:30 +0200)]
xenstore: add explicit memory context parameter to get_parent()

Add a parameter to xenstored get_parent() function to explicitly
specify the memory context to be used for allocations. This will make
it easier to avoid memory leaks by using a context which is freed
soon.

When available use a temporary context when calling get_parent(),
otherwise mimic the old behavior by calling get_parent() with the same
argument for both parameters.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoxenstore: call each xenstored command function with temporary context
Juergen Gross [Tue, 19 Jul 2016 11:30:42 +0000 (13:30 +0200)]
xenstore: call each xenstored command function with temporary context

In order to be able to avoid leaving temporary memory allocated after
processing of a command in xenstored call all command functions with
the temporary "in" context. Each function can then make use of that
temporary context for allocating temporary memory instead of either
leaving that memory allocated until the connection is dropped (or
even until end of xenstored) or freeing the memory itself.

This requires to modify the interfaces of the functions taking only
one argument from the connection by moving the call of onearg() into
the single functions. Other than that no functional change.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>