]> xenbits.xensource.com Git - people/liuw/libxenctrl-split/xen.git/log
people/liuw/libxenctrl-split/xen.git
9 years agox86: support 2- and 3-way alternatives
Jan Beulich [Thu, 4 Feb 2016 10:38:52 +0000 (11:38 +0100)]
x86: support 2- and 3-way alternatives

Parts taken from Linux, but implementing the ALTERNATIVE*() macros
recursively to avoid needless redundancy.

Also make the .discard section non-writable (we might even consider
dropping its alloc flag too) and limit the pushing and popping of
sections.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/PV: fix unintended dependency of m2p-strict mode on migration-v2
Jan Beulich [Wed, 3 Feb 2016 13:12:00 +0000 (14:12 +0100)]
x86/PV: fix unintended dependency of m2p-strict mode on migration-v2

This went unnoticed until a backport of this to an older Xen got used,
causing migration of guests enabling this VM assist to fail, because
page table pinning there precedes vCPU context loading, and hence L4
tables get initialized for the wrong mode. Fix this by post-processing
L4 tables when setting the intended VM assist flags for the guest.

Note that this leaves in place a dependency on vCPU 0 getting its guest
context restored first, but afaict the logic here is not the only thing
depending on that.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agospinlock: fair read-write locks
Jennifer Herbert [Wed, 3 Feb 2016 13:10:33 +0000 (14:10 +0100)]
spinlock: fair read-write locks

The current rwlocks are write-biased and unfair.  This allows writers
to starve readers in situations where there are many writers (e.g.,
p2m type changes from log dirty updates during domain save).

Replace the current implementation with queued read-write locks which use
a fair spinlock (a ticket lock in this case) to ensure fairness between
readers and writers when they are contended.

This implementation is from the Linux commit 70af2f8a4f48 by Waiman
Long and Peter Zijlstra.

    locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks

    This rwlock uses the arch_spin_lock_t as a waitqueue, and assuming
    the arch_spin_lock_t is a fair lock (ticket,mcs etc..) the
    resulting rwlock is a fair lock.

    It fits in the same 8 bytes as the regular rwlock_t by folding the
    reader and writer count into a single integer, using the remaining
    4 bytes for the arch_spinlock_t.

    Architectures that can single-copy adress bytes can optimize
    queue_write_unlock() with a 0 write to the LSB (the write count).

We do not yet make use of the architecture-specific optimization noted
above.

Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agospinlock: move rwlock API and per-cpu rwlocks into their own files
Jennifer Herbert [Wed, 3 Feb 2016 13:09:09 +0000 (14:09 +0100)]
spinlock: move rwlock API and per-cpu rwlocks into their own files

In preparation for a replacement read-write lock implementation, move
the API and the per-cpu read-write locks into their own files.

Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agopl011: Refactor pl011 driver to dt and common initialization parts
Shannon Zhao [Sat, 23 Jan 2016 08:00:20 +0000 (16:00 +0800)]
pl011: Refactor pl011 driver to dt and common initialization parts

Refactor pl011 driver to dt and common initialization parts. This will
be useful later when acpi specific uart initialization function is
introduced.

Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoarm/uart: Rename dt-uart.c to arm-uart.c
Shannon Zhao [Sat, 23 Jan 2016 08:00:19 +0000 (16:00 +0800)]
arm/uart: Rename dt-uart.c to arm-uart.c

Since we will add ACPI initialization for UART in this file later,
rename it with a generic name.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoarm/gic-v3: Refactor gicv3_init into generic and dt specific parts
Shannon Zhao [Thu, 28 Jan 2016 02:33:12 +0000 (10:33 +0800)]
arm/gic-v3: Refactor gicv3_init into generic and dt specific parts

Refactor gic-v3 related functions into dt and generic parts. This will be
helpful when adding acpi support for gic-v3.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoarm/gic-v2: Refactor gicv2_init into generic and dt specific parts
Shannon Zhao [Sat, 23 Jan 2016 08:00:17 +0000 (16:00 +0800)]
arm/gic-v2: Refactor gicv2_init into generic and dt specific parts

Refactor gic-v2 related functions into dt and generic parts. This will be
helpful when adding acpi support for gic.

Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoarm/smpboot: Move dt specific code in smp to seperate functions
Shannon Zhao [Sat, 23 Jan 2016 08:00:16 +0000 (16:00 +0800)]
arm/smpboot: Move dt specific code in smp to seperate functions

Partition smp initialization functions into generic and dt specific
parts, this will be useful when introducing new functions for smp
initialization based on acpi.

Signed-off-by: Parth Dixit <parth.dixit@linaro.org>
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoxenstore: add stddef.h to xenstore_lib.h
Ian Campbell [Wed, 27 Jan 2016 17:06:09 +0000 (17:06 +0000)]
xenstore: add stddef.h to xenstore_lib.h

xs_perm_to_string takes a size_t which isn't defined by anything
pulled in directly by this header.

Given the other headers xenstore_lib.h pulls in this looks to be an
oversight rather than a deliberate policy.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxl: improve logging on domain create failure.
Ian Campbell [Tue, 26 Jan 2016 14:38:46 +0000 (14:38 +0000)]
tools/libxl: improve logging on domain create failure.

A user reported[0] that xl create failed with just:
    libxl: error: libxl_create.c:892:initiate_domain_create: Unable to set domain build info defaults
and some resulting fallout, but without indicating why it was unable
to set the defaults, even in verbose mode[1].

Go through libxl__domain_{create,build}_info_setdefault and ensure
that each error path logs something.

In most cases this involved simply adding a call to LOG.

In two cases this involved switching from strdup to
libxl__strdup(NOGC) and removing the existing error handling.

When switching from qemu-xen to qemu-xen-traditional (because the
former is not available) log at level INFO rather than VERBOSE, so
the message would normally be printed. Also tweak the language here.

I'm not sure all these messages are reachable (some might be shadowed
by previous error paths) but it seems better to err on the side of
caution.

[0] http://lists.xen.org/archives/html/xen-users/2016-01/msg00125.html
[1] http://lists.xen.org/archives/html/xen-users/2016-01/msg00129.html

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: suse.dev@fea.st
9 years agotools: libxencall/foreignmemory: initialise handle->fd
Ian Campbell [Wed, 3 Feb 2016 10:09:42 +0000 (10:09 +0000)]
tools: libxencall/foreignmemory: initialise handle->fd

Otherwise the osdep close on the error path touches an uninitialised
varialble.

CID: 1351231 (foreignmemory) and 1351230 (call)

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxenforeignmemory: handle partial failure correctly
Ian Campbell [Wed, 3 Feb 2016 10:10:01 +0000 (10:10 +0000)]
libxenforeignmemory: handle partial failure correctly

Coverity rightly points out that checking for ret == NULL and then
calling osdep unmap(ret) is wrong.

The intention on this code path is to turn partial failure into total
failure when the err argument is NULL, so we want to take this patch
whenever ret is _non_ NULL (and err_to_free is set, indicating err was
NULL).

CID: 1351219

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: xenconsole: cleanup when clock_gettime fails.
Ian Campbell [Wed, 3 Feb 2016 10:43:47 +0000 (10:43 +0000)]
tools: xenconsole: cleanup when clock_gettime fails.

All other error paths in the infinite loop in handle_io use break, so
as to free resources.

CID: 1351226

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: fix leak in xc_dom_load_hvm_kernel error path
Roger Pau Monne [Wed, 3 Feb 2016 10:59:57 +0000 (11:59 +0100)]
libxc: fix leak in xc_dom_load_hvm_kernel error path

Error path in xc_dom_load_hvm_kernel needs to use the 'error' label instead
of directly returning. This is needed so the entries local variable is
freed.

Coverity-ID: 1351227
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: Add CPU hotplug support for HVM domains without device model
Boris Ostrovsky [Tue, 2 Feb 2016 21:02:12 +0000 (16:02 -0500)]
libxl: Add CPU hotplug support for HVM domains without device model

HVMlite domains add/remove VCPUs by toggling "availability" property in
xenstore.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoarm: p2m.c bug-fix: hypervisor hang on __p2m_get_mem_access
Corneliu ZUZU [Wed, 27 Jan 2016 12:24:35 +0000 (14:24 +0200)]
arm: p2m.c bug-fix: hypervisor hang on __p2m_get_mem_access

When __p2m_get_mem_access gets called, the p2m lock is already taken
by either get_page_from_gva or p2m_get_mem_access.

Possible code paths:
1) -> get_page_from_gva
-> p2m_mem_access_check_and_get_page
-> __p2m_get_mem_access
2) -> p2m_get_mem_access
-> __p2m_get_mem_access

In both cases if __p2m_get_mem_access subsequently gets to
call p2m_lookup (happens if !radix_tree_lookup(...)), a hypervisor
hang will occur, since p2m_lookup also spin-locks on the p2m lock.

This bug-fix simply replaces the p2m_lookup call from __p2m_get_mem_access
with a call to __p2m_lookup.

Following Ian's suggestion, we also add an ASSERT to ensure that
the p2m lock is taken upon __p2m_get_mem_access entry.

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxl: don't free additional memory on soft reset
Vitaly Kuznetsov [Thu, 28 Jan 2016 10:58:25 +0000 (11:58 +0100)]
xl: don't free additional memory on soft reset

We don't need to free anything extra from Dom0 in order to perform soft
reset. It can also fail soft reset if it happens that we don't have this
memory (which we don't need) available.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxc: Provide evtchn_port_or_error_t for compat xenctrl interface
Andrew Cooper [Mon, 1 Feb 2016 11:08:03 +0000 (11:08 +0000)]
tools/libxc: Provide evtchn_port_or_error_t for compat xenctrl interface

c/s 2d2f789 "tools: rename libxc's evtchn_port_or_error_t with an xc_
prefix" doesn't cater for older applications which have requested
XC_WANT_COMPAT_EVTCHN_API

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools/libxl: run_helper - add #define for arguments.
Konrad Rzeszutek Wilk [Tue, 26 Jan 2016 21:31:00 +0000 (16:31 -0500)]
tools/libxl: run_helper - add #define for arguments.

Describe what the four (or more in the future) arguments
are for.

Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agolibxc/xc_domain_resume: Update comment.
Konrad Rzeszutek Wilk [Tue, 26 Jan 2016 21:30:58 +0000 (16:30 -0500)]
libxc/xc_domain_resume: Update comment.

To hopefully clarify what it meant. Also point out that mechanism
by which the return 1 value is done is via an intimate knowledge of the
hypercall ABI (i.e. which register - eax - is the return value).

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: Use libxl_strdup instead of strdup on libxl_version_info
Konrad Rzeszutek Wilk [Tue, 26 Jan 2016 21:30:57 +0000 (16:30 -0500)]
libxl: Use libxl_strdup instead of strdup on libxl_version_info

The change is simple replace of raw strdup with a libxl variant.
The benefit of that is the libxl variant has the extra
behaviour of abort-on-alloc-fail - and will improve error handling.

libxl_version_info is a bit odd - it is a public function and as libxl.h
mentions - the callers of libxl_ public function needs to call the appropiate
_dispose() function.

"However libxl_get_version_info() is special and returns a cached
result from the ctx which cannot and should not be freed (as evidenced
by it returning a const struct). This data is freed in libxl_ctx_free()
by calling libxl_version_info_dispose(). This is why none of the callers
remember to free -- they shouldn't be doing so." (Ian Campbell)

So the patch makes sure to use the NOGC.

Suggested-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: drop hip04 support
Zoltan Kiss [Tue, 2 Feb 2016 13:13:04 +0000 (13:13 +0000)]
xen/arm: drop hip04 support

This platform is no longer actively used, but it makes GICv2 development
harder.

Signed-off-by: Zoltan Kiss <zoltan.kiss@linaro.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agocredit: recalculate per-cpupool credits when updating timeslice
Juergen Gross [Tue, 2 Feb 2016 13:03:40 +0000 (14:03 +0100)]
credit: recalculate per-cpupool credits when updating timeslice

When modifying the timeslice of the credit scheduler in a cpupool the
cpupool global credit value (n_cpus * credits_per_tslice) isn't
recalculated. This will lead to wrong scheduling decisions later.

Do the recalculation when updating the timeslice.

Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Alan.Robinson <alan.robinson@ts.fujitsu.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
9 years agocredit: update timeslice under lock
Juergen Gross [Tue, 2 Feb 2016 13:03:06 +0000 (14:03 +0100)]
credit: update timeslice under lock

When updating the timeslice of the credit scheduler protect the
scheduler's private data by it's lock. Today a possible race could
result only in some weird scheduling decisions during one timeslice,
but further adjustments will need the lock anyway.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
9 years agox86/hvm: fix use-after-free introduced by c/s 428607a
Andrew Cooper [Tue, 2 Feb 2016 13:02:37 +0000 (14:02 +0100)]
x86/hvm: fix use-after-free introduced by c/s 428607a

c/s 428607a "x86: shrink 'struct domain', was already PAGE_SIZE" introduced a
use-after-free error during domain destruction, because of the order in which
timers are torn down.

  (XEN) Xen call trace:
  (XEN)    [<ffff82d08013344e>] spinlock.c#check_lock+0x1e/0x40
  (XEN)    [<ffff82d08013349b>] _spin_lock+0x11/0x52
  (XEN)    [<ffff82d0801e8076>] vpt.c#pt_lock+0x24/0x40
  (XEN)    [<ffff82d0801e88f4>] destroy_periodic_time+0x18/0x81
  (XEN)    [<ffff82d0801e1089>] rtc_deinit+0x53/0x78
  (XEN)    [<ffff82d0801d1e5a>] hvm_domain_destroy+0x52/0x69
  (XEN)    [<ffff82d08016a758>] arch_domain_destroy+0x1a/0x98
  (XEN)    [<ffff82d080107cd5>] domain.c#complete_domain_destroy+0x6f/0x182
  (XEN)    [<ffff82d080126a19>] rcupdate.c#rcu_process_callbacks+0x144/0x1a6
  (XEN)    [<ffff82d080132c52>] softirq.c#__do_softirq+0x82/0x8d
  (XEN)    [<ffff82d080132caa>] do_softirq+0x13/0x15
  (XEN)    [<ffff82d080248ae1>] entry.o#process_softirqs+0x21/0x30
  (XEN)
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 3:
  (XEN) GENERAL PROTECTION FAULT
  (XEN) [error_code=0000]
  (XEN) ****************************************

Defer the freeing of d->arch.hvm_domain.pl_time until all timers have been
destroyed.

For safety, NULL out the pointers after freeing them, in an attempt to make
mistakes more obvious in the future.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: only check for two watchdog NMIs
David Vrabel [Tue, 2 Feb 2016 13:01:57 +0000 (14:01 +0100)]
x86: only check for two watchdog NMIs

Since the NMI handler can now recognize watchdog NMIs, make
check_nmi_watchdog() only check for at least two watchdog NMIs.  This
prevents false negatives caused by other processors (which may be
being power managed by the BIOS) running at reduced clock frequencies.

We check for more than one NMI since there are apparently systems
where the NMI works only once.

This will also slightly speed up boot times since we only wait the
full 10 ticks if the NMI watchdog on one or more CPUs is not working.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/hvm: don't intercept #UD exceptions in general
Andrew Cooper [Tue, 2 Feb 2016 13:01:29 +0000 (14:01 +0100)]
x86/hvm: don't intercept #UD exceptions in general

c/s 0f1cb96e "x86 hvm: Allow cross-vendor migration" caused HVM domains to
unconditionally intercept #UD exceptions.  While cross-vendor migration is
cool as a demo, it is extremely niche.

Intercepting #UD allows userspace code in a multi-vcpu guest to execute
arbitrary instructions in the x86 emulator by having one thread execute a ud2a
instruction, and having a second thread rewrite the instruction before the
emulator performs an instruction fetch.

XSAs 105, 106 and 110 are all examples where guest userspace can use bugs in
the x86 emulator to compromise security of the domain, either by privilege
escalation or causing a crash.

c/s 2d67a7a4 "x86: synchronize PCI config space access decoding"
introduced (amongst other things) a per-domain vendor, based on the guests
cpuid policy.

Use the per-guest vendor to enable #UD interception only when a domain is
configured for a vendor different to the current hardware.  (#UD interception
is also enabled if hvm_fep is specified on the Xen command line.  This is a
debug-only option whose entire purpose is for testing the x86 emulator.)

As a result, the overwhelming majority of usecases now have #UD interception
disabled, removing an attack surface for malicious guest userspace.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/vmx: don't clobber exception_bitmap when entering/leaving emulated real mode
Andrew Cooper [Tue, 2 Feb 2016 13:00:52 +0000 (14:00 +0100)]
x86/vmx: don't clobber exception_bitmap when entering/leaving emulated real mode

Most updates to the exception bitmaps set or clear an individual bits.

However, entering or exiting emulated real mode unilaterally clobbers it,
leaving the exit code to recalculate what it should have been.  This is error
prone, and indeed currently fails to recalculate the TRAP_no_device intercept
appropriately.

Instead of overwriting exception_bitmap when entering emulated real mode, move
the override into vmx_update_exception_bitmap() and leave exception_bitmap
unmodified.

This means that recalculation is unnecessary, and that the use of
vmx_fpu_leave() and vmx_update_debug_state() while in emulated real mode
doesn't result in TRAP_no_device and TRAP_int3 being un-intercepted.

This is only a functional change on hardware lacking unrestricted guest
support.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86: shrink 'struct domain', was already PAGE_SIZE
Corneliu ZUZU [Mon, 1 Feb 2016 13:00:30 +0000 (14:00 +0100)]
x86: shrink 'struct domain', was already PAGE_SIZE

The X86 domain structure already occupied PAGE_SIZE (4096).

Looking @ the memory layout of the structure, we could see that
overall most was occupied by (used the pahole tool on domain.o):
 * sizeof(domain.arch) = sizeof(arch_domain) = 3328 bytes.
 * sizeof(domain.arch.hvm_domain) = 2224 bytes.
 * sizeof(domain.arch.hvm_domain.pl_time) = 1088 bytes.
This patch attempts to free some space, by making the pl_time
field in hvm_domain dynamically allocated.
We xzalloc/xfree it @ hvm_domain_initialise/hvm_domain_destroy.

After this change, the domain structure shrunk w/ 1152 bytes (>1K!).

Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoMAINTAINERS: cover non-x86 vm_event files
Razvan Cojocaru [Mon, 1 Feb 2016 12:59:46 +0000 (13:59 +0100)]
MAINTAINERS: cover non-x86 vm_event files

This patch covers modifications to xen/arch/*/vm_event.c, in order
to include ARM vm_event maintainership.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoacpi: refactor acpi_os_map_memory to be architecturally independent
Shannon Zhao [Mon, 1 Feb 2016 12:56:54 +0000 (13:56 +0100)]
acpi: refactor acpi_os_map_memory to be architecturally independent

The first Mb handling is not necessary and the attribute of __vmap() is
different for ARM. Factor the first Mb handling only for x86 and define
a mapping attribute for each architecture.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agoACPI: add config for BIOS table scan
Graeme Gregory [Mon, 1 Feb 2016 12:56:16 +0000 (13:56 +0100)]
ACPI: add config for BIOS table scan

With the addition of ARM64 that does not have a traditional BIOS to
scan, add a config option which is selected on x86 (ia64 doesn't need
it either, it is EFI/UEFI based system) to do the traditional BIOS
scanning for tables.

Signed-off-by: Graeme Gregory <graeme.gregory@linaro.org>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit 8a1664be0b922dd6afd60eca96a992ef5ec22c40]
[Include <xen/kconfig.h> in osl.c so that it could use IS_ENABLED]
Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoKconfig: import kconfig.h from Linux 4.3
Shannon Zhao [Mon, 1 Feb 2016 12:55:51 +0000 (13:55 +0100)]
Kconfig: import kconfig.h from Linux 4.3

To support using CONFIG_ options in C/CPP expressions, import kconfig.h
from the Linux v4.3 tag (commit id
6a13feb9c82803e2b815eca72fa7a9f5561d7861).
Only import IS_ENABLED for Xen since Xen doesn't support loadable
modules.

Signed-off-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agox86: convert shadow-paging to Kconfig
Andrew Cooper [Mon, 1 Feb 2016 12:54:46 +0000 (13:54 +0100)]
x86: convert shadow-paging to Kconfig

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Tim Deegan <tim@xen.org>
9 years agox86/xstate: extend validation to cover full header
Jan Beulich [Mon, 1 Feb 2016 12:54:09 +0000 (13:54 +0100)]
x86/xstate: extend validation to cover full header

Since we never hand out compacted state, at least for now we're also
not going to accept such.

Reported-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/xstate: fix fault behavior on XRSTORS
Jan Beulich [Mon, 1 Feb 2016 12:53:40 +0000 (13:53 +0100)]
x86/xstate: fix fault behavior on XRSTORS

XRSTORS unconditionally faults when xcomp_bv has bit 63 clear. Instead
of just fixing this issue, overhaul the fault recovery code, which -
one of the many mistakes made when xstate support got introduced - was
blindly mirroring that accompanying FXRSTOR, neglecting the fact that
XRSTOR{,S} aren't all-or-nothing instructions. The new code, first of
all, does all the recovery actions in C, simplifying the inline
assembly used. And it does its work in a multi-stage fashion: Upon
first seeing a fault, state fixups get applied strictly based on what
architecturally may cause #GP. When seeing another fault despite the
fixups done, state gets fully reset. A third fault would then lead to
crashing the domain (instead of hanging the hypervisor in an infinite
loop of recurring faults).

Reported-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: adjust xsave structure attributes
Jan Beulich [Mon, 1 Feb 2016 12:53:16 +0000 (13:53 +0100)]
x86: adjust xsave structure attributes

The packed attribute was pointlessly used here - there are no
misaligned fields, and hence even if the attribute took effect, it
would at best lead to the compiler generating worse code.

At the same time specify the required alignment of the fpu_sse sub-
structure, such that the various typeof() uses on that field obtain
pointers to properly aligned memory (knowledge which a compiler may
want to make use of).

Also add suitable build-time checks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/xstate: fix xcomp_bv initialization
Jan Beulich [Mon, 1 Feb 2016 12:52:50 +0000 (13:52 +0100)]
x86/xstate: fix xcomp_bv initialization

We must not clear the compaction bit when using XSAVES/XRSTORS. And
we need to guarantee that xcomp_bv never has any bits clear which
are set in xstate_bv (which requires partly undoing commit 83ae0bb226
["x86/xsave: simplify xcomp_bv initialization"]). Split initialization
of xcomp_bv from the other FPU/SSE/AVX related state setup in
arch_set_info_guest() and hvm_load_cpu_ctxt().

Reported-by: Harmandeep Kaur <write.harmandeep@gmail.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agovm_event: make sure the domain is paused in key domctls
Razvan Cojocaru [Mon, 1 Feb 2016 12:51:14 +0000 (13:51 +0100)]
vm_event: make sure the domain is paused in key domctls

This patch pauses the domain for all writes through the 'ad'
pointer in monitor_domctl(), defers a domain_unpause() call until
after the CRs are updated for the MONITOR_EVENT_WRITE_CTRLREG
case, and makes sure that the domain is paused for both vm_event
enable and disable cases in vm_event_domctl().
Thanks go to Andrew Cooper for his review and suggestions.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
9 years agox86/HVMlite: document the BSP/AP boot ABI
Roger Pau Monné [Mon, 1 Feb 2016 12:50:52 +0000 (13:50 +0100)]
x86/HVMlite: document the BSP/AP boot ABI

The discussion in [1] lead to an agreement of the missing pieces in PVH
(or HVM without a device-model) in order to progress with it's
implementation.

One of the missing pieces is a new boot ABI, that replaces the PV boot
ABI. The aim of this new boot ABI is to remove the limitations of the
PV boot ABI, that are no longer present when using auto-translated
guests. The new boot protocol should allow to use the same entry point
for both 32bit and 64bit guests, and let the guest choose it's bitness
and paging mode at run time without the domain builder knowing in
advance.

This patch introduces a new document called hvmlite.markdown, with the
intention of merging it into pvh.markdown once the HVMlite implementation
has feature parity with PVH and the old PVH ABI is replaced with the
HVMlite one.

[1] http://lists.xen.org/archives/html/xen-devel/2015-06/msg00258.html

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agospinlock: shrink struct lock_debug
Jennifer Herbert [Fri, 29 Jan 2016 16:52:23 +0000 (17:52 +0100)]
spinlock: shrink struct lock_debug

Reduce the size of struct lock_debug so increases in other lock
structures don't increase the size of struct domain too much.

Signed-off-by: Jennifer Herbert <jennifer.herbert@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
9 years agoatomic: replace atomic_compareandswap() with atomic_cmpxchg()
David Vrabel [Fri, 29 Jan 2016 16:51:15 +0000 (17:51 +0100)]
atomic: replace atomic_compareandswap() with atomic_cmpxchg()

atomic_compareandswap() used atomic_t as the new, old and returned
values which is less convenient than using just int.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoreset runstate_guest handles on soft reset
Vitaly Kuznetsov [Fri, 29 Jan 2016 16:50:41 +0000 (17:50 +0100)]
reset runstate_guest handles on soft reset

runstate_guest handles need to be reset to prevent update_runstate_area()
corrupting guest's memory after we resume the guest.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
9 years agox86/vm_event: reset monitor in vm_event_cleanup_domain()
Razvan Cojocaru [Fri, 29 Jan 2016 16:50:05 +0000 (17:50 +0100)]
x86/vm_event: reset monitor in vm_event_cleanup_domain()

It is currently possible to leave a monitor flag enabled even
after vm_event_cleanup_domain() has been called, potentially
leading to a crash in hvm_msr_write_intercept() and hvm_set_crX()
(when v->arch.vm_event has become NULL, but the corresponding
corresponding v->domain->arch.monitor flag is non-zero).
This patch zeroes out arch.monitor in vm_event_cleanup_domain().

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
9 years agox86/HVM: differentiate IO/mem resources tracked by ioreq server
Shuai Ruan [Fri, 29 Jan 2016 16:49:11 +0000 (17:49 +0100)]
x86/HVM: differentiate IO/mem resources tracked by ioreq server

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each. This
patch uses a seperate rangeset for the guest ram pages.

To differentiate the ioreq type between the write-protected memory
ranges and the mmio ranges when selecting an ioreq server, the p2m
type is retrieved by calling get_page_from_gfn(). And we do not
need to worry about the p2m type change during the ioreq selection
process.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Signed-off-by: Shuai Ruan <shuai.ruan@linux.intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
9 years agoarm: clean up build variables
Doug Goldstein [Wed, 20 Jan 2016 21:47:59 +0000 (15:47 -0600)]
arm: clean up build variables

This consolidates some of the different variables used for the ARM
builds. This change was prompted by the Kconfig changes but looking back
in time the CONFIG_ARM_{32,64} variables existed before Kconfig so this
should just be a generic cleanup.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- switch from ifdef X to ifeq (X,y) in one place as discussed ]

9 years agolibxl: make GC_FREE reachable in libxl_get_scheduler()
Chester Lin [Mon, 25 Jan 2016 00:45:51 +0000 (19:45 -0500)]
libxl: make GC_FREE reachable in libxl_get_scheduler()

Coverity CID 1343309

Make GC_FREE reachable in all cases in libxl_get_scheduler() by
eliminating the error-path return and instead storing the error code in
the returned variable.

To make this semantically consistent, change the return type of
libxl_get_scheduler() from libxl_scheduler to int, and make a note of
the interpretation of the return value in libxl.h.  N.B. This change
does not change the API in a way that affects functionality.

The libxl_scheduler enum is consistent with the sched_id return value
of xc_sched_id and this must continue to be true.

Suggested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Chester Lin <czylin@uwaterloo.ca>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: tidy libxl_get_scheduler() according to CODING_STYLE
Chester Lin [Mon, 25 Jan 2016 00:45:34 +0000 (19:45 -0500)]
libxl: tidy libxl_get_scheduler() according to CODING_STYLE

To more closely follow the guidelines in CODING_STYLE, store the result
of xc_sched_id() in the local variable r, and the check the result of
the call in a separate statement.  Change the type of the output
parameter given to xc_sched_id() from libxl_scheduler to int to match
the libxc interface.

Additionally, change the error log statement to more accurately reflect
the failure.  This is the only functional change introduced by this
patch.

Suggested-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Chester Lin <czylin@uwaterloo.ca>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: update comment to no longer mention Xen 4.3
Roger Pau Monne [Mon, 25 Jan 2016 15:25:30 +0000 (16:25 +0100)]
libxl: update comment to no longer mention Xen 4.3

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxenalyze: remove cr3_compare_total
Ian Campbell [Fri, 22 Jan 2016 14:27:29 +0000 (14:27 +0000)]
xenalyze: remove cr3_compare_total

gcc-6 complains:
xenalyze.c:4132:9: error: 'cr3_compare_total' defined but not used [-Werror=unused-function]
     int cr3_compare_total(const void *_a, const void *_b) {
         ^~~~~~~~~~~~~~~~~

I believe it is correct.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agoxenalyze: fix misleading indentation.
Ian Campbell [Fri, 22 Jan 2016 14:27:28 +0000 (14:27 +0000)]
xenalyze: fix misleading indentation.

gcc-6 adds -Wmisleading-indentation which found these issues.

xenalyze.c: In function 'weighted_percentile':
xenalyze.c:2136:18: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
             L=I; L_weight = I_weight;
                  ^~~~~~~~

xenalyze.c:2135:9: note: ...this 'if' clause, but it is not
         if(J_weight<K_weight)
         ^~

xenalyze.c:2138:18: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
             R=J; R_weight = J_weight;
                  ^~~~~~~~

xenalyze.c:2137:9: note: ...this 'if' clause, but it is not
         if(K_weight<I_weight)
         ^~

xenalyze.c: In function 'self_weighted_percentile':
xenalyze.c:2215:18: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
             L=I; L_weight = I_weight;
                  ^~~~~~~~

xenalyze.c:2214:9: note: ...this 'if' clause, but it is not
         if(J_weight<K_weight)
         ^~

xenalyze.c:2217:18: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
             R=J; R_weight = J_weight;
                  ^~~~~~~~

xenalyze.c:2216:9: note: ...this 'if' clause, but it is not
         if(K_weight<I_weight)
         ^~

I've modified according to what I think the intention is, i.e. added braces
rather than moving the line in question out a level.

I have only build tested the result.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agopublic/io/netif.h: change semantics of "request-multicast-control" flag
Paul Durrant [Wed, 20 Jan 2016 12:50:49 +0000 (12:50 +0000)]
public/io/netif.h: change semantics of "request-multicast-control" flag

My patch b2700877 "move and amend multicast control documentation"
clarified use of the multicast control protocol between frontend and
backend. However, it transpires that the restrictions that documentation
placed on the "request-multicast-control" flag make it hard for a
frontend to enable 'all multicast' promiscuous mode, in that to do so
would require the frontend and backend to disconnect and re-connect.

This patch adds a new "feature-dynamic-multicast-control" flag to allow
a backend to advertise that it will watch "request-multicast-control" hence
allowing it to be meaningfully modified by the frontend at any time rather
than only when the frontend and backend are disconnected.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Keir Fraser <keir@xen.org>
Cc: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: avoid redefinining xenevtchn_handle typedef for xc_suspend_*
Ian Campbell [Mon, 25 Jan 2016 17:10:49 +0000 (17:10 +0000)]
tools: avoid redefinining xenevtchn_handle typedef for xc_suspend_*

Similar to the previous xentoollog case this is not allowed. Switch to
a forward decl of the struct and use of it in the APIs.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools: rename libxc's evtchn_port_or_error_t with an xc_ prefix
Ian Campbell [Mon, 25 Jan 2016 17:10:48 +0000 (17:10 +0000)]
tools: rename libxc's evtchn_port_or_error_t with an xc_ prefix

This is used only for xc_evtchn_alloc_unbound and the legacy/compat
versions of the old interfaces and avoids redefining the typedef. The
evtchn_port_or_error_t name is now used only be libxenevtchn.

None of the callers of xc_evtchn_alloc_unbound use the type
themselves.

NB xc_evtchn_alloc_unbound differs from xc_evtchn_bind_unbound_port
and the underlying xenevtchn_bind_unbound_port in that it allows the
specification of the local domain rather than assuming self. This is
only useful during domain build.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools: avoid redefinition of typedefs
Ian Campbell [Mon, 25 Jan 2016 15:29:21 +0000 (15:29 +0000)]
tools: avoid redefinition of typedefs

When splitting out various functionality from libxc into tools/libs/*
I attempted to make it possible to avoid callers being unnecessarily
exposed to the xentoollog interface by providing a typedef of the
xentoollog_logger handle in each of the headers.

However such typedefs are not allowed in C, instead it is necessary to
forward declare the struct and then use the struct xentoollog_logger
variant in the prototypes.

It appears that older gcc (e.g. 4.4) complains about this issue while
newer ones (e.g. 4.9) are more tolerant unless -pedantic-errors is
used, this was a deliberate change
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=ce3765bf44e49ef0568a1ad4a0b7f807591d6412

As well as tools/libs/* it is also now necessary to give libvchan the
same treatment, since it previously inhereted the typedef via one of
tools/libs/*.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libs/*: freebsd remove unused code
Ian Campbell [Mon, 25 Jan 2016 13:14:34 +0000 (13:14 +0000)]
tools/libs/*: freebsd remove unused code

"tools/libs/*: Use O_CLOEXEC on Linux and FreeBSD" left some dead code
in the FreeBSD case, which breaks the build on that platform.

Also fix a typo "uint_32".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agokdd: build using Werror
Ian Campbell [Mon, 25 Jan 2016 13:18:11 +0000 (13:18 +0000)]
kdd: build using Werror

We build most of tools using Werror and there seems to be no
deliberate reason for this to be an exception.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Tim Deegan <tim@xen.org>
9 years agokdd: Opt in to libxc compat xc_map_foreign_* intefaces.
Ian Campbell [Mon, 25 Jan 2016 12:45:32 +0000 (12:45 +0000)]
kdd: Opt in to libxc compat xc_map_foreign_* intefaces.

This:

kdd-xen.c: In function 'kdd_access_physical_page':
kdd-xen.c:508:9: warning: implicit declaration of function 'xc_map_foreign_range' [-Wimplicit-function-declaration]
         map = xc_map_foreign_range(g->xc_handle,
         ^
kdd-xen.c:508:13: warning: assignment makes pointer from integer without a cast
         map = xc_map_foreign_range(g->xc_handle,
             ^

was caused by the refactoring of this functionality into
libxenforeignmemory.

Reported by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Tested-by: Olaf Hering <olaf@aepfle.de>
9 years agox86/mce: fix misleading indentation in init_nonfatal_mce_checker()
Ian Campbell [Fri, 22 Jan 2016 15:19:51 +0000 (16:19 +0100)]
x86/mce: fix misleading indentation in init_nonfatal_mce_checker()

Debian bug 812166[0] reported this build failure due to
Wmisleading-indentation with gcc-6:

non-fatal.c: In function 'init_nonfatal_mce_checker':
non-fatal.c:103:2: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
  switch (c->x86_vendor) {
  ^~~~~~

non-fatal.c:97:5: note: ...this 'if' clause, but it is not
     if ( __get_cpu_var(poll_bankmask) == NULL )
     ^~

I was unable to reproduce (xen builds cleanly for me with "6.0.0 20160117
(experimental) [trunk revision 232481]") but looking at the code the issue
above is clearly real.

Correctly reindent the if statement.

This file uses Linux coding style (infact the use of Xen style for
this line is the root cause of the wanring) so use tabs and while
there remove the whitespace inside the if as Linux does.

[0] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=812166

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/PV: allow PV guests to have an emulated PIT
Roger Pau Monné [Fri, 22 Jan 2016 15:18:29 +0000 (16:18 +0100)]
x86/PV: allow PV guests to have an emulated PIT

This fixes the fallout from the HVMlite series, that removed the emulated
PIT from PV(H) guests. Also, this patch forces the hardware domain to
always have an emulated PIT, regardless of whether the toolstack specified
one or not.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/domctl: break out logic to update domain state from cpuid information
Andrew Cooper [Fri, 22 Jan 2016 15:18:02 +0000 (16:18 +0100)]
x86/domctl: break out logic to update domain state from cpuid information

Later changes will add to this logic.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agop2m: convert p2m rwlock to percpu rwlock
Malcolm Crossley [Fri, 22 Jan 2016 15:17:13 +0000 (16:17 +0100)]
p2m: convert p2m rwlock to percpu rwlock

The per domain p2m read lock suffers from significant contention when
performance multi-queue block or network IO due to the parallel
grant map/unmaps/copies occuring on the DomU's p2m.

On multi-socket systems, the contention results in the locked compare swap
operation failing frequently which results in a tight loop of retries of the
compare swap operation. As the coherency fabric can only support a specific
rate of compare swap operations for a particular data location then taking
the read lock itself becomes a bottleneck for p2m operations.

Percpu rwlock p2m performance with the same configuration is approximately
64 gbit/s vs the 48 gbit/s with grant table percpu rwlocks only.

Oprofile was used to determine the initial overhead of the read-write locks
and to confirm the overhead was dramatically reduced by the percpu rwlocks.

Note: altp2m users will not achieve a gain if they take an altp2m read lock
simultaneously with the main p2m lock.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agogrant_table: convert grant table rwlock to percpu rwlock
Malcolm Crossley [Fri, 22 Jan 2016 15:16:05 +0000 (16:16 +0100)]
grant_table: convert grant table rwlock to percpu rwlock

The per domain grant table read lock suffers from significant contention when
performance multi-queue block or network IO due to the parallel
grant map/unmaps/copies occurring on the DomU's grant table.

On multi-socket systems, the contention results in the locked compare swap
operation failing frequently which results in a tight loop of retries of the
compare swap operation. As the coherency fabric can only support a specific
rate of compare swap operations for a particular data location then taking
the read lock itself becomes a bottleneck for grant operations.

Standard rwlock performance of a single VIF VM-VM transfer with 16 queues
configured was limited to approximately 15 gbit/s on a 2 socket Haswell-EP
host.

Percpu rwlock performance with the same configuration is approximately
48 gbit/s.

Oprofile was used to determine the initial overhead of the read-write locks
and to confirm the overhead was dramatically reduced by the percpu rwlocks.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agorwlock: add per-cpu reader-writer lock infrastructure
Malcolm Crossley [Fri, 22 Jan 2016 15:04:41 +0000 (16:04 +0100)]
rwlock: add per-cpu reader-writer lock infrastructure

Per-cpu read-write locks allow for the fast path read case to have
low overhead by only setting/clearing a per-cpu variable for using
the read lock. The per-cpu read fast path also avoids locked
compare swap operations which can be particularly slow on coherent
multi-socket systems, particularly if there is heavy usage of the
read lock itself.

The per-cpu reader-writer lock uses a local variable to control
the read lock fast path. This allows a writer to disable the fast
path and ensures the readers switch to using the underlying
read-write lock implementation instead of the per-cpu variable.

Once the writer has taken the write lock and disabled the fast path,
it must poll the per-cpu variable for all CPU's which have entered
the critical section for the specific read-write lock the writer is
attempting to take. This design allows for a single per-cpu variable
to be used for read/write locks belonging to seperate data structures.
If a two or more different per-cpu read lock(s) are taken
simultaneously then the per-cpu data structure is not used and the
implementation takes the read lock of the underlying read-write lock,
this behaviour is equivalent to the slow path in terms of performance.
The per-cpu rwlock is not recursion safe for taking the per-cpu
read lock because there is no recursion count variable, this is
functionally equivalent to standard spin locks.

Slow path readers which are unblocked, set the per-cpu variable and
drop the read lock. This simplifies the implementation and allows
for fairness in the underlying read-write lock to be taken
advantage of.

There is more overhead on the per-cpu write lock path due to checking
each CPUs fast path per-cpu variable but this overhead is likely be
hidden by the required delay of waiting for readers to exit the
critical section. The loop is optimised to only iterate over
the per-cpu data of active readers of the rwlock. The cpumask_t for
tracking the active readers is stored in a single per-cpu data
location and thus the write lock is not pre-emption safe. Therefore
the per-cpu write lock can only be used with interrupts disabled.

Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agotools: Update CFLAGS for qemu-xen to allow it to use new libraries
Ian Campbell [Tue, 22 Sep 2015 14:16:05 +0000 (15:16 +0100)]
tools: Update CFLAGS for qemu-xen to allow it to use new libraries

This means adding -L for libxen{evtchn,gnttab,foreignmemory} so that
it can link them directly (rather than using the libxenctrl compat
layer exposed via -rpath-link). Also add -I for libxenforeignmemory.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libs/*: Use O_CLOEXEC on Linux and FreeBSD
Ian Campbell [Wed, 2 Dec 2015 16:21:41 +0000 (16:21 +0000)]
tools/libs/*: Use O_CLOEXEC on Linux and FreeBSD

In some cases this replaces an FD_CLOEXEC dance, in others it is new.

Linux has had O_CLOEXEC since 2.6.23 (October 2007), so we can rely on
it from Xen 4.7 I think. Some libc headers may still lack the
definition, so we take care of that if need be by defining to 0 (on
the premise that such an old glibc might barf on O_CLOEXEC even if the
kernel may or may not be so old).

All stable versions of FreeBSD support O_CLOEXEC (10.2, 9.3 and 8.4),
and we assume the libc there does too.

Remove various comments about having to take responsibility for this
(since really it is just hygiene, politeness, not a requirement) and
the reasons for using O_CLOEXEC seem pretty straightforward.

Backends for other OSes are untouched.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Roger.Pau@citrix.com
Cc: jbeulich@suse.com
9 years agotools/libs/{call,evtchn}: Document requirements around forking.
Ian Campbell [Fri, 11 Dec 2015 17:31:26 +0000 (17:31 +0000)]
tools/libs/{call,evtchn}: Document requirements around forking.

Much like for gnttab and foreignmemory xencall hypercall buffers need
care.

Evtchn is a bit simpler (no magic mappings) but may not work from
parent + child simultaneously, document "parent only" since it is
consistent with the others.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libs/call: linux: touch newly allocated pages after madvise lockdown
Ian Campbell [Mon, 14 Dec 2015 16:46:26 +0000 (16:46 +0000)]
tools/libs/call: linux: touch newly allocated pages after madvise lockdown

This avoids a potential issue with a fork after allocation but before
madvise.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libs/call: Avoid xc_memalign in netbsd and solaris backends
Ian Campbell [Tue, 22 Sep 2015 11:40:51 +0000 (12:40 +0100)]
tools/libs/call: Avoid xc_memalign in netbsd and solaris backends

These are already arch specific, so just use the appropriate
interfaces (as determined by looking at the xc_memalign backend).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libs/call: Describe return values and error semantics for xencall*
Ian Campbell [Fri, 27 Nov 2015 12:08:32 +0000 (12:08 +0000)]
tools/libs/call: Describe return values and error semantics for xencall*

This behaviour has been confirmed by inspection on:

 - Linux
 - NetBSD & FreeBSD (NB: hcall->retval is the hypercall return value
   only for values >= 0. For negative values the underlying privcmd
   driver translates the value from Xen to {Net,Free}BSD errno space
   and returns it as the result of the ioctl, which becomes
   ret=-1/errno=EFOO in userspace)
 - MiniOS (which takes care of errno in this library)

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Cc: Roger Pau Monné <roger.pau@citrix.com>
9 years agotools/libs/call: Update some log messages to not refer to xc.
Ian Campbell [Tue, 22 Sep 2015 11:37:16 +0000 (12:37 +0100)]
tools/libs/call: Update some log messages to not refer to xc.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libs/gnttab: Extensive updates to API documentation.
Ian Campbell [Tue, 22 Sep 2015 11:12:55 +0000 (12:12 +0100)]
tools/libs/gnttab: Extensive updates to API documentation.

In particular around error handling, behaviour on fork and the unmap
notification mechanism.

Behaviour of xengnttab_map_*grant_refs and xengntshr_share_pages on
partial failure has been confirmed/inferred (by inspection) on Linux
and Mini-os (the only two known implementations. Likewise the
behaviour of the notification mechanism has been confirmed/inferred
(by inspection) of the Linux implementation (currently the only
implementation) and libvchan (primary known user).

These updates are not folded into "tools: Refactor
/dev/xen/gnt{dev,shr} wrappers into libxengnttab." to try and reduce
the amount of non-movement changes in that patch.

While I'm not convinced by javadoc/doxygen cause the existing comments
which appear to use that syntax to have the appropriate /** marker.

Also fix a typo in a code comment.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Daniel De Graaf <dgdegra@tycho.nsa.gov>
9 years agotools/libs: Clean up hard tabs.
Ian Campbell [Mon, 21 Sep 2015 16:34:42 +0000 (17:34 +0100)]
tools/libs: Clean up hard tabs.

These were wrong in the context of libxc before this code was
extracted, clean them up.

Also add some emacs magic blocks

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libs/evtchn: Use uint32_t for domid arguments
Ian Campbell [Wed, 16 Dec 2015 15:44:49 +0000 (15:44 +0000)]
tools/libs/evtchn: Use uint32_t for domid arguments

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libs/evtchn: Review and update doc comments.
Ian Campbell [Mon, 21 Sep 2015 15:54:05 +0000 (16:54 +0100)]
tools/libs/evtchn: Review and update doc comments.

Remove the reference to pre-4.1, since this is now a new library.

Fixup references to xc.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libs/foreignmemory: pull array length argument to map forward
Ian Campbell [Mon, 30 Nov 2015 10:32:28 +0000 (10:32 +0000)]
tools/libs/foreignmemory: pull array length argument to map forward

By having the "num" argument before the page and error arrays we can
potentially use a variable-length-array argument ("int pages[num]") in
the function prototype.

However VLAs are a C99 feature and we are currently targetting C89 and
later, so we don't actually make use of this here, merely arrange that
we can switch to VLAs in the future without changing the function ABI.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libs/foreignmemory: Support err == NULL to map.
Ian Campbell [Fri, 27 Nov 2015 15:55:57 +0000 (15:55 +0000)]
tools/libs/foreignmemory: Support err == NULL to map.

The existing xc_map_foreign_bulk-like interface encourages callers to
miss error checking for partial failure (by forgetting to scan the err
array).

Add support for passing err==NULL which behaves in a
xc_map_foreign_pages-like manner and returns a global error for any
failure.

While documenting this also clarify the overall behaviour and the
behaviour with err!=NULL.

With this the compat wrapper of xc_map_foreign_pages() can be
simplified.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libs/foreignmemory: Mention restrictions on fork in docs.
Ian Campbell [Thu, 24 Sep 2015 16:14:45 +0000 (17:14 +0100)]
tools/libs/foreignmemory: Mention restrictions on fork in docs.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libs/foreignmemory: use size_t for size arguments.
Ian Campbell [Fri, 16 Oct 2015 15:12:56 +0000 (16:12 +0100)]
tools/libs/foreignmemory: use size_t for size arguments.

Surprisingly it appears no callers need updating.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools/libs/foreignmemory: provide xenforeignmemory_unmap.
Ian Campbell [Tue, 28 Jul 2015 13:20:01 +0000 (14:20 +0100)]
tools/libs/foreignmemory: provide xenforeignmemory_unmap.

And require it be used instead of direct munmap.

This will allow e.g. Valgrind hooks to help track incorrect use of
foreign mappings.

Switch all uses of xenforeignmemory_map to use
xenforeignmemory_unmap, not that foreign mappings via the libxc compat
xc_map_foreign_* interface will not take advantage of this and will
need converting.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agotools: Refactor foreign memory mapping into libxenforeignmemory
Ian Campbell [Thu, 18 Jun 2015 15:30:19 +0000 (16:30 +0100)]
tools: Refactor foreign memory mapping into libxenforeignmemory

libxenforeignmemory will provide a stable API and ABI for mapping
foreign domain memory (subject to appropriate privileges).

The new library exposes an interface equivalent to
xc_map_foreign_memory_bulk, which all the other
xc_map_foreign_memory_* functions (which remain in libxc) are
implemented in terms of.

Upon request (via #define XC_WANT_COMPAT_MAP_FOREIGN_API) libxenctrl
will provide a compat API for the old names. This is used by qemu-xen
and qemu-trad as well as various in tree things (which required
de-dupping various #includes in some too to get the #define before the
first).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- updated MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION ]

9 years agotools: Implement xc_map_foreign_range(s) in terms of common helper
Ian Campbell [Thu, 18 Jun 2015 10:19:09 +0000 (11:19 +0100)]
tools: Implement xc_map_foreign_range(s) in terms of common helper

Both Linux and FreeBSD already implemented these functions using
identical helpers based on xc_map_foreign_pages. Make one copy of
these common helpers and switch all OSes to use them, even those which
previously had a specific lower level implementation of this
functionality.

This is makes two fewer low level interfaces to think about.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: Remove xc_map_foreign_batch
Ian Campbell [Thu, 18 Jun 2015 09:52:30 +0000 (10:52 +0100)]
tools: Remove xc_map_foreign_batch

It can trivially be replaced by xc_map_foreign_pages which is the
interface I want to move to going forward (by standardising on _bulk
but handling err=NULL as _pages does).

The callers of _batch are checking a mixture of a NULL return or
looking to see if the top nibble of the (usually sole) mfn they pass
has been modified to be non-zero to detect errors. _pages never
modifies the mfn it was given (it's const) and returns NULL on
failure, so adjust the error handling where necessary. Some callers
use a copy of the mfn array, for reuse on failure with _batch, which
is no longer necessary as _pages doesn't modify the array, however I
haven't cleaned that up here.

This reduces the twist maze of xc_map_foreign_* by one, which will be
useful when trying to come up with an underlying stable interface.

NetBSD and Solaris implemented xc_map_foreign_bulk in terms of
xc_map_foreign_batch via a compat layer, so xc_map_foreign_batch
becomes an internal osdep for them.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
9 years agotools/libxc: drop xc_map_foreign_bulk_compat wrappers
Ian Campbell [Thu, 18 Jun 2015 09:35:06 +0000 (10:35 +0100)]
tools/libxc: drop xc_map_foreign_bulk_compat wrappers

On Solaris and NetBSD xc_map_foreign_bulk is implemented by calling
xc_map_foreign_bulk_compat and xc_map_foreign_bulk_compat is exposed
as a symbol by libxenctrl.so.

Remove these wrappers and turn the compat function into the real thing
surrounded by the appropriate ifdef.

As this is a compat function all new ports should instead implement
xc_map_foreign_bulk properly, hence the ifdef should never be
expanded.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: Refactor hypercall calling wrappers into libxencall.
Ian Campbell [Mon, 1 Jun 2015 15:20:09 +0000 (16:20 +0100)]
tools: Refactor hypercall calling wrappers into libxencall.

libxencall will provide a stable API and ABI for calling hypercalls
(although those hypercalls themselves may not have a stable API). As
well as the hypercall buffer infrastructure needed in order to safely
provide pointer arguments to hypercalls.

libxenctrl encapsulates a instance of this interface, so users of that
library are not currently subjected to any actual changes. However all
hypercalls made internally by libxc now use the correct interface. It
is expected that most users of this library will be other libraries
providing a higher level interface, rather than applications directly.

Only the basic functionality to allocate hypercall safe memory is
moved, the type safe stuff and bounce buffers remain in libxc.

Note that the functionality to map foreign pages using privcmd is not
yet moved, meaning that an xc_interface will now contain two open
privcmd file descriptors. Foreign memory mapping is logically separate
functionality and will be moved into its own library.

The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- updated MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION ]

9 years agotools/libxc: Remove osdep indirection for privcmd
Ian Campbell [Thu, 11 Jun 2015 16:39:00 +0000 (17:39 +0100)]
tools/libxc: Remove osdep indirection for privcmd

The alternative backend (a xen-api/xapi shim) is no longer around and
so this stuff is now just baggage which is getting in the way of
refactoring libxenctrl.

Nested virt probably suffices for this use case now.

This was the last component of the osdep infrastructure, so all the
dynamic loading etc stuff all falls away too.

As part of this I was forced to investigate the twisty
xc_map_foreign_* maze, which I have added to the
toolstack-library-apis doc in the hopes of doing something sensible.

NetBSD and Solaris now call xc_map_foreign_bulk_compat directly from
their xc_map_foreign_bulk, which could have been achieved by using
some ifdefs around a renamed function. This will fall out in the wash
when these functions move to their own library.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: David Scott <dave.scott@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: Refactor /dev/xen/gnt{dev,shr} wrappers into libxengnttab.
Ian Campbell [Mon, 1 Jun 2015 15:20:09 +0000 (16:20 +0100)]
tools: Refactor /dev/xen/gnt{dev,shr} wrappers into libxengnttab.

libxengnttab will provide a stable API and ABI for accessing the
grant table devices.

The functions are moved into the xengnt{tab,shr} namespace to make a
clean break from libxc and avoid ambiguity regarding which interfaces
are stable.

All in-tree users are updated to use the new names.

Upon request (via #define XC_WANT_COMPAT_GNTTAB_API) libxenctrl will
provide a compat API for the old names. This is used by qemu-xen for
the time being. qemu-xen-traditional is updated in lockstep.

This leaves a few grant table related functions which go via privcmd
(GNTTABOP) rather than ioctls on the /dev/xen/gnt* devices in
libxenctrl. Specifically:

  - xc_gnttab_get_version
  - xc_gnttab_map_table_v1
  - xc_gnttab_map_table_v2
  - xc_gnttab_op

These functions do not appear to be needed by qemu-dm, qemu-pv
(provision of device model to HVM guests and PV backends respectively)
or by libvchan suggesting they are not needed by non-toolstack uses of
event channels.

The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.

After this change libxenvchan no longer needs to link against
libxenctrl. It still needs xenctrl.h in one file for xen_mb and
friends.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- updated MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION ]

9 years agotools/libxc: Remove osdep indirection for xc_gnt{shr,tab}
Ian Campbell [Thu, 11 Jun 2015 16:39:00 +0000 (17:39 +0100)]
tools/libxc: Remove osdep indirection for xc_gnt{shr,tab}

The alternative backend (a xen-api/xapi shim) is no longer around and
so this stuff is now just baggage which is getting in the way of
refactoring libxenctrl.

Nested virt probably suffices for this use case now.

It is now necessary to provide explicit versions of things for
platforms which do not implement this functionality, since the osdep
dispatcher cannot fulfil this need any more. These are provided by
appropriate xc_nognt???.c files which are compiled and linked on the
appropriate platforms. In them open and close return failure and
everything else aborts, since if open fails they should never be
called.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: Arrange to check public headers for ANSI compatiblity
Ian Campbell [Thu, 11 Jun 2015 12:18:18 +0000 (13:18 +0100)]
tools: Arrange to check public headers for ANSI compatiblity

Using the same rune as we use for the Xen public headers, except we do
not need stdint.h here.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: Refactor /dev/xen/evtchn wrappers into libxenevtchn.
Ian Campbell [Mon, 1 Jun 2015 15:20:09 +0000 (16:20 +0100)]
tools: Refactor /dev/xen/evtchn wrappers into libxenevtchn.

libxenevtchn will provide a stable API and ABI for accessing the
evtchn device.

The functions are moved into the xenevtchn namespace to make a clean
break from libxc and avoid ambiguity regarding which interfaces are
stable.

All in-tree users are updated to use the new names.

Upon request (via #define XC_WANT_COMPAT_EVTCHN_API) libxenctrl will
provide a compat API for the old names. This is used by qemu-xen for
the time being. qemu-xen-traditional is updated in lockstep.

This leaves a few event channel related functions which go via privcmd
(EVTCHNOP) rather than ioctls on the /dev/xen/evtchn device in
libxenctrl. Specifically:

 - xc_evtchn_alloc_unbound
 - xc_evtchn_reset
 - xc_evtchn_status

Note that xc_evtchn_alloc_unbound's functionality is also provided by
xenevtchn_bind_unbound_port() (née xc_evtchn_bind_unbound_port) and is
probably redundant.

These functions do not appear to be needed by qemu-dm, qemu-pv
(provision of device model to HVM guests and PV backends respectively)
or by libvchan suggesting they are not needed by non-toolstack uses of
event channels. QEMU does use these in hw/xenpv/xen_domainbuild.c but
that is a "toolstack use".

The new library uses a version script to ensure that only expected
symbols are exported and to version them such that ABI guarantees can
be kept in the future.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- updated MINIOS_UPSTREAM_REVISION and QEMU_TRADITIONAL_REVISION ]

9 years agotools/libxc: Remove osdep indirection for xc_evtchn
Ian Campbell [Tue, 9 Jun 2015 12:54:09 +0000 (13:54 +0100)]
tools/libxc: Remove osdep indirection for xc_evtchn

The alternative backend (a xen-api/xapi shim) is no longer around and
so this stuff is now just baggage which is getting in the way of
refactoring libxenctrl.

Note that the intention is to move this into a separate library
shortly.

Nested virt probably suffices for this use case now.

One incorrect instance of using xc_interface where xc_evtchn (in ocaml
stubs) is removed, this used to work because they were typedefs to the
same struct, but is no longer permitted.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86: fix (and simplify) MTRR overlap checking
Jan Beulich [Thu, 21 Jan 2016 15:11:04 +0000 (16:11 +0100)]
x86: fix (and simplify) MTRR overlap checking

Obtaining one individual range per variable range register (via
get_mtrr_range()) was bogus from the beginning, as these registers may
cover multiple disjoint ranges. Do away with that, in favor of simply
comparing masked addresses.

Also, for is_var_mtrr_overlapped()'s result to be correct when called
from mtrr_wrmsr(), generic_set_mtrr() must update saved state first.

As minor cleanup changes, constify is_var_mtrr_overlapped()'s parameter
and make mtrr_wrmsr() static.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86: constrain MFN range Dom0 may access
Jan Beulich [Thu, 21 Jan 2016 15:10:42 +0000 (16:10 +0100)]
x86: constrain MFN range Dom0 may access

... to that covered by the physical address width supported by the
processor. This implicitly avoids Dom0 (accidentally or due to some
kind of abuse) passing out of range addresses to a guest, which in
turn eliminates this only possibility for PV guests to create PTEs
with one or more reserved bits set.

Note that this is not a security issue due to XSA-77.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/paging: invlpg() hook returns boolean
Jan Beulich [Thu, 21 Jan 2016 15:10:19 +0000 (16:10 +0100)]
x86/paging: invlpg() hook returns boolean

... so make its return type reflect this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/PV: relax LDT address check
Jan Beulich [Thu, 21 Jan 2016 15:09:58 +0000 (16:09 +0100)]
x86/PV: relax LDT address check

There's no point placing restrictions on its address when the LDT size
is zero.

Also convert a local variable to a slightly more efficient type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/mmuext: tighten TLB flush address checks
Jan Beulich [Thu, 21 Jan 2016 15:09:22 +0000 (16:09 +0100)]
x86/mmuext: tighten TLB flush address checks

Addresses passed by PV guests should be subjected to __addr_ok(),
avoiding undue TLB flushes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agosched: use the auto-generated list of schedulers
Jonathan Creekmore [Thu, 21 Jan 2016 15:07:02 +0000 (16:07 +0100)]
sched: use the auto-generated list of schedulers

Instead of having a manually-curated list of schedulers, use the array
that was auto-generated simply by compiling in the scheduler files as
the sole source of truth of the available schedulers.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
9 years agosched: register the schedulers into the list
Jonathan Creekmore [Thu, 21 Jan 2016 15:06:36 +0000 (16:06 +0100)]
sched: register the schedulers into the list

Adds a simple macro to place a pointer to a scheduler into an array
section at compile time. Also, goes ahead and generates the array
entries with each of the schedulers.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agobuild: alloc space for sched list in the link file
Jonathan Creekmore [Thu, 21 Jan 2016 15:06:10 +0000 (16:06 +0100)]
build: alloc space for sched list in the link file

Creates a section to contain scheduler entry pointers that are gathered
together into an array. This will allow, in a follow-on patch, scheduler
entries to be automatically gathered together into the array for
automatic parsing.

Signed-off-by: Jonathan Creekmore <jonathan.creekmore@gmail.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>