]> xenbits.xensource.com Git - people/liuw/libxenctrl-split/xen.git/log
people/liuw/libxenctrl-split/xen.git
9 years agoxen/arm: io: Shorten the name of the fields and clean up
Julien Grall [Tue, 29 Sep 2015 14:44:41 +0000 (15:44 +0100)]
xen/arm: io: Shorten the name of the fields and clean up

The field names in the IO emulation are really long and use repeatedly
the term handler which make some line cumbersome to read:

mmio_handler->mmio_handler_ops->write_handler

Also take the opportunity to do some clean up:
    - Avoid "handler" vs "handle" in register_mmio_handler
    - Use a local variable to initialize handler in
    register_mmio_handler
    - Add a comment explaining the dsb(ish) in register_mmio_handler
    - Rename the structure io_handler into vmmio because the io_handler
    is in fine handling multiple handlers and the name a the fields was
    io_handlers. Also rename the field io_handlers to vmmio
    - Rename the field mmio_handler_ops to ops because we are in the
    structure mmio_handler to not need to repeat it
    - Rename the field mmio_handlers to handlers because we are in the
    vmmio structure
    - Make it clear that register_mmio_ops is taking an ops and not an
    handle
    - Clean up local variable to help to understand the code

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v3: Correctly retrieve the vCPU associated to a re-distributor
Julien Grall [Tue, 29 Sep 2015 14:44:40 +0000 (15:44 +0100)]
xen/arm: vgic-v3: Correctly retrieve the vCPU associated to a re-distributor

When the guest is accessing the re-distributor, Xen retrieves the base
of the re-distributor using a mask based on the stride.

When the stride contains multiple bits set, the corresponding mask will be
computed incorrectly [1] and therefore giving invalid vCPU and offset:

(XEN) d0v0: vGICR: unknown gpa read address 000000008d130008
(XEN) traps.c:2447:d0v1 HSR=0x93c08006 pc=0xffffffc00032362c
gva=0xffffff80000b0008 gpa=0x0000008d130008

For instance if the region of re-distributor is starting at 0x8d100000
and the stride is 0x30000, an access to the address 0x8d130008 should
be valid and use the re-distributor of vCPU1 with an offset of 0x8.
Although, Xen is returning the vCPU0 and an offset of 0x20008.

I didn't find a way to replace the current computation of the mask with
a valid one. The only solution I have found is to pass the region in
private data of the handler. So we can directly get the offset from the
beginning of the region and find the corresponding vCPU/offset in the
re-distributor.

This is also make the code simpler and avoid fast/slow path.

[1] http://lists.xen.org/archives/html/xen-devel/2015-09/msg03372.html

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: io: Extend write/read handler to pass private data
Julien Grall [Tue, 29 Sep 2015 14:44:39 +0000 (15:44 +0100)]
xen/arm: io: Extend write/read handler to pass private data

Some handlers may require to use private data in order to get quickly
information related to the region emulated.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: support gzip compressed kernels
Stefano Stabellini [Tue, 29 Sep 2015 15:59:04 +0000 (16:59 +0100)]
xen/arm: support gzip compressed kernels

Free the memory used for the compressed kernel and update the relative
mod->start and mod->size parameters with the uncompressed ones.

To decompress the kernel, allocate memory from dommheap, because freeing
the modules is done by calling init_heap_pages, which frees to domheap.
Map these pages using vmap, because they might not be in the linear 1:1
map.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: ian.campbell@citrix.com
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen: move perform_gunzip to common
Stefano Stabellini [Tue, 29 Sep 2015 15:59:03 +0000 (16:59 +0100)]
xen: move perform_gunzip to common

The current gunzip code to decompress the Dom0 kernel is implemented in
inflate.c which is included by bzimage.c.

I am looking to doing the same on ARM64 but there is quite a bit of
boilerplate definitions that I would need to import in order for
inflate.c to work correctly.

Instead of copying/pasting the code from x86/bzimage.c, move those
definitions to a new common file, gunzip.c. Export only perform_gunzip
and gzip_check. Leave output_length where it is.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
CC: andrew.cooper3@citrix.com
9 years agolibxl: don't shadow global "socket" in psr code
Wei Liu [Wed, 30 Sep 2015 14:54:11 +0000 (15:54 +0100)]
libxl: don't shadow global "socket" in psr code

SLES11 and OpenSUSE 11.4 complain:

[ 1227s] libxl_psr.c: In function 'libxl_psr_cat_get_l3_info':
[ 1227s] libxl_psr.c:342: error: declaration of 'socket' shadows a > global declaration

Change "socket" to "socketid" to fix the problem.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Cc: Chao Peng <chao.p.peng@linux.intel.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agointroduce VM_EVENT_FLAG_SET_REGISTERS
Razvan Cojocaru [Wed, 30 Sep 2015 12:46:32 +0000 (14:46 +0200)]
introduce VM_EVENT_FLAG_SET_REGISTERS

A previous version of this patch dealing with support for skipping
the current instruction when a vm_event response requested it
computed the instruction length in the hypervisor, adding non-trivial
code dependencies. This patch allows a userspace vm_event client to
simply request that the guest's EIP is set to an arbitary value,
computed by the introspection application. The registers that can
now be set are EAX-EDX, ESP, EBP, ESI, EDI, R8-R15, EFLAGS, and EIP.
CR0, CR3 and CR4 are not set, as at the time of vm_event_resume()
we can't call hvm_set_cr{0,3,4}() and simply setting
v->arch.hvm_vcpu.guest_cr[{0,3,4}] is unlikely to have the desired
effect. The rest of the vm_event registers are not set because
they're not being filled by hvm_event_fill_regs(), but only by
p2m_vm_event_fill_regs(). Currently x86-only.
The VCPU needs to be paused for this flag to take effect.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
9 years agosched: adjustments to some performance counters
Dario Faggioli [Wed, 30 Sep 2015 12:46:02 +0000 (14:46 +0200)]
sched: adjustments to some performance counters

More specifically:

1) rename vcpu_destroy to vcpu_remove

It seems this have had to be done as part of 7e6b926a
("cpupools: Make interface more consistent"), which
renamed the function but not the counter.

In fact, because of cpupools, vcpus are not only removed
from a scheduler when they are destroyed, but also when
domains move between pools.

Make the related statistics counter reflect that more
properly.

2) rename vcpu_init to vcpu_alloc

As it lives in *_alloc_vdata.

3) add vcpu_insert

matching vcpu_remove, and useful to quickly check
whether the number of insertions and removal matches,
or in general investigare their relationship.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agosched: get rid of cpupool_scheduler_cpumask()
Dario Faggioli [Wed, 30 Sep 2015 12:45:23 +0000 (14:45 +0200)]
sched: get rid of cpupool_scheduler_cpumask()

and of (almost every) direct use of cpupool_online_cpumask().

In fact, what we really want for the most of the times,
is the set of valid pCPUs of the cpupool a certain domain
is part of. Furthermore, in case it's called with a NULL
pool as argument, cpupool_scheduler_cpumask() does more
harm than good, by returning the bitmask of free pCPUs!

This commit, therefore:
 * gets rid of cpupool_scheduler_cpumask(), in favour of
   cpupool_domain_cpumask(), which makes it more evident
   what we are after, and accommodates some sanity checking;
 * replaces some of the calls to cpupool_online_cpumask()
   with calls to the new functions too.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Joshua Whitehead <josh.whitehead@dornerworks.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agocredit1: fix tickling when it happens from a remote pCPU
Dario Faggioli [Wed, 30 Sep 2015 12:44:22 +0000 (14:44 +0200)]
credit1: fix tickling when it happens from a remote pCPU

especially if that is also from a different cpupool than the
processor of the vCPU that triggered the tickling.

In fact, it is possible that we get as far as calling vcpu_unblock()-->
vcpu_wake()-->csched_vcpu_wake()-->__runq_tickle() for the vCPU 'vc',
but all while running on a pCPU that is different from 'vc->processor'.

For instance, this can happen when an HVM domain runs in a cpupool,
with a different scheduler than the default one, and issues IOREQs
to Dom0, running in Pool-0 with the default scheduler.
In fact, right in this case, the following crash can be observed:

(XEN) ----[ Xen-4.7-unstable  x86_64  debug=y  Tainted:    C ]----
(XEN) CPU:    7
(XEN) RIP:    e008:[<ffff82d0801230de>] __runq_tickle+0x18f/0x430
(XEN) RFLAGS: 0000000000010086   CONTEXT: hypervisor (d1v0)
(XEN) rax: 0000000000000001   rbx: ffff8303184fee00   rcx: 0000000000000000
(XEN) ... ... ...
(XEN) Xen stack trace from rsp=ffff83031fa57a08:
(XEN)    ffff82d0801fe664 ffff82d08033c820 0000000100000002 0000000a00000001
(XEN)    0000000000006831 0000000000000000 0000000000000000 0000000000000000
(XEN) ... ... ...
(XEN) Xen call trace:
(XEN)    [<ffff82d0801230de>] __runq_tickle+0x18f/0x430
(XEN)    [<ffff82d08012348a>] csched_vcpu_wake+0x10b/0x110
(XEN)    [<ffff82d08012b421>] vcpu_wake+0x20a/0x3ce
(XEN)    [<ffff82d08012b91c>] vcpu_unblock+0x4b/0x4e
(XEN)    [<ffff82d080167bd0>] vcpu_kick+0x17/0x61
(XEN)    [<ffff82d080167c46>] vcpu_mark_events_pending+0x2c/0x2f
(XEN)    [<ffff82d08010ac35>] evtchn_fifo_set_pending+0x381/0x3f6
(XEN)    [<ffff82d08010a0f6>] notify_via_xen_event_channel+0xc9/0xd6
(XEN)    [<ffff82d0801c29ed>] hvm_send_ioreq+0x3e9/0x441
(XEN)    [<ffff82d0801bba7d>] hvmemul_do_io+0x23f/0x2d2
(XEN)    [<ffff82d0801bbb43>] hvmemul_do_io_buffer+0x33/0x64
(XEN)    [<ffff82d0801bc92b>] hvmemul_do_pio_buffer+0x35/0x37
(XEN)    [<ffff82d0801cc49f>] handle_pio+0x58/0x14c
(XEN)    [<ffff82d0801eabcb>] vmx_vmexit_handler+0x16b3/0x1bea
(XEN)    [<ffff82d0801efd21>] vmx_asm_vmexit_handler+0x41/0xc0

In this case, pCPU 7 is not in Pool-0, while the (Dom0's) vCPU being
woken is. pCPU's 7 pool has a different scheduler than credit, but it
is, however, right from pCPU 7 that we are waking the Dom0's vCPUs.
Therefore, the current code tries to access csched_balance_mask for
pCPU 7, but that is not defined, and hence the Oops.

(Note that, in case the two pools run the same scheduler we see no
Oops, but things are still conceptually wrong.)

Cure things by making the csched_balance_mask macro accept a
parameter for fetching a specific pCPU's mask (instead than always
using smp_processor_id()).

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agoRevert "x86/PoD: shorten certain operations on higher order ranges"
Jan Beulich [Wed, 30 Sep 2015 12:43:21 +0000 (14:43 +0200)]
Revert "x86/PoD: shorten certain operations on higher order ranges"

This reverts commit dea4d7a9a847e8822f7fbfd7b143a5e203135179, which
has been found to be broken.

9 years agox86/PoD: shorten certain operations on higher order ranges
Jan Beulich [Tue, 29 Sep 2015 13:11:28 +0000 (15:11 +0200)]
x86/PoD: shorten certain operations on higher order ranges

Now that p2m->get_entry() always returns a valid order, utilize this
to accelerate some of the operations in PoD code. (There are two uses
of p2m->get_entry() left which don't easily lend themselves to this
optimization.)

Also adjust a few types as needed and remove stale comments from
p2m_pod_cache_add() (to avoid duplicating them yet another time).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/p2m-pt: ignore pt-share flag for shadow mode guests
Jan Beulich [Tue, 29 Sep 2015 11:56:03 +0000 (13:56 +0200)]
x86/p2m-pt: ignore pt-share flag for shadow mode guests

There is no page table sharing in shadow mode.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/p2m-pt: delay freeing of intermediate page tables
Jan Beulich [Tue, 29 Sep 2015 11:55:34 +0000 (13:55 +0200)]
x86/p2m-pt: delay freeing of intermediate page tables

Old intermediate page tables must be freed only after IOMMU side
updates/flushes have got carried out.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/EPT: adjust types in ept_split_super_page()
Jan Beulich [Tue, 29 Sep 2015 11:54:55 +0000 (13:54 +0200)]
x86/EPT: adjust types in ept_split_super_page()

The function returns a boolean and its current and target level inputs
are unsigned (which in turn allows simplifying the early-out check).
Also convert a non-standard loop variable to an ordinary function scope
one, at once making it unsigned too.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agoPVH Dom0 RMRR IOMMU mapping regression fix
Elena Ufimtseva [Tue, 29 Sep 2015 11:53:31 +0000 (13:53 +0200)]
PVH Dom0 RMRR IOMMU mapping regression fix

This patch addresses a regression introduced by commit
5ae03990c120a7b3067a52d9784c9aa72c0705a6 in new set_identity_p2m_entry.
RMRRs are not being mapped in IOMMU for PVH Dom0. This causes pages faults and
some long 'hang-like' delays during Dom0 PVH boot and device assignments.

During construct_dom0, in PVH path p2m is being constructed and identity mapped
in IOMMU. The p2m type is p2m_mmio_direct and p2m access p2m_rwx.
New code used to map RMRRs invoked from rmrr_identity_mapping
checks if p2m entry exists with same type and access and if yes, skips iommu
mapping. Since there are p2m entries for pvh dom0 iomem, RMRRs are not being
mapped in IOMMU.

As was mentioned in the earlier discussion, the PVH Dom0 construction code
should be modified to properly map RMRR regions in IOMMU. Since change will be
too invasive, this solution is a temporary fix at this time before better
solution is in. Also as Jan mentioned, there is no need in having 'x' permissions
for p2m entry of a mmio region, thus changed here.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agodocs/man: resort sections
Chao Peng [Tue, 29 Sep 2015 07:49:55 +0000 (15:49 +0800)]
docs/man: resort sections

Section 'IGNORED FOR COMPATIBILITY WITH XM' separates 'CACHE MONITORING
TECHNOLOGY' and 'CACHE ALLOCATION TECHNOLOGY' but they really should be
put together.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: make xl-psr.markdown more precise
Chao Peng [Tue, 29 Sep 2015 07:49:54 +0000 (15:49 +0800)]
docs: make xl-psr.markdown more precise

Drop the chapter number as it can be confusing when it gets changed in
the referred document.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ ijc -- dropped hunk changing URL to specific revision, this is not
         needed now that the references do not include a specific
         chapter number ]

9 years agotools/libxl: fix range check in main_psr_cat_cbm_set
Chao Peng [Tue, 29 Sep 2015 07:49:53 +0000 (15:49 +0800)]
tools/libxl: fix range check in main_psr_cat_cbm_set

The 'end' should be inclusive.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxl: return socket id from libxl_psr_cat_get_l3_info
Chao Peng [Tue, 29 Sep 2015 07:49:52 +0000 (15:49 +0800)]
tools/libxl: return socket id from libxl_psr_cat_get_l3_info

The entries returned from libxl_psr_cat_get_l3_info are assumed
to be socket-continuous. But this is not true in the hotplug case.

This patch gets the socket bitmap for all the sockets on the system
first and stores the socket id in the structure libxl_psr_cat_info in
libxl_psr_cat_get_l3_info. The xl or similar consumers then can display
socket information correctly.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxl: fix socket display error for CMT
Chao Peng [Tue, 29 Sep 2015 07:49:51 +0000 (15:49 +0800)]
tools/libxl: fix socket display error for CMT

When displaying the CMT information for all the sockets, we assume socket
number is continuous. This is not true in the hotplug case. For instance,
when the 3rd socket is plugged out on a 4-socket system, the available
sockets numbers are 1,2,4 but current we will display the CMT
information for socket 1,2,3.

The fix is getting the socket bitmap for all the sockets on the system
first and then displaying CMT information for_each_set_bit in that bitmap.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
9 years agotools/libxl: introduce libxl_get_online_socketmap
Chao Peng [Tue, 29 Sep 2015 07:49:50 +0000 (15:49 +0800)]
tools/libxl: introduce libxl_get_online_socketmap

It sets the bit on the given bitmap if the corresponding socket is
available and clears the bit when the corresponding socket is not
available.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/p2m-ept: adjust some types in ept_set_entry()
Jan Beulich [Tue, 29 Sep 2015 08:26:05 +0000 (10:26 +0200)]
x86/p2m-ept: adjust some types in ept_set_entry()

Use unsigned and bool_t as appropriate.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/EPT: tighten conditions of IOMMU mapping updates
Jan Beulich [Tue, 29 Sep 2015 08:25:29 +0000 (10:25 +0200)]
x86/EPT: tighten conditions of IOMMU mapping updates

Permission changes should also result in updates or TLB flushes.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
9 years agofine grained control of REP emulation optimizations
Razvan Cojocaru [Mon, 28 Sep 2015 15:29:45 +0000 (17:29 +0200)]
fine grained control of REP emulation optimizations

Previously, if vm_event emulation support was enabled, then REP
optimizations were disabled when emulating REP-compatible
instructions. This patch allows fine-tuning of this behaviour by
providing a dedicated libxc helper function.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agocleanup domain builder declarations and related users
Juergen Gross [Mon, 28 Sep 2015 15:28:35 +0000 (17:28 +0200)]
cleanup domain builder declarations and related users

There are several unused function and structure declarations in the
hypervisor related to domain building. Remove them.

Use an enum for elf_dom_parms.pae instead of just hard coding the
values when setting the information and adjust the code to use those
instead of own macros (hypervisor and tools).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoMAINTAINERS: fix path to p2m-ept.c
Ross Lagerwall [Mon, 28 Sep 2015 15:28:19 +0000 (17:28 +0200)]
MAINTAINERS: fix path to p2m-ept.c

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
9 years agoremove unused macros from sched.h
Juergen Gross [Fri, 25 Sep 2015 16:04:46 +0000 (18:04 +0200)]
remove unused macros from sched.h

The macros num_cpupool_cpus() and domain_is_locked() aren't used by
anyone. Remove them.

Signed-off-by: Juergen Gross <jgross@ssue.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoMerge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging
Jan Beulich [Fri, 25 Sep 2015 16:04:13 +0000 (18:04 +0200)]
Merge branch 'staging' of xenbits.xen.org:/home/xen/git/xen into staging

9 years agovt-d: fix IM bit unmask of Fault Event Control Register in init_vtd_hw()
Quan Xu [Fri, 25 Sep 2015 16:03:04 +0000 (18:03 +0200)]
vt-d: fix IM bit unmask of Fault Event Control Register in init_vtd_hw()

Bit 0:29 in Fault Event Control Register are 'Reserved and Preserved',
software cannot write 0 to it unconditionally. Software must preserve
the value read for writes.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Quan Xu <quan.xu@intel.com>
9 years agoxen/arm: Fix comment coding style in handle_node in domain_build.c
Julien Grall [Tue, 22 Sep 2015 17:47:37 +0000 (18:47 +0100)]
xen/arm: Fix comment coding style in handle_node in domain_build.c

Only coding style changes. No functional changes.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: do initrd processing of domain builder in own function
Juergen Gross [Fri, 11 Sep 2015 12:32:19 +0000 (14:32 +0200)]
libxc: do initrd processing of domain builder in own function

Factor out the initrd processing in xc_dom_build_image() into an own
function to prepare starting a domain with unmapped initrd.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agolibxc: remove useless stuff from domain builder
Juergen Gross [Tue, 22 Sep 2015 12:20:52 +0000 (14:20 +0200)]
libxc: remove useless stuff from domain builder

Remove unused fields from the domain builder and associated functions.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: introduce gfx_passthru_kind
Tiejun Chen [Fri, 18 Sep 2015 08:30:17 +0000 (16:30 +0800)]
libxl: introduce gfx_passthru_kind

Although we already have 'gfx_passthru' in b_info, this doesn't suffice
after we want to handle IGD specifically. Now we define a new field of
type, gfx_passthru_kind, to indicate we're trying to pass IGD. Actually
this means we can benefit this to support other specific devices just
by extending gfx_passthru_kind. And then we can cooperate with
gfx_passthru to address IGD cases as follows:

    gfx_passthru = 0    => sets build_info.u.gfx_passthru to false
    gfx_passthru = 1    => sets build_info.u.gfx_passthru to true and
                           build_info.u.gfx_passthru_kind to DEFAULT
    gfx_passthru = "igd"    => sets build_info.u.gfx_passthru to true
                               and build_info.u.gfx_passthru_kind to IGD

Here if gfx_passthru_kind = DEFAULT, we will call
libxl__is_igd_vga_passthru() to check if we're hitting that table to need
to pass that option to qemu. But if gfx_passthru_kind = "igd" we always
force to pass that.

And "-gfx_passthru" is just introduced to work for qemu-xen-traditional
so we should get this away from libxl__build_device_model_args_new() in
the case of qemu upstream.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: introduce libxl__is_igd_vga_passthru
Tiejun Chen [Fri, 18 Sep 2015 08:30:16 +0000 (16:30 +0800)]
libxl: introduce libxl__is_igd_vga_passthru

While working with qemu, IGD is a specific device in the case of pass through
so we need to identify that to handle more later. Here we define a table to
record all IGD types currently we can support. Also we need to introduce two
helper functions to get vendor and device ids to lookup that table.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agovt-d: fix IM bit mask and unmask of Fault Event Control Register
Quan Xu [Fri, 25 Sep 2015 07:08:22 +0000 (09:08 +0200)]
vt-d: fix IM bit mask and unmask of Fault Event Control Register

Bit 0:29 in Fault Event Control Register are 'Reserved and Preserved',
software cannot write 0 to it unconditionally. Software must preserve
the value read for writes.

Signed-off-by: Quan Xu <quan.xu@intel.com>
Acked-by: Yang Zhang <yang.z.zhang@intel.com>
9 years agokeyhandler: rework keyhandler infrastructure
Andrew Cooper [Fri, 25 Sep 2015 07:06:34 +0000 (09:06 +0200)]
keyhandler: rework keyhandler infrastructure

struct keyhandler does not contain much information, and requires a lot
of boilerplate to use.  It is far more convenient to have
register_keyhandler() take each piece of information a parameter,
especially when introducing temporary debugging keyhandlers.

This in turn allows struct keyhandler itself to become private to
keyhandler.c and for the key_table to become more efficient.

key_table doesn't need to contain 256 entries; all keys are ASCII which
limits them to 7 bits of index, rather than 8.  It can also become a
straight array, rather than an array of pointers.  The overall effect of
this is the key_table grows in size by 50%, but there are no longer
24-byte keyhandler structures all over the data section.

All of the key_table entries in keyhandler.c can be initialised at
compile time rather than runtime.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/PV: properly populate descriptor tables
Jan Beulich [Fri, 25 Sep 2015 07:05:29 +0000 (09:05 +0200)]
x86/PV: properly populate descriptor tables

Us extending the GDT limit past the Xen descriptors so far meant that
guests (including user mode programs) accessing any descriptor table
slot above the original OS'es limit but below the first Xen descriptor
caused a #PF, converted to a #GP in our #PF handler. Which is quite
different from the native behavior, where some of such accesses (LAR
and LSL) don't fault. Mimic that behavior by mapping a blank page into
unused slots.

While not strictly required, treat the LDT the same for consistency.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoadd missing license and copyright statements to public interface headers
Mike Belopuhov [Fri, 25 Sep 2015 07:04:24 +0000 (09:04 +0200)]
add missing license and copyright statements to public interface headers

The copyright line indicates a person, a group of people and/or a company
granting rights stated in the license text and is a required part of the
license.

The year of the copyright is chosen to be the same as when the license has
been applied to the file or when the file has been created in case there
was no license.  It is possible to update or add additional years if major
changes have been done to the the file, but is generally not a requirement.

Signed-off-by: Mike Belopuhov <mike.belopuhov@esdenera.com>
9 years agox86/bigmem: eliminate struct domain address width restriction
Jan Beulich [Fri, 25 Sep 2015 07:02:02 +0000 (09:02 +0200)]
x86/bigmem: eliminate struct domain address width restriction

PDX-es are 64 bits wide in that case, and hence no limit needs to be
enforced.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoxen/arm: gic-v3: Allow Xen to run on hardware supporting GICv4
Julien Grall [Mon, 14 Sep 2015 15:32:24 +0000 (16:32 +0100)]
xen/arm: gic-v3: Allow Xen to run on hardware supporting GICv4

GICv4 is an extension of GICv3 (see 1.1 in ARM IHI 0069A) which means
that the GICv3 driver can run normally on GICv4 hardware.

The GICv4-only features currently won't be used.

Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: gic-v3: Clean-up the GIC*_PIDR2_* definitions
Julien Grall [Mon, 14 Sep 2015 15:32:23 +0000 (16:32 +0100)]
xen/arm: gic-v3: Clean-up the GIC*_PIDR2_* definitions

GICR_PIDR2 and GICD_PIDR2 use the same register layout. Rather than
define twice, one of which is an alias to the other, introduce GIC_PIDR2_*
defines.

Also:
    * Use the same prefix for the mask and the value
    * Integrate the shift in the value to avoid shifting in the code
    * Use GICv* to match the value name in the spec
    * Move them in a proper place

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years ago(lib)xl: soft reset support
Vitaly Kuznetsov [Mon, 21 Sep 2015 09:57:34 +0000 (11:57 +0200)]
(lib)xl: soft reset support

Use existing create/restore path to perform 'soft reset' for HVM
domains. Tear everything down, e.g. destroy domain's device model,
remove the domain from xenstore, save toolstack record and start
over.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: fix the cleanup of the backend path when using driver domains
Roger Pau Monne [Wed, 23 Sep 2015 10:06:56 +0000 (12:06 +0200)]
libxl: fix the cleanup of the backend path when using driver domains

With the current libxl implementation the control domain will remove both
the frontend and the backend xenstore paths of a device that's handled by a
driver domain. This is incorrect, since the driver domain possibly needs to
access the backend path in order to perform the disconnection and cleanup of
the device.

Fix this by making sure the control domain only cleans the frontend path,
leaving the backend path to be cleaned by the driver domain. Note that if
the device is not handled by a driver domain the control domain will perform
the removal of both the frontend and the backend paths.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reported-by: Alex Velazquez <alex.j.velazquez@gmail.com>
Cc: Alex Velazquez <alex.j.velazquez@gmail.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: fix devd removal path
Roger Pau Monne [Wed, 23 Sep 2015 10:06:55 +0000 (12:06 +0200)]
libxl: fix devd removal path

The current flow of the devd helper (in charge of launching hotplug scripts
inside of driver domains) is to wait for the device backend to switch to
state 6  (XenbusStateClosed) and then remove it. This is not correct, since
a domain can reconnect it's PV devices as many times as it wants.

In order to fix this, introduce the following logic: the control domain will
set the "online" backend node to 0 when it wants the driver domain to
disconnect the device, so now the condition applied in devd is that "state"
must be 6 and "online" 0 in order to proceed with the disconnection.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Alex Velazquez <alex.j.velazquez@gmail.com>
Cc: Alex Velazquez <alex.j.velazquez@gmail.com>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic: Correctly emulate write when byte is used
Julien Grall [Tue, 22 Sep 2015 20:18:48 +0000 (21:18 +0100)]
xen/arm: vgic: Correctly emulate write when byte is used

When a guest is writing a byte, the value will be located in bits[7:0]
of the register.

Although the current implementation is expecting the byte at the Nth
byte of the register where N = address & 4;

When the address is not 4-byte aligned, the corresponding byte in the
internal state will always be set to zero rather.

Note that byte access are only used for GICD_IPRIORITYR and
GICD_ITARGETSR. So the worst things that could happen is not setting the
priority correctly and ignore the target vCPU written.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/hvm: fold opt_hap_{2mb,1gb} into hap_capabilities
Andrew Cooper [Wed, 23 Sep 2015 09:16:51 +0000 (11:16 +0200)]
x86/hvm: fold opt_hap_{2mb,1gb} into hap_capabilities

This allows all runtime users to simply check hap_has_{2mb,1gb} rather than
having to check opt_hap_{2mb,1gb} as well.

As a result, opt_hap_{2mb,1gb} can move into __initdata.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/hvm: refine hap_has_{2mb,1gb} checks
Andrew Cooper [Wed, 23 Sep 2015 09:16:08 +0000 (11:16 +0200)]
x86/hvm: refine hap_has_{2mb,1gb} checks

HAP superpages are a host property and not dependent on domain configuration.
Drop the domain paramter (which was only used in one of the two callsites),
and drop the redundant hvm_ prefix to mirror the cpu_has_* style of feature
detection.

Finally, convert the checks to being proper booleans rather than just non-zero
integers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86: record xsave features in c->x86_capabilities
Andrew Cooper [Wed, 23 Sep 2015 09:15:05 +0000 (11:15 +0200)]
x86: record xsave features in c->x86_capabilities

Convert existing cpu_has_x??? to being functions of boot_cpu_data
(matching the prevailing style), and mask out unsupported features.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/p2m: add PoD accounting to set_typed_p2m_entry()
Jan Beulich [Wed, 23 Sep 2015 09:14:05 +0000 (11:14 +0200)]
x86/p2m: add PoD accounting to set_typed_p2m_entry()

While neither PoD together with pass-through nor PVH are currently
supported we still shouldn't leave in place such latent issues.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86: shorten debug key 'u' output
Jan Beulich [Wed, 23 Sep 2015 09:13:21 +0000 (11:13 +0200)]
x86: shorten debug key 'u' output

... by grouping sequences of contiguous CPUs.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agomemory: polish populate_physmap()
Jan Beulich [Wed, 23 Sep 2015 09:12:24 +0000 (11:12 +0200)]
memory: polish populate_physmap()

Adjust types, avoid a NULL check for a case where it's not needed, and
simplify setting a variable on the alternative path.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/xsm: Make p->policyvers be a local variable (ver) to shut up GCC 5.1.1 warnings.
Konrad Rzeszutek Wilk [Wed, 16 Sep 2015 19:57:27 +0000 (15:57 -0400)]
xen/xsm: Make p->policyvers be a local variable (ver) to shut up GCC 5.1.1 warnings.

policydb.c: In function â€˜user_read’:
policydb.c:1443:26: error: â€˜buf[2]’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
         usrdatum->bounds = le32_to_cpu(buf[2]);
                          ^
cc1: all warnings being treated as errors

Which (as Andrew mentioned) is because GCC cannot assume
that 'p->policyvers' has the same value between checks.

We make it local, optimize the name to 'ver' and the warnings go away.
We also update another call site with this modification to
make it more inline with the rest of the functions.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
9 years agoIOMMU: complete/correct comment explaining "iommu=" sub-options
Jan Beulich [Tue, 22 Sep 2015 10:48:43 +0000 (12:48 +0200)]
IOMMU: complete/correct comment explaining "iommu=" sub-options

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper@citrix.com>
9 years agox86/NPT: always return proper order value from p2m_pt_get_entry()
Jan Beulich [Tue, 22 Sep 2015 10:45:32 +0000 (12:45 +0200)]
x86/NPT: always return proper order value from p2m_pt_get_entry()

This is so that callers can determine what range of address space would
get altered by a corresponding "set".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/EPT: always return proper order value from ept_get_entry()
Jan Beulich [Tue, 22 Sep 2015 10:45:03 +0000 (12:45 +0200)]
x86/EPT: always return proper order value from ept_get_entry()

This is so that callers can determine what range of address space would
get altered by a corresponding "set".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agoimprove x86's alloc_vcpu_guest_context()
Andrew Cooper [Tue, 22 Sep 2015 10:42:21 +0000 (12:42 +0200)]
improve x86's alloc_vcpu_guest_context()

This essentially reverts c/s 2037f2adb "x86: introduce
alloc_vcpu_guest_context()", including the newer arm bits, but achieves
the same end goal by using the newer vmalloc() infrastructure.

For both x86 and ARM, {alloc,free}_vcpu_guest_context() become arch-local
static inlines (which avoids a call into a separate translation),
and removes an x86 scalability limit when compiling with a large NR_CPUS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoMAINTAINERS: update VT-x maintainers
Kevin Tian [Mon, 21 Sep 2015 14:15:13 +0000 (16:15 +0200)]
MAINTAINERS: update VT-x maintainers

Eddie will not act as a VT-x maintainer anymore. So remove
him from the list.

Signed-off-by: Kevin Tian <kevin.tian@intel.com>
9 years agobuild: drop unused SUBARCH variable
Doug Goldstein [Mon, 21 Sep 2015 14:14:19 +0000 (16:14 +0200)]
build: drop unused SUBARCH variable

This variable appears to be unused throughout the code base.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
9 years agotools/libxc: arm: Check the index before accessing the bank
Julien Grall [Thu, 17 Sep 2015 17:36:36 +0000 (18:36 +0100)]
tools/libxc: arm: Check the index before accessing the bank

When creating a guest with more than 3GB of memory, the 2 banks will be
used and the loop with overrunning. The code will fail later on because
Xen will deny to populate the region:

domainbuilder: detail: xc_dom_devicetree_mem: called
domainbuilder: detail: xc_dom_mem_init: mem 3096 MB, pages 0xc1800 pages, 4k each
domainbuilder: detail: xc_dom_mem_init: 0xc1800 pages
domainbuilder: detail: xc_dom_boot_mem_init: called
domainbuilder: detail: set_mode: guest xen-3.0-aarch64, address size 64
domainbuilder: detail: xc_dom_malloc            : 14384 kB
domainbuilder: detail: populate_guest_memory: populating RAM @0000000040000000-0000000100000000 (3072MB)
domainbuilder: detail: populate_one_size: populated 0x3/0x3 entries with shift 18
domainbuilder: detail: populate_guest_memory: populating RAM @0000000200000000-0000000201800000 (24MB)
domainbuilder: detail: populate_one_size: populated 0xc/0xc entries with shift 9
domainbuilder: detail: populate_guest_memory: populating RAM @0000007fad41c000-0007fb39dd42c000 (2141954816MB)
domainbuilder: detail: populate_one_size: populated 0x100/0x1e4 entries with shift 0
domainbuilder: detail: populate_guest_memory: Not enough RAM

This is because we are currently accessing the bank before checking the
validity of the index. AFAICT, on  Debian Jessie, the compiler (gcc 4.9.2) is
assuming that it's not necessary to verify the index because it's used
before. This is a valid assumption because the operand of && are
execute from from left to right.

Re-order the checks to verify the validity of the index before accessing
the bank.

The problem has been present since the introduction of the multi-bank
feature in commit 45d9867837f099e9eed4189dac5ed39d1fe2ed49 " tools: arm:
prepare domain builder for multiple banks of guest RAM".

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen/arm: vgic-v2: Map the GIC virtual CPU interface with the correct size
Julien Grall [Thu, 17 Sep 2015 18:00:03 +0000 (19:00 +0100)]
xen/arm: vgic-v2: Map the GIC virtual CPU interface with the correct size

On GICv2, the GIC virtual CPU interface is at minimum 8KB. Due some to
some necessary quirk for GIC using 64KB stride, we are mapping the
region in 2 time.
The first mapping is 4KB and the second one is 8KB, i.e 12KB in total.
Although the minimum supported size (and widely used) is 8KB. This means
that we are mapping 4KB more to any guest using GICv2.

While this looks scary at first glance, the GIC virtual CPU interface is
most frequently at the end the GIC I/O region. So we will most likely
map an an unused I/O region or a mirrored version of GICV for platform
using 64KB stride.

Nonetheless, fix the second mapping to only map 4KB.

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: handle read-only drives with qemu-xen
Stefano Stabellini [Tue, 15 Sep 2015 09:52:14 +0000 (10:52 +0100)]
libxl: handle read-only drives with qemu-xen

The current libxl code doesn't deal with read-only drives at all.

Upstream QEMU and qemu-xen only support read-only cdrom drives: make
sure to specify "readonly=on" for cdrom drives and return error in case
the user requested a non-cdrom read-only drive.

This is XSA-142, discovered by Lin Liu
(https://bugzilla.redhat.com/show_bug.cgi?id=1257893).

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoINSTALL: Mention MINIOS_UPSTREAM_URL
Ian Campbell [Thu, 17 Sep 2015 16:30:50 +0000 (17:30 +0100)]
INSTALL: Mention MINIOS_UPSTREAM_URL

All the other ones seem to be there.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: Migration feature document
Andrew Cooper [Tue, 15 Sep 2015 13:54:27 +0000 (14:54 +0100)]
docs: Migration feature document

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agodocs: Template for feature documents
Andrew Cooper [Tue, 15 Sep 2015 13:54:26 +0000 (14:54 +0100)]
docs: Template for feature documents

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: ensure xs transaction is initialised in libxl__device_pci_add_xenstore
Chunyan Liu [Wed, 16 Sep 2015 06:16:09 +0000 (14:16 +0800)]
libxl: ensure xs transaction is initialised in libxl__device_pci_add_xenstore

Run "xl pci-attach <domain> <pci_device>", the 2nd time fails:
xl: libxl_xshelp.c:209: libxl__xs_transaction_start: Assertion `!*t' failed.
Aborted

To fix that, initialize xs_transaction to avoid libxl__xs_transaction_start
assertion error.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- updated commit message ]

9 years agotools/xen-mceinj: Pass in GPA when injecting through MSR_MCI_ADDR
Haozhong Zhang [Wed, 16 Sep 2015 05:35:15 +0000 (13:35 +0800)]
tools/xen-mceinj: Pass in GPA when injecting through MSR_MCI_ADDR

This patch removes the address translation in xen-mceinj which
translates the guest physical address passed-in through the argument of
'-p' to the host machine address. Instead, xen-mceinj now passes a flag
MC_MSRINJ_F_GPADDR to ask do_mca() in the hypervisor to do this
translation.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agox86/mce: translate passed-in GPA to host machine address
Haozhong Zhang [Wed, 16 Sep 2015 09:40:26 +0000 (11:40 +0200)]
x86/mce: translate passed-in GPA to host machine address

This patch adds a new flag MC_MSRINJ_F_GPADDR to
xen_mc_msrinject.mcinj_flags, and makes do_mca() to translate the
guest physical address passed-in through
xen_mc_msrinject.mcinj_msr[i].value to the host machine address if
this flag is present.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Christoph Egger <chegger@amazon.de>
9 years agotools/xen-mceinj: fix code style
Haozhong Zhang [Wed, 16 Sep 2015 09:40:16 +0000 (11:40 +0200)]
tools/xen-mceinj: fix code style

Remove trailing whitespaces in xen-mceinj.c.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Christoph Egger <chegger@amazon.de>
9 years agox86/mce: fix code style
Haozhong Zhang [Wed, 16 Sep 2015 09:39:16 +0000 (11:39 +0200)]
x86/mce: fix code style

Remove trailing whitespaces and fix indentations in mce.c and xen_mca.h.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Acked-by: Christoph Egger <chegger@amazon.de>
9 years agox86/sysctl: don't clobber memory if NCAPINTS > ARRAY_SIZE(pi->hw_cap)
Andrew Cooper [Wed, 16 Sep 2015 09:22:00 +0000 (11:22 +0200)]
x86/sysctl: don't clobber memory if NCAPINTS > ARRAY_SIZE(pi->hw_cap)

There is no current problem, as both NCAPINTS and pi->hw_cap are 8 entries,
but the limit should be calculated appropriately so as to avoid hypervisor
stack corruption if the two do get out of sync.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agovtd: correct loglevel when check group devices
Tiejun Chen [Wed, 16 Sep 2015 09:20:54 +0000 (11:20 +0200)]
vtd: correct loglevel when check group devices

Since commit 3848058e7dd6 (vtd/iommu: permit group devices to
passthrough in relaxed mode) is introduced, we always print
message as XENLOG_G_WARNING but its not correct in the case of
strict mode. So here is making this message depending on the
specific mode.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
9 years agox86/MSI: fail if no hardware support
Jan Beulich [Wed, 16 Sep 2015 09:20:27 +0000 (11:20 +0200)]
x86/MSI: fail if no hardware support

This is to guard against buggy callers (luckily Dom0 only) invoking
the respective hypercall for a device not being MSI-capable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/boot: remove unneeded instruction
Daniel Kiper [Wed, 16 Sep 2015 09:18:38 +0000 (11:18 +0200)]
x86/boot: remove unneeded instruction

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agoarm: reduce power use by contented spin locks with WFE/SEV
David Vrabel [Mon, 3 Aug 2015 11:29:19 +0000 (12:29 +0100)]
arm: reduce power use by contented spin locks with WFE/SEV

Instead of cpu_relax() while spinning and observing the ticket head,
introduce arch_lock_relax() which executes a WFE instruction.  After
the ticket head is changed call arch_lock_signal() to execute an SEV
instruction (with the required DSB first) to wake any spinners.

This should improve power consumption when locks are contented and
spinning.

For consistency also move arch_lock_(acquire|release)_barrier to
asm/spinlock.h.

Booted the result on arm32 (Midway) and arm64 (Mustang). Build test
only on amd64.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[ijc: add barrier, rename as arch_lock_*, move arch_lock_*_barrier, test]
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
9 years agoxen/arm: hvm_domain drop unused field instropection_enabled
Julien Grall [Mon, 14 Sep 2015 15:30:38 +0000 (16:30 +0100)]
xen/arm: hvm_domain drop unused field instropection_enabled

Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxl: tighten parsing of "irq" and "iomem" list elements
Jan Beulich [Mon, 14 Sep 2015 13:53:27 +0000 (07:53 -0600)]
xl: tighten parsing of "irq" and "iomem" list elements

While "ioport" list element parsing already validates that the entire
input string got consumed, its two siblings so far didn't.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: slightly refine pci-assignable-{add, remove} handling
Jan Beulich [Thu, 10 Sep 2015 12:36:54 +0000 (06:36 -0600)]
libxl: slightly refine pci-assignable-{add, remove} handling

While it appears to be intentional for "xl pci-assignable-remove" to
not re-bind the original driver by default (requires the -r option),
permanently losing the information which driver was originally used
seems bad. Make "add; remove; add; remove -r" re-bind the original
driver by allowing "remove" to delete the information only upon
successful re-bind.

In the course of this I also noticed that binding information is lost
when upon first "add" pciback isn't loaded yet, due to its presence not
being checked for early enough. Adjust pciback_dev_is_assigned()
accordingly, and properly distinguish "yes" and "error" returns in the
"add" case (removing a redundant error message from the "remove" path
for consistency).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agox86/PoD: use clear_domain_page()
Jan Beulich [Mon, 14 Sep 2015 11:40:04 +0000 (13:40 +0200)]
x86/PoD: use clear_domain_page()

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/p2m: fix mismatched unlock
Jan Beulich [Mon, 14 Sep 2015 11:39:19 +0000 (13:39 +0200)]
x86/p2m: fix mismatched unlock

Luckily, due to gfn_unlock() currently mapping to p2m_unlock(), this is
only a cosmetic issue right now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agovtd/iommu: permit group devices to passthrough in relaxed mode
Tiejun Chen [Mon, 14 Sep 2015 11:38:02 +0000 (13:38 +0200)]
vtd/iommu: permit group devices to passthrough in relaxed mode

Currently we don't allow passing through any group devices which are
sharing same RMRR entry since it would break security among VMs. And
indeed, we expect we can figure out a better way to handle this kind
of case completely.

But before the group assignment gets implemented, we might make this
permission dependent on our RMRR policy. So, now it would be allowed
in the relaxed mode.

Signed-off-by: Tiejun Chen <tiejun.chen@intel.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: handle empty vnuma configuration
Wei Liu [Fri, 11 Sep 2015 13:50:09 +0000 (14:50 +0100)]
xl: handle empty vnuma configuration

When user specifies vnuma = [], we need to skip the whole parser
function, otherwise the parser sets b_info->max_memkb to garbage value.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxl/libxl: disallow saving a guest with vNUMA configured
Wei Liu [Wed, 9 Sep 2015 16:11:24 +0000 (17:11 +0100)]
xl/libxl: disallow saving a guest with vNUMA configured

This is because the migration stream does not preserve node information.

Note this is not a regression for migration v2 vs legacy migration
because neither of them preserves node information.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- some grammar fixes to the doc and fixed a comment ]

9 years agolibxc: introduce xc_domain_getvnuma
Wei Liu [Fri, 11 Sep 2015 13:50:07 +0000 (14:50 +0100)]
libxc: introduce xc_domain_getvnuma

A simple wrapper for XENMEM_get_vnumainfo.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxl: format fd flags with 0x since they are hex.
Ian Campbell [Fri, 11 Sep 2015 14:19:54 +0000 (15:19 +0100)]
libxl: format fd flags with 0x since they are hex.

Commit 93f5194e7270 "libxl: clear O_NONBLOCK|O_NDELAY on migration fd
and reinstate afterwards" added some logging of fcntl.F_GETFL at all
as %x without a 0x prefix to make it clear they numbers are hex. Fix
this alongwith an inadvertent logging of the fd itself as hex.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/hvm: fix saved pmtimer and hpet values
Kouya Shimura [Fri, 11 Sep 2015 14:24:56 +0000 (16:24 +0200)]
x86/hvm: fix saved pmtimer and hpet values

The ACPI PM timer is sometimes broken on live migration.
Since vcpu->arch.hvm_vcpu.guest_time is always zero in other than
"delay for missed ticks mode". Even in "delay for missed ticks mode",
vcpu's guest_time field is not valid (i.e. zero) when
the state of vcpu is "blocked". (see pt_save_timer function)

The original author (Tim Deegan) of pmtimer_save() must have intended
that it saves the last scheduled time of the vcpu. Unfortunately it was
already implied this bug. FYI, there is no other timer mode than
"delay for missed ticks mode" then.

For consistency with HPET, pmtimer_save() should refer hvm_get_guest_time()
to update the counter as well as hpet_save() does.

Without this patch, the clock of windows server 2012R2 without HPET
might leap forward several minutes on live migration.

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Retain use of ->arch.hvm_vcpu.guest_time when non-zero. Do the inverse
adjustment for vHPET.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: Kouya Shimura <kouya@jp.fujitsu.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: clear O_NONBLOCK|O_NDELAY on migration fd and reinstate afterwards
Ian Campbell [Fri, 11 Sep 2015 10:42:51 +0000 (11:42 +0100)]
libxl: clear O_NONBLOCK|O_NDELAY on migration fd and reinstate afterwards

The fd passed to us by libvirt for both save and restore has at least
O_NONBLOCK set, which libxl does not expect and therefore fails to
handle any EAGAIN which might arise.

This has been observed with migration v2, but if v1 used to work I
think that would be just be by luck and/or coincidence.

Unix convention (and the principal of least surprise) is usually to
ensure that an fd has no "strange" properties, such as being
non-blocking, when handing it to another component.

However for the convenience of the application arrange instead for
libxl to clear any unexpected flags on the file descriptors it is
given for save or restore and restore them to their original state at
the end. O_NDELAY could be similarly problematic so clear that as
well as O_NONBLOCK.

To do this introduce a pair of new helper functions one to modify+save
the flags and another to restore them and call them in the appropriate
places.

The migration v1 code appeared to do some things with O_NONBLOCK in
the checkpoint case. Migration v2 doesn't seem to do so, and in any
case I wouldn't expect it to be relying on libvirt's setting of
O_NONBLOCK when xl doesn't use that flag.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Cc: Jim Fehlig <jfehlig@suse.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Cc: Yang Hongyang <yanghy@cn.fujitsu.com>
9 years agoxen: arm: Support <32MB frametables
Chris Brand [Fri, 21 Aug 2015 21:30:37 +0000 (14:30 -0700)]
xen: arm: Support <32MB frametables

setup_frametable_mappings() rounds frametable_size up to a multiple
of 32MB. This is wasteful on systems with less than 4GB of RAM,
although it does allow the "contig" bit to be set in the PTEs.

Where the frametable is less than 32MB in size, instead round up
to a multiple of 2MB, not setting the "contig" bit in the PTEs.

Signed-off-by: Chris Brand <chris.brand@broadcom.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen: arm: Be explicit about bit values in mfn_to_xen_entry()
Chris Brand [Thu, 10 Sep 2015 18:56:29 +0000 (11:56 -0700)]
xen: arm: Be explicit about bit values in mfn_to_xen_entry()

Ensure that every relevant bit is given an explicit value.
This has no effect on the generated code, but makes it
a little easier to follow.

Reported-by: Julien Grall <julien.grall@citrix.com>
Signed-off-by: Chris Brand <chris.brand@broadcom.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoxen: arm re-order assignments in mfn_to_xen_entry()
Chris Brand [Thu, 10 Sep 2015 18:56:28 +0000 (11:56 -0700)]
xen: arm re-order assignments in mfn_to_xen_entry()

Shuffle lines around so that the assignments in mfn_to_xen_entry()
occur in the same order as the bits are declared in lpae_pt_t.
This makes it easier to see which ones are never given a value.
No change in behaviour.

Also fix a minor comment typo.

Signed-off-by: Chris Brand <chris.brand@broadcom.com>
Reviewed-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agoQEMU_TAG update
Ian Jackson [Fri, 11 Sep 2015 10:18:53 +0000 (11:18 +0100)]
QEMU_TAG update

9 years agolibxl: add LIBXL_DEVICE_MODEL_SAVE_FILE
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:58:26 +0000 (16:58 +0200)]
libxl: add LIBXL_DEVICE_MODEL_SAVE_FILE

Use this in libxl_dm instead of hard-coding.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
9 years agolibxc: support XEN_DOMCTL_soft_reset operation
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:58:17 +0000 (16:58 +0200)]
libxc: support XEN_DOMCTL_soft_reset operation

Introduce xc_domain_soft_reset() function supporting XEN_DOMCTL_soft_reset.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agoarch-specific hooks for domain_soft_reset()
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:57:40 +0000 (16:57 +0200)]
arch-specific hooks for domain_soft_reset()

x86-specific hook cleans up the pirq-emuirq mappings, destroys all ioreq
servers and and replaces the shared_info frame with an empty page to support
subsequent XENMAPSPACE_shared_info call.

ARM-specific hook is -ENOSYS for now.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoflask: DOMCTL_soft_reset support
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:54:48 +0000 (16:54 +0200)]
flask: DOMCTL_soft_reset support

Add new soft_reset vector to domain2 class, add it to create_domain
in the default policy.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
9 years agointroduce XEN_DOMCTL_soft_reset
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:54:09 +0000 (16:54 +0200)]
introduce XEN_DOMCTL_soft_reset

New domctl resets state for a domain allowing it to 'start over': register
vcpu_info, switch to FIFO ABI for event channels. Still active grants are
being logged to help debugging misbehaving backends.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agogrant_table: implement grant_table_warn_active_grants()
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:53:36 +0000 (16:53 +0200)]
grant_table: implement grant_table_warn_active_grants()

Log first 10 active grants for a domain. This function is going to be used
for soft reset, active grants on this path usually mean misbehaving backends
refusing to release their mappings on shutdown. We need that in addition to
the already existent 'g' keyhandler as such misbehaving backends can cause a
domain to crash right after the soft reset operation and 'g' option won't be
available in this case.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoevtchn: make evtchn_reset() ready for soft reset
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:53:08 +0000 (16:53 +0200)]
evtchn: make evtchn_reset() ready for soft reset

We need to close all event channel so the domain performing soft reset
will be able to open them back.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoxl: introduce enum domain_restart_type
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:52:58 +0000 (16:52 +0200)]
xl: introduce enum domain_restart_type

As a preparation before adding new restart type (soft reset) put all
restart types into an enum.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agolibxl: support SHUTDOWN_soft_reset shutdown reason
Vitaly Kuznetsov [Thu, 10 Sep 2015 14:52:08 +0000 (16:52 +0200)]
libxl: support SHUTDOWN_soft_reset shutdown reason

Use letter 'S' to indicate a domain in such state. Introduce new
'on_soft_reset' action and default it to 'restart' for now.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>