]> xenbits.xensource.com Git - people/royger/xen.git/log
people/royger/xen.git
9 years agoxsplice, symbols: Implement fast symbol names -> virtual addresses lookup
Konrad Rzeszutek Wilk [Wed, 27 Apr 2016 15:29:40 +0000 (11:29 -0400)]
xsplice, symbols: Implement fast symbol names -> virtual addresses lookup

The current mechanism is geared towards fast virtual address ->
symbol names lookup. This is fine for the normal use cases
(BUG_ON, WARN_ON, etc), but for xSplice - where we need to find
hypervisor symbols - it is slow.

To understand this patch, a description of the existing
method is explained first. For folks familar go to 'NEW CODE:'.

HOW IT WORKS:

The symbol table lookup mechanism uses a simple encoding mechanism
where it extracts the common ascii characters that the symbol's use.

This saves us space. The lookup mechanism is geared towards looking
up symbols based on address. We have one 0..N (where N is
the number of symbols, so 6849 for example) table:

symbols_addresses[0..N]

And an 1-1 (in a loose fashion) of the symbols (encoded) in a
symbols_names stream of size N.

The N is variable (later on that below)

The symbols_names are sorted based on symbols_addresses, which
means that the decoded entries inside symbols_names are not in
ascending or descending order.

There is also the encoding mechanism - the table of 255 entries
called symbols_token_index[]. And the symbols_token_table which
is an stream of ASCIIZ characters, such as (it really
is not a table as the values are variable):

@0   .asciz  "credit"
@6   .asciz  "mask"
..
@300 .asciz  "S"

And the symbols_token_index:
@0        .short  0
@1        .short  7
@2        .short  12
@4        .short  16
...
@84         .short  300

The relationship between them is that the symbols_token_index
gives us the offset to symbols_token_table.

The symbol_names[] array is a stream of encoded values. Each value
follows the same pattern - <len> followed by <encoding values>.
And the another <len> followed by <encoding values>.

Hence to find the right one you need to read <len>, add <len>
(to skip over), read <len>, add <len>, and so on until one
finds the right tuple offset.

The <encoding values> are the indicies into the symbols_token_index.

Meaning if you have:
  0x04, 0x54, 0xda, 0xe2, 0x74
  [4, 84, 218, 226, 116 in human numbering]

The 0x04 tells us that the symbol is four bytes past this one (so next
symbol offset starts at 5). If we lookup symbols_token_index[84] we get 300.
symbols_token[300] gets us the "S". And so on, the string eventually
end up being decode to be 'S_stext'. The first character is the type,
then optionally follwed by the filename (and # right after filename)
and then lastly the symbol, such as:

tvpmu_intel.c#core2_vpmu_do_interrupt

Keep in mind that there are two fixed sized tables:
symbols_addresses[0..symbols_num_syms], and
symbols_markers[0..symbols_num_syms/255].

The symbols_markers is used to speed searching for the right address.
It gives us the offsets within symbol_names that start at the <len><encoded value>.

The way to find a symbol based on the address is:
1) Figure out the 'tuple offset' from symbols_address[0..symbols_num_syms].
   This table is sorted by virtual addresses so finding the value is simple.
2) Get starting offset of symbol_names by retrieving value of
   symbol_markers['tuple offset' / 255].
3). Iterate up to 'tuple_offset & 255' in symbols_markers stream starting
   at 'offset'.
4). Decode the <len><encoded value>

This however does not work very well if we want to search the other
way - we have the symbol name and want to find the address.

NEW CODE:

To make that work we add one fixed size table called symbols_sorted_offsets which
has two elements: offset in symbol stream, offset in the symbol-address.

This whole array is sorted on the original symbol name during build-time
(in case of collision we also take into account the type).

The values are for example:

symbols_sorted_offsets:
    .long 83363, 6302 # [.bss, len=5]
    .long 80459, 6084 # [.data, len=5]
..
[The # added for clarity]

Which makes it incredibly easy to get in the symbols_names and also
symbols_addresses (or symbols_offsets)

Searching for symbols is simplified as we can do a binary search
on symbols_sorted_offsets. Since the symbols are sorted it takes on
average 13 calls to symbols_expand_symbol.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxsplice,symbols: Implement symbol name resolution on address.
Ross Lagerwall [Wed, 20 Apr 2016 20:19:27 +0000 (16:19 -0400)]
xsplice,symbols: Implement symbol name resolution on address.

If in the payload we do not have the old_addr we can resolve
the virtual address based on the UNDEFined symbols.

We also use an boolean flag: new_symbol to track symbols. The usual
case this is used is by:

* A payload may introduce a new symbol
* A payload may override an existing symbol (introduced in Xen or another
  payload)
* Overriding symbols must exist in the symtab for backtraces.
* A payload must always link against the object which defines the new symbol.

Considering that payloads may be loaded in any order it would be incorrect to
link against a payload which simply overrides a symbol because you could end
up with a chain of jumps which is inefficient and may result in the expected
function not being executed.

Since the payload we get is an relocatable image (partial linked ELF file)
we have to match up the symbols. We follow the ELF visibility rules for that
and for local symbols do what bintutils ld does.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.
Konrad Rzeszutek Wilk [Wed, 16 Sep 2015 14:37:12 +0000 (10:37 -0400)]
x86/xen_hello_world.xsplice: Test payload for patching 'xen_extra_version'.

This change demonstrates how to generate an xSplice ELF payload.

The idea here is that we want to patch in the hypervisor
the 'xen_version_extra' function with an function that will
return 'Hello World'. The 'xl info | grep extraversion'
will reflect the new value after the patching.

To generate this ELF payload file we need:
 - C code of the new code (xen_hello_world_func.c).
 - C code generating the .xsplice.funcs structure
   (xen_hello_world.c)
 - The address of the old code (xen_extra_version). We
   retrieve it by  using 'nm --defined' on xen-syms.
 - The size of the new and old code for which we use
   nm --defined -S on our code and xen-syms respectively.

There are two C files and one header files generated
during build. One could make this one C file if the
size of the newly patched function size was known in
advance (or an random value was choosen).

There is also a strict order of compiling:
 1) xen_hello_world_func.c
 2) config.h - extract the size of the new function,
    the old function and the old function address.
 3) xen_hello_world.c - which contains the .xsplice.funcs
    structure.
 4) Link the object files in an xen_hello_world.xsplice file.

The use-case is simple:

$xen-xsplice load /usr/lib/debug/xen_hello_world.xsplice
$xen-xsplice list
 ID                                     | status
----------------------------------------+------------
xen_hello_world                           APPLIED
$xl info | grep extra
xen_extra              : Hello World
$xen-xsplice revert xen_hello_world
Performing revert: completed
$xen-xsplice unload xen_hello_world
Performing unload: completed
$xl info | grep extra
xen_extra              : -unstable

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com> [ARM]
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxsplice: Implement support for applying/reverting/replacing patches.
Ross Lagerwall [Wed, 27 Apr 2016 13:07:04 +0000 (09:07 -0400)]
xsplice: Implement support for applying/reverting/replacing patches.

Implement support for the apply, revert and replace actions.

To perform and action on a payload, the hypercall sets up a data
structure to schedule the work.  A hook is added in the reset_stack_and_jump
to check for work and execute it if needed (specifically we check an
per-cpu flag to make this as quick as possible).

In this way, patches can be applied with all CPUs idle and without
stacks.  The first CPU to run check_for_xsplice_work() becomes the
master and triggers a reschedule softirq to trigger all the other CPUs
to enter check_for_xsplice_work() with no stack.  Once all CPUs
have rendezvoused, all CPUs disable their IRQs and NMIs are ignored.
The system is then quiscient and the master performs the action.
After this, all CPUs enable IRQs and NMIs are re-enabled.

Note that it is unsafe to patch do_nmi and the xSplice internal functions.
Patching functions on NMI/MCE path is liable to end in disaster on x86.
This is not addressed in this patch and is mentioned in the
design doc as a further TODO.

The action to perform is one of:
- APPLY: For each function in the module, store the first arch-specific
  number bytes of the old function and replace it with a jump to the
  new function. (on x86 it is 5 bytes, on ARM it will likey be 4 bytes).
- REVERT: Copy the previously stored bytes into the first arch-specific
  number of bytes of the old function (again, 5 bytes on x86).
- REPLACE: Revert each applied module and then apply the new module.

To prevent a deadlock with any other barrier in the system, the master
will wait for up to 30ms before timing out.
Measurements found that the patch application to take about 100 μs on a
72 CPU system, whether idle or fully loaded.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxsplice: Implement payload loading
Ross Lagerwall [Wed, 27 Apr 2016 13:01:51 +0000 (09:01 -0400)]
xsplice: Implement payload loading

Add support for loading xsplice payloads. This is somewhat similar to
the Linux kernel module loader, implementing the following steps:
- Verify the elf file.
- Parse the elf file.
- Allocate a region of memory mapped within a free area of
  [xen_virt_end, XEN_VIRT_END].
- Copy allocated sections into the new region. Split them in three
  regions - .text, .data, and .rodata. MUST have at least .text.
- Resolve section symbols. All other symbols must be absolute addresses.
  (Note that patch titled "xsplice,symbols: Implement symbol name resolution
   on address" implements that)
- Perform relocations.
- Secure the the regions (.text,.data,.rodata) with proper permissions.

We capitalize on the vmalloc callback API (see patch titled:
"rm/x86/vmap: Add v[z|m]alloc_xen, and vm_init_type") to allocate
a region of memory within the [xen_virt_end, XEN_VIRT_END] for the code.

We also use the "x86/mm: Introduce modify_xen_mappings()"
to change the virtual address page-table permissions.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxsplice: Add helper elf routines
Ross Lagerwall [Fri, 19 Feb 2016 19:37:17 +0000 (14:37 -0500)]
xsplice: Add helper elf routines

Add Elf routines and data structures in preparation for loading an
xSplice payload.

We make an assumption that the max number of sections an ELF payload
can have is 64. We can in future make this be dependent on the
names of the sections and verifying against a list, but for right now
this suffices.

Also we a whole lot of checks to make sure that the ELF payload
file is not corrupted nor that the offsets point past the file.

For most of the checks we print an message if the hypervisor is built
with debug enabled.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoarm/x86/vmap: Add vmalloc_xen and vm_init_type
Konrad Rzeszutek Wilk [Tue, 26 Apr 2016 18:03:06 +0000 (14:03 -0400)]
arm/x86/vmap: Add vmalloc_xen and vm_init_type

For those users who want to use the virtual addresses that
are in the hypervisor's code/data region address space -
these three new functions allow that.

Implementation wise the vmap API keeps track of two virtual
address regions now:
 a) VMAP_VIRT_START
 b) Any provided virtual address space (need start and end).

The a) one is the default one and the existing behavior
for users of vmalloc, vmap, etc is the same.

If however one wishes to use the b) one only has to use
the vm_init_type to initialize and the vmzalloc_xen to utilize
it (vfree and vunmap are capable of searching both address spaces).

This allows users (such as xSplice) to provide their own
mechanism to change the the page flags, and also use virtual
addresses closer to the hypervisor virtual addresses (at least
on x86) while not having to deal with the allocation of
pages.

For example of users, see patch titled "xsplice: Implement payload
loading", where we parse the payload's ELF relocations - which
is defined to be signed 32-bit (on x86) (max displacement hence
is 2GB virtual space, ARM32 is 128MB). The displacement of the
hypervisor virtual addresses to the vmalloc (on x86)
is more than 32-bits - which means that ELF relocations would
truncate the 34 and 33th bit. Hence this alternate API.

We also add add extra checks in case the b) range has not been
initialized.

Part of this patch also removes 'vm_alloc' and 'vm_free'
decleration as we do not have any users of it.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Suggested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.grall@arm.com> [ARM]
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoarm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables...
Konrad Rzeszutek Wilk [Thu, 10 Mar 2016 21:35:50 +0000 (16:35 -0500)]
arm/x86: Use struct virtual_region to do bug, symbol, and (x86) exception tables lookup.

During execution of the hypervisor we have two regions of
executable code - stext -> _etext, and _sinittext -> _einitext.

The later is not needed after bootup.

We also have various built-in macros and functions to search
in between those two swaths depending on the state of the system.

That is either for bug_frames, exceptions (x86) or symbol
names for the instruction.

With xSplice in the picture - we need a mechanism for new payloads
to searched as well for all of this.

Originally we had extra 'if (xsplice)...' but that gets
a bit tiring and does not hook up nicely.

This 'struct virtual_region' and virtual_region_list provide a
mechanism to search for the bug_frames, exception table,
and symbol names entries without having various calls in
other sub-components in the system.

Code which wishes to participate in bug_frames and exception table
entries search has to only use two public APIs:
 - register_virtual_region
 - unregister_virtual_region

to let the core code know.

If the ->lookup_symbol is not then the default internal symbol lookup
mechanism is used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com> [ARM]
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen-xsplice: Tool to manipulate xsplice payloads
Konrad Rzeszutek Wilk [Wed, 16 Sep 2015 14:33:10 +0000 (10:33 -0400)]
xen-xsplice: Tool to manipulate xsplice payloads

A simple tool that allows an system admin to perform
basic xsplice operations:

 - Upload a xsplice file (with an unique name)
 - List all the xsplice payloads loaded.
 - Apply, revert, replace, or unload the payload using the
   unique name.
 - Do all two - upload, and apply the payload in one go (load).
   Also will use the name of the file as the <name>

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: Implementation of XEN_XSPLICE_op in libxc
Konrad Rzeszutek Wilk [Mon, 25 Jan 2016 15:52:45 +0000 (10:52 -0500)]
libxc: Implementation of XEN_XSPLICE_op in libxc

The underlaying toolstack code to do the basic
operations when using the XEN_XSPLICE_op syscalls:
 - upload the payload,
 - get status of an payload,
 - list all the payloads,
 - apply, check, replace, and revert the payload.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/xsplice: Hypervisor implementation of XEN_XSPLICE_op
Konrad Rzeszutek Wilk [Sat, 9 Apr 2016 00:19:34 +0000 (20:19 -0400)]
xen/xsplice: Hypervisor implementation of XEN_XSPLICE_op

The implementation does not actually do any patching.

It just adds the framework for doing the hypercalls,
keeping track of ELF payloads, and the basic operations:
 - query which payloads exist,
 - query for specific payloads,
 - check*1, apply*1, replace*1, and unload payloads.

*1: Which of course in this patch are nops.

The functionality is disabled on ARM until all arch
components are implemented.

Also by default it is disabled until the implementation
is in place.

We also use recursive spinlocks to so that the find_payload
function does not need to have a 'lock' and 'non-lock' variant.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxsplice: Design document
Konrad Rzeszutek Wilk [Mon, 14 Sep 2015 13:05:11 +0000 (09:05 -0400)]
xsplice: Design document

A mechanism is required to binarily patch the running hypervisor with new
opcodes that have come about due to primarily security updates.

This document describes the design of the API that would allow us to
upload to the hypervisor binary patches.

This document has been shaped by the input from:
  Martin Pohlack <mpohlack@amazon.de>
  Jan Beulich <jbeulich@suse.com>

Thank you!

Input-from: Martin Pohlack <mpohlack@amazon.de>
Input-from: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild: use C{C/XX} as HOSTC{C/XX} if we are not cross-compiling
Roger Pau Monné [Thu, 28 Apr 2016 13:11:19 +0000 (15:11 +0200)]
build: use C{C/XX} as HOSTC{C/XX} if we are not cross-compiling

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/vMSI-X: also snoop qword writes
Jan Beulich [Thu, 28 Apr 2016 13:10:45 +0000 (15:10 +0200)]
x86/vMSI-X: also snoop qword writes

... the high half of which may be a write to the Vector Control field.
This gets things in sync again with msixtbl_write().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/vMSI-X: add further checks to snoop logic
Jan Beulich [Thu, 28 Apr 2016 13:10:22 +0000 (15:10 +0200)]
x86/vMSI-X: add further checks to snoop logic

msixtbl_range(), as any other MMIO ->check() handlers, may get called
with other than the base address of an access - avoid the snoop logic
considering those.

Also avoid considering vCPU-s not blocked in the hypervisor in
msixtbl_pt_register(), just to be on the safe side.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/HVM: fix forwarding of internally cached requests (part 2)
Jan Beulich [Thu, 28 Apr 2016 13:09:26 +0000 (15:09 +0200)]
x86/HVM: fix forwarding of internally cached requests (part 2)

Commit 96ae556569 ("x86/HVM: fix forwarding of internally cached
requests") wasn't quite complete: hvmemul_do_io() also needs to
propagate up the clipped count. (I really should have re-tested the
forward port resulting in the earlier change, instead of relying on the
testing done on the older version of Xen which the fix was first needed
for.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/HVM: fix emulation re-issue check
Jan Beulich [Thu, 28 Apr 2016 13:08:54 +0000 (15:08 +0200)]
x86/HVM: fix emulation re-issue check

?: has lower precedence than !=, hence parentheses are required here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agopublic/x86: remove HVMMEM_mmio_write_dm from the public interface
Yu Zhang [Thu, 28 Apr 2016 13:08:02 +0000 (15:08 +0200)]
public/x86: remove HVMMEM_mmio_write_dm from the public interface

HVMMEM_mmio_write_dm is removed for new xen interface versions, and
is replaced with type HVMMEM_unused. Attempts to set a page to this
type will return -EINVAL in xen after 4.7.0. And there will be no
pages with type p2m_mmio_write_dm, therefore HVMOP_get_mem_type will
never get the old type - HVMMEM_mmio_write_dm.

New approaches to write protect guest ram pages will be provided in
future patches.

Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild/xen: fix assembler instruction tests
Roger Pau Monné [Thu, 28 Apr 2016 13:07:37 +0000 (15:07 +0200)]
build/xen: fix assembler instruction tests

The current test performed in order to check if the assembler supports
certain instructions doesn't take into account the value of AFLAGS, which
when using clang contains the option that disables the integrated assembler
due to the lack of features.

As a result of this, the current instruction tests were performed against the
integrated assembler, but then at build time the non-integrated assembler
was used. If both have feature-parity, this is a non-issue, but we cannot
assume this.

Fix this by passing AFLAGS in the instruction test, and including the arch
Rules.mk makefile after AFLAGS is set.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agox86/time: fix gtime_to_gtsc for vtsc=1 PV guests
Jan Beulich [Thu, 28 Apr 2016 13:06:56 +0000 (15:06 +0200)]
x86/time: fix gtime_to_gtsc for vtsc=1 PV guests

For vtsc=1 PV guests, rdtsc is trapped and calculated from get_s_time()
using gtime_to_gtsc. Similarly the tsc_timestamp, part of struct
vcpu_time_info, is calculated from stime_local_stamp using
gtime_to_gtsc.

However gtime_to_gtsc can return 0, if time < vtsc_offset, which can
actually happen when gtime_to_gtsc is called passing stime_local_stamp
(the caller function is __update_vcpu_system_time).

In that case the pvclock protocol doesn't work properly and the guest is
unable to calculate the system time correctly. As a consequence when the
guest tries to set a timer event (for example calling the
VCPUOP_set_singleshot_timer hypercall), the event will be in the past
causing Linux to hang.

The purpose of the pvclock protocol is to allow the guest to calculate
the system_time in nanosec correctly. The guest calculates as follow:

  from_vtsc_scale(rdtsc - vcpu_time_info.tsc_timestamp) + vcpu_time_info.system_time

Given that with vtsc=1:
  rdtsc = to_vtsc_scale(NOW() - vtsc_offset)
  vcpu_time_info.tsc_timestamp = to_vtsc_scale(vcpu_time_info.system_time - vtsc_offset)

The expression evaluates to NOW(), which is what we want.  However when
stime_local_stamp < vtsc_offset, vcpu_time_info.tsc_timestamp is
actually 0. As a consequence the calculated overall system_time is not
correct.

This patch fixes the issue by letting gtime_to_gtsc return a negative
integer in the form of a wrapped around unsigned integer, thus when the
guest subtracts vcpu_time_info.tsc_timestamp from rdtsc will calculate
the right value.

Signed-off-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agotravis: Enable tools when building with clang
Andrew Cooper [Wed, 27 Apr 2016 13:06:04 +0000 (14:06 +0100)]
travis: Enable tools when building with clang

tools now build under clang, so let them be tested.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
9 years agotravis: Remove clang-3.8 build
Andrew Cooper [Wed, 27 Apr 2016 16:10:57 +0000 (17:10 +0100)]
travis: Remove clang-3.8 build

The package appears to have been renamed in Ubuntu.  The only reason this test
is currently passing is because the hypervisor build falls back to clang, at
version 3.5

Add an explicit test in the build script that out desired compiler is
available.  Note that travis already performs this step, but in a way which
isn't fatal to the build.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by Doug Goldstein <cardoe@cardoe.com>

9 years agotools/kdd: Fix uninitialised variable warning
Andrew Cooper [Wed, 27 Apr 2016 12:58:27 +0000 (13:58 +0100)]
tools/kdd: Fix uninitialised variable warning

Clang warns:

  kdd.c:1031:9: error: variable 'fd' is used uninitialized whenever '||'
  condition is true [-Werror,-Wsometimes-uninitialized]
      if (argc != 4
          ^~~~~~~~~
  kdd.c:1040:20: note: uninitialized use occurs here
      if (select(fd + 1, &fds, NULL, NULL, NULL) > 0)
                 ^~

This situation can't actually happen, as usage() is a terminal path.  Annotate
it appropriately.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
9 years agotools/blktap2: Fix use of uninitialised variable in _tap_list_join3()
Andrew Cooper [Wed, 27 Apr 2016 12:31:05 +0000 (13:31 +0100)]
tools/blktap2: Fix use of uninitialised variable in _tap_list_join3()

Clang points out:

  tap-ctl-list.c:457:28: error: variable 'entry' is uninitialized when
  used here [-Werror,-Wuninitialized]
          for (; *_entry != NULL; ++entry) {
                                    ^~~~~

The content of that loop clearly was meant to iterate over _entry rather than
entry, so is fixed to do so.  This presumably fixes a memory leak when
tapdisks get orphed, as only the first item on the list got freed.

There is no use of entry at all.  It is referenced in a
list_for_each_entry(tl, &tap->list, entry) construct, but this is just a
member name, and not a reference to local scope variable of the same name.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
9 years agotools/blktap2: Fix array initialisers for tapdisk_disk_{types,drivers}[]
Andrew Cooper [Wed, 27 Apr 2016 10:23:04 +0000 (11:23 +0100)]
tools/blktap2: Fix array initialisers for tapdisk_disk_{types,drivers}[]

Clang points out:

  tapdisk-disktype.c:117:2: error: initializer overrides prior initialization
  of this subobject [-Werror,-Winitializer-overrides]
          0,
          ^
  tapdisk-disktype.c:115:23: note: previous initialization is here
          [DISK_TYPE_VINDEX]      = &vhd_index_disk,
                                    ^~~~~~~~~~~~~~~

Mixing different initialiser styles should be avoided; The actual behaviour is
different to the expected behaviour.  This specific example has been broken
since its introduction in c/s 7b4dea554 "blktap2: Fix tapdisk disktype issues"
in 2010, and is caused by the '#if 0' block removing &tapdisk_{sync,vmdk}.

First of all, remove what were intended to be trailing NULL entries in
tapdisk_disk_{types,drivers}[], making consistent use of Designated
Initialisers for the initialisation.

This requires changing the loop in tapdisk_disktype_find() to be based on the
number of elements in tapdisk_disk_types[], rather than looking for the first
NULL.  This fixes a latent bug, as the use of Designated Initializers causes
to intermediate zero entries if not all indices are explicitly specified.

There is a second latent bug where tapdisk_disktype_find() assumes that
tapdisk_disk_drivers[] has at least as many entries as tapdisk_disk_types[].
This is not the case and tapdisk_disk_drivers[] had one entry fewer than
tapdisk_disk_types[], but the NULL loop bound prevented an out-of-bounds read
of tapdisk_disk_drivers[].  Fix the issue by explicitly declaring
tapdisk_disk_drivers[] to have the same number of entries as
tapdisk_disk_types[].

Finally, this leads to a linker error.  It turns out that tapdisk_vhd_index
doesn't exist, and I can't find any evidence in the source history to suggest
that it ever did.  I can only presume that it would have been #if 0'd out like
tapdisk_sync and tapdisk_vmdk had it not been for this bug preventing a build
failure.  Drop all three.

No functional change, but only because of the specific layout of
tapdisk_disk_types[].

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
9 years agotools/blktap2: Use abort() instead of custom crash
Andrew Cooper [Wed, 27 Apr 2016 10:33:42 +0000 (11:33 +0100)]
tools/blktap2: Use abort() instead of custom crash

Like c/s 4d98d3ebf - there is a second instance.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/xenstat: Avoid comparing '0 <= unsigned integer'
Andrew Cooper [Wed, 27 Apr 2016 10:01:03 +0000 (11:01 +0100)]
tools/xenstat: Avoid comparing '0 <= unsigned integer'

Clang points out that this is tautological.

  src/xenstat.c:325:8: warning: comparison of 0 <= unsigned expression is
  always true [-Wtautological-compare]
          if (0 <= index && index < node->num_domains)
              ~ ^  ~~~~~

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: Force broadcast of TLB and instruction cache maintenance instructions
Julien Grall [Wed, 27 Apr 2016 11:22:53 +0000 (12:22 +0100)]
xen/arm: Force broadcast of TLB and instruction cache maintenance instructions

UP guest may use TLB instructions to flush only on the local CPU.
Therefore, TLB flush will not be broadcasted across all the CPUs within
the same innershareable domain.

When the vCPU is migrated between different CPUs, it may be rescheduled
to a previous CPU where the TLB has not been flushed. The TLB may
contain stale entries which will result to translate incorrectly a VA to
IPA or even cause TLB conflicts.

To avoid a such situation, it is possible to set HCR_EL2.FB, which will
force the broadcast of TLB and instruction cache maintenance instructions.

The performance impact of setting HCR_EL2.FB will depend on how often
a guest makes use of local flush instructions.

ARM64 Linux kernel is SMP-aware (no possibility to build only for UP).
Most of the flush instructions are innershareable. The local flushes are
limited to the boot (1 per CPU) and when a task is getting a new ASIC.
Therefore the impact of setting HCR.FB for those guests is very limited.

ARM32 Linux kernel offers the possibility to be built either for SMP or
UP. The number of local flush is very limited in the former kernel
whilst the latter will only issue local flushes. Therefore there will be
an impact to set HCR.FB for guest kernel only built for UP.

Note that the SMP kernel can run in a domain using 1 vCPU and it
will still make use of innershareable flush instruction.

Looking at other OSes, such as FreeBSD, they are very similar to ARM32
Linux kernel (i.e offering two configuration: SMP and UP).

However, nothing prevents an SMP-aware kernel to make more often use of
local flush instrutions.

In the case that HCR_EL2.FB is not set, Xen would need to:
    * Flush all the TLBs for the VMID associated to this domain
    * Invalidate all the entries from the branch predictor
    * Invalidate all the entries from the instruction cache
Those actions would only be needed when the vCPU is migrating between 2
physical CPUs.

Whilst this solution would have a negative performance impact on kernels
which do not heavily use local flush instructions, this may improve
performance for kernels only built for UP system.

For now implement the easiest solution (i.e setting HCR_EL2.FB). We can
revisit it if the performance impact is too high for UP kernel.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoxen: arm: doc: Add firmware requirements
Dirk Behme [Mon, 25 Apr 2016 15:42:54 +0000 (17:42 +0200)]
xen: arm: doc: Add firmware requirements

Add a section about what the firmware should do in EL3 before starting Xen.

E.g guest will use HVC instruction to issue hypercall. As this can be set only
at EL3, i.e. outside Xen, document this boot requirement.

Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoxen/arm: traps: Correctly interpret the content of the register HPFAR_EL2
Julien Grall [Fri, 22 Apr 2016 15:58:33 +0000 (16:58 +0100)]
xen/arm: traps: Correctly interpret the content of the register HPFAR_EL2

The register HPFAR_EL2 (resp. HPFAR on arm32) contains the bits [47:12]
(resp. [39:12]) of the faulting IPA. Unlike other registers that represent
an address, the upper bits of the IPA are stored in the register bits
[4:39] (resp. [4:21]).

However, Xen assumes that the register contains the faulting IPA correctly
offsetted. This will result to get a wrong IPA when the fault is happening
during a translation table walk. Note this is only affecting  memaccess.

Introduce a new helper to get the faulting IPA from HPFAR_EL2 and
replace direct read from the register by the helper.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoxen/bitops: Introduce GENMASK to generate mask
Julien Grall [Fri, 22 Apr 2016 15:58:32 +0000 (16:58 +0100)]
xen/bitops: Introduce GENMASK to generate mask

The code has been imported from the header include/linux/bitops.h in
Linux v4.6-rc3.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agotools/pygrub: fix usage of LDFLAGS
Roger Pau Monne [Wed, 27 Apr 2016 09:55:27 +0000 (11:55 +0200)]
tools/pygrub: fix usage of LDFLAGS

LDFLAGS cannot be appended to CFLAGS, instead pass them down as env
variables.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/python: corrently use LDFLAGS and CFLAGS
Roger Pau Monne [Tue, 26 Apr 2016 10:25:28 +0000 (12:25 +0200)]
tools/python: corrently use LDFLAGS and CFLAGS

It is incorrect to add the LDFLAGS to the CFLAGS, and some compilers will
error out if linker flags are passed when creating object files. Fix this by
properly passing CFLAGS and LDFLAGS, instead of putting everything in
CFLAGS.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agooxenstored: fix error when shifting negative value
Roger Pau Monne [Tue, 26 Apr 2016 10:07:51 +0000 (12:07 +0200)]
oxenstored: fix error when shifting negative value

By explicitly casting it to unsigned.

Reasoning on why this is needed, provided by Andrew Cooper:

"Ocaml stores integers shifted left by one, and with the bottom bit set.

Values with the bottom bit clear are pointers into the GC'd heap. Values
with the bottom bit set are integers, and need to be shifted by 1 bit to
have calculations performed."

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: fix passing the type argument to xc_psr_*
Roger Pau Monne [Tue, 26 Apr 2016 10:07:51 +0000 (12:07 +0200)]
libxl: fix passing the type argument to xc_psr_*

The xc_psr_* functions expect the type to be xc_psr_cat_type instead of
libxl_psr_cbm_type, so do the conversion.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: add explicit casts from yajl_gen_status to yajl_status
Roger Pau Monne [Tue, 26 Apr 2016 10:07:51 +0000 (12:07 +0200)]
libxl: add explicit casts from yajl_gen_status to yajl_status

Or else clang complains with:

implicit conversion from enumeration type 'yajl_gen_status' to different
enumeration type 'yajl_status' [-Werror,-Wenum-conversion]

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: remove extraneous ";" ]
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: convert libxl__device_model_xs_path to a macro
Roger Pau Monne [Wed, 27 Apr 2016 09:13:03 +0000 (11:13 +0200)]
libxl: convert libxl__device_model_xs_path to a macro

Since it's unsafe to code it as a function because it would end up passing a
non literal string to a printf like function.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: add the printf-like attributes to a couple of functions
Roger Pau Monne [Tue, 26 Apr 2016 10:07:50 +0000 (12:07 +0200)]
libxl: add the printf-like attributes to a couple of functions

Or else clang complains with:

error: format string is not a string literal

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxl: fix usage of libxl_get_scheduler
Roger Pau Monne [Tue, 26 Apr 2016 10:07:50 +0000 (12:07 +0200)]
xl: fix usage of libxl_get_scheduler

It returns an int, not a libxl_scheduler.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: fix shutdown_reason type in list_domains
Roger Pau Monne [Tue, 26 Apr 2016 10:07:50 +0000 (12:07 +0200)]
libxl: fix shutdown_reason type in list_domains

It should be an enum, not an unsigned.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: fix usage of uninitialized variable
Roger Pau Monne [Tue, 26 Apr 2016 10:07:49 +0000 (12:07 +0200)]
libxc: fix usage of uninitialized variable

*size should be used instead, because it contains the size of the buffer in
out_buf.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/tools: fix substitution of __align8__ uint64_t inside of headers
Roger Pau Monne [Tue, 26 Apr 2016 10:07:49 +0000 (12:07 +0200)]
xen/tools: fix substitution of __align8__ uint64_t inside of headers

The current seedery doesn't work with BSD sed, so remove the try to match
int64_t also (since there's none at the moment). Also, apply the same
treatment to all arch headers, currently this is only done to x86_64
headers.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/headers: prevent adding two __align8__ to uint64_t in ARM headers
Roger Pau Monne [Tue, 26 Apr 2016 10:07:49 +0000 (12:07 +0200)]
tools/headers: prevent adding two __align8__ to uint64_t in ARM headers

Due to the fact that on ARM headers types are substituted to uint64_t and
then uint64_t is also substituted to contain the aligment, this would lead
to some types containing two __align8__ directives. Fix this by first
expanding Xen specific types to uint64_t only, and then replacing all the
uint64_t types to __align8__ uint64_t. This relies on the fact that all
Xen-specific types will have longer names, so they will always be replaced
first.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild: remove Kconfig forced gcc selection
Roger Pau Monne [Tue, 26 Apr 2016 10:07:49 +0000 (12:07 +0200)]
build: remove Kconfig forced gcc selection

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild: pass HOST{CC/CXX} value down to Kconfig
Roger Pau Monne [Tue, 26 Apr 2016 10:07:48 +0000 (12:07 +0200)]
build: pass HOST{CC/CXX} value down to Kconfig

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild: set HOSTCXX based on clang value for Kconfig xconfig target
Roger Pau Monne [Tue, 26 Apr 2016 10:07:48 +0000 (12:07 +0200)]
build: set HOSTCXX based on clang value for Kconfig xconfig target

The xconfig Kconfig target requires a C++ compiler because it uses Qt.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agobuild: make HOSTCC conditional on the value of clang
Roger Pau Monne [Tue, 26 Apr 2016 10:07:48 +0000 (12:07 +0200)]
build: make HOSTCC conditional on the value of clang

Previously HOSTCC was always hardcoded to gcc

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/vMSI-X: write snoops should ignore hvm_mmio_internal() requests
Jan Beulich [Tue, 26 Apr 2016 14:53:36 +0000 (16:53 +0200)]
x86/vMSI-X: write snoops should ignore hvm_mmio_internal() requests

Those aren't actual I/O requests (and hence are of no interest here
anyway). Since they don't get copied into struct vcpu, looking at that
copy reads whatever was left there. Use the state of the request to
determine its validity.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoblktap2: use abort() instead of custom crash
Doug Goldstein [Tue, 26 Apr 2016 13:04:46 +0000 (08:04 -0500)]
blktap2: use abort() instead of custom crash

Instead of trying to write a snippet of code that crashes the process
just use abort() directly. This is to fix the build on clang which
detects that the snippet of code will crash and fails to compile. At
the same time removed extraneous whitespace in the macro.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotravis: enable building of the tools
Doug Goldstein [Mon, 25 Apr 2016 14:46:18 +0000 (09:46 -0500)]
travis: enable building of the tools

For native (non-cross compiles) we now only require bcc, ld86, as86 for
building rombios, we can build the toolstack sans rombios and using the
system SeaBIOS due to known build issues. At the same time capture the
output of the configure scripts to help with tracking down future build
issues. This does not enable building of the toolstack with clang for
now due to multiple failures.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper<andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/HVM: slightly improve hvm_mmio_{first,last}_byte()
Jan Beulich [Tue, 26 Apr 2016 11:47:21 +0000 (13:47 +0200)]
x86/HVM: slightly improve hvm_mmio_{first,last}_byte()

EFLAGS.DF can be assumed to be usually clear, so unlikely()-annotate
the conditionals accordingly.

Also prefer latching p->size (used twice) into a local variable, at
once making it unnecessary for the reader to be sure expressions get
evaluated left to right (operand promotion would yield a different
result if p->addr + p->size - 1 was evaluted right to left).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/HVM: correct last address emulation acceptance check
Jan Beulich [Tue, 26 Apr 2016 11:47:02 +0000 (13:47 +0200)]
x86/HVM: correct last address emulation acceptance check

For REPeated memory access instructions the repeat count also needs to
be considered. Utilize that "last" already takes this into account.

Also defer computing "last" until we really know we need it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/mm: introduce modify_xen_mappings()
Andrew Cooper [Tue, 26 Apr 2016 11:46:26 +0000 (13:46 +0200)]
x86/mm: introduce modify_xen_mappings()

To simply change the permissions on existing Xen mappings.  The existing
destroy_xen_mappings() is altered to support changing the PTE permissions.

A new destroy_xen_mappings() is introduced, as the special case of not passing
_PAGE_PRESENT to modify_xen_mappings().

As cleanup (and an ideal functional test), the boot logic which remaps Xen's
code and data with reduced permissions is altered to use
modify_xen_mappings(), rather than map_pages_to_xen() passing the same mfn's
back in.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
9 years agoRevert "HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane."
Konrad Rzeszutek Wilk [Tue, 26 Apr 2016 09:54:08 +0000 (11:54 +0200)]
Revert "HYPERCALL_version_op. New hypercall mirroring XENVER_ but sane."

This reverts commit 2716d875379d538c1dfccad78a99ca7db2e09f90.

As it was decided that the existing XENVER hypercall - while having
grown organically over the years can still be expanded.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Requested-and-acked-by: Jan Beulich <jbeulich@suse.com>
9 years agoRevert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall"
Konrad Rzeszutek Wilk [Tue, 26 Apr 2016 09:53:49 +0000 (11:53 +0200)]
Revert "libxc/libxl/python/xenstat/ocaml: Use new XEN_VERSION hypercall"

This reverts commit d275ec9ca8a86f7c9c213f3551194d471ce90fbd.

As we prefer to still utilize the old XENVER_ hypercall.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/xstate: don't clobber or leak state when using XSAVES
Jan Beulich [Tue, 26 Apr 2016 09:53:18 +0000 (11:53 +0200)]
x86/xstate: don't clobber or leak state when using XSAVES

Commit 4d27280572 ("x86/xsaves: fix overwriting between non-lazy/lazy
xsaves") switched to always saving full state when using compacted
format (which is the only one XSAVES allows). It didn't, however, also
adjust the restore side: In order to save full state, we also need to
make sure we always load full state, or else the subject vCPU's state
would get clobbered by that of the vCPU which happened to last have in
use the respective component(s).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agox86/vMSI-X: avoid missing first unmask of vectors
Jan Beulich [Tue, 26 Apr 2016 09:52:22 +0000 (11:52 +0200)]
x86/vMSI-X: avoid missing first unmask of vectors

Recent changes to Linux result in there just being a single unmask
operation prior to expecting the first interrupts to arrive. However,
we've had a chicken-and-egg problem here: Qemu invokes
xc_domain_update_msi_irq(), ultimately leading to
msixtbl_pt_register(), upon seeing that first unmask operation. Yet
for msixtbl_range() to return true (in order to msixtbl_write() to get
invoked at all) msixtbl_pt_register() must have completed.

Deal with this by snooping suitable writes in msixtbl_range() and
triggering the invocation of msix_write_completion() from
msixtbl_pt_register() when that happens in the context of a still in
progress vector control field write.

Note that the seemingly unrelated deletion of the redundant
irq_desc->msi_desc check in msixtbl_pt_register() is to make clear to
any compiler version used that the "msi_desc" local variable isn't
being used uninitialized. (Doing the same in msixtbl_pt_unregister() is
just for consistency reasons.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs/arm64: clarify the documention for loading XSM support
Fu Wei [Mon, 25 Apr 2016 16:38:57 +0000 (17:38 +0100)]
docs/arm64: clarify the documention for loading XSM support

Improve the clarity of the wording introduced in 67831c4c
"docs/arm64: update the documentation for loading XSM support"

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Fu Wei <fu.wei@linaro.org>
CC: Julien Grall <julien.grall@arm.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: update FLASK cmd line instructions
Doug Goldstein [Mon, 25 Apr 2016 16:15:48 +0000 (17:15 +0100)]
docs: update FLASK cmd line instructions

The command line instructions for FLASK include a note on how to compile
Xen with FLASK but the note was out of date after the change to Kconfig.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibfsimage: fix bad header guard
Doug Goldstein [Mon, 25 Apr 2016 14:39:03 +0000 (09:39 -0500)]
libfsimage: fix bad header guard

The #ifndef / #define value used was not consistent so it did not
function as a proper header guard.

Signed-off-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/MSI-X: correctly track interrupt masking state
Jan Beulich [Mon, 25 Apr 2016 12:23:07 +0000 (14:23 +0200)]
x86/MSI-X: correctly track interrupt masking state

When a guest unmasks MSI-X interrupts before enabling MSI-X on the
device, so far nothing updates the {host,guest}_masked internal state;
this to date only gets done when MSI-X is already enabled. This is why
half way recent Linux works (as it enables MSI-X first), while Windows
doesn't (as it enables MSI-X only after having set up und unmasked all
vectors). Since with a successful write to the vector control field
everything is ready internally, we should also update internal tracking
state there, regardless of the device's MSI-X enabled state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/MMCFG: don't ignore error from intercept handler
Jan Beulich [Mon, 25 Apr 2016 12:22:29 +0000 (14:22 +0200)]
x86/MMCFG: don't ignore error from intercept handler

In commit 9256f66c16 ("x86/PCI: intercept all PV Dom0 MMCFG writes")
for an unclear to me reason I left pci_conf_write_intercept()'s return
value unchecked. Correct this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86/MSI: handle both MSI-X and MSI in cfg space write intercept
Jan Beulich [Mon, 25 Apr 2016 12:21:13 +0000 (14:21 +0200)]
x86/MSI: handle both MSI-X and MSI in cfg space write intercept

In commit aa7c1fdf9d ("x86/MSI: properly track guest masking requests")
I neglected to consider devices allowing for both MSI and MSI-X to be
used (not at the same time of course): The MSI-X part of the intercept
logic needs to fall through to the MSI one when the access is outside
the MSI-X capability bounds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoMAINTAINERS: drop Keir
Jan Beulich [Mon, 25 Apr 2016 12:20:00 +0000 (14:20 +0200)]
MAINTAINERS: drop Keir

... as per his agreement, which got privately forwarded to me by Lars.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agox86emul: special case far branch validation outside of long mode
Jan Beulich [Mon, 25 Apr 2016 12:18:55 +0000 (14:18 +0200)]
x86emul: special case far branch validation outside of long mode

In that case (with the new value being held in, or now in one case cast
to, a 32-bit variable) there's no need to go through the long mode part
of the checks.

Primarily this was meant to hopefully address Coverity ID 1355278, but
since the change produces smaller code as well I think we should use it
even if it doesn't help that (benign) warning.

Also it's more in line with jmp_rel() for commit_far_branch() to do the
_regs.eip update, so adjust that at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: fix compile errors with -Og
Olaf Hering [Tue, 12 Apr 2016 13:55:19 +0000 (15:55 +0200)]
tools: fix compile errors with -Og

At least gcc-4.8 and older fails to recognize that err is always
initialized, the build fails:
  xc_cpupool.c: In function 'xc_cpupool_removecpu':
  xc_cpupool.c:168:5: error: 'err' may be used uninitialized in this function [-Werror=maybe-uninitialized]
    return err;

Fix this in blktap2 and libxc.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoConfig.mk: update seabios revision
Wei Liu [Fri, 22 Apr 2016 18:04:35 +0000 (19:04 +0100)]
Config.mk: update seabios revision

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: pvusb: use %u to convert unsigned number
Wei Liu [Mon, 11 Apr 2016 13:08:03 +0000 (14:08 +0100)]
libxl: pvusb: use %u to convert unsigned number

Both be_domid and fe_domid are unsigned.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxenaccess: minor fixes and extra printouts
Tamas K Lengyel [Thu, 21 Apr 2016 00:39:25 +0000 (18:39 -0600)]
xenaccess: minor fixes and extra printouts

Without specifying the altp2m flag on the response the view never got switched.
Also, add extra information printouts that can be useful during debugging.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs/arm64: update the documentation for loading XSM support
Fu Wei [Thu, 21 Apr 2016 11:07:09 +0000 (19:07 +0800)]
docs/arm64: update the documentation for loading XSM support

This patch updates the documentation for allowing detection of an XSM
module that lacks a specific compatible string.
This mechanism has been added by the commit
ca32012341f3de7d3975407fb963e6028f0d0c8b.

Signed-off-by: Fu Wei <fu.wei@linaro.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen/arm: domain_build: Add PSCI 1.0 compatibility
Dirk Behme [Thu, 21 Apr 2016 05:33:50 +0000 (07:33 +0200)]
xen/arm: domain_build: Add PSCI 1.0 compatibility

Xen needs to blacklist any PSCI node as it will be recreated for DOM0.
Up to now, this was done only for arm,psci and arm,psci-0.2 compatible
nodes. Add PSCI 1.0 compatibility to make device tree nodes with

compatible = "arm,psci-1.0";

blacklisted, too

Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoMAINTAINERS: ARM docs to be maintained by ARM maintainers
Jan Beulich [Fri, 22 Apr 2016 17:01:37 +0000 (18:01 +0100)]
MAINTAINERS: ARM docs to be maintained by ARM maintainers

I've been getting increasingly annoyed by people not applying common
sense to these docs updates.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Julien Grall <julien.grall@arm.com>
9 years agocommitters to be REST maintainers
George Dunlap [Fri, 22 Apr 2016 11:19:23 +0000 (12:19 +0100)]
committers to be REST maintainers

As proposed on the hackathon.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/ept: defer the invalidation until the p2m lock is released
David Vrabel [Tue, 12 Apr 2016 16:19:43 +0000 (17:19 +0100)]
x86/ept: defer the invalidation until the p2m lock is released

Holding the p2m lock while calling ept_sync_domain() is very expensive
since it does an on_selected_cpus() call.  IPIs on many socket
machines can be very slow and on_selected_cpus() is serialized.

It is safe to defer the invalidate until the p2m lock is released
except for two cases:

1. When freeing a page table page (since partial translations may be
   cached).
2. When reclaiming a zero page as part of PoD.

For these cases, add p2m_tlb_flush_sync() calls which will immediately
perform the invalidate before the page is freed or reclaimed.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: George Dunlap <geroge.dunlap@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/xenstat: handle network interface name in uppercase.
Zhigang Wang [Wed, 20 Apr 2016 14:16:35 +0000 (10:16 -0400)]
tools/xenstat: handle network interface name in uppercase.

xentop will segmentation fault in this case:

  # ip link set eth1 down
  # ip link set eth1 name ETH
  # xentop

This patch will let xentop to handle all uppercase network interface name.

Signed-off-by: Zhigang Wang <zhigang.x.wang@oracle.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agodocs: add misc/qemu-backends.txt
Juergen Gross [Mon, 11 Apr 2016 13:04:09 +0000 (15:04 +0200)]
docs: add misc/qemu-backends.txt

Document the interface between qemu and libxl regarding backends
supported by qemu.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agox86: limit GFNs to 32 bits for shadowed superpages.
Tim Deegan [Mon, 14 Mar 2016 11:05:48 +0000 (11:05 +0000)]
x86: limit GFNs to 32 bits for shadowed superpages.

Superpage shadows store the shadowed GFN in the backpointer field,
which for non-BIGMEM builds is 32 bits wide.  Shadowing a superpage
mapping of a guest-physical address above 2^44 would lead to the GFN
being truncated there, and a crash when we come to remove the shadow
from the hash table.

Track the valid width of a GFN for each guest, including reporting it
through CPUID, and enforce it in the shadow pagetables.  Set the
maximum witth to 32 for guests where this truncation could occur.

This is XSA-173.

Reported-by: Ling Liu <liuling-it@360.cn>
Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
9 years agotools/libxc: Correct use of X86_XSS_MASK in guest xstate generation
Andrew Cooper [Tue, 19 Apr 2016 17:27:05 +0000 (18:27 +0100)]
tools/libxc: Correct use of X86_XSS_MASK in guest xstate generation

c/s 75f9455e "tools/libxc: Calculate xstate cpuid leaf from guest information"
incorrectly inverted the shift and mask when using X86_XSS_MASK.  Luckily, the
mask is currently zero, avoiding incorrect calculations.

While adjusting this, use an explcit uint32_t cast rather than masking against
0xffffffff.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen: cpupools: Document XEN_SYSCTL_CPUPOOL_OP_* error codes
Juergen Gross [Fri, 15 Apr 2016 14:54:17 +0000 (16:54 +0200)]
xen: cpupools: Document XEN_SYSCTL_CPUPOOL_OP_* error codes

Requested-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxc: cpupools: adjust retry loop in xc_cpupool_removecpu()
Juergen Gross [Fri, 15 Apr 2016 14:54:16 +0000 (16:54 +0200)]
libxc: cpupools: adjust retry loop in xc_cpupool_removecpu()

Commit 1ef6beea187b ("libxc: do some retries in xc_cpupool_removecpu()
for EBUSY case") added a retry loop in xc_cpupool_removecpu() for the
EBUSY case. As EBUSY was returned in multiple error situations the
loop would have been executed in situations where a retry would not
be successful. Additionally calling sleep(1) between the rerires is a
bad idea when being called in a daemon.

The hypervisor has been changed to return different error values now.
The retry added in above mentioned commit should be done in the
EADDRINUSE case now. As the error condition should last only for a
very short time, the sleep(1) call can be removed.

Requested-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Alan Robinson <alan.robinson@ts.fujitsu.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoxen: cpupools: return different error values for cpupool operations
Juergen Gross [Fri, 15 Apr 2016 14:54:15 +0000 (16:54 +0200)]
xen: cpupools: return different error values for cpupool operations

Today there are several different situations in which moving a cpu
from or to a cpupool will return -EBUSY. This makes it hard for the
user to know what he did wrong, as the Xen tools are not capable to
print a detailed error message.

Depending on the situation return different error codes in order to
enable the tools to print useful messages.

Requested-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl: fix old style declarations
Wei Liu [Sun, 17 Apr 2016 22:36:53 +0000 (23:36 +0100)]
libxl: fix old style declarations

Fix errors like:

/local/work/xen.git/dist/install/usr/local/include/libxl_uuid.h:59:1: error: 'static' is not at beginning of declaration [-Werror=old-style-declaration]
 void static inline libxl_uuid_copy_0x040400(libxl_uuid *dst,
 ^
/local/work/xen.git/dist/install/usr/local/include/libxl_uuid.h:59:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration]

/local/work/xen.git/dist/install/usr/local/include/libxl.h:1233:1: error: 'static' is not at beginning of declaration [-Werror=old-style-declaration]
 int static inline libxl_domain_create_restore_0x040200(
 ^

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agohotplug/Linux: fix same_vm check in block script
Wei Liu [Wed, 13 Apr 2016 17:02:36 +0000 (18:02 +0100)]
hotplug/Linux: fix same_vm check in block script

The original same_vm check has two bugs. When stubdom is in use because
it relies on numeric domid to check if two domains are in fact the same
one.  Another one is that the check would fail when two stubdoms are
checked against each other.

The first bug is fixed by using uuid to identify a domain. The second
bug is fixed by comparing the domains two stubdoms serve.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxl: Fix legacy migration following COLO backchannel breakage
Andrew Cooper [Thu, 14 Apr 2016 19:54:15 +0000 (20:54 +0100)]
tools/libxl: Fix legacy migration following COLO backchannel breakage

c/s f5d947bf1b "tools/libxl: add back channel support to read stream"
made a bogus adjustment to libxl__stream_read_start(), including
removing the comment hinting at what was going on, which breaks
conversion of a legacy migration stream.

Symptoms look like:

  root@anonymi:~ # xl migrate domU host
  migration target: Ready to receive domain.
  Saving to migration stream new xl format (info 0x1/0x0/2677)
  xc: error: error polling suspend notification channel: -1: Internal error
  Loading new save file <incoming migration stream> (new xl fmt info 0x1/0x0/2677)
   Savefile contains xl domain config in JSON format
  Parsing config from <saved>
  libxl: error: libxl_stream_read.c:327:stream_header_done: Invalid ident: expected 0x4c6962786c466d74, got 0x01f00f0000000000
  libxl: error: libxl_utils.c:430:libxl_read_exactly: file/stream truncated reading ipc msg header from domain 1 save/restore helper stdout pipe

The adjustment is not required for backchannel support (as there is no
interaction between back channels and legacy conversion), and caused
stream->fd to be latched in the datacopier before legacy conversion
substitutes it for the fd which is the output of the conversion script.

This causes libxl to consume data from the legacy stream rather than the
v2 stream, and for the conversion script to encounter an error as the
legacy stream appears to skip ahead.

Undo the adjustments to libxl__stream_read_start(), and introduce a
better description of what is going on.  Introduce some extra assertions
to try and catch similar breakage in the future.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wen Congyang <wency@cn.fujitsu.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Olaf Hering <olaf@aepfle.de>
9 years agoxen: change the sizes of memory fields in the HVM start info to be 64bits
Roger Pau Monne [Tue, 12 Apr 2016 16:00:28 +0000 (18:00 +0200)]
xen: change the sizes of memory fields in the HVM start info to be 64bits

At the moment the only consumer of this structure is x86, but other arches
might also use it, so make all the fields 64bits. On x86 Xen will still try
to place everything below the 4GiB boundary, but that might not be feasible
in other arches.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Requested-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agolibxl/save: set domain_suspend_state->domid in do_domain_soft_reset()
Vitaly Kuznetsov [Mon, 11 Apr 2016 12:20:04 +0000 (14:20 +0200)]
libxl/save: set domain_suspend_state->domid in do_domain_soft_reset()

c/s d5c693d "libxl/save: Refactor libxl__domain_suspend_state" broke soft
reset as libxl__domain_suspend_device_model() now fails when domid in not set
in libxl__domain_suspend_state.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agoConfig.mk: update mini-os changeset
Wei Liu [Fri, 15 Apr 2016 10:35:03 +0000 (11:35 +0100)]
Config.mk: update mini-os changeset

[commits pulled in are:
  Fix time update
  Clean arch/x86/time.c
  Mini-OS: netfront: fix off-by-one error introduced in 7c8f3483
-iwj]

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
9 years agodrivers/pl011: ACPI: The interrupt should always be high level triggered
Julien Grall [Mon, 11 Apr 2016 13:33:33 +0000 (14:33 +0100)]
drivers/pl011: ACPI: The interrupt should always be high level triggered

The SPCR does not specify if the interrupt is edge or level triggered.
So the configuration needs to be hardcoded in the code.

Based on the PL011 TRM (see 2.2.8 in ARM DDI 0183G), the interrupt generated
will be active high. Whilst the wording may be interpreted differently,
the SBSA (section 4.3.2 in ARM-DEN-0029 v2.3) states the PL011 is
implemented with a level triggered interrupt.

So the driver should configure the interrupt as high level triggered.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
9 years agoxen: sched: fix spinlock issue in schedule_cpu_switch().
Dario Faggioli [Mon, 11 Apr 2016 16:27:01 +0000 (18:27 +0200)]
xen: sched: fix spinlock issue in schedule_cpu_switch().

Commit 94734ab7c3f5 ("xen: sched: close potential races
when switching scheduler to CPUs") buggily replaced a call
to pcpu_schedule_lock_irq() with just pcpu_schedule_lock(),
causing the relevant irq_safe vs. non-irq_safe ASSERT()
in check_lock() to trigger.

Fix that.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
9 years agox86/pv: Correctly fold vIOPL back into vcpu_guest_context
Andrew Cooper [Mon, 11 Apr 2016 09:03:55 +0000 (10:03 +0100)]
x86/pv: Correctly fold vIOPL back into vcpu_guest_context

c/s f71ecb6 "x86: introduce a new VMASSIST for architectural behaviour of
iopl" shifted the vcpu iopl field by 12, but didn't update the logic which
reconstructs the guests eflags for migration.

Existing guest kernels set a vIOPL of 1, to prevent them from faulting when
accessing IO ports.  This bug manifests as a crash after migrate, as the vIOPL
reverts back to the default of 0, and the guest suffers an unexpected #GP
fault.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
9 years agoxen/arm: acpi: Print more error messages in acpi_map_gic_cpu_interface
Julien Grall [Mon, 11 Apr 2016 13:33:37 +0000 (14:33 +0100)]
xen/arm: acpi: Print more error messages in acpi_map_gic_cpu_interface

It's helpful to spot any error without having to modify the hypervisor
code.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
9 years agoxen/arm: acpi: Remove uncessary check in acpi_map_gic_cpu_interface
Julien Grall [Mon, 11 Apr 2016 13:33:36 +0000 (14:33 +0100)]
xen/arm: acpi: Remove uncessary check in acpi_map_gic_cpu_interface

This part of the code will never be executed when the entry
corresponds to the boot CPU.

Also print an error message rather when arch_cpu_init has failed.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
9 years agoxen/arm: acpi: Fix SMP support when booting with ACPI
Julien Grall [Mon, 11 Apr 2016 13:33:35 +0000 (14:33 +0100)]
xen/arm: acpi: Fix SMP support when booting with ACPI

The variable enabled_cpus is used to know the number of CPU enabled in
the MADT.

Currently this variable is used to check the validity of the boot CPU.
It will be considered invalid when "enabled_cpus > 1".

However, this condition also means that multiple CPUs are present on the
system. So secondary will never be brought up.

The correct way to check the validity of the boot CPU is to use the
variable bootcpu_valid.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Shannon Zhao <shannon.zhao@linaro.org>
9 years agoxen/arm: acpi: The boot CPU does not always match the first entry in the MADT
Julien Grall [Mon, 11 Apr 2016 13:33:34 +0000 (14:33 +0100)]
xen/arm: acpi: The boot CPU does not always match the first entry in the MADT

Since the ACPI 6.0 errata document [1], the first entry in the MADT
does not have to correspond to the boot CPU.

Introduce a new variable to know if a MADT entry matching the boot CPU
is found. Furthermore, it's not necessary to check if the MPIDR is
duplicated for the boot CPU. So the rest of the function can be skipped.

[1] 1380 Unnecessary restrictions to FW vendors in ordering of GIC structures
in MADT

Signed-off-by: Julien Grall <julien.grall@arm.com>
9 years agox86: Alter nmi_callback_t typedef
Konrad Rzeszutek Wilk [Wed, 30 Mar 2016 17:45:59 +0000 (13:45 -0400)]
x86: Alter nmi_callback_t typedef

Drop paranthesis and function pointer on nmi_callback_t typedef.

Make it more inline with how x86 maintainers want function
typedefs to be.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
9 years agotools/libxc: Calculate xstate cpuid leaf from guest information
Andrew Cooper [Tue, 24 Nov 2015 14:49:49 +0000 (14:49 +0000)]
tools/libxc: Calculate xstate cpuid leaf from guest information

The existing logic is broken for heterogeneous migration.  By always
advertising the host maximum xstate, a migration to a less capable host always
fails as Xen cannot accomodate the xcr0_accum in the migration stream.

By calculating xstate from the feature information (which a multi-host
toolstack will have levelled appropriately), the guest will have the current
hosts maximum xstate advertised, allowing for correct migration to less
capable hosts.

In addition, some further improvements and corrections:
 - don't discard the known flags in sub-leaves 2..63 ECX
 - zap sub-leaves beyond 62
 - zap all bits in leaf 1, EBX/ECX.  No XSS features are currently supported.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxc: Use featuresets rather than guesswork
Andrew Cooper [Wed, 7 Oct 2015 15:51:54 +0000 (16:51 +0100)]
tools/libxc: Use featuresets rather than guesswork

It is conceptually wrong to base a VM's featureset on the features visible to
the toolstack which happens to construct it.

Instead, the featureset used is either an explicit one passed by the
toolstack, or the default which Xen believes it can give to the guest.

Collect all the feature manipulation into a single function which adjusts the
featureset, and perform deep dependency removal.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools/libxc: Wire a featureset through to cpuid policy logic
Andrew Cooper [Tue, 6 Oct 2015 15:01:37 +0000 (16:01 +0100)]
tools/libxc: Wire a featureset through to cpuid policy logic

Later changes (Patch titled "tools/libxc: Use featuresets rather than
guesswork") will cause the cpuid generation logic to seed their
information from a featureset.  This patch adds the infrastructure to
specify a featureset, and will obtain the appropriate defaults from Xen
if omitted.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
9 years agotools: Utility for dealing with featuresets
Andrew Cooper [Thu, 4 Feb 2016 22:42:50 +0000 (22:42 +0000)]
tools: Utility for dealing with featuresets

It is able to reports the current featuresets; both the static masks and
dynamic featuresets from Xen, or to decode an arbitrary featureset into
`/proc/cpuinfo` style strings.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
9 years agotools/libxc: Expose the automatically generated cpu featuremask information
Andrew Cooper [Mon, 25 Jan 2016 17:07:13 +0000 (17:07 +0000)]
tools/libxc: Expose the automatically generated cpu featuremask information

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>