]> xenbits.xensource.com Git - xen.git/log
xen.git
8 years agox86/cpu: Print CPU Family/Vendor infomation in both decimal and hexidecimal
Andrew Cooper [Mon, 12 Sep 2016 09:07:35 +0000 (10:07 +0100)]
x86/cpu: Print CPU Family/Vendor infomation in both decimal and hexidecimal

Different manuals use different representations.

A new sample looks like:

(XEN) CPU Vendor: Intel, Family 6 (0x6), Model 60 (0x3c), Stepping 3 (raw 000306c3)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
8 years agolibxl: dont pass array size to libxl__xs_kvs_of_flexarray()
Juergen Gross [Thu, 8 Sep 2016 07:20:23 +0000 (09:20 +0200)]
libxl: dont pass array size to libxl__xs_kvs_of_flexarray()

Instead of passing the array size as an argument when calling
libxl__xs_kvs_of_flexarray() let the function get the size from the
array instead.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: add libxl__qmp_run_command_flexarray() function
Juergen Gross [Thu, 8 Sep 2016 07:20:22 +0000 (09:20 +0200)]
libxl: add libxl__qmp_run_command_flexarray() function

Add a function libxl__qmp_run_command_flexarray() to run a qmp command
with an array of arguments. The arguments are name-value pairs stored
in a flexarray.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: rename libxl_pvusb.c to libxl_usb.c
Juergen Gross [Thu, 8 Sep 2016 07:20:21 +0000 (09:20 +0200)]
libxl: rename libxl_pvusb.c to libxl_usb.c

Rename libxl_pvusb.c to libxl_usb.c in order to reflect future support
of USB passthrough via qemu emulated USB controllers.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/shadow: Use standard C array designators
Andrew Cooper [Mon, 12 Sep 2016 08:33:31 +0000 (08:33 +0000)]
x86/shadow: Use standard C array designators

Clang identifies:

  multi.c:82:23: error: use of GNU 'missing =' extension in
  designator [-Werror,-Wgnu-designator]
      [ft_prefetch]     "prefetch",

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
8 years agoarm/vm_event: get/set registers
Tamas K Lengyel [Mon, 1 Aug 2016 17:59:14 +0000 (11:59 -0600)]
arm/vm_event: get/set registers

Add support for getting/setting registers through vm_event on ARM. Only
TTB/CR/R0/R1, PC and CPSR are sent as part of a request and only PC is set
as part of a response. The set of registers can be expanded in the future to
include other registers as well if necessary.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86,arm: Change arch_livepatch_quiesce() declaration.
Konrad Rzeszutek Wilk [Mon, 22 Aug 2016 18:41:41 +0000 (14:41 -0400)]
x86,arm: Change arch_livepatch_quiesce() declaration.

On ARM we need an alternative VA region to poke in the
hypervisor .text data. And since this is setup during runtime
we may fail (it uses vmap so most likely error is ENOMEM).

As such this error needs to be bubbled up and also abort
the livepatching if it occurs.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoarm64/insn: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions
Konrad Rzeszutek Wilk [Tue, 9 Aug 2016 03:38:54 +0000 (23:38 -0400)]
arm64/insn: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions

This is copied from Linux 4.7, and the initial commit
that put this in is 5c5bf25d4f7a950382f94fc120a5818197b48fe9
"arm64: introduce aarch64_insn_gen_{nop|branch_imm}() helper functions"

This lays the groundwork for Livepatch to generate the
trampoline to jump to the new replacement function.
Also allows us to NOP the callsites.

Acked-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
--
Cc: Ross Lagerwall <ross.lagerwall@citrix.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Julien Grall <julien.grall@arm.com>
RFC: First submission
v1: The full copy of insn_gen_branch instead of just the code to make branch
v2: Added Julien's Ack.
    Remove the duplicate paragraph in the commit message.

8 years agoalternatives: x86 rename and change parameters on ARM
Konrad Rzeszutek Wilk [Wed, 17 Aug 2016 02:20:54 +0000 (22:20 -0400)]
alternatives: x86 rename and change parameters on ARM

On x86 we squash 'apply_alternatives' in to
'alternative_instructions' (who was its sole user)
and 'apply_alternatives_nocheck' to 'apply_alternatives'.

On ARM we change the parameters for 'apply_alternatives'
to be of 'const struct alt_instr *' instead of void pointer and
size length.

We also add 'const' and make the arguments be on the
proper offset.

Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> [x86 bits]
Reviewed-by: Julien Grall <julien.grall@arm.com> [ARM bits]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agox86/arm64: Expose the ALT_[ORIG|REPL]_PTR macros to header files.
Konrad Rzeszutek Wilk [Fri, 12 Aug 2016 20:11:27 +0000 (16:11 -0400)]
x86/arm64: Expose the ALT_[ORIG|REPL]_PTR macros to header files.

That way common code can use the same macro to access
the most common attributes without much #ifdef.

Take advantage of it right away in the livepatch code.

Note: on ARM we use tabs to conform to the style of the file.

Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agolivepatch: Bubble up sanity checks on Elf relocs
Konrad Rzeszutek Wilk [Fri, 12 Aug 2016 20:03:18 +0000 (16:03 -0400)]
livepatch: Bubble up sanity checks on Elf relocs

The checks for SHT_REL[,A] ELF sanity checks does not need to
be in the platform specific file and can be bubbled up
in the platform agnostic file.

This makes the ARM 32/64 implementation easier as the
duplicate checks don't have to be in the platform specific files.

Acked-by: Jan Beulich <jbeulich@suse.com> [x86 part]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoxen/arm: alternative: Make it possible to patch outside of the hypervisor
Julien Grall [Fri, 9 Sep 2016 08:40:08 +0000 (09:40 +0100)]
xen/arm: alternative: Make it possible to patch outside of the hypervisor

With livepatch the alternatives that should be patched are outside of
the Xen hypervisor _start -> _end. The current code is assuming that
only Xen could be patched and therefore will explode when a payload
contains alternatives.

Given that alt_instr contains a relative offset, the function
__apply_alternatives could directly take in parameter the virtual
address of the alt_instr set of the re-mapped region. So we can mandate
the callers of __apply_alternatives to provide use with a region that has
read-write access.

The only caller that will patch directly the Xen binary is the function
__apply_alternatives_multi_stop. The other caller apply_alternatives
will work on the payload which will still have read-write access at that
time.

Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm: alternative: Clean-up __apply_alternatives
Julien Grall [Fri, 9 Sep 2016 08:40:07 +0000 (09:40 +0100)]
xen/arm: alternative: Clean-up __apply_alternatives

This patch contains only renaming and comment update. There are no
functional changes:
    - Don't mix _start and _stext, they both point to the same address
    but the former makes more sense (we are mapping the Xen binary, not
    only the text section).
    - s/text_mfn/xen_mfn/ and s/text_order/xen_order/ to make clear that
    we map the Xen binary.
    - Mention about inittext as alternative may patch this section.
    - Use 1U instead of 1 in shift

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/x86: Fix build with clang following c/s 4fa0105
Andrew Cooper [Thu, 8 Sep 2016 17:52:46 +0000 (18:52 +0100)]
xen/x86: Fix build with clang following c/s 4fa0105

https://travis-ci.org/xen-project/xen/jobs/158494027#L2344

Clang complains:

  emulate.c:2016:14: error: comparison of unsigned enum expression < 0
  is always false [-Werror,-Wtautological-compare]
      if ( seg < 0 || seg >= ARRAY_SIZE(hvmemul_ctxt->seg_reg) )
           ~~~ ^ ~

Clang is wrong to raise a warning like this.  The signed-ness of an enum is
implementation defined in C, and robust code must not assume the choices made
by the compiler.

In this case, dropping the < 0 check creates a latent bug which would result
in an array underflow when compiled with a compiler which chooses a signed
enum.

Work around the bug by explicitly pulling seg into an unsigned integer, and
only perform the upper bounds check.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agoRemove ambiguities in the COPYING file; add CONTRIBUTING file
Lars Kurth [Fri, 12 Aug 2016 09:37:28 +0000 (10:37 +0100)]
Remove ambiguities in the COPYING file; add CONTRIBUTING file

COPYING file:
The motivation of this change is to make it easier for new
contributors to conduct a license and patent review, WITHOUT
changing any licenses.
- Remove references to BSD-style licenses as we have more
  common license exceptions and replace with "other license
  stanzas"
- List the most common situations under which code is licensed
  under licenses other than GPLv2 (section "Licensing Exceptions")
- List the most common non-GPLv2 licenses that are in use in
  this repository based on a recent FOSSology scan (section
  "Licensing Exceptions")
- List other license related conventions within the project
  to make it easier to conduct a license review.
- Clarify the incoming license as its omission has confused
  past contributors (section "Contributions")

CONTRIBUTION file:
The motivation of this file is to make it easier for contributors
to find contribution related resources. Add information on existing
license related conventions to avoid unintentional future licensing
issues. Provide templates for copyright headers for the most commonly
used licenses in this repository.

Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/hvm: Perform a user instruction fetch for a FEP in userspace
Andrew Cooper [Thu, 16 Jun 2016 13:36:44 +0000 (14:36 +0100)]
x86/hvm: Perform a user instruction fetch for a FEP in userspace

This matches hardware behaviour, and prevents erroneous failures when a guest
has SMEP/SMAP active and issues a FEP from userspace.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/hvm: Optimise segment accesses in hvmemul_write_segment()
Andrew Cooper [Sat, 2 Jul 2016 15:29:49 +0000 (16:29 +0100)]
x86/hvm: Optimise segment accesses in hvmemul_write_segment()

There is no need to read the segment information from VMCS/VMCB and cache it,
just to clobber the cached content immediately afterwards.

Write straight into the cache and set the accessed/dirty bits.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/segment: Bounds check accesses to emulation ctxt->seg_reg[]
Andrew Cooper [Fri, 1 Jul 2016 00:02:04 +0000 (01:02 +0100)]
x86/segment: Bounds check accesses to emulation ctxt->seg_reg[]

HVM HAP codepaths have space for all segment registers in the seg_reg[]
cache (with x86_seg_none still risking an array overrun), while the shadow
codepaths only have space for the user segments.

Range check the input segment of *_get_seg_reg() against the size of the array
used to cache the results, to avoid overruns in the case that the callers
don't filter their input suitably.

Subsume the is_x86_user_segment(seg) checks from the shadow code, which were
an incomplete attempt at range checking, and are now superceeded.  Make
hvm_get_seg_reg() static, as it is not used outside of shadow/common.c

No functional change, but far easier to reason that no overflow is possible.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agohvm/fep: Allow testing of instructions crossing the -1 -> 0 virtual boundary
Andrew Cooper [Fri, 12 Aug 2016 13:35:28 +0000 (14:35 +0100)]
hvm/fep: Allow testing of instructions crossing the -1 -> 0 virtual boundary

The Force Emulation Prefix is named to follow its PV counterpart for cpuid or
rdtsc, but isn't really an instruction prefix.  It behaves as a break-out into
Xen, with the purpose of emulating the next instruction in the current state.

It is important to be able to test legal situations which occur in real
hardware, including instruction which cross certain boundaries, and
instructions starting at 0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agofix EFI part of "symbols: Generate an xen-sym.map"
Jan Beulich [Thu, 8 Sep 2016 15:32:56 +0000 (17:32 +0200)]
fix EFI part of "symbols: Generate an xen-sym.map"

Commit 6ea24e53f1 introduced two problems: It left out a semicolon and
typo-ed the source file name of the EFI map file install command.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agoRevert "tools: remove blktap2 related code and documentation"
Wei Liu [Thu, 8 Sep 2016 15:15:59 +0000 (16:15 +0100)]
Revert "tools: remove blktap2 related code and documentation"

This reverts commit 3f0ae679f2704ca5671eef5be59ec30982fbf08a.

8 years agoRevert "tools: remove blktap2 source code"
Wei Liu [Thu, 8 Sep 2016 15:15:47 +0000 (16:15 +0100)]
Revert "tools: remove blktap2 source code"

This reverts commit 44b2829a8b97a8b04e063a93303dbe3a468642e3.

8 years agolibelf: drop pointless uses of __FUNCTION__
Jan Beulich [Thu, 8 Sep 2016 12:17:05 +0000 (14:17 +0200)]
libelf: drop pointless uses of __FUNCTION__

Non-debugging message text should be (and is in the cases here, albeit
often only with the addition of an ELF: prefix) distinguishable without
also logging function names.

In the messages touched at once use %#x (or variants thereof) in favor
of 0x%x.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agox86/shadow: Avoid overflowing sh_ctxt->seg_reg[]
Andrew Cooper [Fri, 1 Jul 2016 00:02:04 +0000 (01:02 +0100)]
x86/shadow: Avoid overflowing sh_ctxt->seg_reg[]

hvm_get_seg_reg() does not perform a range check on its input segment, calls
hvm_get_segment_register() and writes straight into sh_ctxt->seg_reg[].

x86_seg_none is outside the bounds of sh_ctxt->seg_reg[], and will hit a BUG()
in {vmx,svm}_get_segment_register().

HVM guests running with shadow paging can end up performing a virtual to
linear translation with x86_seg_none.  This is used for addresses which are
already linear.  However, none of this is a legitimate pagetable update, so
fail the emulation in such a case.

This is XSA-187 / CVE-2016-7094.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
8 years agox86/emulate: Correct boundary interactions of emulated instructions
Andrew Cooper [Fri, 22 Jul 2016 16:02:54 +0000 (16:02 +0000)]
x86/emulate: Correct boundary interactions of emulated instructions

This reverts most of c/s 0640ffb6 "x86emul: fix rIP handling".

Experimentally, in long mode processors will execute an instruction stream
which crosses the 64bit -1 -> 0 virtual boundary, whether the instruction
boundary is aligned on the virtual boundary, or is misaligned.

In compatibility mode, Intel processors will execute an instruction stream
which crosses the 32bit -1 -> 0 virtual boundary, while AMD processors raise a
segmentation fault.  Xen's segmentation behaviour matches AMD.

For 16bit code, hardware does not ever truncated %ip.  %eip is always used and
behaves normally as a 32bit register, including in 16bit protected mode
segments, as well as in Real and Unreal mode.

This is XSA-186 / CVE-2016-7093.

Reported-by: Brian Marcotte <marcotte@panix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/32on64: don't allow recursive page tables from L3
Jan Beulich [Thu, 8 Sep 2016 12:14:53 +0000 (14:14 +0200)]
x86/32on64: don't allow recursive page tables from L3

L3 entries are special in PAE mode, and hence can't reasonably be used
for setting up recursive (and hence linear) page table mappings. Since
abuse is possible when the guest in fact gets run on 4-level page
tables, this needs to be excluded explicitly.

This is XSA-185 / CVE-2016-7092.

Reported-by: Jérémie Boutoille <jboutoille@ext.quarkslab.com>
Reported-by: "栾尚聪(好风)" <shangcong.lsc@alibaba-inc.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/paging: Make paging_mode_*() predicates behave like predicates
Andrew Cooper [Tue, 14 Jun 2016 11:45:56 +0000 (12:45 +0100)]
x86/paging: Make paging_mode_*() predicates behave like predicates

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoarm/arm64: Update comment about VA layout.
Konrad Rzeszutek Wilk [Mon, 22 Aug 2016 18:29:55 +0000 (14:29 -0400)]
arm/arm64: Update comment about VA layout.

It was missing 2MB.

Reviewed-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agox86/arm: Make 'make debug' work properly.
Konrad Rzeszutek Wilk [Sat, 13 Aug 2016 02:15:04 +0000 (22:15 -0400)]
x86/arm: Make 'make debug' work properly.

When doing cross-compilation we should use proper $(OBJDUMP).
Otherwise decompiling say ARM 32 code using x86 objdump
won't help much.

Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agosymbols: Generate an xen-sym.map
Konrad Rzeszutek Wilk [Mon, 18 Jul 2016 16:36:24 +0000 (12:36 -0400)]
symbols: Generate an xen-sym.map

You could construct _most_ of the names of the functions
by doing 'nm --defined' but unfortunatly you do not get the
<file> prefix that is added on in Xen . For example:

$ cat xen-syms.symbols |grep do_domain_pause
0xffff82d080104920 t domain.c#do_domain_pause
$ nm --defined xen-syms|grep do_domain_pause
ffff82d080104920 t do_domain_pause

This is normally not an issue, but if one is doing livepatching and
wants during build-time verify that the symbols the livepatch payloads
will patch do correspond to the one the hypervisor has built - this helps a lot.

Note that during runtime one can do:
[root@localhost xen]# cat /proc/xen/xensyms |grep do_domain_pause
ffff82d080104920 t domain.c#do_domain_pause

But one may not want to build and verify a livepatch on the same host.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agolivepatch: Move code from prepare_payload to own routine
Konrad Rzeszutek Wilk [Wed, 10 Aug 2016 13:53:52 +0000 (09:53 -0400)]
livepatch: Move code from prepare_payload to own routine

Specifically the code that is looking up f->old_addr - which
can be in its own routine instead of having it part of prepare_payload.

No functional change.

Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoversion: Print build-id at bootup.
Konrad Rzeszutek Wilk [Tue, 6 Sep 2016 16:18:10 +0000 (12:18 -0400)]
version: Print build-id at bootup.

Livepatch expected at some point to be able to print the
build-id during bootup, which it did not.  The reason is
that xen_build_init and livepatch_init are both __initcall
type routines. This meant that when livepatch_init called
xen_build_id, it would return -ENODATA as build_id_len was
not setup yet (b/c xen_build_init would be called later).

The original patch fixed this by calling xen_build_init in
livepatch_init which allows us to print the build-id of
the hypervisor.

However the x86 maintainers pointed out that build-id
is independent of Livepatch and in fact should print
regardless whether Livepatch is enabled or not.

Therefore this patch moves the logic of printing the build-id
to version.c.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agoversion/livepatch: Move xen_build_id_check to version.h
Konrad Rzeszutek Wilk [Tue, 9 Aug 2016 14:31:28 +0000 (10:31 -0400)]
version/livepatch: Move xen_build_id_check to version.h

It makes more sense for it to be there. However that
means the version.h has now a dependency on <xen/elfstructs.h>
as the Elf_Note is a macro.

The elfstructs.h has a dependency on types.h as well so
we need that. We cannot put that #include <xen/types.h>
in elfstructs.h as that file is used by tools and they
do not have such file.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agolivepatch: Deal with payloads without any .text
Konrad Rzeszutek Wilk [Thu, 11 Aug 2016 01:04:43 +0000 (21:04 -0400)]
livepatch: Deal with payloads without any .text

It is possible. Especially if the only thing they do is
NOP functions - in which case there is only .livepatch.funcs
sections.

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agox86/HVM: adjust feature checking in MSR intercept handling
Jan Beulich [Wed, 7 Sep 2016 10:35:40 +0000 (12:35 +0200)]
x86/HVM: adjust feature checking in MSR intercept handling

Consistently consult hvm_cpuid(). With that, BNDCFGS gets better
handled outside of VMX specific code, just like XSS. Don't needlessly
check for MTRR support when the MSR being accessed clearly is not an
MTRR one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoVMX: correct feature checks for MPX and XSAVES
Jan Beulich [Wed, 7 Sep 2016 10:34:43 +0000 (12:34 +0200)]
VMX: correct feature checks for MPX and XSAVES

Their VMCS fields aren't tied to the respective base CPU feature flags
but instead to VMX specific ones.

Note that while the VMCS GUEST_BNDCFGS field exists if either of the
two respective features is available, MPX continues to get exposed to
guests only with both features present.

Also add the so far missing handling of
- GUEST_BNDCFGS in construct_vmcs()
- MSR_IA32_BNDCFGS in vmx_msr_{read,write}_intercept()
and mirror the extra correctness checks during MSR write to
vmx_load_msr().

Reported-by: "Rockosov, Dmitry" <dmitry.rockosov@intel.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: "Rockosov, Dmitry" <dmitry.rockosov@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/altp2m: use __get_gfn_type_access to avoid lock conflicts
Tamas K Lengyel [Wed, 7 Sep 2016 10:33:57 +0000 (12:33 +0200)]
x86/altp2m: use __get_gfn_type_access to avoid lock conflicts

Use __get_gfn_type_access instead of get_gfn_type_access when checking
the hostp2m entries during altp2m mem_access setting and gfn remapping
to avoid a lock conflict which can make dom0 freeze. During mem_access
setting the hp2m is already locked. For gfn remapping we change the flow
to lock the hp2m before locking the ap2m.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agoreplace bogus -ENOSYS uses
Jan Beulich [Wed, 7 Sep 2016 10:32:31 +0000 (12:32 +0200)]
replace bogus -ENOSYS uses

This doesn't cover all of them, just the ones that I think would most
obviously better be -EINVAL or -EOPNOTSUPP.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agoxen: make clear gcov support limitation in Kconfig
Wei Liu [Tue, 6 Sep 2016 11:02:29 +0000 (12:02 +0100)]
xen: make clear gcov support limitation in Kconfig

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen: replace TEST_COVERAGE with CONFIG_GCOV
Wei Liu [Thu, 1 Sep 2016 13:58:28 +0000 (14:58 +0100)]
xen: replace TEST_COVERAGE with CONFIG_GCOV

The sole purpose of TEST_COVERAGE macro is to guard the availability of
gcov sysctl. Now we have a proper CONFIG_GCOV, use it.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agogcov: collect more sections to constructor list
Wei Liu [Thu, 1 Sep 2016 12:06:57 +0000 (13:06 +0100)]
gcov: collect more sections to constructor list

The version of gcc (4.9.2) I use put constructors into .init_array*
section(s). Collect those sections into constructor list as well.

Modify both arm and x86 scripts to keep them in sync.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen: indicate gcov in log messages
Wei Liu [Fri, 2 Sep 2016 13:43:25 +0000 (14:43 +0100)]
xen: indicate gcov in log messages

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agodocs: document old SUSE/Novell unplug for HVM
Olaf Hering [Fri, 2 Sep 2016 09:32:55 +0000 (11:32 +0200)]
docs: document old SUSE/Novell unplug for HVM

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
8 years agox86/hypercall: Reduce the size of the hypercall tables
Andrew Cooper [Mon, 26 Jan 2015 15:21:30 +0000 (15:21 +0000)]
x86/hypercall: Reduce the size of the hypercall tables

The highest populated entry in each hypercall table is currently at index 49.
There is no need to extend both to tables to 64 entries.

Range check eax against the hypercall table array size, and use a
BUILD_BUG_ON() to ensure that the hypercall tables don't grow larger than the
args table.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/hypercall: Merge the hypercall arg tables
Andrew Cooper [Mon, 26 Jan 2015 15:11:59 +0000 (15:11 +0000)]
x86/hypercall: Merge the hypercall arg tables

For the same reason as c/s 33a231e3f "x86/HVM: fold hypercall tables" and
c/s d6d67b047 "x86/pv: Merge the pv hypercall tables", this removes the
risk of accidentally updating only one of the tables.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/pv: Merge the pv hypercall tables
Andrew Cooper [Mon, 26 Jan 2015 14:46:12 +0000 (14:46 +0000)]
x86/pv: Merge the pv hypercall tables

For the same reason as c/s 33a231e3f "x86/HVM: fold hypercall tables", this
removes the risk of accidentally updating only one of the tables.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/multicall: Rework arch multicall handling
Andrew Cooper [Mon, 26 Jan 2015 14:30:43 +0000 (14:30 +0000)]
xen/multicall: Rework arch multicall handling

The x86 multicall handling was previously some very hairy inline assembly, and
is hard to follow and maintain.

Replace the existing do_multicall_call() with arch_do_multicall_call().  The
x86 side needs to handle both compat and non-compat calls, so pass the full
multicall state, rather than just the multicall_entry sub-structure.

On the ARM side, alter the prototype to match, but there is no resulting
functional change.  On the x86 side, the implementation is now in plain C.

This allows the removal of both asm/multicall.h header files.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/hypercall: Move the hypercall tables into C
Andrew Cooper [Mon, 26 Jan 2015 14:15:23 +0000 (14:15 +0000)]
x86/hypercall: Move the hypercall tables into C

Editing (and indeed, finding) the hypercall tables can be tricky, especially
towards the end where .rept's are used to maintain the correct layout.

Move this all into C, and let the compiler do the hard work.

To do this, xen/hypercall.h and asm-x86/hypercall.h need to contain prototypes
for all the hypercalls; some were previously missing.  This in turn requires
some shuffling of definitions and includes.

One difference is that NULL function pointers are used instead of
{,compat_}do_ni_hypercall(), which pv_hypercall() handles correctly.  All
ni_hypercall() infrastructure is therefore dropped.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/pv: Implement pv_hypercall() in C
Andrew Cooper [Mon, 26 Jan 2015 12:01:00 +0000 (12:01 +0000)]
x86/pv: Implement pv_hypercall() in C

In a similar style to hvm_do_hypercall().  The C version is far easier to
understand and edit than the assembly versions.

There are a few small differences however.  The register clobbering values
have changed (to match the HVM side), and in particular clobber the upper
32bits of 64bit arguments.  The hypercall and performance counter record are
reordered to increase code sharing between the 32bit and 64bit cases.

The sole callers of __trace_hypercall_entry() were the assembly code.  Given
the new C layout, it is more convenient to fold __trace_hypercall_entry() into
pv_hypercall(), and call __trace_hypercall() directly.

Finally, pv_hypercall() will treat a NULL hypercall function pointer as
-ENOSYS, allowing further cleanup.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/hypercall: Move the hypercall arg tables into C
Andrew Cooper [Mon, 26 Jan 2015 11:25:43 +0000 (11:25 +0000)]
x86/hypercall: Move the hypercall arg tables into C

Editing (and indeed, finding) the hypercall args tables can be tricky,
especially towards the end where .rept's are used to maintain the correct
layout.

Move this all into C, and let the compiler do the hard work.  As 0 is the
default value, drop all explicit 0's.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/pv: Support do_set_segment_base() for compat guests
Andrew Cooper [Fri, 15 Jul 2016 13:12:01 +0000 (13:12 +0000)]
x86/pv: Support do_set_segment_base() for compat guests

set_segment_base is the only hypercall exists in only one of the two modes
guests might run in; all other hypercalls are either implemented, or
unimplemented in both modes.

Remove this split, by allowing do_set_segment_base() to be called in the
compat hypercall path.  This change will simplify the verification logic in a
later change.

No behavioural change from a guests point of view.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
8 years agox86/hypercall: Move some of the hvm hypercall infrastructure into hypercall.h
Andrew Cooper [Mon, 26 Jan 2015 11:10:02 +0000 (11:10 +0000)]
x86/hypercall: Move some of the hvm hypercall infrastructure into hypercall.h

It will be reused for PV hypercalls in subsequent changes.

 * Rename hvm_hypercall_t to hypercall_fn_t
 * Introduce hypercall_table_t

Finally, rework the #includes for hypercall.h so it may be included in
isolation.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <JBeulich@suse.com>
8 years agoConfig.mk: update OVMF commit
Wei Liu [Tue, 6 Sep 2016 11:54:47 +0000 (12:54 +0100)]
Config.mk: update OVMF commit

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agoConfig.mk: update seabios to 1.9.3 release
Wei Liu [Tue, 6 Sep 2016 11:50:44 +0000 (12:50 +0100)]
Config.mk: update seabios to 1.9.3 release

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: add config parameter for maximum memory of xenstore domain
Juergen Gross [Mon, 8 Aug 2016 08:28:29 +0000 (10:28 +0200)]
tools: add config parameter for maximum memory of xenstore domain

Add a parameter to xencommons configuration file for specifying the
maximum memory size of the xenstore domain.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agostubdom: add CONFIG_BALLOON to xenstore config
Juergen Gross [Mon, 8 Aug 2016 08:28:28 +0000 (10:28 +0200)]
stubdom: add CONFIG_BALLOON to xenstore config

Compile xenstore stubdom with ballooning support.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools: add --maxmem parameter to init-xenstore-domain
Juergen Gross [Mon, 8 Aug 2016 08:28:27 +0000 (10:28 +0200)]
tools: add --maxmem parameter to init-xenstore-domain

Add a parameter to specify the maximum memory size of the xenstore
domain. In case the xenstore domain supports ballooning it will be
capable to adjust its own size according to its memory needs.

The maximum memory size can be specified as an absolute value in
MiB, as a fraction of the host's memory, or as a combination of
both (the maximum of the absolute and the fraction value):

--maxmem <m>             maxmem is <m> MiB
--maxmem <a>/<b>         maxmem is hostmem * a / b
--maxmem <m>:<a>/<b>     maxmem is max(<m> MiB, hostmem * a / b)

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools: remove blktap2 source code
Wei Liu [Mon, 15 Aug 2016 11:05:44 +0000 (12:05 +0100)]
tools: remove blktap2 source code

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agotools: remove blktap2 related code and documentation
Wei Liu [Mon, 15 Aug 2016 10:32:56 +0000 (11:32 +0100)]
tools: remove blktap2 related code and documentation

Blktap2 is effectively dead code for a few years.

Notable changes in this patch:

0. Unhook blktap2 from build system
1. Now libxl no longer supports TAP disk backend, appropriate assertions
   are added and some code paths now return ERROR_FAIL
2. Tap is no longer a supported backend in doc
3. Remove relevant entries in MAINTAINERS

A patch to actually remove blktap2 directory will come later.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: George Dunlap <George.Dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agox86: correct CPUID output for out of bounds input
Jan Beulich [Tue, 6 Sep 2016 08:19:18 +0000 (10:19 +0200)]
x86: correct CPUID output for out of bounds input

Another place where we should try to behave sufficiently close to how
real hardware does; see the code comments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agomem_access: sanitize code around sending vm_event request
Tamas K Lengyel [Tue, 6 Sep 2016 08:17:46 +0000 (10:17 +0200)]
mem_access: sanitize code around sending vm_event request

The two functions monitor_traps and mem_access_send_req duplicate some of the
same functionality. The mem_access_send_req however leaves a lot of the
standard vm_event fields to be filled by other functions.

Remove mem_access_send_req() completely, making use of monitor_traps() to put
requests into the monitor ring.  This in turn causes some cleanup around the
old callsites of mem_access_send_req(). We also update monitor_traps to now
include setting the common vcpu_id field so that all other call-sites can ommit
this step.

Finally, this change identifies that errors from mem_access_send_req() were
never checked.  As errors constitute a problem with the monitor ring,
crashing the domain is the most appropriate action to take.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoRevert "x86: allow disabling sm{e,a}p for Xen itself"
Jan Beulich [Mon, 5 Sep 2016 13:04:53 +0000 (15:04 +0200)]
Revert "x86: allow disabling sm{e,a}p for Xen itself"

This reverts commit 5fdea6577098eda065c794c79e1ae23f33f103af,
which is still buggy.

8 years agolibxl: do not assume Dom0 backend while getting nic info
Marek Marczykowski-Górecki [Mon, 5 Sep 2016 09:26:04 +0000 (11:26 +0200)]
libxl: do not assume Dom0 backend while getting nic info

Fill backend_domid field based on backend path.

Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/firmware: Rename bios.bin to seabios.bin
Wei Liu [Mon, 5 Sep 2016 10:36:45 +0000 (11:36 +0100)]
tools/firmware: Rename bios.bin to seabios.bin

bios.bin as a name is far too generic.  Rename it to seabios.bin.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: fix up conflict, rerun autogen.sh ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agolibxl: update flex output files for DSA 3653-2
Wei Liu [Mon, 5 Sep 2016 09:21:28 +0000 (10:21 +0100)]
libxl: update flex output files for DSA 3653-2

We updated flex output files in 4b314c89 ("libxl: update flex output
files") for DSA 3653-1 / CVE-2016-6354. But Debian security team
discovered the fix to flex was incomplete and issued DSA 3653-2. We need
to update our flex output files accordingly.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agox86: allow disabling sm{e,a}p for Xen itself
He Chen [Mon, 5 Sep 2016 10:49:43 +0000 (12:49 +0200)]
x86: allow disabling sm{e,a}p for Xen itself

SMEP/SMAP is a security feature to prevent kernel executing/accessing
user address involuntarily, any such behavior will lead to a page fault.

SMEP/SMAP is open (in CR4) for both Xen and HVM guest in earlier code.
SMEP/SMAP bit set in Xen CR4 would enforce security checking for 32-bit
PV guest which will suffer unknown SMEP/SMAP page fault when guest
kernel attempt to access user address although SMEP/SMAP is close for
PV guests.

This patch introduces a new boot option value "hvm" for "sm{e,a}p", it
is going to diable SMEP/SMAP for Xen hypervisor while enable them for
HVM. In this way, 32-bit PV guest will not suffer SMEP/SMAP security
issue. Users can choose whether open SMEP/SMAP for Xen itself,
especially when they are going to run 32-bit PV guests.

Signed-off-by: He Chen <he.chen@linux.intel.com>
[jbeulich: doc and style adjustments]
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agohave __DEFINE_COMPAT_HANDLE() generate const versions
Razvan Cojocaru [Mon, 5 Sep 2016 10:47:46 +0000 (12:47 +0200)]
have __DEFINE_COMPAT_HANDLE() generate const versions

Both DEFINE_XEN_GUEST_HANDLE() and __DEFINE_XEN_GUEST_HANDLE()
each produce both const and non-const handles,
only DEFINE_COMPAT_HANDLE() does (__DEFINE_COMPAT_HANDLE()
does not). This patch has __DEFINE_COMPAT_HANDLE() also
produce a const handle.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/monitor: include EAX/ECX in CPUID monitor events
Tamas K Lengyel [Mon, 5 Sep 2016 10:47:16 +0000 (12:47 +0200)]
x86/monitor: include EAX/ECX in CPUID monitor events

Extend the CPUID monitor event to include EAX and ECX values that were used
when CPUID was executed. This is useful in identifying which leaf was queried.
We also adjust the xen-access output format to more closely resemble the output
of the Linux cpuid tool's raw format.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/cpuid: AVX-512 feature detection
Luwei Kang [Mon, 5 Sep 2016 10:46:13 +0000 (12:46 +0200)]
x86/cpuid: AVX-512 feature detection

AVX512 is an extention of AVX2. Its spec can be found at:
https://software.intel.com/sites/default/files/managed/b4/3a/319433-024.pdf
This patch detects AVX512 features by CPUID.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agotools: delete gtraceview and gtracestat
Wei Liu [Mon, 15 Aug 2016 15:27:27 +0000 (16:27 +0100)]
tools: delete gtraceview and gtracestat

There has not been any substantial update to them since 2011. My quick
check shows that they don't work.

Just delete them. It would be easy to resurrect them from git log should
people still need them.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agox86/mm: drop pointless use of __FUNCTION__
Jan Beulich [Fri, 2 Sep 2016 12:22:28 +0000 (14:22 +0200)]
x86/mm: drop pointless use of __FUNCTION__

Non-debugging message text should be (and is here) distinguishable
without also logging function names.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
8 years agox86emul: check alignment of SSE and AVX memory operands
Jan Beulich [Fri, 2 Sep 2016 12:20:23 +0000 (14:20 +0200)]
x86emul: check alignment of SSE and AVX memory operands

It only now occurred to me that there's no new hook needed to do so.
Eliminate the two work item comments.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agomemory: fix compat handling of XENMEM_access_op
Jan Beulich [Fri, 2 Sep 2016 12:19:51 +0000 (14:19 +0200)]
memory: fix compat handling of XENMEM_access_op

Within compat_memory_op() this needs to be placed in the first switch()
statement, or it ends up being dead code (as that first switch() has a
default case chaining to compat_arch_memory_op()).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/PV: make PMU MSR handling consistent
Jan Beulich [Fri, 2 Sep 2016 12:19:29 +0000 (14:19 +0200)]
x86/PV: make PMU MSR handling consistent

So far accesses to Intel MSRs on an AMD system fall through to the
default case, while accesses to AMD MSRs on an Intel system bail (in
the RDMSR case without updating EAX and EDX). Make the "AMD MSRs on
Intel" case match the "Intel MSR on AMD" one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86: correct PT_NOTE file position
Jan Beulich [Fri, 2 Sep 2016 12:18:52 +0000 (14:18 +0200)]
x86: correct PT_NOTE file position

Program and section headers disagreed about the file offset at which
the build ID note lives.

Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agocredit1: fix a race when picking initial pCPU for a vCPU
Dario Faggioli [Fri, 2 Sep 2016 12:17:55 +0000 (14:17 +0200)]
credit1: fix a race when picking initial pCPU for a vCPU

In the Credit1 hunk of 9f358ddd69463 ("xen: Have
schedulers revise initial placement") csched_cpu_pick()
is called without taking the runqueue lock of the
(temporary) pCPU that the vCPU has been assigned to
(e.g., in XEN_DOMCTL_max_vcpus).

However, although 'hidden' in the IS_RUNQ_IDLE() macro,
that function does access the runq (for doing load
balancing calculations). Two scenarios are possible:
 1) we are on cpu X, and IS_RUNQ_IDLE() peeks at cpu's
    X own runq;
 2) we are on cpu X, but IS_RUNQ_IDLE() peeks at some
    other cpu's runq.

Scenario 2) absolutely requies that the appropriate
runq lock is taken. Scenario 1) works even without
taking the cpu's own runq lock. That is actually what
happens when when _csched_pick_cpu() is called from
csched_vcpu_acct() (in turn, called by csched_tick()).

Races have been observed and reported (by both XenServer
own testing and OSSTest [1]), in the form of
IS_RUNQ_IDLE() falling over LIST_POISON, because we're
not currently holding the proper lock, in
csched_vcpu_insert(), when scenario 1) occurs.

However, for better robustness, from now on we always
ask for the proper runq lock to be held when calling
IS_RUNQ_IDLE() (which is also becoming a static inline
function instead of macro).

In order to comply with that, we take the lock around
the call to _csched_cpu_pick() in csched_vcpu_acct().

[1] https://lists.xen.org/archives/html/xen-devel/2016-08/msg02144.html

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
8 years agoxen/trace: Turn the stub debugtrace_{dump,printk}() macros into functions
Andrew Cooper [Sat, 2 Jul 2016 10:43:02 +0000 (11:43 +0100)]
xen/trace: Turn the stub debugtrace_{dump,printk}() macros into functions

This allows printf format checking to be performed, and for
debugtrace_printk() to evaluate its arguments, even if debugtrace is disabled
at compile time.

No intended change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/shadow: More consistent printing for debug messages
Andrew Cooper [Sat, 2 Jul 2016 10:28:13 +0000 (11:28 +0100)]
x86/shadow: More consistent printing for debug messages

 * Use %pv or just d%d in preference to the multiple current ways of
   presenting the same information.
 * Use PRI_mfn instead of opencoding it.
 * Drop all explicit use of __func__ from SHADOW_{PRINTK,DEBUG}() calls.  The
   wrappers already include it.
 * Use hex rather than decimal for printing a pagefault error code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
8 years agostubdom: support Mini-OS config for Mini-OS apps
Juergen Gross [Tue, 30 Aug 2016 14:53:39 +0000 (16:53 +0200)]
stubdom: support Mini-OS config for Mini-OS apps

Mini-OS apps need to be compiled with the appropriate config settings
of Mini-OS, as there are various dependencies on those settings in
header files included by the apps.

Enhance stubdom Makefile to set the appropriate CPPFLAGS when calling
the apps' make.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: fold in change to Config.mk to update mini-os commit ]

8 years agolibxl: fix libxl_device_usbdev_list()
Juergen Gross [Fri, 2 Sep 2016 08:16:14 +0000 (10:16 +0200)]
libxl: fix libxl_device_usbdev_list()

Commit 03814de1d2ecdabedabceb8e728d934a632a43b9 ("libxl: Do not trust
frontend for vusb") introduced an error in libxl_device_usbdev_list().
Fix it.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agodoc: fix some typos
Juergen Gross [Thu, 1 Sep 2016 11:02:45 +0000 (13:02 +0200)]
doc: fix some typos

Fix some typos in docs/man/xl.cfg.pod.5.in

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools/migrate: Prevent PTE truncation from being fatal duing the live phase
Andrew Cooper [Thu, 1 Sep 2016 09:45:03 +0000 (10:45 +0100)]
tools/migrate: Prevent PTE truncation from being fatal duing the live phase

It is possible, when normalising a PV pagetable that the table has been freed
and reused for something else by the guest.

In such a case, data read might no longer be a pagetable, and fail the
truncation check.  However, this should only be fatal if we encounter such a
page in the paused phase.

This check is now consistent with all other checks in the same area.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
8 years agox86/levelling: fix breakage on older Intel boxes from c/s 08e7738
Andrew Cooper [Fri, 2 Sep 2016 06:12:29 +0000 (08:12 +0200)]
x86/levelling: fix breakage on older Intel boxes from c/s 08e7738

cpufeat_mask() yields an unsigned integer constant.  As a result, taking its
complement causes zero extention rather than sign extention.

The result is that, when a guest OS has OXSAVE disabled, all features in 1d
are hidden from native CPUID.  Amongst other things, this causes the early
code in Linux to find no LAPIC, but for everything to appear fine later when
userspace is up and running.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Jan Beulich <jbeulich@suse.com>
8 years agox86: drop leftover uses of regparm attribute
Jan Beulich [Thu, 1 Sep 2016 13:24:20 +0000 (15:24 +0200)]
x86: drop leftover uses of regparm attribute

These were relevant only for 32-bit builds on Xen.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/32on64: misc adjustments to call gate emulation
Jan Beulich [Thu, 1 Sep 2016 13:23:46 +0000 (15:23 +0200)]
x86/32on64: misc adjustments to call gate emulation

- There's no 32-bit displacement in 16-bit addressing mode.
- It is wrong to ASSERT() anything on parts of an instruction fetched
  from guest memory.
- The two scaling bits of a SIB byte don't affect whether there is a
  scaled index register or not.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86: drop pointless uses of __func__ / __FUNCTION__
Jan Beulich [Thu, 1 Sep 2016 13:21:06 +0000 (15:21 +0200)]
x86: drop pointless uses of __func__ / __FUNCTION__

Non-debugging message text should be (and is in the cases here)
distinguishable without also logging function names. Debugging message
text, otoh, already includes file name and line number, so also
logging function names is redundant. One relatively pointless debugging
message gets removed altogether. In another case a missing log level
specifier gets added at once.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/EFI: use less crude a way of generating the build ID
Jan Beulich [Thu, 1 Sep 2016 13:19:40 +0000 (15:19 +0200)]
x86/EFI: use less crude a way of generating the build ID

Recent enough binutils (2.25 onwards) support --build-id also for
COFF/PE output, and hence we should use that in favor of the original
hack when possible.

This gets complicated by the linker requiring at least one COFF object
file to attach the .buildid section to. Hence the patch introduces a
buildid.ihex (in order to avoid introducing binary files into the repo)
which then gets converted to a binary minimal COFF object (no sections,
no symbols).

Also (to avoid both code fragment going out of sync) remove an unneeded
ALIGN() from xen.lds.S: Adding an equivalent of it to the .buildid
section would cause the _erodata symbol to become associated with the
wrong section again (see commit 0970299de5 ["x86/EFI + Live Patch:
avoid symbol address truncation"]). And it's pointless because the
alignment already gets properly set by the input section(s).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/levelling: Provide architectural OSXSAVE handling to masked native CPUID
Andrew Cooper [Mon, 22 Aug 2016 16:50:55 +0000 (17:50 +0100)]
x86/levelling: Provide architectural OSXSAVE handling to masked native CPUID

Contrary to c/s b2507fe7 "x86/domctl: Update PV domain cpumasks when setting
cpuid policy", Intel CPUID masks are applied after fast forwarding hardware
state, rather than before.  (All behaviour in this regard appears completely
undocumented by both Intel and AMD).

Therefore, a set bit in the MSR causes hardware to be fast-forwarded, while a
clear bit forces the guests view to 0, even if Xen's CR4.OSXSAVE is actually
set.

This allows Xen to provide an architectural view of a guest kernels
CR4.OSXSAVE setting to any native CPUID instruction issused by guest kernel or
userspace, even when masking is used.

The masking value defaults to 1 (if the guest has XSAVE available) to cause
fast-forwarding to occur for the HVM and idle vcpus.

When setting the MSRs, a PV guest kernel's choice of OXSAVE is taken into
account, and clobbered from the MSR if not set.  This causes the
fast-forwarding of Xen's CR4 state not to happen.

As a side effect however, levelling potentially need updating on all PV CR4
changes.

Reported-by: Jan Beulich <JBeulich@suse.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling()
Andrew Cooper [Mon, 22 Aug 2016 16:39:44 +0000 (17:39 +0100)]
x86/levelling: Pass a vcpu rather than a domain to ctxt_switch_levelling()

A subsequent change needs to special-case OSXSAVE handling, which is per-vcpu
rather than per-domain.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agox86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID
Andrew Cooper [Tue, 23 Aug 2016 10:10:12 +0000 (11:10 +0100)]
x86/levelling: Restrict non-architectural OSXSAVE handling to emulated CPUID

There is no need to extend the workaround to the faulted CPUID view, as
Linux's dependence on the workaround is stricly via the emulated view.

This causes a guest kernel faulted CPUID to observe architectural behaviour
with respect to its CR4.OSXSAVE setting.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/Kconfig: Misc tweaks
Andrew Cooper [Thu, 18 Aug 2016 15:15:39 +0000 (16:15 +0100)]
xen/Kconfig: Misc tweaks

 * Drop one piece of trailing whitespace
 * Reposition LATE_HWDOM so it sits properly nested inside XSM in menuconfig

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
8 years agoxen/Kconfig: Drop redundant comments from Kconfig files
Andrew Cooper [Thu, 18 Aug 2016 12:14:05 +0000 (13:14 +0100)]
xen/Kconfig: Drop redundant comments from Kconfig files

Most of the comments are duplicated from the help text, and those without help
provide no useful additional input.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
8 years agox86/PV: remove read_descriptor()'s regs parameter
Jan Beulich [Wed, 31 Aug 2016 16:15:07 +0000 (18:15 +0200)]
x86/PV: remove read_descriptor()'s regs parameter

As of commit a35dc6ccbb ("x86: remove the use of vm86_mode()") it is
unused.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agopassthrough: fix a BUG_ON issue
Feng Wu [Wed, 31 Aug 2016 16:13:47 +0000 (18:13 +0200)]
passthrough: fix a BUG_ON issue

The 'idx' can equal to the max number of vCPUs, fix it.

Signed-off-by: Feng Wu <feng.wu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen: add a gcov Kconfig option
Wei Liu [Wed, 31 Aug 2016 15:26:52 +0000 (16:26 +0100)]
xen: add a gcov Kconfig option

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
8 years agoxen: fix gcov compilation
Wei Liu [Wed, 31 Aug 2016 15:26:51 +0000 (16:26 +0100)]
xen: fix gcov compilation

Currently enabling gcov in hypervisor won't build because although
26c9d03d ("gcov: Adding support for coverage information") claimed that
%.init.o files were excluded from applying compilation options, it was
in fact not true.

Fix that by filtering out the options correctly. Because the dependency
of stub.o in x86 EFI build can't be eliminated easily and we prefer a
generalised method going forward, we introduce nogcov-y to explicitly
mark objects that don't need to build with gcov support.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Jan Beulich <JBeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoarm64: use "b" to branch to start_xen
Wei Liu [Wed, 31 Aug 2016 15:26:50 +0000 (16:26 +0100)]
arm64: use "b" to branch to start_xen

The cbz instruction has range limitation. When compiled with gcov
support the object is larger so cbz can't handle that anymore. The error
message is like:

aarch64-linux-gnu-ld    -EL  -T xen.lds -N prelink.o \
    /local/work/xen.git/xen/common/symbols-dummy.o -o /local/work/xen.git/xen/.xen-syms.0
prelink.o: In function `launch':
/local/work/xen.git/xen/arch/arm/arm64/head.S:602:(.text+0x408): relocation truncated to fit: R_AARCH64_CONDBR19 against symbol `start_xen' defined in .init.text section in prelink.o

Use "b" instead.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
8 years agoarm: acpi/boot.c is only used during initialisation
Wei Liu [Wed, 31 Aug 2016 15:26:49 +0000 (16:26 +0100)]
arm: acpi/boot.c is only used during initialisation

That file should contain code and data used during initialisation only.

Mark it as such in build system and correctly annotate enabled_cpus.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
8 years agolibxl: update flex output files
Wei Liu [Fri, 26 Aug 2016 10:11:46 +0000 (11:11 +0100)]
libxl: update flex output files

Libxl ships output files from flex (libxlu_*_l.{c,h}). We use the flex
shipped in Debian to generate those files. Debian just patched their
flex (DSA 3653-1) to fix CVE-2016-6354, which is a buffer overrun bug.

Note that libxl is _NOT_ vulnerable to that CVE. See below for Ian's
analysis to security@xen.

It would still be nice that we update our shipped flex output files to
avoid confusion.

===QUOTE===

The bug is that with input >16K[1] flex would usually fail to resize
the input buffer, and then overrun it.

I have read the code in libxlu_cfg_l.c to try to understand the
implications for libxl.

AFAICT
 - libxl always does config file reading _from the file_ itself, and
   provides flex with a string or buffer.
 - so we always call whatever_yy_scan_bytes, not any other flex setup
   function to set up a `buffer' (as flex calls it)
 - yy_scan_bytes calls yy_scan_buffer to set up the buffer
 - yy_scan_buffer sets b->yy_fill_buffer
 - The effect of this is that yy_get_next_buffer will always
   return early, rather than continuing on to the vulnerable code.

So I think libxl is not vulnerable, regardless of the contents of the
configuration file.

[1] the default buffer size, or whatever other buffer size is
configured (but we don't change it)

===ENDQUOTE===

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agolibxc: correct max_pfn calculation for saving domain
Juergen Gross [Fri, 26 Aug 2016 11:58:55 +0000 (13:58 +0200)]
libxc: correct max_pfn calculation for saving domain

Commit 91e204d37f44913913776d0a89279721694f8b32 ("libxc: try to find
last used pfn when migrating") introduced a bug for the case of a
domain supporting the virtual mapped linear p2m list: the maximum pfn
of the domain calculated from the p2m memory allocation might be too
low.

Correct this.

Reported-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Tested-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>