]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agohvmloader: avoid tests when they would clobber used memory
Jan Beulich [Fri, 19 May 2017 14:04:38 +0000 (16:04 +0200)]
hvmloader: avoid tests when they would clobber used memory

First of all limit the memory range used for testing to 4Mb: There's no
point placing page tables right above 8Mb when they can equally well
live at the bottom of the chunk at 4Mb - rep_io_test() cares about the
5Mb...7Mb range only anyway. In a subsequent patch this will then also
allow simply looking for an unused 4Mb range (instead of using a build
time determined one).

Extend the "skip tests" condition beyond the "is there enough memory"
question.

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Gary Lin <glin@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoUse non-debug build for Xen 4.9
Julien Grall [Thu, 18 May 2017 15:38:29 +0000 (16:38 +0100)]
Use non-debug build for Xen 4.9

Modify Config.mk and Kconfig.debug to disable debug by default in
preparation for late RCs and eventual release.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
7 years agolibxl/devd: move the device allocation/removal code
Roger Pau Monne [Tue, 16 May 2017 07:59:25 +0000 (08:59 +0100)]
libxl/devd: move the device allocation/removal code

Move the device addition/removal code to the {add/remove}_device functions.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agolibxl/devd: correctly manipulate the dguest list
Roger Pau Monne [Tue, 16 May 2017 07:59:24 +0000 (08:59 +0100)]
libxl/devd: correctly manipulate the dguest list

Current code in backend_watch_callback has two issues when manipulating the
dguest list:

1. backend_watch_callback forgets to remove a libxl__ddomain_guest from the
list of tracked domains when the related data is freed, causing dereferences
later on when the list is traversed. Make sure that a domain is always removed
from the list when freed.

2. A spurious device state change can cause a dguest to be freed, with active
devices and without being removed from the list. Fix this by always checking if
a dguest has active devices before freeing and removing it.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Reinis Martinsons <admin@frp.lv>
Suggested-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agolibxl/devd: fix a race with concurrent device addition/removal
Roger Pau Monne [Tue, 16 May 2017 07:59:23 +0000 (08:59 +0100)]
libxl/devd: fix a race with concurrent device addition/removal

Current code can free the libxl__device inside of the libxl__ddomain_device
before the addition has finished if a removal happens while an addition is
still in process:

  backend_watch_callback
            |
            v
       add_device
            |                 backend_watch_callback
    (async operation)                   |
            |                           v
            |                     remove_device
            |                           |
            |                           V
            |                    device_complete
            |                 (free libxl__device)
            v
     device_complete
  (deref libxl__device)

Fix this by creating a temporary copy of the libxl__device, that's tracked by
the GC of the nested async operation. This ensures that the libxl__device used
by the async operations cannot be freed while being used.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agobuild: more adjustments to top-level Makefile dependencies
Wei Liu [Fri, 19 May 2017 11:55:26 +0000 (12:55 +0100)]
build: more adjustments to top-level Makefile dependencies

In the original code, top-level dist target unconditionally invokes
dist target for tools/include, which is wrong when tools component is
not enabled.

Make dist-tools depend on dist-tools-public-headers, which depends on
build-tools-public-headers.

Discovered by Travis-CI.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
elease-acked-by: Julien Grall <julien.grall@arm.com>

7 years agoarm: fix build with gcc 7
Jan Beulich [Fri, 19 May 2017 08:12:08 +0000 (10:12 +0200)]
arm: fix build with gcc 7

The compiler dislikes duplicate "const", and the ones it complains
about look like they we in fact meant to be placed differently.

Also fix array_access_okay() (just like on x86), despite the construct
being unused on ARM: -Wint-in-bool-context, enabled by default in
gcc 7, doesn't like multiplication in conditional operators. "Hide" it,
at the risk of the next compiler version becoming smarter and
recognizing even that. (The hope is that added smartness then would
also better deal with legitimate cases like the one here.) The change
could have been done in access_ok(), but I think we better keep it at
the place the compiler is actually unhappy about.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86: fix build with gcc 7
Jan Beulich [Fri, 19 May 2017 08:11:36 +0000 (10:11 +0200)]
x86: fix build with gcc 7

-Wint-in-bool-context, enabled by default in gcc 7, doesn't like
multiplication in conditional operators. "Hide" them, at the risk of
the next compiler version becoming smarter and recognizing even those.
(The hope is that added smartness then would also better deal with
legitimate cases like the ones here.)

The change could have been done in access_ok(), but I think we better
keep it at the places the compiler is actually unhappy about.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoxmalloc: correct _xmalloc_array() indentation
Jan Beulich [Fri, 19 May 2017 08:10:49 +0000 (10:10 +0200)]
xmalloc: correct _xmalloc_array() indentation

It's been wrongly indented using tabs till now, and the stray blank
ahead of the final return statement gets in the way of using .i files
for detailed analysis of other compiler issues
(-Wmisleading-indentation kicks in due to the tab->space
transformation done in the course of pre-processing).

Also add missing spaces inside the if() at once, including the similar
case in _xzalloc_array().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agobuild: add missing dependency
Wei Liu [Thu, 18 May 2017 10:57:32 +0000 (11:57 +0100)]
build: add missing dependency

Commit f745b55 missed install-tools' dependency on
build-tools-public-headers.

Discovered by Travis-CI.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agobuild: stubdom and tools should depend on public header target
Wei Liu [Wed, 17 May 2017 14:26:08 +0000 (15:26 +0100)]
build: stubdom and tools should depend on public header target

Build can fail if stubdom build is run before tools build because:

1. tools/include build uses relative path and depends on XEN_OS
2. stubdom needs tools/include to be built, at which time XEN_OS is
   mini-os and corresponding symlinks are created
3. libraries inside tools needs tools/include to be built, at which
   time XEN_OS is the host os name, but symlinks won't be created
   because they are already there
4. libraries get the wrong headers and fail to build

Since both tools and stubdom build need the public headers, we build
tools/include before stubdom and tools. Remove runes in stubdom and
tools to avoid building tools/include more than once.

Provide a new dist target for tools/include.  Hook up the install,
clean, dist and distclean targets for tools/include.

The new arrangement ensures tools build gets the correct headers
because XEN_OS is set to host os when building tools/include. As for
stubdom, it explicitly links to the mini-os directory without relying
on XEN_OS so it should be fine.

Reported-by: Steven Haigh <netwiz@crc.id.au>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Tested-by: Steven Haigh <netwiz@crc.id.au>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
7 years agotools/xenconsoled: Preserve errno while rotating logfile handles
Andrew Cooper [Tue, 16 May 2017 13:57:27 +0000 (14:57 +0100)]
tools/xenconsoled: Preserve errno while rotating logfile handles

The logic to optionally exit after a poll() error relies on errno, but
handle_log_reload() does not preserve it.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agolibxl/arm: Fix ARM build.
Andrii Anisov [Tue, 16 May 2017 15:57:53 +0000 (18:57 +0300)]
libxl/arm: Fix ARM build.

Initialise *size in default branch to prevent certain compilers (i.e.
Linaro GCC 5.2-2015.11-2) from reporting "variable may be used uninitialized"
errors in caller function.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/ioreq_server: make p2m_finish_type_change actually work
Xiong Zhang [Wed, 17 May 2017 15:24:45 +0000 (17:24 +0200)]
x86/ioreq_server: make p2m_finish_type_change actually work

Commit 6d774a951696 ("x86/ioreq server: synchronously reset outstanding
p2m_ioreq_server entries when an ioreq server unmaps") introduced
p2m_finish_type_change(), which was meant to synchronously finish a
previously initiated type change over a gpfn range.  It did this by
calling get_entry(), checking if it was the appropriate type, and then
calling set_entry().

Unfortunately, a previous commit (1679e0df3df6 "x86/ioreq server:
asynchronously reset outstanding p2m_ioreq_server entries") modified
get_entry() to always return the new type after the type change, meaning
that p2m_finish_type_change() never changed any entries.  Which means
when an ioreq server was detached and then re-attached (as happens in
XenGT on reboot) the re-attach failed.

Fix this by using the existing p2m-specific recalculation logic instead
of doing a read-check-write loop.

Fix: 'commit 6d774a951696 ("x86/ioreq server: synchronously reset
      outstanding p2m_ioreq_server entries when an ioreq server unmaps")'

Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/mm: fix incorrect unmapping of 2MB and 1GB pages
Igor Druzhinin [Wed, 17 May 2017 15:23:15 +0000 (17:23 +0200)]
x86/mm: fix incorrect unmapping of 2MB and 1GB pages

The same set of functions is used to set as well as to clean
P2M entries, except that for clean operations INVALID_MFN (~0UL)
is passed as a parameter. Unfortunately, when calculating an
appropriate target order for a particular mapping INVALID_MFN
is not taken into account which leads to 4K page target order
being set each time even for 2MB and 1GB mappings. This eventually
breaks down an EPT structure irreversibly into 4K mappings which
prevents consecutive high order mappings to this area.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/pv: Fix the handling of `int $x` for vectors which alias exceptions
Andrew Cooper [Mon, 15 May 2017 12:05:45 +0000 (13:05 +0100)]
x86/pv: Fix the handling of `int $x` for vectors which alias exceptions

The claim at the top of c/s 2e426d6eecf "x86/traps: Drop use_error_code
parameter from do_{,guest_}trap()" is only actually true for hardware
exceptions.  It is not true for `int $x` instructions (which never push error
code), irrespective of whether the vector aliases an exception or not.

Furthermore, c/s 6480cc6280e "x86/traps: Fix failed ASSERT() in
do_guest_trap()" really should have helped highlight that a regression had
been introduced.

Modify pv_inject_event() to understand event types other than
X86_EVENTTYPE_HW_EXCEPTION, and introduce pv_inject_sw_interrupt() for the
`int $x` handling code.

Add further assertions to pv_inject_event() concerning the type of events
passed in, which in turn requires that do_guest_trap() set its type
appropriately (which is now used exclusively for hardware exceptions).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoinclude: fix build without C++ compiler installed
Jan Beulich [Fri, 12 May 2017 06:52:54 +0000 (00:52 -0600)]
include: fix build without C++ compiler installed

The rule for headers++.chk wants to move headers++.chk.new to the
designated target, which means we have to create that file in the first
place.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoioemu-stubdom: don't link *-softmmu* and *-linux-user*
Wei Liu [Fri, 12 May 2017 15:21:06 +0000 (16:21 +0100)]
ioemu-stubdom: don't link *-softmmu* and *-linux-user*

They are generated by ./configure. Having them linked can cause race
between tools build and stubdom build.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agotools: don't require unavailable optional libraries in pkg-config files
Juergen Gross [Fri, 12 May 2017 13:10:51 +0000 (15:10 +0200)]
tools: don't require unavailable optional libraries in pkg-config files

blktap2 is optional, so there should be no pkg-config file requiring
xenblktapctl if it isn't enabled for the build.

Add a filter mechanism to tools/Rules.mk to filter out optional
libraries.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agopublic/elfnote: document non-alignment of relocated init-P2M 4.9.0-rc5
Jan Beulich [Fri, 12 May 2017 15:24:17 +0000 (17:24 +0200)]
public/elfnote: document non-alignment of relocated init-P2M

Since PV kernels can't use large pages anyway, when the init-P2M
support was added it was decided to keep the implementation simple and
not align large pages in PFN space. Document this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoFix broken package config file xenlight.pc.in
Charles Arnold [Thu, 11 May 2017 16:29:42 +0000 (10:29 -0600)]
Fix broken package config file xenlight.pc.in

The Requires line in this config file uses the wrong names for two dependencies.

The package config file for xenctrl is called 'xencontrol' and for blktapctl is
called 'xenblktapctl'. Running a command like 'pkg-config --exists xenlight' will
fail without this fix.

Signed-off-by: Charles Arnold <carnold@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agoxl: don't ignore return value from libxl_device_events_handler
Wei Liu [Fri, 12 May 2017 10:02:58 +0000 (11:02 +0100)]
xl: don't ignore return value from libxl_device_events_handler

That function can return a whole slew of error codes. Translate them
to EXIT_FAILURE.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agolibxenforeignmemory: bump minor version number
Wei Liu [Wed, 10 May 2017 11:51:09 +0000 (12:51 +0100)]
libxenforeignmemory: bump minor version number

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/pv: Align %rsp before pushing the failsafe stack frame
Andrew Cooper [Fri, 5 May 2017 16:38:19 +0000 (17:38 +0100)]
x86/pv: Align %rsp before pushing the failsafe stack frame

Architecturally, all 64bit stacks are aligned on a 16 byte boundary before an
exception frame is pushed.  The failsafe frame should not special in this
regard.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agox86/pv: Fix bugs with the handling of int80_bounce
Andrew Cooper [Fri, 5 May 2017 16:38:19 +0000 (17:38 +0100)]
x86/pv: Fix bugs with the handling of int80_bounce

Testing has revealed two issues:

 1) Passing a NULL handle to set_trap_table() is intended to flush the entire
    table.  The 64bit guest case (and 32bit guest on 32bit Xen, when it
    existed) called init_int80_direct_trap() to reset int80_bounce, but c/s
    cda335c279 which introduced the 32bit guest on 64bit Xen support omitted
    this step.  Previously therefore, it was impossible for a 32bit guest to
    reset its registered int80_bounce details.

 2) init_int80_direct_trap() doesn't honour the guests request to have
    interrupts disabled on entry.  PVops Linux requests that interrupts are
    disabled, but Xen currently leaves them enabled when following the int80
    fastpath.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm: Survive unknown traps from guests
Julien Grall [Fri, 5 May 2017 14:30:36 +0000 (15:30 +0100)]
xen/arm: Survive unknown traps from guests

Currently we crash Xen if we see an ESR_EL2.EC value we don't recognise.
As configurable disables/enables are added to the architecture
(controlled by RES1/RESO bits respectively), with associated synchronous
exceptions, it may be possible for a guest to trigger exceptions with
classes that we don't recognise.

While we can't service these exceptions in a manner useful to the guest,
we can avoid bringing down the host. Per ARM DDI 0487A.k_iss10775, page
D7-1937, EC values within the range 0x00 - 0x2c are reserved for future
use with synchronous exceptions, and EC within the range 0x2d - 0x3f may
be used for either synchronous or asynchronous exceptions.

The patch makes Xen handle any unknown EC by injecting an UNDEFINED
exception into the guest, with a corresponding (ratelimited) warning in
the log.

This patch is based on Linux commit f050fe7a9164 "arm: KVM: Survive unknown
traps from the guest".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: do_trap_hypervisor: Separate hypervisor and guest traps
Julien Grall [Fri, 5 May 2017 14:30:35 +0000 (15:30 +0100)]
xen/arm: do_trap_hypervisor: Separate hypervisor and guest traps

The function do_trap_hypervisor is currently handling both trap coming
from the hypervisor and the guest. This makes difficult to get specific
behavior when a trap is coming from either the guest or the hypervisor.

Split the function into two parts:
    - do_trap_guest_sync to handle guest traps
    - do_trap_hyp_sync to handle hypervisor traps

On AArch32, the Hyp Trap Exception provides the standard mechanism for
trapping Guest OS functions to the hypervisor (see B1.14.1 in ARM DDI
0406C.c). It cannot be generated when generated when the processor is in
Hyp Mode, instead other exception will be used. So it is fine to replace
the call to do_trap_hypervisor by do_trap_guest_sync.

For AArch64, there are two distincts exception depending whether the
exception was taken from the current level (hypervisor) or lower level
(guest).

Note that the unknown traps from guests will lead to panic Xen. This is
already behavior and is left unchanged for simplicy. A follow-up patch
will address that.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: arm32: Rename the trap to the correct name
Julien Grall [Fri, 5 May 2017 14:30:34 +0000 (15:30 +0100)]
xen/arm: arm32: Rename the trap to the correct name

Per Table B1-3 in ARM DDI 0406C.c, the vector 0x8 for hyp is called
"Hypervisor Call".

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86/mm: add temporary debugging code to get_page_from_gfn_p2m()
Jan Beulich [Mon, 8 May 2017 15:48:32 +0000 (17:48 +0200)]
x86/mm: add temporary debugging code to get_page_from_gfn_p2m()

See the code comment.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agolibxl: u.hvm.usbdevice_list is checked for emptiness
Robin Lee [Fri, 5 May 2017 19:02:32 +0000 (03:02 +0800)]
libxl: u.hvm.usbdevice_list is checked for emptiness

Currently usbdevice_list is only checked for nullity. But the OCaml
binding will convert empty list to a pointer to NULL, instead of a
NULL pointer. That means the OCaml binding will fail to disable USB.

This patch will check emptiness of usbdevice_list. And NULL is still a
valid empty list.

Signed-off-by: Robin Lee <robinlee.sysu@gmail.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86: correct boot time page table setup
Jan Beulich [Mon, 8 May 2017 12:56:14 +0000 (14:56 +0200)]
x86: correct boot time page table setup

While using alloc_domheap_pages() and assuming the allocated memory is
directly accessible is okay at boot time (as we run on the idle page
tables there), memory hotplug code too assumes it can access the
resulting page tables without using map_domain_page() or alike, and
hence we need to obtain memory suitable for ordinary page table use
here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86: correct create_bounce_frame
Jan Beulich [Mon, 8 May 2017 12:55:20 +0000 (14:55 +0200)]
x86: correct create_bounce_frame

Commit d9b7ef209a7 ("x86: drop failsafe callback invocation from
assembly") didn't go quite far enough with the cleanup it did: The
changed maximum frame size should also have been reflected in the early
address range check (which has now been pointed out to have been wrong
anyway, using 60 instead of 0x60), and it should have updated the
comment ahead of the function.

Also adjust the lower bound - all is fine (for our purposes) if the
initial guest kernel stack pointer points right at the hypervisor base
address, as only memory _below_ that address is going to be written.

Additionally limit the number of times %rsi is being adjusted to what
is really needed.

Finally move exception fixup code into the designated .fixup section
and macroize the stores to guest stack.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien grall <julien.grall@arm.com>
8 years agox86/vm_event: fix race between __context_switch() and vm_event_resume()
Razvan Cojocaru [Mon, 8 May 2017 12:54:00 +0000 (14:54 +0200)]
x86/vm_event: fix race between __context_switch() and vm_event_resume()

The introspection agent can reply to a vm_event faster than
vmx_vmexit_handler() can complete in some cases, where it is then
not safe for vm_event_set_registers() to modify v->arch.user_regs.
In the test scenario, we were stepping over an INT3 breakpoint by
setting RIP += 1. The quick reply tended to complete before the VCPU
triggering the introspection event had properly paused and been
descheduled. If the reply occurs before __context_switch() happens,
__context_switch() clobbers the reply by overwriting
v->arch.user_regs from the stack. If we don't pass through
__context_switch() (due to switching to the idle vCPU), reply data
wouldn't be picked up when switching back straight to the original
vCPU.

This patch ensures that vm_event_resume() code only sets per-VCPU
data to be used for the actual setting of registers later in
hvm_do_resume() (similar to the model used to control setting of CRs
and MSRs).

The patch additionally removes the sync_vcpu_execstate(v) call from
vm_event_resume(), which is no longer necessary, which removes the
associated broadcast TLB flush (read: performance improvement).

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/vm_event: add hvm/vm_event.{h,c}
Razvan Cojocaru [Mon, 8 May 2017 12:52:31 +0000 (14:52 +0200)]
x86/vm_event: add hvm/vm_event.{h,c}

Created arch/x86/hvm/vm_event.c and include/asm-x86/hvm/vm_event.h,
where HVM-specific vm_event-related code will live. This cleans up
hvm_do_resume() and ensures that the vm_event maintainers are
responsible for changes to that code.

Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/vpmu_intel: fix hypervisor crash by masking PC bit in MSR_P6_EVNTSEL
Mohit Gambhir [Mon, 8 May 2017 11:37:17 +0000 (13:37 +0200)]
x86/vpmu_intel: fix hypervisor crash by masking PC bit in MSR_P6_EVNTSEL

Setting Pin Control (PC) bit (19) in MSR_P6_EVNTSEL results in a General
Protection Fault and thus results in a hypervisor crash. This behavior has
been observed on two generations of Intel processors namely, Haswell and
Broadwell. Other Intel processor generations were not tested. However, it
does seem to be a possible erratum that hasn't yet been confirmed by Intel.

To fix the problem this patch masks PC bit and returns an error in
case any guest tries to write to it on any Intel processor. In addition
to the fact that setting this bit crashes the hypervisor on Haswell and
Broadwell, the PC flag bit toggles a hardware pin on the physical CPU
every time the programmed event occurs and the hardware behavior in
response to the toggle is undefined in the SDM, which makes this bit
unsafe to be used by guests and hence should be masked on all machines.

Signed-off-by: Mohit Gambhir <mohit.gambhir@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoVMX: constrain vmx_intr_assist() debugging code to debug builds
Jan Beulich [Mon, 8 May 2017 11:36:28 +0000 (13:36 +0200)]
VMX: constrain vmx_intr_assist() debugging code to debug builds

This is because that code, added by commit 997382b771 ("y86/vmx: dump
PIR and vIRR before ASSERT()"), was meant to be removed by the time we
finalize 4.9, but the root cause of the ASSERT() wrongly(?) triggering
still wasn't found.

Take the opportunity and also correct the format specifiers, which I
had got wrong when editing said change while committing.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/public: correct register naming 4.9.0-rc4
Jan Beulich [Fri, 5 May 2017 15:09:49 +0000 (17:09 +0200)]
x86/public: correct register naming

Commit 897129deab ("x86: use unambiguous register names") went a little
too far: With it we also get register names like _e15 and e15 for
non-Xen consumers using a gcc compatible compiler. Correct this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86: polish __{get,put}_user_{,no}check()
Jan Beulich [Fri, 5 May 2017 15:08:14 +0000 (17:08 +0200)]
x86: polish __{get,put}_user_{,no}check()

The primary purpose is correcting a latent bug in __get_user_check()
(the macro has no active user at present): The access_ok() check should
be before the actual access, or else any PV guest could initiate MMIO
reads with side effects.

Clean up all four macros at once:
- all arguments evaluated exactly once
- build the "check" flavor using the "nocheck" ones, instead of open
  coding them
- "int" is wide enough for error codes
- name local variables without using underscores as prefixes
- avoid pointless parentheses
- add blanks after commas separating parameters or arguments
- consistently use tabs for indentation

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien grall <julien.grall@arm.com>
8 years agox86/asm: Clobber %r{8..15} on exit to 32bit PV guests
Andrew Cooper [Thu, 13 Apr 2017 09:51:44 +0000 (10:51 +0100)]
x86/asm: Clobber %r{8..15} on exit to 32bit PV guests

In the presence of bugs such as XSA-214 where a 32bit PV guest can get its
hands on a long mode segment, this change prevents register content leaking
between domains.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/asm: Fold LOAD_C_CLOBBERED into RESTORE_ALL
Andrew Cooper [Wed, 12 Apr 2017 13:50:35 +0000 (13:50 +0000)]
x86/asm: Fold LOAD_C_CLOBBERED into RESTORE_ALL

With its sole other user removed, fold LOAD_C_CLOBBERED into RESTORE_ALL to
reduce the cognitive load of trying to work out which registers get modified.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/traps: Lift all non-entrypoint logic in entry_int82() up into C
Andrew Cooper [Wed, 12 Apr 2017 16:40:32 +0000 (17:40 +0100)]
x86/traps: Lift all non-entrypoint logic in entry_int82() up into C

This is more readable, maintainable, and livepatchable.

This involves declaring check_for_unexpected_msi(), untrusted_msi and
pv_hypercall() suitably for use by C.  While making these changes,
untrusted_msi is switched over to being a C99 bool.

No behavioural change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/traps: Rename compat_hypercall() to entry_int82()
Andrew Cooper [Wed, 12 Apr 2017 16:37:56 +0000 (17:37 +0100)]
x86/traps: Rename compat_hypercall() to entry_int82()

This follows the Linux example of naming the entry point by how it is arrived
at, rather than its purpose.

Doing so highlights that the SAVE_VOLATILE instantiation sets up the wrong
entry_vector on the stack (although this is currently benign as we never
sysret back to a 32bit PV, and the iret path doesn't care).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/mm: Further restrict permissions on some virtual mappings
Andrew Cooper [Tue, 11 Apr 2017 15:34:58 +0000 (16:34 +0100)]
x86/mm: Further restrict permissions on some virtual mappings

As originally reported, the Linear Pagetable slot maps 512GB of ram as RWX,
where the guest has full read access and a lot of direct or indirect control
over the written content.  It isn't hard for a PV guest to hide shellcode
here.

Therefore, increase defence in depth by auditing our current pagetable
mappings.

 * The regular linear, shadow linear, and per-domain slots have no business
   being executable (but need to be written), so are updated to be NX.
 * The Read Only mappings of the M2P (compat and regular) don't need to be
   writeable or executable.
 * The PV GDT mappings and bits of the directmap don't need to be executable.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/traps: Poison unused stack pointers in the TSS
Andrew Cooper [Tue, 11 Apr 2017 14:39:08 +0000 (15:39 +0100)]
x86/traps: Poison unused stack pointers in the TSS

This is for additional defence-in-depth following LDT/GDT/IDT corruption.

It causes attempted control transfers to ring 1 or 2 (via a call gate), or
attempts to use IST 3 through 7 to yield #SS, rather than executing with a
stack starting at the top of virtual address space.

Express the TSS setup in terms of structure assignment, which should be less
fragile if the IST indexes need to change, and has the useful side effect of
zeroing the reserved fields.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/traps: Drop 32bit fields out of tss_struct
Andrew Cooper [Tue, 11 Apr 2017 14:42:41 +0000 (15:42 +0100)]
x86/traps: Drop 32bit fields out of tss_struct

The backlink field doesn't exist in a 64bit TSS, and union for esp{0..2} is of
no practical use.  Specify everything with stdint types, and empty bitfields
for reserved values.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm32: Distinguish guest SError from Xen data aborts
Wei Chen [Thu, 4 May 2017 03:27:49 +0000 (11:27 +0800)]
xen/arm32: Distinguish guest SError from Xen data aborts

ARM32 doesn't have an exception similar to hyp_sync of ARM64 to catch
the synchronous data abort (For example, a NULL pointer has been referenced).
Hence the SError and sync data abort will be caught by the same data abort
exception.

Since commit "3f16c8cb" we treat all data aborts caught by this excetpion
as SError. This means, we will forward Xen synchronous data abort to guest,
if the serror_op=FORWARD. This is obviously incorrect. But we don't have
any method to distinguish SError from Xen data aborts.

But we can distinguish guest generated SError from Xen data aborts. So we
want to change the policy to handle data aborts for ARM32:
1. If this data abort is guest generated SError, we will handle this data
   abort follow the SError handle option setting.
2. If this data abort is synchronous data abort or Xen generate SError, we
   will PANIC the whole system.

Signed-off-by: Wei Chen <Wei.Chen@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: efi: Avoid out-of-bounds write in meminfo_add_bank
Julien Grall [Thu, 4 May 2017 19:36:41 +0000 (20:36 +0100)]
xen/arm: efi: Avoid out-of-bounds write in meminfo_add_bank

Commit 2c77db77 "xen/arm: efi: Avoid duplicating the addition of a new
bank", introduced a new function meminfo_add_bank that add a new bank.
This new code fails to check correctly the size of the array which may
result to an out-of-bounds write.

Coverity-ID: 1433183
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agolibs/devicemodel: Fix dependency with libxencall
Anthony PERARD [Thu, 4 May 2017 10:50:52 +0000 (11:50 +0100)]
libs/devicemodel: Fix dependency with libxencall

libxendevicemodel.so do depends on libxencall.so but the dependency was
missing at link time.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agohvm: fix hypervisor crash in hvm_save_one()
Jan Beulich [Thu, 4 May 2017 13:05:26 +0000 (15:05 +0200)]
hvm: fix hypervisor crash in hvm_save_one()

hvm_save_cpu_ctxt() returns success without writing any data into
hvm_domain_context_t when all VCPUs are offline. This can then crash
the hypervisor (with FATAL PAGE FAULT) in hvm_save_one() via the
"off < (ctxt.cur - sizeof(*desc))" for() test, where ctxt.cur remains 0,
causing an underflow which leads the hypervisor to go off the end of the
ctxt buffer.

This has been broken since Xen 4.4 (c/s e019c606f59).
It has happened in practice with an HVM Linux VM (Debian 8) queried around
shutdown:

(XEN) hvm.c:1595:d3v0 All CPUs offline -- powering off.
(XEN) ----[ Xen-4.9-rc  x86_64  debug=y   Not tainted ]----
(XEN) CPU:    5
(XEN) RIP:    e008:[<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd
(XEN) RFLAGS: 0000000000010286   CONTEXT: hypervisor (d0v2)
(XEN) rax: ffff830492cbb445   rbx: 0000000000000000   rcx: ffff83039343b400
(XEN) rdx: 00000000ff88004d   rsi: fffffffffffffff8   rdi: 0000000000000000
(XEN) rbp: ffff8304103e7c88   rsp: ffff8304103e7c48   r8:  0000000000000001
(XEN) r9:  deadbeefdeadf00d   r10: 0000000000000000   r11: 0000000000000282
(XEN) r12: 00007f43a3b14004   r13: 00000000fffffffe   r14: 0000000000000000
(XEN) r15: ffff830400c41000   cr0: 0000000080050033   cr4: 00000000001526e0
(XEN) cr3: 0000000402e13000   cr2: ffff830492cbb447
(XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: e010   cs: e008
(XEN) Xen code around <ffff82d0802496d2> (hvm_save_one+0x145/0x1fd):
(XEN)  00 00 48 01 c8 83 c2 08 <66> 39 58 02 75 64 eb 08 48 89 c8 ba 08 00 00 00
(XEN) Xen stack trace from rsp=ffff8304103e7c48:
(XEN)    0000041000000000 ffff83039343b400 ffff8304103e7c70 ffff8304103e7da8
(XEN)    ffff830400c41000 00007f43a3b13004 ffff8304103b7000 ffffffffffffffea
(XEN)    ffff8304103e7d48 ffff82d0802683d4 ffff8300d19fd000 ffff82d0802320d8
(XEN)    ffff830400c41000 0000000000000000 ffff8304103e7cd8 ffff82d08026ff3d
(XEN)    0000000000000000 ffff8300d19fd000 ffff8304103e7cf8 ffff82d080232142
(XEN)    0000000000000000 ffff8300d19fd000 ffff8304103e7d28 ffff82d080207051
(XEN)    ffff8304103e7d18 ffff830400c41000 0000000000000202 ffff830400c41000
(XEN)    0000000000000000 00007f43a3b13004 0000000000000000 deadbeefdeadf00d
(XEN)    ffff8304103e7e68 ffff82d080206c47 0700000000000000 ffff830410375bd0
(XEN)    0000000000000296 ffff830410375c78 ffff830410375c80 0000000000000003
(XEN)    ffff8304103e7e68 ffff8304103b67c0 ffff8304103b7000 ffff8304103b67c0
(XEN)    0000000d00000037 0000000000000003 0000000000000002 00007f43a3b14004
(XEN)    00007ffd5d925590 0000000000000000 0000000100000000 0000000000000000
(XEN)    00000000ea8f8000 0000000000000000 00007ffd00000000 0000000000000000
(XEN)    00007f43a276f557 0000000000000000 00000000ea8f8000 0000000000000000
(XEN)    00007ffd5d9255e0 00007f43a23280b2 00007ffd5d926058 ffff8304103e7f18
(XEN)    ffff8300d19fe000 0000000000000024 ffff82d0802053e5 deadbeefdeadf00d
(XEN)    ffff8304103e7f08 ffff82d080351565 010000003fffffff 00007f43a3b13004
(XEN)    deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d deadbeefdeadf00d
(XEN)    ffff8800781425c0 ffff88007ce94300 ffff8304103e7ed8 ffff82d0802719ec
(XEN) Xen call trace:
(XEN)    [<ffff82d0802496d2>] hvm_save_one+0x145/0x1fd
(XEN)    [<ffff82d0802683d4>] arch_do_domctl+0xa7a/0x259f
(XEN)    [<ffff82d080206c47>] do_domctl+0x1862/0x1b7b
(XEN)    [<ffff82d080351565>] pv_hypercall+0x1ef/0x42c
(XEN)    [<ffff82d080355106>] entry.o#test_all_events+0/0x30
(XEN)
(XEN) Pagetable walk from ffff830492cbb447:
(XEN)  L4[0x106] = 00000000dbc36063 ffffffffffffffff
(XEN)  L3[0x012] = 0000000000000000 ffffffffffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 5:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: ffff830492cbb447
(XEN) ****************************************

At the same time pave the way for having zero-length records.

Inspired by an earlier patch from Andrew and Razvan.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Diagnosed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Release-Acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/mm: silence a pointless warning
Jan Beulich [Thu, 4 May 2017 13:04:29 +0000 (15:04 +0200)]
x86/mm: silence a pointless warning

get_page() logs a message when it fails (dom_cow is never dying or
paging_mode_external()), so better avoid the call when it's pointless
to do anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agovTPM: update email address and file path in MAINTAINERS file
Quan Xu [Thu, 27 Apr 2017 18:14:29 +0000 (02:14 +0800)]
vTPM: update email address and file path in MAINTAINERS file

Signed-off-by: Quan Xu <xuquan8@huawei.com>
8 years agox86: discard type information when stealing pages
Jan Beulich [Tue, 2 May 2017 12:46:58 +0000 (14:46 +0200)]
x86: discard type information when stealing pages

While a page having just a single general reference left necessarily
has a zero type reference count too, its type may still be valid (and
in validated state; at present this is only possible and relevant for
PGT_seg_desc_page, as page tables have their type forcibly zapped when
their type reference count drops to zero, and
PGT_{writable,shared}_page pages don't require any validation). In
such a case when the page is being re-used with the same type again,
validation is being skipped. As validation criteria differ between
32- and 64-bit guests, pages to be transferred between guests need to
have their validation indicator zapped (and with it we zap all other
type information at once).

This is XSA-214.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agomulticall: deal with early exit conditions
Jan Beulich [Tue, 2 May 2017 12:45:02 +0000 (14:45 +0200)]
multicall: deal with early exit conditions

In particular changes to guest privilege level require the multicall
sequence to be aborted, as hypercalls are permitted from kernel mode
only. While likely not very useful in a multicall, also properly handle
the return value in the HYPERVISOR_iret case (which should be the guest
specified value).

This is XSA-213.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86emul: correct stub invocation constraints again
Jan Beulich [Fri, 28 Apr 2017 14:03:40 +0000 (16:03 +0200)]
x86emul: correct stub invocation constraints again

While the hypervisor side of commit cd91ab08ea ("x86emul: correct stub
invocation constraints") was fine, the tools side triggered a bogus
error with old gcc (4.3 and 4.4 at least). Use a slightly less
appropriate variant instead, proven to be good enough to not
re-introduce the original problem: Which of the addresses is actually
used doesn't matter much as long as the compiler can't prove that the
two pointers don't alias one another.

Reported-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Release-acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agoseabios: run olddefconfig 4.9.0-rc3
Wei Liu [Wed, 26 Apr 2017 11:13:34 +0000 (12:13 +0100)]
seabios: run olddefconfig

We provided a base config file in 970f8de3e. To generate a full config
file, running olddefconfig is required.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
8 years agox86emul: correct stub invocation constraints
Jan Beulich [Wed, 26 Apr 2017 07:49:24 +0000 (09:49 +0200)]
x86emul: correct stub invocation constraints

Stub invocations need to have the space the stub occupies as an input,
to prevent the compiler from re-ordering (or omitting) writes to it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/32on64: properly honor add-to-physmap-batch's size
Jan Beulich [Wed, 26 Apr 2017 07:48:45 +0000 (09:48 +0200)]
x86/32on64: properly honor add-to-physmap-batch's size

Commit 407a3c00ff ("compat/memory: fix build with old gcc") "fixed" a
build issue by switching to the use of uninitialized data. Due to
- the bounding of the uninitialized data item
- the accessed area being outside of Xen space
- arguments being properly verified by the native hypercall function
this is not a security issue.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agodocs: fix iommu_inclusive_mapping documentation
Roger Pau Monné [Wed, 26 Apr 2017 07:48:07 +0000 (09:48 +0200)]
docs: fix iommu_inclusive_mapping documentation

iommu_inclusive_mapping is enabled by default (and has been for a long time).

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Also "correct" -> "incorrect" in the text.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
8 years agodmop: add xendevicemodel_modified_memory_bulk()
Jennifer Herbert [Wed, 26 Apr 2017 07:47:30 +0000 (09:47 +0200)]
dmop: add xendevicemodel_modified_memory_bulk()

This new lib devicemodel call allows multiple extents of pages to be
marked as modified in a single call.  This is something needed for a
usecase I'm working on.

The xen side of the modified_memory call has been modified to accept
an array of extents.  The devicemodel library either provides an array
of length 1, to support the original library function, or a second
function allows an array to be provided.

Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agohvm/dmop: implement COPY_{TO,FROM}_GUEST_BUF_OFFSET() helpers
Andrew Cooper [Wed, 26 Apr 2017 07:46:57 +0000 (09:46 +0200)]
hvm/dmop: implement COPY_{TO,FROM}_GUEST_BUF_OFFSET() helpers

copy_{to,from}_guest_buf() are now implemented using an offset of 0.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
This does only extend to the functionality here, specifically not to
the use of all-upper-case names for the macros:

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agohvm/dmop: implement COPY_{TO,FROM}_GUEST_BUF() in terms of raw accessors
Andrew Cooper [Wed, 26 Apr 2017 07:40:22 +0000 (09:40 +0200)]
hvm/dmop: implement COPY_{TO,FROM}_GUEST_BUF() in terms of raw accessors

This also allows the usual cases to be simplified, by omitting an unnecessary
buf parameters, and because the macros can appropriately size the object.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
This does only extend to the functionality here, specifically not to
the use of all-upper-case names for the macros:

Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agohvm/dmop: make copy_buf_{from, to}_guest for a buffer not big enough an error
Jennifer Herbert [Wed, 26 Apr 2017 07:40:00 +0000 (09:40 +0200)]
hvm/dmop: make copy_buf_{from, to}_guest for a buffer not big enough an error

This makes copying to or from a buf that isn't big enough an error.
If the buffer isnt big enough, trying to carry on regardless
can only cause trouble later on.

Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agohvm/dmop: box dmop_args rather than passing multiple parameters around
Jennifer Herbert [Wed, 26 Apr 2017 07:39:14 +0000 (09:39 +0200)]
hvm/dmop: box dmop_args rather than passing multiple parameters around

No functional change.

Signed-off-by: Jennifer Herbert <Jennifer.Herbert@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years ago8250-uart: Fix typo in the header
Julien Grall [Tue, 25 Apr 2017 12:36:51 +0000 (13:36 +0100)]
8250-uart: Fix typo in the header

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
8 years agox86/mm: Add missing newline to a printk() in get_page_from_l1e()
Andrew Cooper [Mon, 24 Apr 2017 17:07:20 +0000 (18:07 +0100)]
x86/mm: Add missing newline to a printk() in get_page_from_l1e()

This avoids the log message being followed by

  <G><1>mm.c:5374:d0v0 could not get_page_from_l1e()

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen.h: fix comment for vcpu_guest_context
Wei Liu [Mon, 10 Apr 2017 10:28:01 +0000 (11:28 +0100)]
xen.h: fix comment for vcpu_guest_context

Use the correct vcpuop name and delete one trailing white space.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
8 years agoxen/arm: Don't unflatten DT when booting with ACPI
Punit Agrawal [Fri, 21 Apr 2017 13:12:54 +0000 (14:12 +0100)]
xen/arm: Don't unflatten DT when booting with ACPI

Unflattening the device tree when booting with "acpi=force" leads to the
following stack trace on AMD Seattle platform -

(XEN) Xen call trace:
(XEN)    [<0000000000204bfc>] dt_irq_translate+0x48/0x58 (PC)
(XEN)    [<0000000000204f5c>] dt_device_get_irq+0x34/0x38 (LR)
(XEN)    [<0000000000251a08>] platform_get_irq+0x14/0x44
(XEN)    [<00000000002952bc>] smmu.c#arm_smmu_dt_init+0x190/0x100c
(XEN)    [<0000000000299310>] device_init+0xa8/0xdc
(XEN)    [<00000000002950f0>] iommu_hardware_setup+0x34/0x68
(XEN)    [<0000000000294ef0>] iommu_setup+0x48/0x1c8
(XEN)    [<000000000029cecc>] start_xen+0xb94/0xd34
(XEN)    [<00000083fbba91dc>] 00000083fbba91dc

The problem arises due to the unflattened device tree being
unconditionally used in iommu_hardware_setup().

Let's re-arrange the code without changing boot order to unflatten the
device tree only when acpi is disabled.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agox86/hvm: Corrections and improvements to unhandled vmexit logging
Andrew Cooper [Wed, 19 Apr 2017 15:56:32 +0000 (16:56 +0100)]
x86/hvm: Corrections and improvements to unhandled vmexit logging

 * Use gprintk rather than gdprintk.  These logging messages shouldn't
   disappear in release builds, as they usually happen immediately before a
   domain crash.  Raise them from WARNING to ERR.
 * Format the vmexit reason in the same base as is used in the vendor
   manuals (decimal for Intel, hex for AMD), and consistently use 0x for hex
   numbers.
 * Consistently use "Unexpected vmexit" terminology.

In particular, this corrects the information printed for nested VT-x, and
actually prints information for nested SVM.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agocorrect rcu_unlock_domain()
Jan Beulich [Fri, 21 Apr 2017 10:10:51 +0000 (12:10 +0200)]
correct rcu_unlock_domain()

Match rcu_lock_domain(), and remove the slightly misleading comment:
This isn't just the companion to rcu_lock_domain_by_id() (and that
latter function indeed also keeps the domain locked, not the domain
list).

No functional change, as rcu_read_{,un}lock() ignore their arguments
anyway.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/vlapic: Don't reset APIC ID when handling INIT signal
Chao Gao [Fri, 21 Apr 2017 10:09:48 +0000 (12:09 +0200)]
x86/vlapic: Don't reset APIC ID when handling INIT signal

According to SDM "ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) ->
"EXTENDED XAPIC (X2APIC)" -> "x2APIC State Transitions", the APIC mode
and APIC ID are preserved when handling INIT signal and a reset places
APIC to xAPIC mode and APIC base address to 0xFEE00000h (this part
is in "Local APIC" -> "Local APIC Status and Location"). So there are
two problems in current code:
1. Using reset logic (aka vlapic_reset) to handle INIT signal.
2. Forgetting resetting APIC mode and base address in vlapic_reset()

This patch introduces a new function vlapic_do_init() and replaces the
wrongly used vlapic_reset(). Also reset APIC mode and APIC base address
in vlapic_reset().

Note that: LDR is read only in x2APIC mode. Resetting it to zero in x2APIC
mode is unreasonable. This patch also doesn't reset LDR when handling INIT
signal in x2APIC mode.

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/arm: Properly map the FDT in the boot page table
Julien Grall [Thu, 20 Apr 2017 15:12:28 +0000 (16:12 +0100)]
xen/arm: Properly map the FDT in the boot page table

Currently, Xen is assuming the FDT will always fit in a 2MB section.
Recently, I noticed an early crash on Xen when using GRUB with the
following call trace:

(XEN) Hypervisor Trap. HSR=0x96000006 EC=0x25 IL=1 Syndrome=0x6
(XEN) CPU0: Unexpected Trap: Hypervisor
(XEN) ----[ Xen-4.9-unstable  arm64  debug=y   Not tainted ]----
(XEN) CPU:    0
(XEN) PC:     0000000000264140 strlen+0x10/0x84
(XEN) LR:     00000000002401c0
(XEN) SP:     00000000002cfc20
(XEN) CPSR:   400003c9 MODE:64-bit EL2h (Hypervisor, handler)
(XEN)      X0: 0000000000801230  X1: 0000000000801230  X2: 0000000000005230
(XEN)      X3: 0000000000000030  X4: 0000000000000030  X5: 0000000000000038
(XEN)      X6: 0000000000000034  X7: 0000000000000000  X8: 7f7f7f7f7f7f7f7f
(XEN)      X9: 64622c6479687222 X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101
(XEN)     X12: 0000000000000030 X13: ffffff00ff000000 X14: 0800000003000000
(XEN)     X15: ffffffffffffffff X16: 00000000fefff610 X17: 00000000000000f0
(XEN)     X18: 0000000000000004 X19: 0000000000000008 X20: 00000000007fc040
(XEN)     X21: 00000000007fc000 X22: 000000000000000e X23: 0000000000000000
(XEN)     X24: 00000000002a9f58 X25: 0000000000801230 X26: 00000000002a9f68
(XEN)     X27: 00000000002a9f58 X28: 0000000000298910  FP: 00000000002cfc20
(XEN)
(XEN)   VTCR_EL2: 80010c40
(XEN)  VTTBR_EL2: 0000082800203000
(XEN)
(XEN)  SCTLR_EL2: 30c5183d
(XEN)    HCR_EL2: 000000000038663f
(XEN)  TTBR0_EL2: 00000000f4912000
(XEN)
(XEN)    ESR_EL2: 96000006
(XEN)  HPFAR_EL2: 00000000e8071000
(XEN)    FAR_EL2: 0000000000801230
(XEN)
(XEN) Xen stack trace from sp=00000000002cfc20:
(XEN)    00000000002cfc70 0000000000240254 00000000002a9f58 00000000007fc000
(XEN)    0000000000000000 0000000000000000 0000000000000000 00000000007fc03c
(XEN)    00000000002cfd78 0000000000000000 00000000002cfca0 00000000002986fc
(XEN)    0000000000000000 00000000007fc000 0000000000000000 0000000000000000
(XEN)    00000000002cfcc0 0000000000298f1c 0000000000000000 00000000007fc000
(XEN)    00000000002cfdc0 000000000029904c 00000000f47fc000 00000000f4604000
(XEN)    00000000f47fc000 00000000007fc000 0000000000400000 0000000000000100
(XEN)    00000000f4604000 0000000000000001 0000000000000001 8000000000000002
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 00000000002cfdc0 0000000000299038
(XEN)    00000000f47fc000 00000000f4604000 00000000f47fc000 0000000000000000
(XEN)    00000000002cfe20 000000000029c420 00000000002d8000 00000000f4604000
(XEN)    00000000f47fc000 0000000000000000 0000000000400000 0000000000000100
(XEN)    00000000f4604000 0000000000000001 00000000f47fc000 000000000029c404
(XEN)    00000000fefff510 0000000000200624 00000000f4804000 00000000f4604000
(XEN)    00000000f47fc000 0000000000000000 0000000000400000 0000000000000100
(XEN)    0000000000000001 0000000000000001 0000000000000001 8000000000000002
(XEN)    00000000f47fc000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) Xen call trace:
(XEN)    [<0000000000264140>] strlen+0x10/0x84 (PC)
(XEN)    [<00000000002401c0>] fdt_get_property_namelen+0x9c/0xf0 (LR)
(XEN)    [<0000000000240254>] fdt_get_property+0x40/0x50
(XEN)    [<00000000002986fc>] bootfdt.c#device_tree_get_u32+0x18/0x5c
(XEN)    [<0000000000298f1c>] device_tree_for_each_node+0x84/0x144
(XEN)    [<000000000029904c>] boot_fdt_info+0x70/0x23c
(XEN)    [<000000000029c420>] start_xen+0x9c/0xd30
(XEN)    [<0000000000200624>] arm64/head.o#paging+0x84/0xbc
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) CPU0: Unexpected Trap: Hypervisor
(XEN)
(XEN) ****************************************

Indeed, the booting documentation for AArch32 and AArch64 only requires
the FDT to be placed on a 8-byte boundary. This means the Device-Tree can
cross a 2MB boundary.

Given that Xen limits the size of the FDT to 2MB, it will always fit in
a 4MB slot. So extend the fixmap slot for FDT from 2MB to 4MB.

The second 2MB superpage will only be mapped if the FDT is cross the 2MB
boundary.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Check if the FDT passed by the bootloader is valid
Julien Grall [Thu, 20 Apr 2017 15:12:27 +0000 (16:12 +0100)]
xen/arm: Check if the FDT passed by the bootloader is valid

There is currently no sanity check on the FDT passed by the bootloader.
Whilst they are stricly not necessary, it will avoid us to spend hours
to try to find out why it does not work.

>From the booting documentation for AArch32 [1] and AArch64 [2] must :
    - be placed on 8-byte boundary
    - not exceed 2MB (only on AArch64)

Even if AArch32 does not seem to limit the size, Xen is not currently
able to support more the 2MB FDT. It is better to crash rather with a nice
error message than claiming we are supporting any size of FDT.

The checks are mostly borrowed from the Linux code (see fixmap_remap_fdt
in arch/arm64/mm/mmu.c).

[1] Section 2 in linux/Documentation/arm64/booting.txt
[2] Section 4b in linux/Documentation/arm/Booting

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Move the code to map FDT in the boot tables from assembly to C
Julien Grall [Thu, 20 Apr 2017 15:12:26 +0000 (16:12 +0100)]
xen/arm: Move the code to map FDT in the boot tables from assembly to C

The FDT will not be accessed before start_xen (begining of C code) is
called and it will be easier to maintain as the code could be common
between AArch32 and AArch64.

A new function early_fdt_map is introduced to map the FDT in the boot
page table.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: mm: Move create_mappings function earlier in the file
Julien Grall [Thu, 20 Apr 2017 15:12:25 +0000 (16:12 +0100)]
xen/arm: mm: Move create_mappings function earlier in the file

This function will be called by other function later one. This will
avoid forward declaration and keep the new function close to sibling
ones.

This was moved just after *_fixmap helpers as they are page table
handling functions too.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/arm: Add BOOT_FDT_VIRT_END and BOOT_FDT_SLOT_SIZE
Julien Grall [Thu, 20 Apr 2017 15:12:24 +0000 (16:12 +0100)]
xen/arm: Add BOOT_FDT_VIRT_END and BOOT_FDT_SLOT_SIZE

The 2 new defines will help to avoid hardcoding the size and the end of
the slot in the code.

The newlines are added for clarity.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
8 years agoxen/kexec: remove spinlock now that all KEXEC hypercall ops are protected at the...
Eric DeVolder [Wed, 19 Apr 2017 21:01:49 +0000 (16:01 -0500)]
xen/kexec: remove spinlock now that all KEXEC hypercall ops are protected at the top-level

The spinlock in kexec_swap_images() was removed as
this function is only reachable on the kexec hypercall, which is
now protected at the top-level in do_kexec_op_internal(),
thus the local spinlock is no longer necessary.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen/kexec: use hypercall_create_continuation to protect KEXEC ops
Eric DeVolder [Wed, 19 Apr 2017 21:01:48 +0000 (16:01 -0500)]
xen/kexec: use hypercall_create_continuation to protect KEXEC ops

When we concurrently try to unload and load crash
images we eventually get:

 Xen call trace:
    [<ffff82d08018b04f>] machine_kexec_add_page+0x3a0/0x3fa
    [<ffff82d08018b184>] machine_kexec_load+0xdb/0x107
    [<ffff82d080116e8d>] kexec.c#kexec_load_slot+0x11/0x42
    [<ffff82d08011724f>] kexec.c#kexec_load+0x119/0x150
    [<ffff82d080117c1e>] kexec.c#do_kexec_op_internal+0xab/0xcf
    [<ffff82d080117c60>] do_kexec_op+0xe/0x1e
    [<ffff82d08025c620>] pv_hypercall+0x20a/0x44a
    [<ffff82d080260116>] cpufreq.c#test_all_events+0/0x30

 Pagetable walk from ffff820040088320:
  L4[0x104] = 00000002979d1063 ffffffffffffffff
  L3[0x001] = 00000002979d0063 ffffffffffffffff
  L2[0x000] = 00000002979c7063 ffffffffffffffff
  L1[0x088] = 80037a91ede97063 ffffffffffffffff

The interesting thing is that the page bits (063) look legit.

The operation on which we blow up is us trying to write
in the L1 and finding that the L2 entry points to some
bizzare MFN. It stinks of a race, and it looks like
the issue is due to no concurrency locks when dealing
with the crash kernel space.

Specifically we concurrently call kimage_alloc_crash_control_page
which iterates over the kexec_crash_area.start -> kexec_crash_area.size
and once found:

  if ( page )
  {
      image->next_crash_page = hole_end;
      clear_domain_page(_mfn(page_to_mfn(page)));
  }

clears. Since the parameters of what MFN to use are provided
by the callers (and the area to search is bounded) the the 'page'
is probably the same. So #1 we concurrently clear the
'control_code_page'.

The next step is us passing this 'control_code_page' to
machine_kexec_add_page. This function requires the MFNs:
page_to_maddr(image->control_code_page).

And this would always return the same virtual address, as
the MFN of the control_code_page is inside of the
kexec_crash_area.start -> kexec_crash_area.size area.

Then machine_kexec_add_page updates the L1 .. which can be done
concurrently and on subsequent calls we mangle it up.

This is all a theory at this time, but testing reveals
that adding the hypercall_create_continuation() at the
kexec hypercall fixes the crash.

NOTE: This patch follows 5c5216 (kexec: clear kexec_image slot
when unloading kexec image) to prevent crashes during
simultaneous load/unloads.

NOTE: Consideration was given to using the existing flag
KEXEC_FLAG_IN_PROGRESS to denote a kexec hypercall in
progress. This, however, overloads the original intent of
the flag which is to denote that we are about-to/have made
the jump to the crash path. The overloading would lead to
failures in existing checks on this flag as the flag would
always be set at the top level in do_kexec_op_internal().
For this reason, the new flag KEXEC_FLAG_HC_IN_PROGRESS
was introduced.

While at it, fixed the #define mismatched spacing

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/microcode: Use the return value from early_microcode_update_cpu
Ross Lagerwall [Thu, 20 Apr 2017 13:18:00 +0000 (14:18 +0100)]
x86/microcode: Use the return value from early_microcode_update_cpu

Use the return value from early_microcode_update_cpu rather than
ignoring it.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-Acked-by: Julien Grall <julien.grall@arm.com>
8 years agohotplug/FreeBSD: configure xenstored
Wei Liu [Tue, 18 Apr 2017 14:48:59 +0000 (15:48 +0100)]
hotplug/FreeBSD: configure xenstored

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agooxenstored: provide options to define xenstored devices
Wei Liu [Tue, 18 Apr 2017 14:42:43 +0000 (15:42 +0100)]
oxenstored: provide options to define xenstored devices

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agopaths.m4: provide XENSTORED_{KVA,PORT}
Wei Liu [Tue, 18 Apr 2017 14:20:03 +0000 (15:20 +0100)]
paths.m4: provide XENSTORED_{KVA,PORT}

The default values are Linux device names. No users yet.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86: Move microcode loading earlier 4.9.0-rc2
Ross Lagerwall [Tue, 18 Apr 2017 15:47:24 +0000 (16:47 +0100)]
x86: Move microcode loading earlier

Move microcode loading earlier for the boot CPU and secondary CPUs so
that it takes place before identify_cpu() is called for each CPU.
Without this, the detected features may be wrong if the new microcode
loading adjusts the feature bits. That could mean that some fixes (e.g.
d6e9f8d4f35d ("x86/vmx: fix vmentry failure with TSX bits in LBR"))
don't work as expected.

Previously during boot, the microcode loader was invoked for each
secondary CPU started and then again for each CPU as part of an
initcall. Simplify the code so that it is invoked exactly once for each
CPU during boot.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86emul: force CLZERO feature flag in test harness
Jan Beulich [Wed, 19 Apr 2017 11:30:27 +0000 (13:30 +0200)]
x86emul: force CLZERO feature flag in test harness

Commit b988e88cc0 ("x86/emul: Add feature check for clzero") added a
feature check to the emulator, which breaks the harness without this
flag being forced to true.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/vioapic: allow holes in the GSI range for PVH Dom0
Roger Pau Monné [Wed, 19 Apr 2017 11:29:51 +0000 (13:29 +0200)]
x86/vioapic: allow holes in the GSI range for PVH Dom0

The current vIO APIC for PVH Dom0 doesn't allow non-contiguous GSIs, which
means that all GSIs must belong to an IO APIC. This doesn't match reality,
where there are systems with non-contiguous GSIs.

In order to fix this add a base_gsi field to each hvm_vioapic struct, in order
to store the base GSI for each emulated IO APIC. For PVH Dom0 those values are
populated based on the hardware ones.

Reported-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Chao Gao <chao.gao@intel.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/HVM: don't uniformly report "MMIO" for various forms of failed emulation
Jan Beulich [Wed, 19 Apr 2017 11:29:14 +0000 (13:29 +0200)]
x86/HVM: don't uniformly report "MMIO" for various forms of failed emulation

This helps distinguishing the call paths leading there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoVMX: don't blindly enable descriptor table exiting control
Jan Beulich [Wed, 19 Apr 2017 11:26:55 +0000 (13:26 +0200)]
VMX: don't blindly enable descriptor table exiting control

This is an optional feature and hence we should check for it before
use.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/HVM: restrict emulation in hvm_descriptor_access_intercept()
Jan Beulich [Wed, 19 Apr 2017 11:26:18 +0000 (13:26 +0200)]
x86/HVM: restrict emulation in hvm_descriptor_access_intercept()

While I did review d0a699a389 ("x86/monitor: add support for descriptor
access events") it didn't really occur to me that someone could be this
blunt and add unguarded emulation again just a few weeks after we
guarded all special purpose emulator invocations. Fix this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86emul: always fill x86_insn_modrm()'s outputs
Jan Beulich [Wed, 19 Apr 2017 11:25:44 +0000 (13:25 +0200)]
x86emul: always fill x86_insn_modrm()'s outputs

The function is rather unlikely to be called for insns which don't have
ModRM bytes, and hence addressing Coverity's recurring complaint of
callers potentially consuming uninitialized data when they know that
certain opcodes have ModRM bytes can be suppressed this way without
unduly adding overhead to fast paths.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86emul: add "unblock NMI" retire flag
Jan Beulich [Wed, 19 Apr 2017 11:24:18 +0000 (13:24 +0200)]
x86emul: add "unblock NMI" retire flag

No matter that we emulate IRET for (guest) real mode only right now, we
should get its effect on (virtual) NMI delivery right. Note that we can
simply check the other retire flags also in the !OKAY case, as the
insn emulator now guarantees them to only be set on OKAY.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agodocs: fix configuration syntax in xl.cfg manpage
Wei Liu [Tue, 11 Apr 2017 11:03:00 +0000 (12:03 +0100)]
docs: fix configuration syntax in xl.cfg manpage

No quote is required when a string is provided as part of a spec string.

Reported-by: Doug Freed <dwfreed@mtu.edu>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
8 years agotools: Use POSIX signal.h instead of sys/signal.h
Alistair Francis [Mon, 17 Apr 2017 21:33:11 +0000 (14:33 -0700)]
tools: Use POSIX signal.h instead of sys/signal.h

The POSIX spec specifies to use:
    #include <signal.h>
instead of:
    #include <sys/signal.h>
as seen here:
   http://pubs.opengroup.org/onlinepubs/009695399/functions/signal.html

This removes the warning:
    #warning redirecting incorrect #include <sys/signal.h> to <signal.h>
when building with the musl C-library.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agotools: Use POSIX poll.h instead of sys/poll.h
Alistair Francis [Mon, 17 Apr 2017 21:33:10 +0000 (14:33 -0700)]
tools: Use POSIX poll.h instead of sys/poll.h

The POSIX spec specifies to use:
    #include <poll.h>
instead of:
    #include <sys/poll.h>
as seen here:
    http://pubs.opengroup.org/onlinepubs/009695399/functions/poll.html

This removes the warning:
    #warning redirecting incorrect #include <sys/poll.h> to <poll.h>
when building with the musl C-library.

Signed-off-by: Alistair Francis <alistair.francis@xilinx.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/emul: Drop more redundant ctxt.event_pending checks
Andrew Cooper [Mon, 10 Apr 2017 12:11:06 +0000 (13:11 +0100)]
x86/emul: Drop more redundant ctxt.event_pending checks

Since c/s 92cf67888a, x86_emulate_wrapper() asserts stricter behaviour about
the relationship between X86EMUL_EXCEPTION and ctxt.event_pending.

These removals should have been included in the aforementioned changeset, and
were only omitted due an oversight.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agox86/vIO-APIC: fix uninitialized variable warning
Jan Beulich [Thu, 13 Apr 2017 15:35:02 +0000 (17:35 +0200)]
x86/vIO-APIC: fix uninitialized variable warning

In a release build modern gcc validly complains about "pin" possibly
being uninitialized in vioapic_irq_positive_edge().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoVT-d: correct a comment and remove an useless if() statement
Chao Gao [Thu, 13 Apr 2017 15:34:29 +0000 (17:34 +0200)]
VT-d: correct a comment and remove an useless if() statement

Fix two flaws in the patch (93358e8e VT-d: introduce update_irte to update
irte safely):
1. Expand a comment in update_irte() to make it clear that VT-d hardware
doesn't update IRTE and software can't update IRTE behind us since we hold
iremap_lock.
2. remove an useless if() statement

Signed-off-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoclang: disable the gcc-compat warnings for read_atomic
Roger Pau Monné [Thu, 13 Apr 2017 15:33:21 +0000 (17:33 +0200)]
clang: disable the gcc-compat warnings for read_atomic

clang gcc-compat warnings can wrongly fire when certain constructions are used,
at least the following flow:

switch ( ... )
{
case ...:
    while ( ({ int x; switch ( foo ) { case 1: x = 1; break; } x }) )
    {
        ...

Will cause clang to emit the following warning "'break' is bound to loop, GCC
binds it to switch", which is a false positive, and both gcc and clang bind
the break to the inner switch. In order to workaround this issue, disable the
gcc-compat checks for the usage of the read_atomic macro.

This has been reported upstream as http://bugs.llvm.org/show_bug.cgi?id=32595.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agotools:misc:xenpm: set max freq to all cpu with default cpuid
Luwei Kang [Thu, 13 Apr 2017 10:44:28 +0000 (18:44 +0800)]
tools:misc:xenpm: set max freq to all cpu with default cpuid

User can set max freq to specific cpu by
"xenpm set-scaling-maxfreq [cpuid] <HZ>"
or set max freq to all cpu with default cpuid by
"xenpm set-scaling-maxfreq <HZ>".

Set max freq with default cpuid will cause
segmentation fault after commit id d4906b5d05.
This patch will fix this issue and add ability
to set max freq with default cpuid.

Signed-off-by: Luwei Kang <luwei.kang@intel.com>
Compile-tested-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen: credit: change an ASSERT on nr_runnable so that it makes sense.
Dario Faggioli [Thu, 13 Apr 2017 07:49:54 +0000 (09:49 +0200)]
xen: credit: change an ASSERT on nr_runnable so that it makes sense.

Since the counter is unsigned, it's pointless/bogous to check
for if to be above zero.

Check that it is at least one before it's decremented, instead.

Spotted by Coverity.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agoxen: credit2: cleanup patch for type betterness
Praveen Kumar [Tue, 11 Apr 2017 18:08:42 +0000 (23:38 +0530)]
xen: credit2: cleanup patch for type betterness

The patch actually doesn't impact the functionality as such. This only replaces
bool_t with bool in credit2.

Signed-off-by: Praveen Kumar <kpraveen.lkml@gmail.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
8 years agomisc/release-checklist.txt: Try to avoid wrong-tag mistakes
Ian Jackson [Wed, 12 Apr 2017 15:36:30 +0000 (16:36 +0100)]
misc/release-checklist.txt: Try to avoid wrong-tag mistakes

Add some better checking and make the runes a bit more robust.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>