]> xenbits.xensource.com Git - people/iwj/xen.git/log
people/iwj/xen.git
7 years agoxen/pvshim: set correct domid value
Roger Pau Monne [Thu, 11 Jan 2018 11:41:19 +0000 (11:41 +0000)]
xen/pvshim: set correct domid value

If domid is not provided by L0 set domid to 1 by default. Note that L0
not provinding the domid can cause trouble if the guest tries to use
it's domid instead of DOMID_SELF when performing hypercalls that are
forwarded to the L0 hypervisor.

Since the domain created is no longer the hardware domain add a hook
to the domain shutdown path in order to forward shutdown operations to
the L0 hypervisor.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
Changes since v1:
 - s/get_dom0_domid/get_initial_domain_id/.
 - Add a comment regarding why dom0 needs to be global.
 - Fix compilation of xen/common/domain.c on ARM.

7 years agoxen/pvshim: modify Dom0 builder in order to build a DomU
Roger Pau Monne [Thu, 11 Jan 2018 11:41:18 +0000 (11:41 +0000)]
xen/pvshim: modify Dom0 builder in order to build a DomU

According to the PV ABI the initial virtual memory regions should
contain the xenstore and console pages after the start_info. Also set
the correct values in the start_info for DomU operation.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Modify the position of the __init attribute in dom0_update_physmap.
 - Move the addition of sizeof(struct dom0_vga_console_info) to
   vstartinfo_end with an existing if branch.
 - Add a TODO item for fill_console_start_info in the !CONFIG_VIDEO
   case.
 - s/replace_va/replace_va_mapping/.
 - Remove call to free_domheap_pages in replace_va_mapping.
   put_page_and_type should already take care of freeing the page.
 - Use PFN_DOWN in SET_AND_MAP_PARAM macro.
 - Parenthesize va in SET_AND_MAP_PARAM macro when required.

7 years agoxen: mark xenstore/console pages as RAM
Roger Pau Monne [Thu, 11 Jan 2018 11:41:18 +0000 (11:41 +0000)]
xen: mark xenstore/console pages as RAM

This si required so that later they can be shared with the guest if
Xen is running in shim mode.

Also prevent them from being used by Xen by marking them as bad pages
in init_boot_pages.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Remove adding the pages to dom_io, there's no need since they are
   already marked as bad pages.
 - Use a static global array to store the memory address of this
   special pages, so Xen avoids having to call
   xen_hypercall_hvm_get_param twice.

7 years agoxen/pvshim: skip Dom0-only domain builder parts
Roger Pau Monne [Thu, 11 Jan 2018 11:41:18 +0000 (11:41 +0000)]
xen/pvshim: skip Dom0-only domain builder parts

Do not allow access to any iomem or ioport by the shim, and also
remove the check for Dom0 kernel support.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/pvh: do not mark the low 1MB as IO mem
Roger Pau Monne [Thu, 11 Jan 2018 11:41:18 +0000 (11:41 +0000)]
xen/pvh: do not mark the low 1MB as IO mem

On PVH there's nothing special on the low 1MB.

This is an optional patch that doesn't affect the functionality of the
shim.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/x86: make VGA support selectable
Roger Pau Monne [Tue, 28 Nov 2017 09:54:17 +0000 (09:54 +0000)]
xen/x86: make VGA support selectable

Through a Kconfig option. Enable it by default, and disable it for the
PV-in-PVH shim.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
Changes since v1:
 - Make the VGA option dependent on the shim one.

7 years agotools/firmware: Build and install xen-shim
Andrew Cooper [Wed, 22 Nov 2017 13:31:26 +0000 (13:31 +0000)]
tools/firmware: Build and install xen-shim

Link a minimum set of files to build the shim. The linkfarm rune can
handle creation and deletion of files. Introduce build-shim and
install-shim targets in xen/Makefile.

We can do better by properly generate the dependency from the list of
files but that's an improvement for later.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
[change default scheduler to credit]
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
v2: Introduce a top-level build-shim target. Split the xen-shim build
    with normal build.

7 years agox86/shim: Kconfig and command line options
Andrew Cooper [Fri, 10 Nov 2017 16:35:26 +0000 (16:35 +0000)]
x86/shim: Kconfig and command line options

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/guest: use PV console for Xen/Dom0 I/O
Sergey Dyasli [Fri, 24 Nov 2017 11:21:17 +0000 (11:21 +0000)]
x86/guest: use PV console for Xen/Dom0 I/O

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/guest: add PV console code
Sergey Dyasli [Fri, 24 Nov 2017 11:07:32 +0000 (11:07 +0000)]
x86/guest: add PV console code

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/guest: setup event channel upcall vector
Roger Pau Monne [Tue, 9 Jan 2018 12:51:37 +0000 (12:51 +0000)]
x86/guest: setup event channel upcall vector

And a dummy event channel upcall handler.

Note that with the current code the underlying Xen (L0) must support
HVMOP_set_evtchn_upcall_vector or else event channel setup is going to
fail. This limitation can be lifted by implementing more event channel
interrupt injection methods as a backup.

Register callback_irq to trick toolstack to think the domain is
enlightened.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: don't swallow the first command line item in guest mode
Wei Liu [Thu, 11 Jan 2018 13:45:48 +0000 (13:45 +0000)]
x86: don't swallow the first command line item in guest mode

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86: read wallclock from Xen when running in pvh mode
Wei Liu [Fri, 17 Nov 2017 15:19:09 +0000 (15:19 +0000)]
x86: read wallclock from Xen when running in pvh mode

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: APIC timer calibration when running as a guest
Wei Liu [Fri, 17 Nov 2017 12:46:41 +0000 (12:46 +0000)]
x86: APIC timer calibration when running as a guest

The timer calibration currently depends on PIT. Introduce a variant
to wait for a tick's worth of time to elapse when running as a PVH
guest.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: xen pv clock time source
Wei Liu [Thu, 16 Nov 2017 17:56:18 +0000 (17:56 +0000)]
x86: xen pv clock time source

It is a variant of TSC clock source.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Use the mapped vcpu_info.

7 years agox86/guest: map per-cpu vcpu_info area.
Roger Pau Monne [Thu, 28 Dec 2017 15:22:34 +0000 (15:22 +0000)]
x86/guest: map per-cpu vcpu_info area.

Mapping the per-vcpu vcpu_info area is required in order to use more
than XEN_LEGACY_MAX_VCPUS.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Make vcpu_info_mapped static.
 - Add a BUG_ON in case VCPUOP_register_vcpu_info fails.
 - Remove one indentation level in hypervisor_setup.
 - Make xen_hypercall_vcpu_op return int.

7 years agoxen/guest: fetch vCPU ID from Xen
Roger Pau Monne [Wed, 27 Dec 2017 09:23:01 +0000 (09:23 +0000)]
xen/guest: fetch vCPU ID from Xen

If available.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
[ wei: fix non-shim build ]
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/guest: map shared_info page
Roger Pau Monne [Tue, 9 Jan 2018 11:19:44 +0000 (11:19 +0000)]
x86/guest: map shared_info page

Use an unpopulated PFN in order to map it.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
Changes since v1:
 - Use an unpopulated PFN to map the shared_info page.
 - Mask all event channels.
 - Report XENMEM_add_to_physmap error code in case of failure.

7 years agoxen/pvshim: keep track of used PFN ranges
Wei Liu [Wed, 3 Jan 2018 16:50:24 +0000 (16:50 +0000)]
xen/pvshim: keep track of used PFN ranges

Simple infrastructure to keep track of PFN space usage, so that we can
use unpopulated PFNs to map special pages like shared info and grant
table.

As rangeset depends on malloc being ready so hypervisor_setup is
introduced for things that can be initialised late in the process.

Note that the PFN is marked as reserved at least up to 4GiB (or more
if the guest has more memory). This is not a perfect solution but
avoids using the MMIO hole below 4GiB. Ideally the shim (L1) should
have a way to ask the underlying Xen (L0) which memory regions are
populated, unpopulated, or MMIO space.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen: introduce rangeset_claim_range
Wei Liu [Wed, 3 Jan 2018 16:38:54 +0000 (16:38 +0000)]
xen: introduce rangeset_claim_range

Reserve a hole in a rangeset.

Signed-off-by: Roger Pau Monne <roger.pau@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
Changes since v1:
 - Change function name.
 - Use a local variable instead of *s.
 - Add unlikely to the !prev case.
 - Move the function prototype position in the header file.

7 years agoxen/console: Introduce console=xen
Wei Liu [Thu, 11 Jan 2018 10:18:09 +0000 (10:18 +0000)]
xen/console: Introduce console=xen

This specifies whether to use Xen specific console output. There are
two variants: one is the hypervisor console, the other is the magic
debug port 0xe9.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/pvh: Retrieve memory map from Xen
Wei Liu [Tue, 14 Nov 2017 18:19:09 +0000 (18:19 +0000)]
x86/pvh: Retrieve memory map from Xen

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
v2: fixed clang build, dropped rb tag

7 years agox86/shutdown: Support for using SCHEDOP_{shutdown,reboot}
Andrew Cooper [Tue, 21 Nov 2017 14:43:32 +0000 (14:43 +0000)]
x86/shutdown: Support for using SCHEDOP_{shutdown,reboot}

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v2:
1. Use sched_shutdown
2. Move header inclusion

7 years agox86/guest: Hypercall support
Andrew Cooper [Tue, 21 Nov 2017 13:54:47 +0000 (13:54 +0000)]
x86/guest: Hypercall support

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
v2: append underscores to tmp.

7 years agox86/entry: Probe for Xen early during boot
Andrew Cooper [Tue, 28 Nov 2017 14:53:51 +0000 (14:53 +0000)]
x86/entry: Probe for Xen early during boot

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2: Add __read_mostly.

7 years agox86/boot: Map more than the first 16MB
Andrew Cooper [Wed, 22 Nov 2017 11:39:04 +0000 (11:39 +0000)]
x86/boot: Map more than the first 16MB

TODO: Replace somehow (bootstrap_map() ?)

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/entry: Early PVH boot code
Wei Liu [Mon, 13 Nov 2017 17:32:19 +0000 (17:32 +0000)]
x86/entry: Early PVH boot code

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2:
1. Fix comment.
2. Use cmpb $0.
3. Address comments on pvh-boot.c.
4. Haven't changed the pritnk modifiers to accommodate future changes.
5. Missing a prerequisite patch to relocate pvh_info to make __va work reliably.
   [BLOCKER].

7 years agox86: produce a binary that can be booted as PVH
Wei Liu [Fri, 10 Nov 2017 16:19:40 +0000 (16:19 +0000)]
x86: produce a binary that can be booted as PVH

Produce a binary that can be booted as PVH. It doesn't do much yet.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
v2:
1. Remove shim-y dependency.
2. Remove extraneous blank line.
3. Fix bugs in xen.lds.S.
4. Haven't split code into pvh.S because that will break later
   patches.

7 years agox86: introduce ELFNOTE macro
Wei Liu [Fri, 10 Nov 2017 12:36:49 +0000 (12:36 +0000)]
x86: introduce ELFNOTE macro

It is needed later for introducing PVH entry point.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
v2:
1. Specify section attribute and type.
2. Use p2align.
3. Align instructions.
4. Haven't used .L or turned it into assembly macro.

7 years agox86/link: Relocate program headers
Andrew Cooper [Wed, 22 Nov 2017 11:09:41 +0000 (11:09 +0000)]
x86/link: Relocate program headers

When the xen binary is loaded by libelf (in the future) we rely on the
elf loader to load the binary accordingly. Specify the load address so
that the resulting binary can make p_vaddr and p_paddr have different
values.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
v2:
Clarify commit message. Haven't tested grub1 boot.

7 years agox86/Kconfig: Options for Xen and PVH support
Andrew Cooper [Fri, 10 Nov 2017 16:35:26 +0000 (16:35 +0000)]
x86/Kconfig: Options for Xen and PVH support

Introduce two options. One to detect whether the binary is running on
Xen, the other enables PVH ABI support.

The former will be useful to PV in HVM approach. Both will be used by
PV in PVH approach.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
v2:
Write commit message. Didn't change the config option value as it
requires a lot of changes in later patches.

7 years agox86: Common cpuid faulting support
Andrew Cooper [Thu, 11 Jan 2018 17:48:00 +0000 (17:48 +0000)]
x86: Common cpuid faulting support

With CPUID Faulting offered to SVM guests, move Xen's faulting code to being
common rather than Intel specific.

This is necessary for nested Xen (inc. pv-shim mode) to prevent PV guests from
finding the outer HVM Xen leaves via native cpuid.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/fixmap: Modify fix_to_virt() to return a void pointer
Andrew Cooper [Thu, 11 Jan 2018 17:48:00 +0000 (17:48 +0000)]
x86/fixmap: Modify fix_to_virt() to return a void pointer

Almost all users of fix_to_virt() actually want a pointer.  Include the cast
within the definition, so the callers don't need to.

Two users which need the integer value are switched to using __fix_to_virt()
directly.  A few users stay fully unchanged, due to GCC's void pointer
arithmetic extension causing the same behaviour.  Most users however have
their explicit casting dropped.

Since __iomem is not used consistently in Xen, we drop it too.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
v2: update commit message and remove unnecessary parentheses.

7 years agotools/ocaml: Extend domain_create() to take arch_domainconfig
Jon Ludlam [Thu, 11 Jan 2018 17:47:59 +0000 (17:47 +0000)]
tools/ocaml: Extend domain_create() to take arch_domainconfig

No longer passing NULL into xc_domain_create() allows for the creation
of PVH guests.

Signed-off-by: Jon Ludlam <jonathan.ludlam@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agotools/ocaml: Expose arch_config in domaininfo
Andrew Cooper [Thu, 11 Jan 2018 17:47:59 +0000 (17:47 +0000)]
tools/ocaml: Expose arch_config in domaininfo

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/domctl: Return arch_config via getdomaininfo
Andrew Cooper [Thu, 11 Jan 2018 17:47:59 +0000 (17:47 +0000)]
xen/domctl: Return arch_config via getdomaininfo

This allows toolstack software to distinguish HVM from PVH guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v2: bump domctl version number

7 years agoACPICA: Make ACPI Power Management Timer (PM Timer) optional.
Bob Moore [Thu, 11 Jan 2018 17:47:59 +0000 (17:47 +0000)]
ACPICA: Make ACPI Power Management Timer (PM Timer) optional.

PM Timer is now optional.
This support is already in Windows8 and "SHOULD" come out in ACPI 5.0A
(if all goes well).

The change doesn't affect Xen directly, because it does not rely
on the presence of the PM timer.

Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ported to Xen]
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/link: Introduce and use SECTION_ALIGN
Andrew Cooper [Thu, 11 Jan 2018 17:47:59 +0000 (17:47 +0000)]
x86/link: Introduce and use SECTION_ALIGN

... to reduce the quantity of #ifdef EFI.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
7 years agox86/time: Print a more helpful error when a platform timer can't be found
Andrew Cooper [Thu, 11 Jan 2018 17:47:59 +0000 (17:47 +0000)]
x86/time: Print a more helpful error when a platform timer can't be found

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agoxen/common: Widen the guest logging buffer slightly
Andrew Cooper [Thu, 11 Jan 2018 17:47:58 +0000 (17:47 +0000)]
xen/common: Widen the guest logging buffer slightly

This reduces the amount of line wrapping from guests; Xen in particular likes
to print lines longer than 80 characters.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/libxc: Multi modules support
Jonathan Ludlam [Thu, 11 Jan 2018 17:47:58 +0000 (17:47 +0000)]
tools/libxc: Multi modules support

Signed-off-by: Jonathan Ludlam <jonathan.ludlam@citrix.com>
Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
7 years agotools/libelf: fix elf notes check for PVH guest
Wei Liu [Thu, 11 Jan 2018 17:47:58 +0000 (17:47 +0000)]
tools/libelf: fix elf notes check for PVH guest

PVH only requires PHYS32_ENTRY to be set. Return immediately if that's
the case.

Also remove the printk in pvh_load_kernel.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agotools/libxc: remove extraneous newline in xc_dom_load_acpi
Wei Liu [Thu, 11 Jan 2018 17:47:58 +0000 (17:47 +0000)]
tools/libxc: remove extraneous newline in xc_dom_load_acpi

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/x86: report domain id on cpuid
Roger Pau Monne [Thu, 11 Jan 2018 17:47:58 +0000 (17:47 +0000)]
xen/x86: report domain id on cpuid

Use the ECX register of the hypervisor leaf 5. The EAX register on
this leaf is a flags field that can be used to notice the presence of
the domain id in ECX. Note that this is only available to HVM guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
 - Use leaf 5 instead.

7 years agox86/svm: Offer CPUID Faulting to AMD HVM guests as well
Andrew Cooper [Thu, 11 Jan 2018 17:47:57 +0000 (17:47 +0000)]
x86/svm: Offer CPUID Faulting to AMD HVM guests as well

CPUID Faulting can be virtulised for HVM guests without hardware support,
meaning it can be offered to SVM guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/upcall: inject a spurious event after setting upcall vector
Roger Pau Monné [Thu, 11 Jan 2018 17:51:14 +0000 (17:51 +0000)]
x86/upcall: inject a spurious event after setting upcall vector

In case the vCPU has pending events to inject. This fixes a bug that
happened if the guest mapped the vcpu info area using
VCPUOP_register_vcpu_info without having setup the event channel
upcall, and then setup the upcall vector.

In this scenario the guest would not receive any upcalls, because the
call to VCPUOP_register_vcpu_info would have marked the vCPU as having
pending events, but the vector could not be injected because it was
not yet setup.

This has not caused issues so far because all the consumers first
setup the vector callback and then map the vcpu info page, but there's
no limitation that prevents doing it in the inverse order.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/msr: Free msr_vcpu_policy during vcpu destruction
Andrew Cooper [Thu, 4 Jan 2018 13:32:01 +0000 (14:32 +0100)]
x86/msr: Free msr_vcpu_policy during vcpu destruction

c/s 4187f79dc7 "x86/msr: introduce struct msr_vcpu_policy" introduced a
per-vcpu memory allocation, but failed to free it in the clean vcpu
destruction case.

This is XSA-253.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: e204e60f77702bf5c884dd37c3f1b01f14e396ae
master date: 2018-01-04 14:27:38 +0100

7 years agox86/vmx: Don't use hvm_inject_hw_exception() in long_mode_do_msr_write()
Andrew Cooper [Wed, 20 Dec 2017 14:45:32 +0000 (15:45 +0100)]
x86/vmx: Don't use hvm_inject_hw_exception() in long_mode_do_msr_write()

Since c/s 49de10f3c1718 "x86/hvm: Don't raise #GP behind the emulators back
for MSR accesses", returning X86EMUL_EXCEPTION has pushed the exception
generation to the top of the call tree.

Using hvm_inject_hw_exception() and returning X86EMUL_EXCEPTION causes a
double #GP injection, which combines to #DF.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 896ee3980e72866b602e743396751384de301fb0
master date: 2017-12-14 18:05:45 +0000

7 years agoxen/efi: Fix build with clang-5.0
Andrew Cooper [Wed, 20 Dec 2017 14:44:57 +0000 (15:44 +0100)]
xen/efi: Fix build with clang-5.0

The clang-5.0 build is reliably failing with:

  Error: size of boot.o:.text is 0x01

which is because efi_arch_flush_dcache_area() exists as a single ret
instruction.  Mark it as __init like everything else in the files.

Spotted by Travis.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: c4f6ad4c5fd25cb0ccc0cdbe711db97e097f0407
master date: 2017-12-14 10:59:26 +0000

7 years agognttab: improve GNTTABOP_cache_flush locking
Jan Beulich [Wed, 20 Dec 2017 14:44:20 +0000 (15:44 +0100)]
gnttab: improve GNTTABOP_cache_flush locking

Dropping the lock before returning from grant_map_exists() means handing
possibly stale information back to the caller. Return back the pointer
to the active entry instead, for the caller to release the lock once
done.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 553ac37137c2d1c03bf1b69cfb192ffbfe29daa4
master date: 2017-12-04 11:04:18 +0100

7 years agognttab: correct GNTTABOP_cache_flush empty batch handling
Jan Beulich [Wed, 20 Dec 2017 14:43:53 +0000 (15:43 +0100)]
gnttab: correct GNTTABOP_cache_flush empty batch handling

Jann validly points out that with a caller bogusly requesting a zero-
element batch with non-zero high command bits (the ones used for
continuation encoding), the assertion right before the call to
hypercall_create_continuation() would trigger. A similar situation would
arise afaict for non-empty batches with op and/or length zero in every
element.

While we want the former to succeed (as we do elsewhere for similar
no-op requests), the latter can clearly be converted to an error, as
this is a state that can't be the result of a prior operation.

Take the opportunity and also correct the order of argument checks:
We shouldn't accept zero-length elements with unknown bits set in "op".
Also constify cache_flush()'s first parameter.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
master commit: 9c22e4d67f5552c7c896ed83bd95d5d4c5837a9d
master date: 2017-12-04 11:03:32 +0100

7 years agox86/microcode: Add support for fam17h microcode loading
Tom Lendacky [Wed, 20 Dec 2017 14:43:14 +0000 (15:43 +0100)]
x86/microcode: Add support for fam17h microcode loading

The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes.  Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[Linux commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf]

Ported to Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 61d458ba8c171809e8dd9abd19339c87f3f934ca
master date: 2017-12-13 14:30:10 +0000

7 years agox86/mm: drop bogus paging mode assertion
Jan Beulich [Wed, 20 Dec 2017 14:42:42 +0000 (15:42 +0100)]
x86/mm: drop bogus paging mode assertion

Olaf has observed this assertion to trigger after an aborted migration
of a PV guest:

(XEN) Xen call trace:
(XEN)    [<ffff82d0802a85dc>] do_page_fault+0x39f/0x55c
(XEN)    [<ffff82d08036b7d8>] x86_64/entry.S#handle_exception_saved+0x66/0xa4
(XEN)    [<ffff82d0802a9274>] __copy_to_user_ll+0x22/0x30
(XEN)    [<ffff82d0802772d4>] update_runstate_area+0x19c/0x228
(XEN)    [<ffff82d080277371>] domain.c#_update_runstate_area+0x11/0x39
(XEN)    [<ffff82d080277596>] context_switch+0x1fd/0xf25
(XEN)    [<ffff82d0802395c5>] schedule.c#schedule+0x303/0x6a8
(XEN)    [<ffff82d08023d067>] softirq.c#__do_softirq+0x6c/0x95
(XEN)    [<ffff82d08023d0da>] do_softirq+0x13/0x15
(XEN)    [<ffff82d08036b2f1>] x86_64/entry.S#process_softirqs+0x21/0x30

Release builds work fine, which is a first indication that the assertion
isn't really needed.

What's worse though - there appears to be a timing window where the
guest runs in shadow mode, but not in log-dirty mode, and that is what
triggers the assertion (the same could, afaict, be achieved by test-
enabling shadow mode on a PV guest). This is because turing off log-
dirty mode is being performed in two steps: First the log-dirty bit gets
cleared (paging_log_dirty_disable() [having paused the domain] ->
sh_disable_log_dirty() -> shadow_one_bit_disable()), followed by
unpausing the domain and only then clearing shadow mode (via
shadow_test_disable(), which pauses the domain a second time).

Hence besides removing the ASSERT() here (or optionally replacing it by
explicit translate and refcounts mode checks, but this seems rather
pointless now that the three are tied together) I wonder whether either
shadow_one_bit_disable() should turn off shadow mode if no other bit
besides PG_SH_enable remains set (just like shadow_one_bit_enable()
enables it if not already set), or the domain pausing scope should be
extended so that both steps occur without the domain getting a chance to
run in between.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: b95f7be32d668fa4b09300892ebe19636ecebe36
master date: 2017-12-12 16:56:15 +0100

7 years agox86/mb2: avoid Xen image when looking for module/crashkernel position
Daniel Kiper [Wed, 20 Dec 2017 14:42:13 +0000 (15:42 +0100)]
x86/mb2: avoid Xen image when looking for module/crashkernel position

Commit e22e1c4 (x86/EFI: avoid Xen image when looking for module/kexec
position) added relevant check for EFI case. However, since commit
f75a304 (x86: add multiboot2 protocol support for relocatable images)
Multiboot2 compatible bootloaders are able to relocate Xen image too.
So, we have to avoid also Xen image region in such cases.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 9589927e5bf9e123ec42b6e0b0809f153bd92732
master date: 2017-12-12 14:30:53 +0100

7 years agox86/vvmx: don't enable vmcs shadowing for nested guests
Sergey Dyasli [Wed, 20 Dec 2017 14:41:33 +0000 (15:41 +0100)]
x86/vvmx: don't enable vmcs shadowing for nested guests

Running "./xtf_runner vvmx" in L1 Xen under L0 Xen produces the
following result on H/W with VMCS shadowing:

    Test: vmxon
    Failure in test_vmxon_in_root_cpl0()
      Expected 0x8200000f: VMfailValid(15) VMXON_IN_ROOT
           Got 0x82004400: VMfailValid(17408) <unknown>
    Test result: FAILURE

This happens because SDM allows vmentries with enabled VMCS shadowing
VM-execution control and VMCS link pointer value of ~0ull. But results
of a nested VMREAD are undefined in such cases.

Fix this by not copying the value of VMCS shadowing control from vmcs01
to vmcs02.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 19fdb8e258619aea265af9c183e035e545cbc2d2
master date: 2017-12-01 19:03:27 +0000

7 years agoxen/pv: Construct d0v0's GDT properly
Andrew Cooper [Wed, 20 Dec 2017 14:40:58 +0000 (15:40 +0100)]
xen/pv: Construct d0v0's GDT properly

c/s cf6d39f8199 "x86/PV: properly populate descriptor tables" changed the GDT
to reference zero_page for intermediate frames between the guest and Xen
frames.

Because dom0_construct_pv() doesn't call arch_set_info_guest(), some bits of
initialisation are missed, including the pv_destroy_gdt() which initially
fills the references to zero_page.

In practice, this means there is a window between starting and the first call
to HYPERCALL_set_gdt() were lar/lsl/verr/verw suffer non-architectural
behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 08f27f4468eedbeccaac9fdda4ef732247efd74e
master date: 2017-12-01 19:03:26 +0000

7 years agoupdate Xen version to 4.10.1-pre
Jan Beulich [Wed, 20 Dec 2017 14:39:44 +0000 (15:39 +0100)]
update Xen version to 4.10.1-pre

7 years agoXen 4.10 release: update README and xen/Makefile versions
Ian Jackson [Wed, 13 Dec 2017 11:37:59 +0000 (11:37 +0000)]
Xen 4.10 release: update README and xen/Makefile versions

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoXen 4.10 release: update Config.mk revisions to refer to tags
Ian Jackson [Wed, 13 Dec 2017 11:36:12 +0000 (11:36 +0000)]
Xen 4.10 release: update Config.mk revisions to refer to tags

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoMerge branch 'xsa248-251' into staging-4.10
Ian Jackson [Tue, 12 Dec 2017 12:23:17 +0000 (12:23 +0000)]
Merge branch 'xsa248-251' into staging-4.10

7 years agox86: don't wrongly trigger linear page table assertion (2)
Jan Beulich [Fri, 8 Dec 2017 15:32:05 +0000 (15:32 +0000)]
x86: don't wrongly trigger linear page table assertion (2)

_put_final_page_type(), when free_page_type() has exited early to allow
for preemption, should not update the time stamp, as the page continues
to retain the typ which is in the process of being unvalidated. I can't
see why the time stamp update was put on that path in the first place
(albeit it may well have been me who had put it there years ago).

This is part of XSA-240.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/paging: don't unconditionally BUG() on finding SHARED_M2P_ENTRY
Jan Beulich [Fri, 8 Dec 2017 15:27:14 +0000 (15:27 +0000)]
x86/paging: don't unconditionally BUG() on finding SHARED_M2P_ENTRY

PV guests can fully control the values written into the P2M.

This is XSA-251.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/shadow: fix ref-counting error handling
Jan Beulich [Fri, 8 Dec 2017 15:27:14 +0000 (15:27 +0000)]
x86/shadow: fix ref-counting error handling

The old-Linux handling in shadow_set_l4e() mistakenly ORed together the
results of sh_get_ref() and sh_pin(). As the latter failing is not a
correctness problem, simply ignore its return value.

In sh_set_toplevel_shadow() a failing sh_get_ref() must not be
accompanied by installing the entry, despite the domain being crashed.

This is XSA-250.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/shadow: fix refcount overflow check
Jan Beulich [Fri, 8 Dec 2017 15:27:14 +0000 (15:27 +0000)]
x86/shadow: fix refcount overflow check

Commit c385d27079 ("x86 shadow: for multi-page shadows, explicitly track
the first page") reduced the refcount width to 25, without adjusting the
overflow check. Eliminate the disconnect by using a manifest constant.

Interestingly, up to commit 047782fa01 ("Out-of-sync L1 shadows: OOS
snapshot") the refcount was 27 bits wide, yet the check was already
using 26.

This is XSA-249.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/mm: don't wrongly set page ownership
Jan Beulich [Fri, 8 Dec 2017 15:27:14 +0000 (15:27 +0000)]
x86/mm: don't wrongly set page ownership

PV domains can obtain mappings of any pages owned by the correct domain,
including ones that aren't actually assigned as "normal" RAM, but used
by Xen internally.  At the moment such "internal" pages marked as owned
by a guest include pages used to track logdirty bits, as well as p2m
pages and the "unpaged pagetable" for HVM guests. Since the PV memory
management and shadow code conflict in their use of struct page_info
fields, and since shadow code is being used for log-dirty handling for
PV domains, pages coming from the shadow pool must, for PV domains, not
have the domain set as their owner.

While the change could be done conditionally for just the PV case in
shadow code, do it unconditionally (and for consistency also for HAP),
just to be on the safe side.

There's one special case though for shadow code: The page table used for
running a HVM guest in unpaged mode is subject to get_page() (in
set_shadow_status()) and hence must have its owner set.

This is XSA-248.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86/HVM: don't retain emulated insn cache when exiting back to guest
Jan Beulich [Thu, 7 Dec 2017 09:59:22 +0000 (10:59 +0100)]
x86/HVM: don't retain emulated insn cache when exiting back to guest

vio->mmio_retry is being set when a repeated string insn is being split
up. In that case we'll exit to the guest, expecting immediate re-entry.
Interruptions, however, may be serviced by the guest before re-entry
from the repeated string insn. Any emulation needed in the course of
handling the interruption must not fetch from the internally maintained
cache.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
master commit: 5fcb26e69e8089e20c9168774bee681b8f5a3187
master date: 2017-12-06 12:50:23 +0100

7 years agox86/hvm: fix interaction between internal and external emulation
Paul Durrant [Tue, 28 Nov 2017 14:05:19 +0000 (14:05 +0000)]
x86/hvm: fix interaction between internal and external emulation

A call to handle_hvm_io_completion() is needed for completing I/O
that requires external emulation. Such completion should be requested when
hvm_vcpu_io_need_completion() returns true after hvm_emulate_once() has
completed. This is indicative of the underlying I/O emulation having
returned X86EMUL_RETRY and hence a re-emulation of the instruction is
needed to pick up the result of the I/O.

A call to handle_hvm_io_completion() is NOT needed when the underlying
I/O has not returned X86EMUL_RETRY since there will be no result to pick
up. Hence it bogus to request such completion when mmio_retry is set,
since this can only happen if the underlying I/O emulation has returned
X86EMUL_OKAY (meaning the I/O has completed successfully).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit 9c9384d6d8184ca6d21975ccf4e4f72b560540cc)

7 years agox86: Avoid corruption on migrate for vcpus using CPUID Faulting
Andrew Cooper [Sat, 25 Nov 2017 15:17:14 +0000 (15:17 +0000)]
x86: Avoid corruption on migrate for vcpus using CPUID Faulting

Xen 4.8 and later virtualises CPUID Faulting support for guests.  However, the
value of MSR_MISC_FEATURES_ENABLES is omitted from the vcpu state, meaning
that the current cpuid faulting setting is lost on migrate/suspend/resume.

Instead of following the MSR status quo, take the opportunity to make the
logic more generic, and in particular, trivial to extend for future MSRs.

This is done by discarding the notion of optional MSRs, and requiring the
toolstack to be prepared to move all of the MSRs, although only a subset will
typically need to move.

This allows for the use of guest_{rd,wr}msr() alone to evaluate whether an MSR
needs moving.  This is a benefit because it means there is a single piece of
logic responsible for evaluating whether a guest can use an MSR, and which
values are acceptable.

One small adjustment to guest_wrmsr() is required to cope with being called in
toolstack context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
(cherry picked from commit b90f86be161c74df8cb69c98d9f22885d9d87114)

7 years agoDisable debug for 4.10 stable branch, in preparation for release
Ian Jackson [Fri, 1 Dec 2017 15:15:39 +0000 (15:15 +0000)]
Disable debug for 4.10 stable branch, in preparation for release

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoRevert "xen/arm: domain_builder: irq sanity check logic fix"
Andrew Cooper [Wed, 29 Nov 2017 11:45:02 +0000 (11:45 +0000)]
Revert "xen/arm: domain_builder: irq sanity check logic fix"

This reverts commit 11e7dd958de73a45645bd40d82280660bd2c9ee8.

It breaks boot on ARM.

Reported-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/arm: domain_builder: irq sanity check logic fix
Stewart Hildebrand [Tue, 28 Nov 2017 14:42:03 +0000 (14:42 +0000)]
xen/arm: domain_builder: irq sanity check logic fix

It's not possible for an irq to be both below 16 and greater/equal than 32.
Also fix the reference to linux documentation while we're at it.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoarm64: ITS: fix cacheability adjustment
Andre Przywara [Thu, 16 Nov 2017 12:02:35 +0000 (12:02 +0000)]
arm64: ITS: fix cacheability adjustment

If the host GICv3 redistributor reports that the pending table cannot
use shareable memory, we try to drop the cacheability attributes as
well. However we fail horribly in doing computer science 101 bit
masking, effectively clearing the whole register instead of just a few
bits.
Fix this by removing the one redundant masking operation and adding the
magic negation for the actually needed other operation.

Reported-by: Manish Jaggi <manish.jaggi@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Release-Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools: xentoolcore_restrict_all: Do deregistration before close
Ian Jackson [Tue, 14 Nov 2017 12:15:42 +0000 (12:15 +0000)]
tools: xentoolcore_restrict_all: Do deregistration before close

Closing the fd before unhooking it from the list runs the risk that a
concurrent thread calls xentoolcore_restrict_all will operate on the
old fd value, which might refer to a new fd by then.  So we need to do
it in the other order.

Sadly this weakens the guarantee provided by xentoolcore_restrict_all
slightly, but not (I think) in a problematic way.  It would be
possible to implement the previous guarantee, but it would involve
replacing all of the close() calls in all of the individual osdep
parts of all of the individual libraries with calls to a new function
which does
   dup2("/dev/null", thing->fd);
   pthread_mutex_lock(&handles_lock);
   thing->fd = -1;
   pthread_mutex_unlock(&handles_lock);
   close(fd);
which would be terribly tedious.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoimprove XENMEM_add_to_physmap_batch address checking
Jan Beulich [Tue, 28 Nov 2017 12:15:12 +0000 (13:15 +0100)]
improve XENMEM_add_to_physmap_batch address checking

As a follow-up to XSA-212 we should have addressed a similar issue here:
The handles being advanced at the top of xenmem_add_to_physmap_batch()
means we allow hypervisor space accesses (in particular, for "errs",
writes) with suitably crafted input arguments. This isn't a security
issue in this case because of the limited width of struct
xen_add_to_physmap_batch's size field: It being 16-bits wide, only the
r/o M2P area can be accessed. Still we can and should do better.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: check paging mode earlier in xenmem_add_to_physmap_one()
Jan Beulich [Tue, 28 Nov 2017 12:14:43 +0000 (13:14 +0100)]
x86: check paging mode earlier in xenmem_add_to_physmap_one()

There's no point in deferring this until after some initial processing,
and it's actively wrong for the XENMAPSPACE_gmfn_foreign handling to not
have such a check at all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: replace bad ASSERT() in xenmem_add_to_physmap_one()
Jan Beulich [Tue, 28 Nov 2017 12:14:10 +0000 (13:14 +0100)]
x86: replace bad ASSERT() in xenmem_add_to_physmap_one()

There are no locks being held, i.e. it is possible to be triggered by
racy hypercall invocations. Subsequent code doesn't really depend on the
checked values, so this is not a security issue.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agop2m: Check return value of p2m_set_entry() when decreasing reservation
George Dunlap [Tue, 28 Nov 2017 12:13:26 +0000 (13:13 +0100)]
p2m: Check return value of p2m_set_entry() when decreasing reservation

If the entire range specified to p2m_pod_decrease_reservation() is marked
populate-on-demand, then it will make a single p2m_set_entry() call,
reducing its PoD entry count.

Unfortunately, in the right circumstances, this p2m_set_entry() call
may fail.  It that case, repeated calls to decrease_reservation() may
cause p2m->pod.entry_count to fall below zero, potentially tripping
over BUG_ON()s to the contrary.

Instead, check to see if the entry succeeded, and return false if not.
The caller will then call guest_remove_page() on the gfns, which will
return -EINVAL upon finding no valid memory there to return.

Unfortunately if the order > 0, the entry may have partially changed.
A domain_crash() is probably the safest thing in that case.

Other p2m_set_entry() calls in the same function should be fine,
because they are writing the entry at its current order.  Nonetheless,
check the return value and crash if our assumption turns otu to be
wrong.

This is part of XSA-247.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agop2m: Always check to see if removing a p2m entry actually worked
George Dunlap [Tue, 28 Nov 2017 12:13:03 +0000 (13:13 +0100)]
p2m: Always check to see if removing a p2m entry actually worked

The PoD zero-check functions speculatively remove memory from the p2m,
then check to see if it's completely zeroed, before putting it in the
cache.

Unfortunately, the p2m_set_entry() calls may fail if the underlying
pagetable structure needs to change and the domain has exhausted its
p2m memory pool: for instance, if we're removing a 2MiB region out of
a 1GiB entry (in the p2m_pod_zero_check_superpage() case), or a 4k
region out of a 2MiB or larger entry (in the p2m_pod_zero_check()
case); and the return value is not checked.

The underlying mfn will then be added into the PoD cache, and at some
point mapped into another location in the p2m.  If the guest
afterwards ballons out this memory, it will be freed to the hypervisor
and potentially reused by another domain, in spite of the fact that
the original domain still has writable mappings to it.

There are several places where p2m_set_entry() shouldn't be able to
fail, as it is guaranteed to write an entry of the same order that
succeeded before.  Add a backstop of crashing the domain just in case,
and an ASSERT_UNREACHABLE() to flag up the broken assumption on debug
builds.

While we're here, use PAGE_ORDER_2M rather than a magic constant.

This is part of XSA-247.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pod: prevent infinite loop when shattering large pages
Julien Grall [Tue, 28 Nov 2017 12:11:55 +0000 (13:11 +0100)]
x86/pod: prevent infinite loop when shattering large pages

When populating pages, the PoD may need to split large ones using
p2m_set_entry and request the caller to retry (see ept_get_entry for
instance).

p2m_set_entry may fail to shatter if it is not possible to allocate
memory for the new page table. However, the error is not propagated
resulting to the callers to retry infinitely the PoD.

Prevent the infinite loop by return false when it is not possible to
shatter the large mapping.

This is XSA-246.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoSUPPORT.md: Add statement on PCI passthrough
George Dunlap [Wed, 22 Nov 2017 19:19:04 +0000 (19:19 +0000)]
SUPPORT.md: Add statement on PCI passthrough

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add secondary memory management features
George Dunlap [Wed, 22 Nov 2017 19:19:04 +0000 (19:19 +0000)]
SUPPORT.md: Add secondary memory management features

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add Security-releated features
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add Security-releated features

With the exception of driver domains, which depend on PCI passthrough,
and will be introduced later.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoSUPPORT.md: Add 'easy' HA / FT features
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add 'easy' HA / FT features

Migration being one of the key 'non-easy' ones to be added later.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add Debugging, analysis, crash post-portem
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add Debugging, analysis, crash post-portem

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add ARM-specific virtual hardware
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add ARM-specific virtual hardware

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoSUPPORT.md: Add x86-specific virtual hardware
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add x86-specific virtual hardware

x86-specific virtual hardware provided by the hypervisor, toolstack,
or QEMU.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
7 years agoSUPPORT.md: Add virtual devices common to ARM and x86
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add virtual devices common to ARM and x86

Mostly PV protocols.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Toolstack core
George Dunlap [Wed, 22 Nov 2017 19:19:01 +0000 (19:19 +0000)]
SUPPORT.md: Toolstack core

For now only include xl-specific features, or interaction with the
system.  Feature support matrix will be added when features are
mentioned.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agoSUPPORT.md: Add scalability features
George Dunlap [Wed, 22 Nov 2017 19:19:01 +0000 (19:19 +0000)]
SUPPORT.md: Add scalability features

Superpage support and PVHVM.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.gralL@linaro.org>
7 years agoSUPPORT.md: Add core ARM features
George Dunlap [Thu, 23 Nov 2017 17:32:16 +0000 (17:32 +0000)]
SUPPORT.md: Add core ARM features

Hardware support and guest type.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoSUPPORT.md: Add some x86 features
George Dunlap [Thu, 23 Nov 2017 17:32:16 +0000 (17:32 +0000)]
SUPPORT.md: Add some x86 features

Including host architecture support and guest types.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add core functionality
George Dunlap [Thu, 23 Nov 2017 17:32:15 +0000 (17:32 +0000)]
SUPPORT.md: Add core functionality

Core memory management and scheduling.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoIntroduce skeleton SUPPORT.md
George Dunlap [Thu, 23 Nov 2017 17:32:14 +0000 (17:32 +0000)]
Introduce skeleton SUPPORT.md

Add a machine-readable file to describe what features are in what
state of being 'supported', as well as information about how long this
release will be supported, and so on.

The document should be formatted using "semantic newlines" [1], to make
changes easier.

Begin with the basic framework.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
[1] http://rhodesmill.org/brandon/2012/one-sentence-per-line/

7 years agox86emul/test: keep compiler from using {x,y,z}mm registers itself
Jan Beulich [Thu, 23 Nov 2017 10:40:31 +0000 (11:40 +0100)]
x86emul/test: keep compiler from using {x,y,z}mm registers itself

Since the emulator acts on the live hardware registers, we need to
prevent the compiler from using them e.g. for inlined memcpy() /
memset() (as gcc7 does). We can't, however, set this from the command
line, as otherwise the 64-bit build would face issues with functions
returning floating point values and being declared in standard headers.

As the pragma isn't available prior to gcc6, we need to invoke it
conditionally. Luckily up to gcc6 we haven't seen generated code access
SIMD registers beyond what our asm()s do.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agosync CPU state upon final domain destruction
Jan Beulich [Thu, 23 Nov 2017 10:38:22 +0000 (11:38 +0100)]
sync CPU state upon final domain destruction

See the code comment being added for why we need this.

This is being placed here to balance between the desire to prevent
future similar issues (the risk of which would grow if it was put
further down the call stack, e.g. in vmx_vcpu_destroy()) and the
intention to limit the performance impact (otherwise it could also go
into rcu_do_batch(), paralleling the use in do_tasklet_work()).

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/hvm: Don't corrupt the HVM context stream when writing the MSR record
Andrew Cooper [Thu, 16 Nov 2017 21:34:02 +0000 (21:34 +0000)]
x86/hvm: Don't corrupt the HVM context stream when writing the MSR record

Ever since it was introduced in c/s bd1f0b45ff, hvm_save_cpu_msrs() has had a
bug whereby it corrupts the HVM context stream if some, but fewer than the
maximum number of MSRs are written.

_hvm_init_entry() creates an hvm_save_descriptor with length for
msr_count_max, but in the case that we write fewer than max, h->cur only moves
forward by the amount of space used, causing the subsequent
hvm_save_descriptor to be written within the bounds of the previous one.

To resolve this, reduce the length reported by the descriptor to match the
actual number of bytes used.

A typical failure on the destination side looks like:

    (XEN) HVM4 restore: CPU_MSR 0
    (XEN) HVM4.0 restore: not enough data left to read 56 MSR bytes
    (XEN) HVM4 restore: failed to load entry 20/0

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/libxc: Fix restoration of PV MSRs after migrate
Andrew Cooper [Thu, 16 Nov 2017 21:10:00 +0000 (21:10 +0000)]
tools/libxc: Fix restoration of PV MSRs after migrate

There are two bugs in process_vcpu_msrs() which clearly demonstrate that I
didn't test this bit of Migration v2 very well when writing it...

vcpu->msrsz is always expected to be a multiple of xen_domctl_vcpu_msr_t
records in a spec-compliant stream, so the modulo yields 0 for the msr_count,
rather than the actual number sent in the stream.

Passing 0 for the msr_count causes the hypercall to exit early, and hides the
fact that the guest handle is inserted into the wrong field in the domctl
union.

The reason that these bugs have gone unnoticed for so long is that the only
MSRs passed like this for PV guests are the AMD DBGEXT MSRs, which only exist
in fairly modern hardware, and whose use doesn't appear to be implemented in
any contemporary PV guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/hvm: Fix altp2m_vcpu_enable_notify error handling
Adrian Pop [Wed, 15 Nov 2017 13:47:59 +0000 (15:47 +0200)]
x86/hvm: Fix altp2m_vcpu_enable_notify error handling

The altp2m_vcpu_enable_notify subop handler might skip calling
rcu_unlock_domain() after rcu_lock_current_domain().  Albeit since both
rcu functions are no-ops when run on the current domain, this doesn't
really have repercussions.

The second change is adding a missing break that would have potentially
enabled #VE for the current domain even if it had intended to enable it
for another one (not a supported functionality).

Signed-off-by: Adrian Pop <apop@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/shadow: correct SH_LINEAR mapping detection in sh_guess_wrmap()
Andrew Cooper [Thu, 16 Nov 2017 09:38:14 +0000 (10:38 +0100)]
x86/shadow: correct SH_LINEAR mapping detection in sh_guess_wrmap()

The fix for XSA-243 / CVE-2017-15592 (c/s bf2b4eadcf379) introduced a change
in behaviour for sh_guest_wrmap(), where it had to cope with no shadow linear
mapping being present.

As the name suggests, guest_vtable is a mapping of the guests pagetable, not
Xen's pagetable, meaning that it isn't the pagetable we need to check for the
shadow linear slot in.

The practical upshot is that a shadow HVM vcpu which switches into 4-level
paging mode, with an L4 pagetable that contains a mapping which aliases Xen's
SH_LINEAR_PT_VIRT_START will fool the safety check for whether a SHADOW_LINEAR
mapping is present.  As the check passes (when it should have failed), Xen
subsequently falls over the missing mapping with a pagefault such as:

    (XEN) Pagetable walk from ffff8140a0503880:
    (XEN)  L4[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L3[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L2[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L1[0x103] = 0000000000000000 ffffffffffffffff

This is part of XSA-243.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86: don't wrongly trigger linear page table assertion
Jan Beulich [Thu, 16 Nov 2017 09:37:29 +0000 (10:37 +0100)]
x86: don't wrongly trigger linear page table assertion

_put_page_type() may do multiple iterations until its cmpxchg()
succeeds. It invokes set_tlbflush_timestamp() on the first
iteration, however. Code inside the function takes care of this, but
- the assertion in _put_final_page_type() would trigger on the second
  iteration if time stamps in a debug build are permitted to be
  sufficiently much wider than the default 6 bits (see WRAP_MASK in
  flushtlb.c),
- it returning -EINTR (for a continuation to be scheduled) would leave
  the page inconsistent state (until the re-invocation completes).
Make the set_tlbflush_timestamp() invocation conditional, bypassing it
(for now) only in the case we really can't tolerate the stamp to be
stored.

This is part of XSA-240.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>