]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agox86/shadow: fix ref-counting error handling
Jan Beulich [Tue, 12 Dec 2017 13:29:45 +0000 (14:29 +0100)]
x86/shadow: fix ref-counting error handling

The old-Linux handling in shadow_set_l4e() mistakenly ORed together the
results of sh_get_ref() and sh_pin(). As the latter failing is not a
correctness problem, simply ignore its return value.

In sh_set_toplevel_shadow() a failing sh_get_ref() must not be
accompanied by installing the entry, despite the domain being crashed.

This is XSA-250.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/shadow: fix refcount overflow check
Jan Beulich [Tue, 12 Dec 2017 13:29:13 +0000 (14:29 +0100)]
x86/shadow: fix refcount overflow check

Commit c385d27079 ("x86 shadow: for multi-page shadows, explicitly track
the first page") reduced the refcount width to 25, without adjusting the
overflow check. Eliminate the disconnect by using a manifest constant.

Interestingly, up to commit 047782fa01 ("Out-of-sync L1 shadows: OOS
snapshot") the refcount was 27 bits wide, yet the check was already
using 26.

This is XSA-249.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/mm: don't wrongly set page ownership
Jan Beulich [Tue, 12 Dec 2017 13:28:36 +0000 (14:28 +0100)]
x86/mm: don't wrongly set page ownership

PV domains can obtain mappings of any pages owned by the correct domain,
including ones that aren't actually assigned as "normal" RAM, but used
by Xen internally.  At the moment such "internal" pages marked as owned
by a guest include pages used to track logdirty bits, as well as p2m
pages and the "unpaged pagetable" for HVM guests. Since the PV memory
management and shadow code conflict in their use of struct page_info
fields, and since shadow code is being used for log-dirty handling for
PV domains, pages coming from the shadow pool must, for PV domains, not
have the domain set as their owner.

While the change could be done conditionally for just the PV case in
shadow code, do it unconditionally (and for consistency also for HAP),
just to be on the safe side.

There's one special case though for shadow code: The page table used for
running a HVM guest in unpaged mode is subject to get_page() (in
set_shadow_status()) and hence must have its owner set.

This is XSA-248.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86: don't wrongly trigger linear page table assertion (2)
Jan Beulich [Tue, 12 Dec 2017 13:27:34 +0000 (14:27 +0100)]
x86: don't wrongly trigger linear page table assertion (2)

_put_final_page_type(), when free_page_type() has exited early to allow
for preemption, should not update the time stamp, as the page continues
to retain the typ which is in the process of being unvalidated. I can't
see why the time stamp update was put on that path in the first place
(albeit it may well have been me who had put it there years ago).

This is part of XSA-240.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap.com>
7 years agoxen/arm32: mm: Rework is_xen_heap_page to avoid nameclash
Julien Grall [Wed, 1 Nov 2017 14:03:14 +0000 (14:03 +0000)]
xen/arm32: mm: Rework is_xen_heap_page to avoid nameclash

The arm32 version of the function is_xen_heap_page currently define a
variable _mfn. This will lead to a compiler when use typesafe MFN in a
follow-up patch:

called object '_mfn' is not a function or function pointer

Fix it by renaming the local variable _mfn to mfn_.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain_build: Clean-up insert_11_bank
Julien Grall [Wed, 1 Nov 2017 14:03:13 +0000 (14:03 +0000)]
xen/arm: domain_build: Clean-up insert_11_bank

    - Remove spurious ()
    - Add missing spaces
    - Turn 1 << to 1UL <<
    - Rename spfn to smfn and switch to mfn_t

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: VGIC: move gic_remove_irq_from_queues()
Andre Przywara [Thu, 7 Dec 2017 16:14:08 +0000 (16:14 +0000)]
ARM: VGIC: move gic_remove_irq_from_queues()

gic_remove_irq_from_queues() was not only misnamed, it also has the wrong
abstraction, as it should not live in gic.c.
Move it into vgic.c and vgic.h, where it belongs to, and rename it on
the way.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: gic-v3: Bail out if gicv3_cpu_init fail
Julien Grall [Wed, 6 Dec 2017 14:51:37 +0000 (14:51 +0000)]
xen/arm: gic-v3: Bail out if gicv3_cpu_init fail

When system registers are not enabled, all the access to them will trap
in EL2. In Xen, system registers will be enabled by gicv3_cpu_init only
on success. As the rest of the code (e.g gicv3_hyp_init) relies on
system register, it is better to bail out directly.

This will save time on debugging early boot issue on GICv3 platform.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Surround HSR_SYSREG macro value with ()
Julien Grall [Wed, 29 Nov 2017 17:46:35 +0000 (17:46 +0000)]
xen/arm: Surround HSR_SYSREG macro value with ()

The value of the macro HCR_SYSREG is not surrounded by (). This means
the behavior may change depend on how it is used.

Thanksfully recent GCC will issue a warning for that.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: vGIC: fix nr_irq definition
Andre Przywara [Thu, 19 Oct 2017 12:48:37 +0000 (13:48 +0100)]
ARM: vGIC: fix nr_irq definition

The global variable "nr_irqs" is used for x86 and some common Xen code.
To make the latter work easily for ARM, it was #defined to NR_IRQS.
This not only violated the common habit of capitalizing macros, but
also caused issues if one wanted to use a rather innocent "nr_irqs" as
a local variable name or as a function parameter.
Drop the optimization and make nr_irqs a normal variable for ARM also.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
7 years agoARM: remove unneeded gic.h inclusions
Andre Przywara [Thu, 19 Oct 2017 12:48:36 +0000 (13:48 +0100)]
ARM: remove unneeded gic.h inclusions

gic.h is supposed to hold defines and prototypes for the hardware side
of the GIC interrupt controller. A lot of parts in Xen should not be
bothered with that, as they either only care about the VGIC or use
more generic interfaces.
Remove unneeded inclusions of gic.h from files where they are actually
not needed.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: bootfdt: Use proper default for #address-cells and #size-cells
Julien Grall [Wed, 29 Nov 2017 17:57:32 +0000 (17:57 +0000)]
xen/arm: bootfdt: Use proper default for #address-cells and #size-cells

Per the device-tree specific [1], when the property #address-cells
and  #size-cells are not present, the default value should be resp. 1
and 2.

[1] https://www.devicetree.org/downloads/devicetree-specification-v0.1-20160524.pdf

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm64: head.S: Introduce macro to load the physical address of a symbol
Julien Grall [Thu, 7 Dec 2017 17:18:46 +0000 (17:18 +0000)]
xen/arm64: head.S: Introduce macro to load the physical address of a symbol

A lot of places in the ARM64 assembly code requiring to load the
physical address of a symbol. Rather than open-coding the translation,
introduce a new macro that will load the physical address of a symbol.

Lastly, use this new macro to replace all the current opencoded version.

Note that most of comments associated to the code changed have been
removed because the code is now self-explanatory.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Remove unused fixmap slots
Julien Grall [Thu, 7 Dec 2017 17:19:11 +0000 (17:19 +0000)]
xen/arm: Remove unused fixmap slots

There are quite a few fixmap slots that have not been used for a while.
Remove them.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86: rename DIRTY_GS_BASE_USER
Jan Beulich [Thu, 7 Dec 2017 10:10:12 +0000 (11:10 +0100)]
x86: rename DIRTY_GS_BASE_USER

As of commit 91f85280b9 ("x86: fix GS-base-dirty determination") the
USER part of it isn't really appropriate anymore.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agomm: don't use domain_shutdown() when re-offlining a page
Jan Beulich [Thu, 7 Dec 2017 10:09:31 +0000 (11:09 +0100)]
mm: don't use domain_shutdown() when re-offlining a page

It goes all silent, leaving open what has actually caused the crash.
Use domain_crash() instead, which leaves a log message before calling
domain_shutdown(..., SHUTDOWN_crash).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agopdx: correct indentation
Jan Beulich [Thu, 7 Dec 2017 10:08:41 +0000 (11:08 +0100)]
pdx: correct indentation

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/HVM: don't retain emulated insn cache when exiting back to guest
Jan Beulich [Wed, 6 Dec 2017 11:50:23 +0000 (12:50 +0100)]
x86/HVM: don't retain emulated insn cache when exiting back to guest

vio->mmio_retry is being set when a repeated string insn is being split
up. In that case we'll exit to the guest, expecting immediate re-entry.
Interruptions, however, may be serviced by the guest before re-entry
from the repeated string insn. Any emulation needed in the course of
handling the interruption must not fetch from the internally maintained
cache.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agodrop stray .0 from hypervisor version
Jan Beulich [Tue, 5 Dec 2017 16:25:40 +0000 (17:25 +0100)]
drop stray .0 from hypervisor version

7 years agox86: don't ignore foreigndom on L2/L3/L4 page table updates
Jan Beulich [Tue, 5 Dec 2017 16:23:53 +0000 (17:23 +0100)]
x86: don't ignore foreigndom on L2/L3/L4 page table updates

Silently assuming DOMID_SELF is unlikely to be a good idea for page
table updates. For PGT_writable pages, though, it seems better to allow
the writes, so the same check isn't being applied there.

Also add blank lines between the individual case blocks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: tighten MMU_*PT_UPDATE* check and combine error paths
Jan Beulich [Tue, 5 Dec 2017 16:23:18 +0000 (17:23 +0100)]
x86: tighten MMU_*PT_UPDATE* check and combine error paths

Don't accept anything other than r/w RAM pages as page table pages and
move the paged-out check into the (unlikely) error path following that
check.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mm: drop yet another relic of translated PV domains from new_guest_cr3()
Jan Beulich [Tue, 5 Dec 2017 16:22:31 +0000 (17:22 +0100)]
x86/mm: drop yet another relic of translated PV domains from new_guest_cr3()

The function can be called for PV domains only, which commit 5a0b9fba92
("x86/mm: drop further relics of translated PV domains") sort of
realized, but not fully.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/HVM: tighten re-issue check in hvmemul_do_io()
Jan Beulich [Tue, 5 Dec 2017 16:18:37 +0000 (17:18 +0100)]
x86/HVM: tighten re-issue check in hvmemul_do_io()

I'm not sure why we had left out the address check in case of indirect
accesses (where "data" holds a guest physical address).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agoXSM/flask: constification of IRQ mapping interfaces
Jan Beulich [Tue, 5 Dec 2017 16:17:57 +0000 (17:17 +0100)]
XSM/flask: constification of IRQ mapping interfaces

This clarifies that the involved structures are read-only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agox86/MSI: leverage local variables
Jan Beulich [Tue, 5 Dec 2017 16:17:23 +0000 (17:17 +0100)]
x86/MSI: leverage local variables

... instead of using redundant calculations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoefi: use ROUNDUP() macro instead of open code
Daniel Kiper [Tue, 5 Dec 2017 16:16:04 +0000 (17:16 +0100)]
efi: use ROUNDUP() macro instead of open code

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: improve GNTTABOP_cache_flush locking
Jan Beulich [Mon, 4 Dec 2017 10:04:18 +0000 (11:04 +0100)]
gnttab: improve GNTTABOP_cache_flush locking

Dropping the lock before returning from grant_map_exists() means handing
possibly stale information back to the caller. Return back the pointer
to the active entry instead, for the caller to release the lock once
done.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agognttab: correct GNTTABOP_cache_flush empty batch handling
Jan Beulich [Mon, 4 Dec 2017 10:03:32 +0000 (11:03 +0100)]
gnttab: correct GNTTABOP_cache_flush empty batch handling

Jann validly points out that with a caller bogusly requesting a zero-
element batch with non-zero high command bits (the ones used for
continuation encoding), the assertion right before the call to
hypercall_create_continuation() would trigger. A similar situation would
arise afaict for non-empty batches with op and/or length zero in every
element.

While we want the former to succeed (as we do elsewhere for similar
no-op requests), the latter can clearly be converted to an error, as
this is a state that can't be the result of a prior operation.

Take the opportunity and also correct the order of argument checks:
We shouldn't accept zero-length elements with unknown bits set in "op".
Also constify cache_flush()'s first parameter.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agopci: introduce a type to store a SBDF
Roger Pau Monné [Mon, 4 Dec 2017 10:02:46 +0000 (11:02 +0100)]
pci: introduce a type to store a SBDF

That provides direct access to all the members that constitute a SBDF.
The only function switched to use it is hvm_pci_decode_addr, because
it makes following patches simpler.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/pio: allow internal PIO handlers to return RETRY
Roger Pau Monné [Mon, 4 Dec 2017 10:02:16 +0000 (11:02 +0100)]
x86/pio: allow internal PIO handlers to return RETRY

Fix handle_pio so internal PIO handlers can return X86EMUL_RETRY and
it is properly handled by not advancing the IP.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agolibelf: allow having HYPERCALL_PAGE entry before VIRT_BASE in __xen_guest section
Gregory Herrero [Mon, 4 Dec 2017 10:01:48 +0000 (11:01 +0100)]
libelf: allow having HYPERCALL_PAGE entry before VIRT_BASE in __xen_guest section

When filling __xen_guest section of a guest, user may define
HYPERCALL_PAGE earlier than VIRT_BASE in the section leading to an
incorrect hypercall page address since an undefined virt_base could be
used to compute hypercall page address.
If there is no VIRT_BASE entry in __xen_guest section, default value of
0 is used for virt_base. Thus, setting hypercall page address to
HYPERCALL_PAGE value is correct in this case too.

Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/physdev: remove redundant code in branch MAP_PIRQ_TYPE_MSI
Zhenzhong Duan [Mon, 4 Dec 2017 10:01:24 +0000 (11:01 +0100)]
x86/physdev: remove redundant code in branch MAP_PIRQ_TYPE_MSI

Same code is already in allocate_and_map_msi_pirq()

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/boot: rename send_chr to print_err
David Esler [Mon, 4 Dec 2017 10:00:24 +0000 (11:00 +0100)]
x86/boot: rename send_chr to print_err

The send_chr function sends an entire C-string and not one character and
doesn't necessarily just send it over the serial UART anymore so rename
it to print_err so that its closer in name to what it does.

Signed-off-by: David Esler <drumandstrum@gmail.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
7 years agox86/svm: Add virtual GIF support
Brian Woods [Thu, 16 Nov 2017 22:11:15 +0000 (16:11 -0600)]
x86/svm: Add virtual GIF support

This patch detects and enables Virtual GIF if available.  This allows
a nested hypervisor to perform STGIs and CLGIs without having to be
intercepted by host hypervisor.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/svm: Add virtual GIF feature definition
Brian Woods [Thu, 16 Nov 2017 22:11:14 +0000 (16:11 -0600)]
x86/svm: Add virtual GIF feature definition

Add support for enabling the virtual GIF feature.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/traps: Drop redundant printk() in fatal_trap()
Andrew Cooper [Tue, 28 Nov 2017 18:48:07 +0000 (18:48 +0000)]
x86/traps: Drop redundant printk() in fatal_trap()

show_page_walk() already prints the linear address of the walk, and
show_execution_state() has printed a raw %cr2 value.  This avoids having
two adjacent log lines with identical information.

  (XEN) Faulting linear address: 00000000025ff028
  (XEN) Pagetable walk from 00000000025ff028:
  ...

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/vmx: Drop more PVHv1 remenants
Andrew Cooper [Mon, 20 Nov 2017 13:18:45 +0000 (13:18 +0000)]
x86/vmx: Drop more PVHv1 remenants

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/pvh: Do not add DSDT and FACS to PVH dom0 XSDT
Boris Ostrovsky [Thu, 9 Nov 2017 15:37:53 +0000 (10:37 -0500)]
x86/pvh: Do not add DSDT and FACS to PVH dom0 XSDT

These tables are pointed to from FADT. Adding them will
result in duplicate entries in the guest's tables.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/vvmx: Remove enum vmx_regs_enc
Euan Harris [Thu, 26 Oct 2017 17:03:11 +0000 (18:03 +0100)]
x86/vvmx: Remove enum vmx_regs_enc

This is the standard register encoding, is not VVMX-specific and is only
used in a couple of places.

Signed-off-by: Euan Harris <euan.harris@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/vvmx: don't enable vmcs shadowing for nested guests
Sergey Dyasli [Mon, 23 Oct 2017 09:33:02 +0000 (10:33 +0100)]
x86/vvmx: don't enable vmcs shadowing for nested guests

Running "./xtf_runner vvmx" in L1 Xen under L0 Xen produces the
following result on H/W with VMCS shadowing:

    Test: vmxon
    Failure in test_vmxon_in_root_cpl0()
      Expected 0x8200000f: VMfailValid(15) VMXON_IN_ROOT
           Got 0x82004400: VMfailValid(17408) <unknown>
    Test result: FAILURE

This happens because SDM allows vmentries with enabled VMCS shadowing
VM-execution control and VMCS link pointer value of ~0ull. But results
of a nested VMREAD are undefined in such cases.

Fix this by not copying the value of VMCS shadowing control from vmcs01
to vmcs02.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/svm: add virtual VMLOAD/VMSAVE support
Brian Woods [Tue, 31 Oct 2017 22:03:08 +0000 (17:03 -0500)]
x86/svm: add virtual VMLOAD/VMSAVE support

On AMD family 17h server processors, there is a feature called virtual
VMLOAD/VMSAVE.  This allows a nested hypervisor to preform a VMLOAD or
VMSAVE without needing to be intercepted by the host hypervisor.
Virtual VMLOAD/VMSAVE requires the host hypervisor to be in long mode
and nested page tables to be enabled.  For more information about it
please see:

AMD64 Architecture Programmer’s Manual Volume 2: System Programming
http://support.amd.com/TechDocs/24593.pdf
Section: VMSAVE and VMLOAD Virtualization (Section 15.33.1)

This patch series adds support to check for and enable the virtual
VMLOAD/VMSAVE features if available.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/svm: add virtual VMLOAD/VMSAVE feature definition
Brian Woods [Tue, 31 Oct 2017 22:03:07 +0000 (17:03 -0500)]
x86/svm: add virtual VMLOAD/VMSAVE feature definition

Adding support for enabling the virtual VMLOAD/VMSAVE feature..

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/svm: rename lbr control field in vmcb
Brian Woods [Tue, 31 Oct 2017 22:03:06 +0000 (17:03 -0500)]
x86/svm: rename lbr control field in vmcb

Rename the lbr_control field in the vmcb for future/upcoming changes.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
Haozhong Zhang [Mon, 11 Sep 2017 04:37:43 +0000 (12:37 +0800)]
x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()

Replace pdx_to_page(pfn_to_pdx(pfn)) by mfn_to_page(pfn), which is
identical to the former.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/vmx: Don't use rdmsr() to fill HOST_SYSENTER_{CS,EIP}
Andrew Cooper [Fri, 20 Oct 2017 13:56:23 +0000 (14:56 +0100)]
x86/vmx: Don't use rdmsr() to fill HOST_SYSENTER_{CS,EIP}

These are compile-time constants, and don't need to be read back from
hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/vmx: Don't rewrite HOST_TR_SELECTOR on every context switch
Andrew Cooper [Tue, 17 Oct 2017 17:06:23 +0000 (18:06 +0100)]
x86/vmx: Don't rewrite HOST_TR_SELECTOR on every context switch

TSS_ENTRY is a compile time constant, so HOST_TR_SELECTOR can be set up during
VMCS construction and left alone thereafter, rather than rewriting it on every
context switch.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/pv: Misc improvements to pv_destroy_gdt()
Andrew Cooper [Tue, 3 Oct 2017 18:46:40 +0000 (19:46 +0100)]
x86/pv: Misc improvements to pv_destroy_gdt()

Hoist the l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO) calculation out of the
loop, and switch the code over to using mfn_t.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Use DIV_ROUND_UP() when converting between GDT entries and frames
Andrew Cooper [Tue, 3 Oct 2017 15:30:54 +0000 (15:30 +0000)]
x86/pv: Use DIV_ROUND_UP() when converting between GDT entries and frames

Also consistently use use nr_frames, rather than mixing nr_pages with a
frames[] array.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Move compat_set_gdt() to be beside do_set_gdt()
Andrew Cooper [Tue, 3 Oct 2017 15:30:01 +0000 (15:30 +0000)]
x86/pv: Move compat_set_gdt() to be beside do_set_gdt()

This also makes the do_update_descriptor() pair of functions adjacent.

Purely code motion; no functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Factor out the calculation of LDT/GDT descriptor pointers
Andrew Cooper [Fri, 13 Oct 2017 10:55:00 +0000 (10:55 +0000)]
x86/pv: Factor out the calculation of LDT/GDT descriptor pointers

Rather than opencoding it in two places.  While only used in the PV emulation
code, this helper is in principle usable anywhere in the hypervisor.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/pv: Construct d0v0's GDT properly
Andrew Cooper [Mon, 16 Oct 2017 13:20:07 +0000 (13:20 +0000)]
xen/pv: Construct d0v0's GDT properly

c/s cf6d39f8199 "x86/PV: properly populate descriptor tables" changed the GDT
to reference zero_page for intermediate frames between the guest and Xen
frames.

Because dom0_construct_pv() doesn't call arch_set_info_guest(), some bits of
initialisation are missed, including the pv_destroy_gdt() which initially
fills the references to zero_page.

In practice, this means there is a window between starting and the first call
to HYPERCALL_set_gdt() were lar/lsl/verr/verw suffer non-architectural
behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
This probably wants backporting to Xen 4.7 and later.

7 years agox86/ldt: Alter how invalidate_shadow_ldt() deals with TLB flushes
Andrew Cooper [Mon, 2 Oct 2017 14:13:38 +0000 (14:13 +0000)]
x86/ldt: Alter how invalidate_shadow_ldt() deals with TLB flushes

Modify invalidate_shadow_ldt() to return a boolean indicating whether mappings
have been dropped, rather than taking a flush parameter.  Tweak the internal
logic to be able to ASSERT() that v->arch.pv_vcpu.shadow_ldt_mapcnt matches
the number of PTEs removed.

This allows MMUEXTOP_SET_LDT to avoid a local TLB flush if no LDT entries had
been faulted in to begin with.

Finally, correct a comment in __get_page_type().  Under no circumstance is it
safe to forgo the TLB shootdown for GDT/LDT pages, as that would allow one
vcpu to gain a writeable mapping to a frame still mapped as a GDT/LDT by
another vcpu.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: Introduce static inline wrappers for l{idt,gdt,ldt,tr}()
Andrew Cooper [Mon, 2 Oct 2017 13:58:17 +0000 (13:58 +0000)]
xen/x86: Introduce static inline wrappers for l{idt,gdt,ldt,tr}()

This avoids indirection and parameter constraint issues.  Doing so relaxes the
load_LDT() constraints from %ax to any general purpose register.  The helpers
are upgraded to full compiler barriers, because nothing good will come of
having these reordered with respect to other segment accesses.

The triple-fault reboot method stays as is, to avoid the int3 possibly getting
moved relative to the lidt.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/smp: Rework cpu_smpboot_alloc() to cope with more than just -ENOMEM
Andrew Cooper [Mon, 2 Oct 2017 13:50:05 +0000 (13:50 +0000)]
x86/smp: Rework cpu_smpboot_alloc() to cope with more than just -ENOMEM

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/hvm: fix interaction between internal and external emulation
Paul Durrant [Tue, 28 Nov 2017 14:05:19 +0000 (14:05 +0000)]
x86/hvm: fix interaction between internal and external emulation

A call to handle_hvm_io_completion() is needed for completing I/O
that requires external emulation. Such completion should be requested when
hvm_vcpu_io_need_completion() returns true after hvm_emulate_once() has
completed. This is indicative of the underlying I/O emulation having
returned X86EMUL_RETRY and hence a re-emulation of the instruction is
needed to pick up the result of the I/O.

A call to handle_hvm_io_completion() is NOT needed when the underlying
I/O has not returned X86EMUL_RETRY since there will be no result to pick
up. Hence it bogus to request such completion when mmio_retry is set,
since this can only happen if the underlying I/O emulation has returned
X86EMUL_OKAY (meaning the I/O has completed successfully).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: Avoid corruption on migrate for vcpus using CPUID Faulting
Andrew Cooper [Sat, 25 Nov 2017 15:17:14 +0000 (15:17 +0000)]
x86: Avoid corruption on migrate for vcpus using CPUID Faulting

Xen 4.8 and later virtualises CPUID Faulting support for guests.  However, the
value of MSR_MISC_FEATURES_ENABLES is omitted from the vcpu state, meaning
that the current cpuid faulting setting is lost on migrate/suspend/resume.

Instead of following the MSR status quo, take the opportunity to make the
logic more generic, and in particular, trivial to extend for future MSRs.

This is done by discarding the notion of optional MSRs, and requiring the
toolstack to be prepared to move all of the MSRs, although only a subset will
typically need to move.

This allows for the use of guest_{rd,wr}msr() alone to evaluate whether an MSR
needs moving.  This is a benefit because it means there is a single piece of
logic responsible for evaluating whether a guest can use an MSR, and which
values are acceptable.

One small adjustment to guest_wrmsr() is required to cope with being called in
toolstack context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoREADME, Makefiles, Config.mk: Update for branching 4.10 vs 4.11-unstable
Ian Jackson [Fri, 1 Dec 2017 15:06:11 +0000 (15:06 +0000)]
README, Makefiles, Config.mk: Update for branching 4.10 vs 4.11-unstable

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoRevert "xen/arm: domain_builder: irq sanity check logic fix" 4.10.0-rc7
Andrew Cooper [Wed, 29 Nov 2017 11:45:02 +0000 (11:45 +0000)]
Revert "xen/arm: domain_builder: irq sanity check logic fix"

This reverts commit 11e7dd958de73a45645bd40d82280660bd2c9ee8.

It breaks boot on ARM.

Reported-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/arm: domain_builder: irq sanity check logic fix
Stewart Hildebrand [Tue, 28 Nov 2017 14:42:03 +0000 (14:42 +0000)]
xen/arm: domain_builder: irq sanity check logic fix

It's not possible for an irq to be both below 16 and greater/equal than 32.
Also fix the reference to linux documentation while we're at it.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoarm64: ITS: fix cacheability adjustment
Andre Przywara [Thu, 16 Nov 2017 12:02:35 +0000 (12:02 +0000)]
arm64: ITS: fix cacheability adjustment

If the host GICv3 redistributor reports that the pending table cannot
use shareable memory, we try to drop the cacheability attributes as
well. However we fail horribly in doing computer science 101 bit
masking, effectively clearing the whole register instead of just a few
bits.
Fix this by removing the one redundant masking operation and adding the
magic negation for the actually needed other operation.

Reported-by: Manish Jaggi <manish.jaggi@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Release-Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools: xentoolcore_restrict_all: Do deregistration before close
Ian Jackson [Tue, 14 Nov 2017 12:15:42 +0000 (12:15 +0000)]
tools: xentoolcore_restrict_all: Do deregistration before close

Closing the fd before unhooking it from the list runs the risk that a
concurrent thread calls xentoolcore_restrict_all will operate on the
old fd value, which might refer to a new fd by then.  So we need to do
it in the other order.

Sadly this weakens the guarantee provided by xentoolcore_restrict_all
slightly, but not (I think) in a problematic way.  It would be
possible to implement the previous guarantee, but it would involve
replacing all of the close() calls in all of the individual osdep
parts of all of the individual libraries with calls to a new function
which does
   dup2("/dev/null", thing->fd);
   pthread_mutex_lock(&handles_lock);
   thing->fd = -1;
   pthread_mutex_unlock(&handles_lock);
   close(fd);
which would be terribly tedious.

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoimprove XENMEM_add_to_physmap_batch address checking
Jan Beulich [Tue, 28 Nov 2017 12:15:12 +0000 (13:15 +0100)]
improve XENMEM_add_to_physmap_batch address checking

As a follow-up to XSA-212 we should have addressed a similar issue here:
The handles being advanced at the top of xenmem_add_to_physmap_batch()
means we allow hypervisor space accesses (in particular, for "errs",
writes) with suitably crafted input arguments. This isn't a security
issue in this case because of the limited width of struct
xen_add_to_physmap_batch's size field: It being 16-bits wide, only the
r/o M2P area can be accessed. Still we can and should do better.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: check paging mode earlier in xenmem_add_to_physmap_one()
Jan Beulich [Tue, 28 Nov 2017 12:14:43 +0000 (13:14 +0100)]
x86: check paging mode earlier in xenmem_add_to_physmap_one()

There's no point in deferring this until after some initial processing,
and it's actively wrong for the XENMAPSPACE_gmfn_foreign handling to not
have such a check at all.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: replace bad ASSERT() in xenmem_add_to_physmap_one()
Jan Beulich [Tue, 28 Nov 2017 12:14:10 +0000 (13:14 +0100)]
x86: replace bad ASSERT() in xenmem_add_to_physmap_one()

There are no locks being held, i.e. it is possible to be triggered by
racy hypercall invocations. Subsequent code doesn't really depend on the
checked values, so this is not a security issue.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agop2m: Check return value of p2m_set_entry() when decreasing reservation
George Dunlap [Tue, 28 Nov 2017 12:13:26 +0000 (13:13 +0100)]
p2m: Check return value of p2m_set_entry() when decreasing reservation

If the entire range specified to p2m_pod_decrease_reservation() is marked
populate-on-demand, then it will make a single p2m_set_entry() call,
reducing its PoD entry count.

Unfortunately, in the right circumstances, this p2m_set_entry() call
may fail.  It that case, repeated calls to decrease_reservation() may
cause p2m->pod.entry_count to fall below zero, potentially tripping
over BUG_ON()s to the contrary.

Instead, check to see if the entry succeeded, and return false if not.
The caller will then call guest_remove_page() on the gfns, which will
return -EINVAL upon finding no valid memory there to return.

Unfortunately if the order > 0, the entry may have partially changed.
A domain_crash() is probably the safest thing in that case.

Other p2m_set_entry() calls in the same function should be fine,
because they are writing the entry at its current order.  Nonetheless,
check the return value and crash if our assumption turns otu to be
wrong.

This is part of XSA-247.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agop2m: Always check to see if removing a p2m entry actually worked
George Dunlap [Tue, 28 Nov 2017 12:13:03 +0000 (13:13 +0100)]
p2m: Always check to see if removing a p2m entry actually worked

The PoD zero-check functions speculatively remove memory from the p2m,
then check to see if it's completely zeroed, before putting it in the
cache.

Unfortunately, the p2m_set_entry() calls may fail if the underlying
pagetable structure needs to change and the domain has exhausted its
p2m memory pool: for instance, if we're removing a 2MiB region out of
a 1GiB entry (in the p2m_pod_zero_check_superpage() case), or a 4k
region out of a 2MiB or larger entry (in the p2m_pod_zero_check()
case); and the return value is not checked.

The underlying mfn will then be added into the PoD cache, and at some
point mapped into another location in the p2m.  If the guest
afterwards ballons out this memory, it will be freed to the hypervisor
and potentially reused by another domain, in spite of the fact that
the original domain still has writable mappings to it.

There are several places where p2m_set_entry() shouldn't be able to
fail, as it is guaranteed to write an entry of the same order that
succeeded before.  Add a backstop of crashing the domain just in case,
and an ASSERT_UNREACHABLE() to flag up the broken assumption on debug
builds.

While we're here, use PAGE_ORDER_2M rather than a magic constant.

This is part of XSA-247.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pod: prevent infinite loop when shattering large pages
Julien Grall [Tue, 28 Nov 2017 12:11:55 +0000 (13:11 +0100)]
x86/pod: prevent infinite loop when shattering large pages

When populating pages, the PoD may need to split large ones using
p2m_set_entry and request the caller to retry (see ept_get_entry for
instance).

p2m_set_entry may fail to shatter if it is not possible to allocate
memory for the new page table. However, the error is not propagated
resulting to the callers to retry infinitely the PoD.

Prevent the infinite loop by return false when it is not possible to
shatter the large mapping.

This is XSA-246.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoSUPPORT.md: Add statement on PCI passthrough
George Dunlap [Wed, 22 Nov 2017 19:19:04 +0000 (19:19 +0000)]
SUPPORT.md: Add statement on PCI passthrough

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add secondary memory management features
George Dunlap [Wed, 22 Nov 2017 19:19:04 +0000 (19:19 +0000)]
SUPPORT.md: Add secondary memory management features

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add Security-releated features
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add Security-releated features

With the exception of driver domains, which depend on PCI passthrough,
and will be introduced later.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
7 years agoSUPPORT.md: Add 'easy' HA / FT features
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add 'easy' HA / FT features

Migration being one of the key 'non-easy' ones to be added later.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add Debugging, analysis, crash post-portem
George Dunlap [Wed, 22 Nov 2017 19:19:03 +0000 (19:19 +0000)]
SUPPORT.md: Add Debugging, analysis, crash post-portem

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add ARM-specific virtual hardware
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add ARM-specific virtual hardware

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoSUPPORT.md: Add x86-specific virtual hardware
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add x86-specific virtual hardware

x86-specific virtual hardware provided by the hypervisor, toolstack,
or QEMU.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
7 years agoSUPPORT.md: Add virtual devices common to ARM and x86
George Dunlap [Wed, 22 Nov 2017 19:19:02 +0000 (19:19 +0000)]
SUPPORT.md: Add virtual devices common to ARM and x86

Mostly PV protocols.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Toolstack core
George Dunlap [Wed, 22 Nov 2017 19:19:01 +0000 (19:19 +0000)]
SUPPORT.md: Toolstack core

For now only include xl-specific features, or interaction with the
system.  Feature support matrix will be added when features are
mentioned.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agoSUPPORT.md: Add scalability features
George Dunlap [Wed, 22 Nov 2017 19:19:01 +0000 (19:19 +0000)]
SUPPORT.md: Add scalability features

Superpage support and PVHVM.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien.gralL@linaro.org>
7 years agoSUPPORT.md: Add core ARM features
George Dunlap [Thu, 23 Nov 2017 17:32:16 +0000 (17:32 +0000)]
SUPPORT.md: Add core ARM features

Hardware support and guest type.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoSUPPORT.md: Add some x86 features
George Dunlap [Thu, 23 Nov 2017 17:32:16 +0000 (17:32 +0000)]
SUPPORT.md: Add some x86 features

Including host architecture support and guest types.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoSUPPORT.md: Add core functionality
George Dunlap [Thu, 23 Nov 2017 17:32:15 +0000 (17:32 +0000)]
SUPPORT.md: Add core functionality

Core memory management and scheduling.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoIntroduce skeleton SUPPORT.md
George Dunlap [Thu, 23 Nov 2017 17:32:14 +0000 (17:32 +0000)]
Introduce skeleton SUPPORT.md

Add a machine-readable file to describe what features are in what
state of being 'supported', as well as information about how long this
release will be supported, and so on.

The document should be formatted using "semantic newlines" [1], to make
changes easier.

Begin with the basic framework.

Signed-off-by: Ian Jackson <ian.jackson@citrix.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
[1] http://rhodesmill.org/brandon/2012/one-sentence-per-line/

7 years agox86emul/test: keep compiler from using {x,y,z}mm registers itself
Jan Beulich [Thu, 23 Nov 2017 10:40:31 +0000 (11:40 +0100)]
x86emul/test: keep compiler from using {x,y,z}mm registers itself

Since the emulator acts on the live hardware registers, we need to
prevent the compiler from using them e.g. for inlined memcpy() /
memset() (as gcc7 does). We can't, however, set this from the command
line, as otherwise the 64-bit build would face issues with functions
returning floating point values and being declared in standard headers.

As the pragma isn't available prior to gcc6, we need to invoke it
conditionally. Luckily up to gcc6 we haven't seen generated code access
SIMD registers beyond what our asm()s do.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agosync CPU state upon final domain destruction
Jan Beulich [Thu, 23 Nov 2017 10:38:22 +0000 (11:38 +0100)]
sync CPU state upon final domain destruction

See the code comment being added for why we need this.

This is being placed here to balance between the desire to prevent
future similar issues (the risk of which would grow if it was put
further down the call stack, e.g. in vmx_vcpu_destroy()) and the
intention to limit the performance impact (otherwise it could also go
into rcu_do_batch(), paralleling the use in do_tasklet_work()).

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/hvm: Don't corrupt the HVM context stream when writing the MSR record 4.10.0-rc6
Andrew Cooper [Thu, 16 Nov 2017 21:34:02 +0000 (21:34 +0000)]
x86/hvm: Don't corrupt the HVM context stream when writing the MSR record

Ever since it was introduced in c/s bd1f0b45ff, hvm_save_cpu_msrs() has had a
bug whereby it corrupts the HVM context stream if some, but fewer than the
maximum number of MSRs are written.

_hvm_init_entry() creates an hvm_save_descriptor with length for
msr_count_max, but in the case that we write fewer than max, h->cur only moves
forward by the amount of space used, causing the subsequent
hvm_save_descriptor to be written within the bounds of the previous one.

To resolve this, reduce the length reported by the descriptor to match the
actual number of bytes used.

A typical failure on the destination side looks like:

    (XEN) HVM4 restore: CPU_MSR 0
    (XEN) HVM4.0 restore: not enough data left to read 56 MSR bytes
    (XEN) HVM4 restore: failed to load entry 20/0

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/libxc: Fix restoration of PV MSRs after migrate
Andrew Cooper [Thu, 16 Nov 2017 21:10:00 +0000 (21:10 +0000)]
tools/libxc: Fix restoration of PV MSRs after migrate

There are two bugs in process_vcpu_msrs() which clearly demonstrate that I
didn't test this bit of Migration v2 very well when writing it...

vcpu->msrsz is always expected to be a multiple of xen_domctl_vcpu_msr_t
records in a spec-compliant stream, so the modulo yields 0 for the msr_count,
rather than the actual number sent in the stream.

Passing 0 for the msr_count causes the hypercall to exit early, and hides the
fact that the guest handle is inserted into the wrong field in the domctl
union.

The reason that these bugs have gone unnoticed for so long is that the only
MSRs passed like this for PV guests are the AMD DBGEXT MSRs, which only exist
in fairly modern hardware, and whose use doesn't appear to be implemented in
any contemporary PV guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/hvm: Fix altp2m_vcpu_enable_notify error handling
Adrian Pop [Wed, 15 Nov 2017 13:47:59 +0000 (15:47 +0200)]
x86/hvm: Fix altp2m_vcpu_enable_notify error handling

The altp2m_vcpu_enable_notify subop handler might skip calling
rcu_unlock_domain() after rcu_lock_current_domain().  Albeit since both
rcu functions are no-ops when run on the current domain, this doesn't
really have repercussions.

The second change is adding a missing break that would have potentially
enabled #VE for the current domain even if it had intended to enable it
for another one (not a supported functionality).

Signed-off-by: Adrian Pop <apop@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/shadow: correct SH_LINEAR mapping detection in sh_guess_wrmap()
Andrew Cooper [Thu, 16 Nov 2017 09:38:14 +0000 (10:38 +0100)]
x86/shadow: correct SH_LINEAR mapping detection in sh_guess_wrmap()

The fix for XSA-243 / CVE-2017-15592 (c/s bf2b4eadcf379) introduced a change
in behaviour for sh_guest_wrmap(), where it had to cope with no shadow linear
mapping being present.

As the name suggests, guest_vtable is a mapping of the guests pagetable, not
Xen's pagetable, meaning that it isn't the pagetable we need to check for the
shadow linear slot in.

The practical upshot is that a shadow HVM vcpu which switches into 4-level
paging mode, with an L4 pagetable that contains a mapping which aliases Xen's
SH_LINEAR_PT_VIRT_START will fool the safety check for whether a SHADOW_LINEAR
mapping is present.  As the check passes (when it should have failed), Xen
subsequently falls over the missing mapping with a pagefault such as:

    (XEN) Pagetable walk from ffff8140a0503880:
    (XEN)  L4[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L3[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L2[0x102] = 000000046c218063 ffffffffffffffff
    (XEN)  L1[0x103] = 0000000000000000 ffffffffffffffff

This is part of XSA-243.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86: don't wrongly trigger linear page table assertion
Jan Beulich [Thu, 16 Nov 2017 09:37:29 +0000 (10:37 +0100)]
x86: don't wrongly trigger linear page table assertion

_put_page_type() may do multiple iterations until its cmpxchg()
succeeds. It invokes set_tlbflush_timestamp() on the first
iteration, however. Code inside the function takes care of this, but
- the assertion in _put_final_page_type() would trigger on the second
  iteration if time stamps in a debug build are permitted to be
  sufficiently much wider than the default 6 bits (see WRAP_MASK in
  flushtlb.c),
- it returning -EINTR (for a continuation to be scheduled) would leave
  the page inconsistent state (until the re-invocation completes).
Make the set_tlbflush_timestamp() invocation conditional, bypassing it
(for now) only in the case we really can't tolerate the stamp to be
stored.

This is part of XSA-240.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agoxen/arm: p2m: Add more debug in get_page_from_gva
Julien Grall [Wed, 15 Nov 2017 19:34:14 +0000 (19:34 +0000)]
xen/arm: p2m: Add more debug in get_page_from_gva

The function get_page_from_gva is used by copy_*_guest helpers to
translate a guest virtual address to a machine physical address and take
reference on the page.

There are a couple of errors paths that will return the same value making
it difficult to know the exact error. Add more debug in each error patch
only for debug-build.

This should help narrowing down the intermittent failure with the
hypercall GNTTABOP_copy (see [1]).

[1] https://lists.xen.org/archives/html/xen-devel/2017-11/msg00942.html

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: mm: Change the return value of gvirt_to_maddr
Julien Grall [Wed, 15 Nov 2017 19:34:13 +0000 (19:34 +0000)]
xen/arm: mm: Change the return value of gvirt_to_maddr

Currently, gvirt_to_maddr return -EFAULT when the translation failed.
It might be useful to return the PAR_EL1 (Physical Address Register)
in such a case to get a better idea of the reason.

So modify the return value to use 0 on success or return the PAR on
failure.

The callers are modified to reflect the change of the return value.

Note that with the change in gvirt_to_maddr, ma needs to be initialized
to avoid GCC been confused (i.e value may be uninitialized) with the new
construction.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/mm: fix race condition in modify_xen_mappings()
Yu Zhang [Tue, 14 Nov 2017 16:11:26 +0000 (17:11 +0100)]
x86/mm: fix race condition in modify_xen_mappings()

In modify_xen_mappings(), a L1/L2 page table shall be freed,
if all entries of this page table are empty. Corresponding
L2/L3 PTE will need be cleared in such scenario.

However, concurrent paging structure modifications on different
CPUs may cause the L2/L3 PTEs to be already be cleared or set
to reference a superpage.

Therefore the logic to enumerate the L1/L2 page table and to
reset the corresponding L2/L3 PTE need to be protected with
spinlock. And the _PAGE_PRESENT and _PAGE_PSE flags need be
checked after the lock is obtained.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/mm: fix race conditions in map_pages_to_xen()
Min He [Tue, 14 Nov 2017 16:10:56 +0000 (17:10 +0100)]
x86/mm: fix race conditions in map_pages_to_xen()

In map_pages_to_xen(), a L2 page table entry may be reset to point to
a superpage, and its corresponding L1 page table need be freed in such
scenario, when these L1 page table entries are mapping to consecutive
page frames and having the same mapping flags.

However, variable `pl1e` is not protected by the lock before L1 page table
is enumerated. A race condition may happen if this code path is invoked
simultaneously on different CPUs.

For example, `pl1e` value on CPU0 may hold an obsolete value, pointing
to a page which has just been freed on CPU1. Besides, before this page
is reused, it will still be holding the old PTEs, referencing consecutive
page frames. Consequently the `free_xen_pagetable(l2e_to_l1e(ol2e))` will
be triggered on CPU0, resulting the unexpected free of a normal page.

This patch fixes the above problem by protecting the `pl1e` with the lock.

Also, there're other potential race conditions. For instance, the L2/L3
entry may be modified concurrently on different CPUs, by routines such as
map_pages_to_xen(), modify_xen_mappings() etc. To fix this, this patch will
check the _PAGE_PRESENT and _PAGE_PSE flags, after the spinlock is obtained,
for the corresponding L2/L3 entry.

Signed-off-by: Min He <min.he@intel.com>
Signed-off-by: Yi Zhang <yi.z.zhang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86/hvm: do not register hpet mmio during s3 cycle
Eric Chanudet [Tue, 14 Nov 2017 16:09:50 +0000 (17:09 +0100)]
x86/hvm: do not register hpet mmio during s3 cycle

Do it once at domain creation (hpet_init).

Sleep -> Resume cycles will end up crashing an HVM guest with hpet as
the sequence during resume takes the path:
-> hvm_s3_suspend
  -> hpet_reset
    -> hpet_deinit
    -> hpet_init
      -> register_mmio_handler
        -> hvm_next_io_handler

register_mmio_handler will use a new io handler each time, until
eventually it reaches NR_IO_HANDLERS, then hvm_next_io_handler calls
domain_crash.

Signed-off-by: Eric Chanudet <chanudete@ainfosec.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agotools/xenstored: Check number of strings passed to do_control() 4.10.0-rc5
Pawel Wieczorkiewicz [Fri, 27 Oct 2017 16:32:15 +0000 (16:32 +0000)]
tools/xenstored: Check number of strings passed to do_control()

It is possible to send a zero-string message body to xenstore's
XS_CONTROL handling function. Then the number of strings is used
for an array allocation. This leads to a crash in strcmp() in a
CONTROL sub-command invocation loop.
The output of xs_count_string() should be verified and all 0 or
negative values should be rejected with an EINVAL. At least the
sub-command name must be specified.

The xenstore crash can only be triggered from within dom0 (there
is a check in do_control() rejecting all non-dom0 requests with
an EACCES).

Testing: reproduced with the following command:
python -c 'print 16*"\x00"' | nc -U $XENSTORED_RUNDIR/socket

Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Martin Pohlack <mpohlack@amazon.de>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agolibxl: Fix the bug introduced in commit "libxl: use correct type modifier for vuart_gfn"
Bhupinder Thakur [Tue, 31 Oct 2017 06:55:05 +0000 (12:25 +0530)]
libxl: Fix the bug introduced in commit "libxl: use correct type modifier for vuart_gfn"

In libxl__device_vuart_add vuart_gfn is getting stored as a hex value:

> flexarray_append(ro_front, GCSPRINTF("%"PRI_xen_pfn, state->vuart_gfn));

However, xenstore reads this value as a decimal value and tries to map the
wrong address and fails.

This patch introduces a new format specifier "PRIu_xen_pfn" which formats the value as a
decimal value.

Signed-off-by: Bhupinder Thakur <bhupinder.thakur@linaro.org>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agolibs/evtchn: Remove active handler on clean-up or failure
Julien Grall [Fri, 10 Nov 2017 17:10:50 +0000 (17:10 +0000)]
libs/evtchn: Remove active handler on clean-up or failure

Commit 89d55473ed16543044a31d1e0d4660cf5a3f49df "xentoolcore_restrict_all:
Implement for libxenevtchn" added a call to register allowing to
restrict the event channel.

However, the call to deregister the handler was not performed if open
failed or when closing the event channel. This will result to corrupt
the list of handlers and potentially crash the application later one.

Fix it by calling xentoolcore_deregister_active_handle on failure and
closure.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoConfig.mk: Update QEMU changeset
Anthony PERARD [Mon, 13 Nov 2017 12:27:32 +0000 (12:27 +0000)]
Config.mk: Update QEMU changeset

New commits:
- xen/pt: allow QEMU to request MSI unmasking at bind time
To fix a passthrough bug.
- ui/gtk: Fix deprecation of vte_terminal_copy_clipboard
A build fix.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Julien Grall <julien.grall@arm.com>
7 years agodocs: rename hvmlite.markdown to pvh.markdown
Wei Liu [Sun, 12 Nov 2017 11:03:06 +0000 (11:03 +0000)]
docs: rename hvmlite.markdown to pvh.markdown

And remove stale paragraph and escape underscores.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agolibevtchn: fix build on non-Linux hosts
Roger Pau Monne [Wed, 8 Nov 2017 12:52:57 +0000 (12:52 +0000)]
libevtchn: fix build on non-Linux hosts

Non-Linux hosts (where osdep_evtchn_restrict is not yet supported)
made use of errno without including errno.h, fix this by including the
header.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agogcov: return EOPNOTSUPP for unimplemented gcov sysctl
Roger Pau Monné [Wed, 8 Nov 2017 12:41:51 +0000 (13:41 +0100)]
gcov: return EOPNOTSUPP for unimplemented gcov sysctl

ENOSYS should only be used by unimplemented top-level syscalls. Use
EOPNOTSUPP instead.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>