]> xenbits.xensource.com Git - xen.git/log
xen.git
7 years agokconfig/gcov: remove gcc version choice from kconfig
Roger Pau Monne [Thu, 9 Nov 2017 11:16:00 +0000 (12:16 +0100)]
kconfig/gcov: remove gcc version choice from kconfig

Use autodetect only.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoVMX: drop bogus gpa parameter from __invept()
Jan Beulich [Fri, 15 Dec 2017 10:18:06 +0000 (11:18 +0100)]
VMX: drop bogus gpa parameter from __invept()

Perhaps there once was a plan to have a flush type requiring this, but
the current SDM has no mention of such and all callers pass zero anyway.

Take the opportunity and also change involved types to uint64_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agodomctl: improve locking during domain destruction
Jan Beulich [Fri, 15 Dec 2017 10:17:19 +0000 (11:17 +0100)]
domctl: improve locking during domain destruction

There is no need to hold the global domctl lock across domain_kill() -
the domain lock is fully sufficient here, and parallel cleanup after
multiple domains performs quite a bit better this way.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: make _get_page_type() a proper counterpart of _put_page_type() again
Jan Beulich [Fri, 15 Dec 2017 10:16:32 +0000 (11:16 +0100)]
x86: make _get_page_type() a proper counterpart of _put_page_type() again

Drop one of the leading underscores and use bool for its "preemptible"
parameter.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: use switch() in _put_page_type()
Jan Beulich [Fri, 15 Dec 2017 10:15:54 +0000 (11:15 +0100)]
x86: use switch() in _put_page_type()

Use this to cheaply add another assertion.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: improve _put_page_type() readability
Jan Beulich [Fri, 15 Dec 2017 10:15:16 +0000 (11:15 +0100)]
x86: improve _put_page_type() readability

By limiting the scope of rc it is more obvious that failure can be
reported only if _put_final_page_type() failed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citix.com>
7 years agox86: remove _PAGE_PSE check from get_page_from_l2e()
Jan Beulich [Fri, 15 Dec 2017 10:14:31 +0000 (11:14 +0100)]
x86: remove _PAGE_PSE check from get_page_from_l2e()

With L2_DISALLOW_MASK containing _PAGE_PSE unconditionally as of commit
56fff3e5e9 ("x86: nuke PV superpage option and code") there's no point
anymore in separately checking for the bit.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: make get_page_from_mfn() return struct page_info *
Jan Beulich [Fri, 15 Dec 2017 10:13:49 +0000 (11:13 +0100)]
x86: make get_page_from_mfn() return struct page_info *

Almost all users of it want it, and it calculates it anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/HVM: fix hvmemul_rep_outs_set_context()
Jan Beulich [Fri, 15 Dec 2017 10:11:36 +0000 (11:11 +0100)]
x86/HVM: fix hvmemul_rep_outs_set_context()

There were two issues with this function: Its use of
hvmemul_do_pio_buffer() was wrong (the function deals only with
individual port accesses, not repeated ones, i.e. passing it
"*reps * bytes_per_rep" does not have the intended effect). And it
could have processed a larger set of operations in one go than was
probably intended (limited just by the size that xmalloc() can hand
back).

By converting to proper use of hvmemul_do_pio_buffer(), no intermediate
buffer is needed at all. As a result a preemption check is being added.

Also drop unused parameters from the function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agox86: implement data structure and CPU init flow for MBA
Yi Sun [Fri, 20 Oct 2017 08:50:00 +0000 (10:50 +0200)]
x86: implement data structure and CPU init flow for MBA

This patch implements main data structures of MBA.

Like CAT features, MBA HW info has cos_max which means the max thrtl
register number, and thrtl_max which means the max throttle value
(delay value). It also has a flag to represent if the throttle
value is linear or non-linear.

One thrtl register of MBA stores a throttle value for one or more
domains. The throttle value means the delay applied to traffic between
L2 cache and next cache level.

This patch also implements init flow for MBA and register stub
callback functions.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: a few optimizations to psr codes
Yi Sun [Fri, 20 Oct 2017 08:50:00 +0000 (10:50 +0200)]
x86: a few optimizations to psr codes

This patch refines psr codes:
1. Change type of 'cat_init_feature' to 'bool' to remove the pointless
   returning of error code.
2. Move printk in 'cat_init_feature' to reduce a return path.
3. Define a local variable 'feat_mask' in 'psr_cpu_init' to reduce calling of
   'cpuid_count_leaf()'.
4. Change 'PSR_INFO_IDX_CAT_FLAG' to 'PSR_INFO_IDX_CAT_FLAGS'.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86: rename 'cbm_type' to 'psr_type' to make it general
Yi Sun [Fri, 20 Oct 2017 08:50:00 +0000 (10:50 +0200)]
x86: rename 'cbm_type' to 'psr_type' to make it general

This patch renames 'cbm_type' to 'psr_type' to generalize it.
Then, we can reuse this for all psr allocation features.

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoRename PSR sysctl/domctl interfaces and xsm policy to make them be general
Yi Sun [Tue, 24 Oct 2017 09:33:00 +0000 (11:33 +0200)]
Rename PSR sysctl/domctl interfaces and xsm policy to make them be general

This patch renames PSR sysctl/domctl interfaces and related xsm policy to
make them be general for all resource allocation features but not only
for CAT. Then, we can resuse the interfaces for all allocation features.

Basically, it changes 'psr_cat_op' to 'psr_alloc', and remove 'CAT_' from some
macros. E.g.:
1. psr_cat_op -> psr_alloc
2. XEN_DOMCTL_psr_cat_op -> XEN_DOMCTL_psr_alloc
3. XEN_SYSCTL_psr_cat_op -> XEN_SYSCTL_psr_alloc
4. XEN_DOMCTL_PSR_CAT_SET_L3_CBM -> XEN_DOMCTL_PSR_SET_L3_CBM
5. XEN_SYSCTL_PSR_CAT_get_l3_info -> XEN_SYSCTL_PSR_get_l3_info

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agodocs: create Memory Bandwidth Allocation (MBA) feature document
Yi Sun [Fri, 20 Oct 2017 08:50:00 +0000 (10:50 +0200)]
docs: create Memory Bandwidth Allocation (MBA) feature document

This patch creates MBA feature document in doc/features/. It describes
key points to implement MBA which is described in details in Intel SDM
"Introduction to Memory Bandwidth Allocation".

Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/vmx: Don't use hvm_inject_hw_exception() in long_mode_do_msr_write()
Andrew Cooper [Wed, 6 Dec 2017 17:46:20 +0000 (17:46 +0000)]
x86/vmx: Don't use hvm_inject_hw_exception() in long_mode_do_msr_write()

Since c/s 49de10f3c1718 "x86/hvm: Don't raise #GP behind the emulators back
for MSR accesses", returning X86EMUL_EXCEPTION has pushed the exception
generation to the top of the call tree.

Using hvm_inject_hw_exception() and returning X86EMUL_EXCEPTION causes a
double #GP injection, which combines to #DF.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/efer: Make {read,write}_efer() into inline helpers
Andrew Cooper [Mon, 23 Oct 2017 09:49:33 +0000 (10:49 +0100)]
x86/efer: Make {read,write}_efer() into inline helpers

There is no need for the overhead of a call to a separate translation unit.
While moving the implementation, update them to use uint64_t over u64

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/domctl: Avoid redundant zeroing in XEN_DOMCTL_get_vcpu_msrs
Andrew Cooper [Fri, 1 Dec 2017 13:16:12 +0000 (13:16 +0000)]
x86/domctl: Avoid redundant zeroing in XEN_DOMCTL_get_vcpu_msrs

Zero the msr structure once at initialisation time, and avoid re-zeroing the
reserved field every time the structure is used.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/efi: Fix build with clang-5.0
Andrew Cooper [Wed, 13 Dec 2017 16:55:38 +0000 (16:55 +0000)]
xen/efi: Fix build with clang-5.0

The clang-5.0 build is reliably failing with:

  Error: size of boot.o:.text is 0x01

which is because efi_arch_flush_dcache_area() exists as a single ret
instruction.  Mark it as __init like everything else in the files.

Spotted by Travis.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/microcode: Add support for fam17h microcode loading
Tom Lendacky [Thu, 30 Nov 2017 22:46:40 +0000 (16:46 -0600)]
x86/microcode: Add support for fam17h microcode loading

The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes.  Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[Linux commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf]

Ported to Xen.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/intel: Drop zeroed-out select_idle_routine() function
Andrew Cooper [Wed, 6 Dec 2017 18:44:15 +0000 (18:44 +0000)]
x86/intel: Drop zeroed-out select_idle_routine() function

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/arm: traps: Merge do_trap_instr_abort_guest and do_trap_data_abort_guest
Julien Grall [Tue, 12 Dec 2017 19:02:12 +0000 (19:02 +0000)]
xen/arm: traps: Merge do_trap_instr_abort_guest and do_trap_data_abort_guest

The two helpers do_trap_instr_abort_guest and do_trap_data_abort_guest
are used trap stage-2 abort. While the former is only handling prefetch
abort and the latter data abort, they are very similarly and does not
warrant to have separate helpers.

For instance, merging the both will make easier to maintain stage-2 abort
handling. So consolidate the two helpers in a new helper
do_trap_stage2_abort.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: traps: Move the definition of mmio_info_t in try_handle_mmio
Julien Grall [Tue, 12 Dec 2017 19:02:11 +0000 (19:02 +0000)]
xen/arm: traps: Move the definition of mmio_info_t in try_handle_mmio

mmio_info_t is currently filled by do_trap_data_guest_abort but only
important when emulation an MMIO region.

A follow-up patch will merge stage-2 prefetch abort and stage-2 data abort
in a single helper. To prepare that, mmio_info_t is now filled by
try_handle_mmio.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org.
7 years agoxen/arm: traps: Remove the field gva from mmio_info_t
Julien Grall [Tue, 12 Dec 2017 19:02:10 +0000 (19:02 +0000)]
xen/arm: traps: Remove the field gva from mmio_info_t

mmio_info_t is used to gather information in order do emulation of a
region. Guest virtual address is unlikely to be a useful information and
not currently used. So remove the field gva from mmio_info_t and replace
by a local variable.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: p2m: Fold p2m_tlb_flush into p2m_force_tlb_flush_sync
Julien Grall [Tue, 12 Dec 2017 19:02:09 +0000 (19:02 +0000)]
xen/arm: p2m: Fold p2m_tlb_flush into p2m_force_tlb_flush_sync

p2m_tlb_flush is called in 2 places: p2m_alloc_table and
p2m_force_tlb_flush_sync.

p2m_alloc_table is called when the domain is initialized and could be
replace by a call to p2m_force_tlb_flush_sync with the P2M write locked.

This seems a bit pointless but would allow to have a single API for
flushing and avoid misusage in the P2M code.

So update p2m_alloc_table to use p2m_force_tlb_flush_sync and fold
p2m_tlb_flush in p2m_force_tlb_flush_sync.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: p2m: Introduce p2m_tlb_flush_sync, export it and use it
Julien Grall [Tue, 12 Dec 2017 19:02:08 +0000 (19:02 +0000)]
xen/arm: p2m: Introduce p2m_tlb_flush_sync, export it and use it

Multiple places in the code requires to flush the TLBs only when
p2m->need_flush is set.

Rather than open-coding it, introduce a new helper p2m_tlb_flush_sync to
do it.

Note that p2m_tlb_flush_sync is exported as it might be used by other
part of Xen.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: p2m: Rename p2m_flush_tlb and p2m_flush_tlb_sync
Julien Grall [Tue, 12 Dec 2017 19:02:07 +0000 (19:02 +0000)]
xen/arm: p2m: Rename p2m_flush_tlb and p2m_flush_tlb_sync

Rename p2m_flush_tlb and p2m_flush_tlb_sync to respectively
p2m_tlb_flush and p2m_force_tlb_flush_sync.

At first glance, inverting 'flush' and 'tlb'  might seem pointless but
would be helpful in the future in order to get more easily some code ported
from x86 P2M or even to shared with.

For p2m_flush_tlb_sync, the 'force' was added because the TLBs are
flush unconditionally. A follow-up patch will add an helper to flush
TLBs only in certain cases.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain_build: Use copy_to_guest_phys_flush_dcache in dtb_load
Julien Grall [Tue, 12 Dec 2017 19:02:06 +0000 (19:02 +0000)]
xen/arm: domain_build: Use copy_to_guest_phys_flush_dcache in dtb_load

The function dtb_load is dealing with IPA but uses gvirt_to_maddr to do
the translation. This is currently working fine because the stage-1 MMU
is disabled.

Rather than relying on such assumption, use the new
copy_to_guest_phys_flush_dcache. This also result to a slightly more
comprehensible code.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain_build: Rework initrd_load to use the generic copy helper
Julien Grall [Tue, 12 Dec 2017 19:02:05 +0000 (19:02 +0000)]
xen/arm: domain_build: Rework initrd_load to use the generic copy helper

The function initrd_load is dealing with IPA but uses gvirt_to_maddr to
do the translation. This is currently working fine because the stage-1 MMU
is disabled.

Furthermore, the function is implementing its own copy to guest resulting
in code duplication and making more difficult to update the logic in
page-tables (such support for Populate On Demand).

The new copy_to_guest_phys_flush_dcache could be used here by temporarily
mapping the full initrd in the virtual space.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: kernel: Rework kernel_zimage_load to use the generic copy helper
Julien Grall [Tue, 12 Dec 2017 19:02:04 +0000 (19:02 +0000)]
xen/arm: kernel: Rework kernel_zimage_load to use the generic copy helper

The function kernel_zimage is dealing with IPA but uses gvirt_to_maddr to
do the translation. This is currently working fine because the stage-1 MMU
is disabled.

Furthermore, the function is implementing its own copy to guest resulting
in code duplication and making more difficult to update the logic in
page-tables (such support for Populate On Demand).

The new copy_to_guest_phys_flush_dcache could be used here by
temporarily mapping the full kernel in the virtual space.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Introduce copy_to_guest_phys_flush_dcache
Julien Grall [Tue, 12 Dec 2017 19:02:03 +0000 (19:02 +0000)]
xen/arm: Introduce copy_to_guest_phys_flush_dcache

This new function will be used in a follow-up patch to copy data to the guest
using the IPA (aka guest physical address) and then clean the cache.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Extend copy_to_guest to support copying from/to guest physical address
Julien Grall [Tue, 12 Dec 2017 19:02:02 +0000 (19:02 +0000)]
xen/arm: Extend copy_to_guest to support copying from/to guest physical address

The only differences between copy_to_guest and access_guest_memory_by_ipa are:
    - The latter does not support copying data crossing page boundary
    - The former is copying from/to guest VA whilst the latter from
    guest PA

copy_to_guest can easily be extended to support copying from/to guest
physical address. For that a new bit is used to tell whether linear
address or ipa is been used.

Lastly access_guest_memory_by_ipa is reimplemented using copy_to_guest.
This also has the benefits to extend the use of it, it is now possible
to copy data crossing page boundary.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: guest_copy: Extend the prototype to pass the vCPU
Julien Grall [Tue, 12 Dec 2017 19:02:01 +0000 (19:02 +0000)]
xen/arm: guest_copy: Extend the prototype to pass the vCPU

Currently, guest_copy assumes the copy will only be done for the current
vCPU. copy_guest is meant to be vCPU agnostic, so extend the prototype
to pass the vCPU.

At the same time, encapsulate the vCPU in an union to allow extension
for copying from a guest domain (ipa case) in the future.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Extend copy_to_guest to support zeroing guest VA and use it
Julien Grall [Tue, 12 Dec 2017 19:02:00 +0000 (19:02 +0000)]
xen/arm: Extend copy_to_guest to support zeroing guest VA and use it

The function copy_to_guest can easily be extended to support zeroing
guest VA. To avoid using a new bit, it is considered that a NULL buffer
(i.e buf == NULL) means the guest memory will be zeroed.

Lastly, reimplement raw_clear_guest using copy_to_guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Extend copy_to_guest to support copying from guest VA and use it
Julien Grall [Tue, 12 Dec 2017 19:01:59 +0000 (19:01 +0000)]
xen/arm: Extend copy_to_guest to support copying from guest VA and use it

The only differences between copy_to_guest (formerly called
raw_copy_to_guest_helper) and raw_copy_from_guest is:
    - The direction of the memcpy
    - The permission use for translating the address

Extend copy_to_guest to support copying from guest VA by adding using a
bit in the flags to tell the direction of the copy.

Lastly, reimplement raw_copy_from_guest using copy_to_guest.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: raw_copy_to_guest_helper: Rework the prototype and rename it
Julien Grall [Tue, 12 Dec 2017 19:01:58 +0000 (19:01 +0000)]
xen/arm: raw_copy_to_guest_helper: Rework the prototype and rename it

All the helpers within arch/arm/guestcopy.c are doing the same things:
copy data from/to the guest.

At the moment, the logic is duplicated in each helpers making more
difficult to implement new variant.

The first step for the consolidation is to get a common prototype and a
base. For convenience (it is at the beginning of the file!),
raw_copy_to_guest_helper is chosen.

The function is now renamed copy_guest to show it will be a
generic function to copy data from/to the guest. Note that for now, only
copying to guest virtual address is supported. Follow-up patches will
extend the support.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: raw_copy_to_guest_helper: Rename flush_dcache to flags
Julien Grall [Tue, 12 Dec 2017 19:01:57 +0000 (19:01 +0000)]
xen/arm: raw_copy_to_guest_helper: Rename flush_dcache to flags

In a follow-up patch, it will be necessary to pass more flags to the
function.

Rename flush_dcache to flags and introduce a define to tell whether the
cache needs to be flushed after the copy.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86/mm: drop bogus paging mode assertion
Jan Beulich [Tue, 12 Dec 2017 15:56:15 +0000 (16:56 +0100)]
x86/mm: drop bogus paging mode assertion

Olaf has observed this assertion to trigger after an aborted migration
of a PV guest:

(XEN) Xen call trace:
(XEN)    [<ffff82d0802a85dc>] do_page_fault+0x39f/0x55c
(XEN)    [<ffff82d08036b7d8>] x86_64/entry.S#handle_exception_saved+0x66/0xa4
(XEN)    [<ffff82d0802a9274>] __copy_to_user_ll+0x22/0x30
(XEN)    [<ffff82d0802772d4>] update_runstate_area+0x19c/0x228
(XEN)    [<ffff82d080277371>] domain.c#_update_runstate_area+0x11/0x39
(XEN)    [<ffff82d080277596>] context_switch+0x1fd/0xf25
(XEN)    [<ffff82d0802395c5>] schedule.c#schedule+0x303/0x6a8
(XEN)    [<ffff82d08023d067>] softirq.c#__do_softirq+0x6c/0x95
(XEN)    [<ffff82d08023d0da>] do_softirq+0x13/0x15
(XEN)    [<ffff82d08036b2f1>] x86_64/entry.S#process_softirqs+0x21/0x30

Release builds work fine, which is a first indication that the assertion
isn't really needed.

What's worse though - there appears to be a timing window where the
guest runs in shadow mode, but not in log-dirty mode, and that is what
triggers the assertion (the same could, afaict, be achieved by test-
enabling shadow mode on a PV guest). This is because turing off log-
dirty mode is being performed in two steps: First the log-dirty bit gets
cleared (paging_log_dirty_disable() [having paused the domain] ->
sh_disable_log_dirty() -> shadow_one_bit_disable()), followed by
unpausing the domain and only then clearing shadow mode (via
shadow_test_disable(), which pauses the domain a second time).

Hence besides removing the ASSERT() here (or optionally replacing it by
explicit translate and refcounts mode checks, but this seems rather
pointless now that the three are tied together) I wonder whether either
shadow_one_bit_disable() should turn off shadow mode if no other bit
besides PG_SH_enable remains set (just like shadow_one_bit_enable()
enables it if not already set), or the domain pausing scope should be
extended so that both steps occur without the domain getting a chance to
run in between.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86emul: build SIMD tests with -Os
Jan Beulich [Tue, 12 Dec 2017 13:31:55 +0000 (14:31 +0100)]
x86emul: build SIMD tests with -Os

Specifically in the context of putting together subsequent patches I've
noticed that together with the touch() macro using -Os further
increases the chances of the compiler using memory operands for the
instructions we actually care to test.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mb2: avoid Xen image when looking for module/crashkernel position
Daniel Kiper [Tue, 12 Dec 2017 13:30:53 +0000 (14:30 +0100)]
x86/mb2: avoid Xen image when looking for module/crashkernel position

Commit e22e1c4 (x86/EFI: avoid Xen image when looking for module/kexec
position) added relevant check for EFI case. However, since commit
f75a304 (x86: add multiboot2 protocol support for relocatable images)
Multiboot2 compatible bootloaders are able to relocate Xen image too.
So, we have to avoid also Xen image region in such cases.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/paging: don't unconditionally BUG() on finding SHARED_M2P_ENTRY
Jan Beulich [Tue, 12 Dec 2017 13:30:17 +0000 (14:30 +0100)]
x86/paging: don't unconditionally BUG() on finding SHARED_M2P_ENTRY

PV guests can fully control the values written into the P2M.

This is XSA-251.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/shadow: fix ref-counting error handling
Jan Beulich [Tue, 12 Dec 2017 13:29:45 +0000 (14:29 +0100)]
x86/shadow: fix ref-counting error handling

The old-Linux handling in shadow_set_l4e() mistakenly ORed together the
results of sh_get_ref() and sh_pin(). As the latter failing is not a
correctness problem, simply ignore its return value.

In sh_set_toplevel_shadow() a failing sh_get_ref() must not be
accompanied by installing the entry, despite the domain being crashed.

This is XSA-250.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/shadow: fix refcount overflow check
Jan Beulich [Tue, 12 Dec 2017 13:29:13 +0000 (14:29 +0100)]
x86/shadow: fix refcount overflow check

Commit c385d27079 ("x86 shadow: for multi-page shadows, explicitly track
the first page") reduced the refcount width to 25, without adjusting the
overflow check. Eliminate the disconnect by using a manifest constant.

Interestingly, up to commit 047782fa01 ("Out-of-sync L1 shadows: OOS
snapshot") the refcount was 27 bits wide, yet the check was already
using 26.

This is XSA-249.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Tim Deegan <tim@xen.org>
7 years agox86/mm: don't wrongly set page ownership
Jan Beulich [Tue, 12 Dec 2017 13:28:36 +0000 (14:28 +0100)]
x86/mm: don't wrongly set page ownership

PV domains can obtain mappings of any pages owned by the correct domain,
including ones that aren't actually assigned as "normal" RAM, but used
by Xen internally.  At the moment such "internal" pages marked as owned
by a guest include pages used to track logdirty bits, as well as p2m
pages and the "unpaged pagetable" for HVM guests. Since the PV memory
management and shadow code conflict in their use of struct page_info
fields, and since shadow code is being used for log-dirty handling for
PV domains, pages coming from the shadow pool must, for PV domains, not
have the domain set as their owner.

While the change could be done conditionally for just the PV case in
shadow code, do it unconditionally (and for consistency also for HAP),
just to be on the safe side.

There's one special case though for shadow code: The page table used for
running a HVM guest in unpaged mode is subject to get_page() (in
set_shadow_status()) and hence must have its owner set.

This is XSA-248.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Tim Deegan <tim@xen.org>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
7 years agox86: don't wrongly trigger linear page table assertion (2)
Jan Beulich [Tue, 12 Dec 2017 13:27:34 +0000 (14:27 +0100)]
x86: don't wrongly trigger linear page table assertion (2)

_put_final_page_type(), when free_page_type() has exited early to allow
for preemption, should not update the time stamp, as the page continues
to retain the typ which is in the process of being unvalidated. I can't
see why the time stamp update was put on that path in the first place
(albeit it may well have been me who had put it there years ago).

This is part of XSA-240.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap.com>
7 years agoxen/arm32: mm: Rework is_xen_heap_page to avoid nameclash
Julien Grall [Wed, 1 Nov 2017 14:03:14 +0000 (14:03 +0000)]
xen/arm32: mm: Rework is_xen_heap_page to avoid nameclash

The arm32 version of the function is_xen_heap_page currently define a
variable _mfn. This will lead to a compiler when use typesafe MFN in a
follow-up patch:

called object '_mfn' is not a function or function pointer

Fix it by renaming the local variable _mfn to mfn_.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: domain_build: Clean-up insert_11_bank
Julien Grall [Wed, 1 Nov 2017 14:03:13 +0000 (14:03 +0000)]
xen/arm: domain_build: Clean-up insert_11_bank

    - Remove spurious ()
    - Add missing spaces
    - Turn 1 << to 1UL <<
    - Rename spfn to smfn and switch to mfn_t

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: VGIC: move gic_remove_irq_from_queues()
Andre Przywara [Thu, 7 Dec 2017 16:14:08 +0000 (16:14 +0000)]
ARM: VGIC: move gic_remove_irq_from_queues()

gic_remove_irq_from_queues() was not only misnamed, it also has the wrong
abstraction, as it should not live in gic.c.
Move it into vgic.c and vgic.h, where it belongs to, and rename it on
the way.

Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: gic-v3: Bail out if gicv3_cpu_init fail
Julien Grall [Wed, 6 Dec 2017 14:51:37 +0000 (14:51 +0000)]
xen/arm: gic-v3: Bail out if gicv3_cpu_init fail

When system registers are not enabled, all the access to them will trap
in EL2. In Xen, system registers will be enabled by gicv3_cpu_init only
on success. As the rest of the code (e.g gicv3_hyp_init) relies on
system register, it is better to bail out directly.

This will save time on debugging early boot issue on GICv3 platform.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Surround HSR_SYSREG macro value with ()
Julien Grall [Wed, 29 Nov 2017 17:46:35 +0000 (17:46 +0000)]
xen/arm: Surround HSR_SYSREG macro value with ()

The value of the macro HCR_SYSREG is not surrounded by (). This means
the behavior may change depend on how it is used.

Thanksfully recent GCC will issue a warning for that.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoARM: vGIC: fix nr_irq definition
Andre Przywara [Thu, 19 Oct 2017 12:48:37 +0000 (13:48 +0100)]
ARM: vGIC: fix nr_irq definition

The global variable "nr_irqs" is used for x86 and some common Xen code.
To make the latter work easily for ARM, it was #defined to NR_IRQS.
This not only violated the common habit of capitalizing macros, but
also caused issues if one wanted to use a rather innocent "nr_irqs" as
a local variable name or as a function parameter.
Drop the optimization and make nr_irqs a normal variable for ARM also.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
7 years agoARM: remove unneeded gic.h inclusions
Andre Przywara [Thu, 19 Oct 2017 12:48:36 +0000 (13:48 +0100)]
ARM: remove unneeded gic.h inclusions

gic.h is supposed to hold defines and prototypes for the hardware side
of the GIC interrupt controller. A lot of parts in Xen should not be
bothered with that, as they either only care about the VGIC or use
more generic interfaces.
Remove unneeded inclusions of gic.h from files where they are actually
not needed.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
7 years agoxen/arm: bootfdt: Use proper default for #address-cells and #size-cells
Julien Grall [Wed, 29 Nov 2017 17:57:32 +0000 (17:57 +0000)]
xen/arm: bootfdt: Use proper default for #address-cells and #size-cells

Per the device-tree specific [1], when the property #address-cells
and  #size-cells are not present, the default value should be resp. 1
and 2.

[1] https://www.devicetree.org/downloads/devicetree-specification-v0.1-20160524.pdf

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm64: head.S: Introduce macro to load the physical address of a symbol
Julien Grall [Thu, 7 Dec 2017 17:18:46 +0000 (17:18 +0000)]
xen/arm64: head.S: Introduce macro to load the physical address of a symbol

A lot of places in the ARM64 assembly code requiring to load the
physical address of a symbol. Rather than open-coding the translation,
introduce a new macro that will load the physical address of a symbol.

Lastly, use this new macro to replace all the current opencoded version.

Note that most of comments associated to the code changed have been
removed because the code is now self-explanatory.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agoxen/arm: Remove unused fixmap slots
Julien Grall [Thu, 7 Dec 2017 17:19:11 +0000 (17:19 +0000)]
xen/arm: Remove unused fixmap slots

There are quite a few fixmap slots that have not been used for a while.
Remove them.

Signed-off-by: Julien Grall <julien.grall@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agox86: rename DIRTY_GS_BASE_USER
Jan Beulich [Thu, 7 Dec 2017 10:10:12 +0000 (11:10 +0100)]
x86: rename DIRTY_GS_BASE_USER

As of commit 91f85280b9 ("x86: fix GS-base-dirty determination") the
USER part of it isn't really appropriate anymore.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agomm: don't use domain_shutdown() when re-offlining a page
Jan Beulich [Thu, 7 Dec 2017 10:09:31 +0000 (11:09 +0100)]
mm: don't use domain_shutdown() when re-offlining a page

It goes all silent, leaving open what has actually caused the crash.
Use domain_crash() instead, which leaves a log message before calling
domain_shutdown(..., SHUTDOWN_crash).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agopdx: correct indentation
Jan Beulich [Thu, 7 Dec 2017 10:08:41 +0000 (11:08 +0100)]
pdx: correct indentation

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/HVM: don't retain emulated insn cache when exiting back to guest
Jan Beulich [Wed, 6 Dec 2017 11:50:23 +0000 (12:50 +0100)]
x86/HVM: don't retain emulated insn cache when exiting back to guest

vio->mmio_retry is being set when a repeated string insn is being split
up. In that case we'll exit to the guest, expecting immediate re-entry.
Interruptions, however, may be serviced by the guest before re-entry
from the repeated string insn. Any emulation needed in the course of
handling the interruption must not fetch from the internally maintained
cache.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agodrop stray .0 from hypervisor version
Jan Beulich [Tue, 5 Dec 2017 16:25:40 +0000 (17:25 +0100)]
drop stray .0 from hypervisor version

7 years agox86: don't ignore foreigndom on L2/L3/L4 page table updates
Jan Beulich [Tue, 5 Dec 2017 16:23:53 +0000 (17:23 +0100)]
x86: don't ignore foreigndom on L2/L3/L4 page table updates

Silently assuming DOMID_SELF is unlikely to be a good idea for page
table updates. For PGT_writable pages, though, it seems better to allow
the writes, so the same check isn't being applied there.

Also add blank lines between the individual case blocks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86: tighten MMU_*PT_UPDATE* check and combine error paths
Jan Beulich [Tue, 5 Dec 2017 16:23:18 +0000 (17:23 +0100)]
x86: tighten MMU_*PT_UPDATE* check and combine error paths

Don't accept anything other than r/w RAM pages as page table pages and
move the paged-out check into the (unlikely) error path following that
check.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/mm: drop yet another relic of translated PV domains from new_guest_cr3()
Jan Beulich [Tue, 5 Dec 2017 16:22:31 +0000 (17:22 +0100)]
x86/mm: drop yet another relic of translated PV domains from new_guest_cr3()

The function can be called for PV domains only, which commit 5a0b9fba92
("x86/mm: drop further relics of translated PV domains") sort of
realized, but not fully.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/HVM: tighten re-issue check in hvmemul_do_io()
Jan Beulich [Tue, 5 Dec 2017 16:18:37 +0000 (17:18 +0100)]
x86/HVM: tighten re-issue check in hvmemul_do_io()

I'm not sure why we had left out the address check in case of indirect
accesses (where "data" holds a guest physical address).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agoXSM/flask: constification of IRQ mapping interfaces
Jan Beulich [Tue, 5 Dec 2017 16:17:57 +0000 (17:17 +0100)]
XSM/flask: constification of IRQ mapping interfaces

This clarifies that the involved structures are read-only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
7 years agox86/MSI: leverage local variables
Jan Beulich [Tue, 5 Dec 2017 16:17:23 +0000 (17:17 +0100)]
x86/MSI: leverage local variables

... instead of using redundant calculations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoefi: use ROUNDUP() macro instead of open code
Daniel Kiper [Tue, 5 Dec 2017 16:16:04 +0000 (17:16 +0100)]
efi: use ROUNDUP() macro instead of open code

Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agognttab: improve GNTTABOP_cache_flush locking
Jan Beulich [Mon, 4 Dec 2017 10:04:18 +0000 (11:04 +0100)]
gnttab: improve GNTTABOP_cache_flush locking

Dropping the lock before returning from grant_map_exists() means handing
possibly stale information back to the caller. Return back the pointer
to the active entry instead, for the caller to release the lock once
done.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agognttab: correct GNTTABOP_cache_flush empty batch handling
Jan Beulich [Mon, 4 Dec 2017 10:03:32 +0000 (11:03 +0100)]
gnttab: correct GNTTABOP_cache_flush empty batch handling

Jann validly points out that with a caller bogusly requesting a zero-
element batch with non-zero high command bits (the ones used for
continuation encoding), the assertion right before the call to
hypercall_create_continuation() would trigger. A similar situation would
arise afaict for non-empty batches with op and/or length zero in every
element.

While we want the former to succeed (as we do elsewhere for similar
no-op requests), the latter can clearly be converted to an error, as
this is a state that can't be the result of a prior operation.

Take the opportunity and also correct the order of argument checks:
We shouldn't accept zero-length elements with unknown bits set in "op".
Also constify cache_flush()'s first parameter.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andre Przywara <andre.przywara@linaro.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
7 years agopci: introduce a type to store a SBDF
Roger Pau Monné [Mon, 4 Dec 2017 10:02:46 +0000 (11:02 +0100)]
pci: introduce a type to store a SBDF

That provides direct access to all the members that constitute a SBDF.
The only function switched to use it is hvm_pci_decode_addr, because
it makes following patches simpler.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/pio: allow internal PIO handlers to return RETRY
Roger Pau Monné [Mon, 4 Dec 2017 10:02:16 +0000 (11:02 +0100)]
x86/pio: allow internal PIO handlers to return RETRY

Fix handle_pio so internal PIO handlers can return X86EMUL_RETRY and
it is properly handled by not advancing the IP.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
7 years agolibelf: allow having HYPERCALL_PAGE entry before VIRT_BASE in __xen_guest section
Gregory Herrero [Mon, 4 Dec 2017 10:01:48 +0000 (11:01 +0100)]
libelf: allow having HYPERCALL_PAGE entry before VIRT_BASE in __xen_guest section

When filling __xen_guest section of a guest, user may define
HYPERCALL_PAGE earlier than VIRT_BASE in the section leading to an
incorrect hypercall page address since an undefined virt_base could be
used to compute hypercall page address.
If there is no VIRT_BASE entry in __xen_guest section, default value of
0 is used for virt_base. Thus, setting hypercall page address to
HYPERCALL_PAGE value is correct in this case too.

Signed-off-by: Gregory Herrero <gregory.herrero@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/physdev: remove redundant code in branch MAP_PIRQ_TYPE_MSI
Zhenzhong Duan [Mon, 4 Dec 2017 10:01:24 +0000 (11:01 +0100)]
x86/physdev: remove redundant code in branch MAP_PIRQ_TYPE_MSI

Same code is already in allocate_and_map_msi_pirq()

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com>
Reviewed-by: Joe Jin <joe.jin@oracle.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/boot: rename send_chr to print_err
David Esler [Mon, 4 Dec 2017 10:00:24 +0000 (11:00 +0100)]
x86/boot: rename send_chr to print_err

The send_chr function sends an entire C-string and not one character and
doesn't necessarily just send it over the serial UART anymore so rename
it to print_err so that its closer in name to what it does.

Signed-off-by: David Esler <drumandstrum@gmail.com>
Reviewed-by: Doug Goldstein <cardoe@cardoe.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
7 years agox86/svm: Add virtual GIF support
Brian Woods [Thu, 16 Nov 2017 22:11:15 +0000 (16:11 -0600)]
x86/svm: Add virtual GIF support

This patch detects and enables Virtual GIF if available.  This allows
a nested hypervisor to perform STGIs and CLGIs without having to be
intercepted by host hypervisor.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/svm: Add virtual GIF feature definition
Brian Woods [Thu, 16 Nov 2017 22:11:14 +0000 (16:11 -0600)]
x86/svm: Add virtual GIF feature definition

Add support for enabling the virtual GIF feature.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/traps: Drop redundant printk() in fatal_trap()
Andrew Cooper [Tue, 28 Nov 2017 18:48:07 +0000 (18:48 +0000)]
x86/traps: Drop redundant printk() in fatal_trap()

show_page_walk() already prints the linear address of the walk, and
show_execution_state() has printed a raw %cr2 value.  This avoids having
two adjacent log lines with identical information.

  (XEN) Faulting linear address: 00000000025ff028
  (XEN) Pagetable walk from 00000000025ff028:
  ...

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/vmx: Drop more PVHv1 remenants
Andrew Cooper [Mon, 20 Nov 2017 13:18:45 +0000 (13:18 +0000)]
x86/vmx: Drop more PVHv1 remenants

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/pvh: Do not add DSDT and FACS to PVH dom0 XSDT
Boris Ostrovsky [Thu, 9 Nov 2017 15:37:53 +0000 (10:37 -0500)]
x86/pvh: Do not add DSDT and FACS to PVH dom0 XSDT

These tables are pointed to from FADT. Adding them will
result in duplicate entries in the guest's tables.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
7 years agox86/vvmx: Remove enum vmx_regs_enc
Euan Harris [Thu, 26 Oct 2017 17:03:11 +0000 (18:03 +0100)]
x86/vvmx: Remove enum vmx_regs_enc

This is the standard register encoding, is not VVMX-specific and is only
used in a couple of places.

Signed-off-by: Euan Harris <euan.harris@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/vvmx: don't enable vmcs shadowing for nested guests
Sergey Dyasli [Mon, 23 Oct 2017 09:33:02 +0000 (10:33 +0100)]
x86/vvmx: don't enable vmcs shadowing for nested guests

Running "./xtf_runner vvmx" in L1 Xen under L0 Xen produces the
following result on H/W with VMCS shadowing:

    Test: vmxon
    Failure in test_vmxon_in_root_cpl0()
      Expected 0x8200000f: VMfailValid(15) VMXON_IN_ROOT
           Got 0x82004400: VMfailValid(17408) <unknown>
    Test result: FAILURE

This happens because SDM allows vmentries with enabled VMCS shadowing
VM-execution control and VMCS link pointer value of ~0ull. But results
of a nested VMREAD are undefined in such cases.

Fix this by not copying the value of VMCS shadowing control from vmcs01
to vmcs02.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/svm: add virtual VMLOAD/VMSAVE support
Brian Woods [Tue, 31 Oct 2017 22:03:08 +0000 (17:03 -0500)]
x86/svm: add virtual VMLOAD/VMSAVE support

On AMD family 17h server processors, there is a feature called virtual
VMLOAD/VMSAVE.  This allows a nested hypervisor to preform a VMLOAD or
VMSAVE without needing to be intercepted by the host hypervisor.
Virtual VMLOAD/VMSAVE requires the host hypervisor to be in long mode
and nested page tables to be enabled.  For more information about it
please see:

AMD64 Architecture Programmer’s Manual Volume 2: System Programming
http://support.amd.com/TechDocs/24593.pdf
Section: VMSAVE and VMLOAD Virtualization (Section 15.33.1)

This patch series adds support to check for and enable the virtual
VMLOAD/VMSAVE features if available.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/svm: add virtual VMLOAD/VMSAVE feature definition
Brian Woods [Tue, 31 Oct 2017 22:03:07 +0000 (17:03 -0500)]
x86/svm: add virtual VMLOAD/VMSAVE feature definition

Adding support for enabling the virtual VMLOAD/VMSAVE feature..

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86/svm: rename lbr control field in vmcb
Brian Woods [Tue, 31 Oct 2017 22:03:06 +0000 (17:03 -0500)]
x86/svm: rename lbr control field in vmcb

Rename the lbr_control field in the vmcb for future/upcoming changes.

Signed-off-by: Brian Woods <brian.woods@amd.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
7 years agox86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()
Haozhong Zhang [Mon, 11 Sep 2017 04:37:43 +0000 (12:37 +0800)]
x86_64/mm: drop redundant MFN to page conventions in cleanup_frame_table()

Replace pdx_to_page(pfn_to_pdx(pfn)) by mfn_to_page(pfn), which is
identical to the former.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agox86/vmx: Don't use rdmsr() to fill HOST_SYSENTER_{CS,EIP}
Andrew Cooper [Fri, 20 Oct 2017 13:56:23 +0000 (14:56 +0100)]
x86/vmx: Don't use rdmsr() to fill HOST_SYSENTER_{CS,EIP}

These are compile-time constants, and don't need to be read back from
hardware.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/vmx: Don't rewrite HOST_TR_SELECTOR on every context switch
Andrew Cooper [Tue, 17 Oct 2017 17:06:23 +0000 (18:06 +0100)]
x86/vmx: Don't rewrite HOST_TR_SELECTOR on every context switch

TSS_ENTRY is a compile time constant, so HOST_TR_SELECTOR can be set up during
VMCS construction and left alone thereafter, rather than rewriting it on every
context switch.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
7 years agox86/pv: Misc improvements to pv_destroy_gdt()
Andrew Cooper [Tue, 3 Oct 2017 18:46:40 +0000 (19:46 +0100)]
x86/pv: Misc improvements to pv_destroy_gdt()

Hoist the l1e_from_pfn(zero_pfn, __PAGE_HYPERVISOR_RO) calculation out of the
loop, and switch the code over to using mfn_t.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Use DIV_ROUND_UP() when converting between GDT entries and frames
Andrew Cooper [Tue, 3 Oct 2017 15:30:54 +0000 (15:30 +0000)]
x86/pv: Use DIV_ROUND_UP() when converting between GDT entries and frames

Also consistently use use nr_frames, rather than mixing nr_pages with a
frames[] array.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Move compat_set_gdt() to be beside do_set_gdt()
Andrew Cooper [Tue, 3 Oct 2017 15:30:01 +0000 (15:30 +0000)]
x86/pv: Move compat_set_gdt() to be beside do_set_gdt()

This also makes the do_update_descriptor() pair of functions adjacent.

Purely code motion; no functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/pv: Factor out the calculation of LDT/GDT descriptor pointers
Andrew Cooper [Fri, 13 Oct 2017 10:55:00 +0000 (10:55 +0000)]
x86/pv: Factor out the calculation of LDT/GDT descriptor pointers

Rather than opencoding it in two places.  While only used in the PV emulation
code, this helper is in principle usable anywhere in the hypervisor.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/pv: Construct d0v0's GDT properly
Andrew Cooper [Mon, 16 Oct 2017 13:20:07 +0000 (13:20 +0000)]
xen/pv: Construct d0v0's GDT properly

c/s cf6d39f8199 "x86/PV: properly populate descriptor tables" changed the GDT
to reference zero_page for intermediate frames between the guest and Xen
frames.

Because dom0_construct_pv() doesn't call arch_set_info_guest(), some bits of
initialisation are missed, including the pv_destroy_gdt() which initially
fills the references to zero_page.

In practice, this means there is a window between starting and the first call
to HYPERCALL_set_gdt() were lar/lsl/verr/verw suffer non-architectural
behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
This probably wants backporting to Xen 4.7 and later.

7 years agox86/ldt: Alter how invalidate_shadow_ldt() deals with TLB flushes
Andrew Cooper [Mon, 2 Oct 2017 14:13:38 +0000 (14:13 +0000)]
x86/ldt: Alter how invalidate_shadow_ldt() deals with TLB flushes

Modify invalidate_shadow_ldt() to return a boolean indicating whether mappings
have been dropped, rather than taking a flush parameter.  Tweak the internal
logic to be able to ASSERT() that v->arch.pv_vcpu.shadow_ldt_mapcnt matches
the number of PTEs removed.

This allows MMUEXTOP_SET_LDT to avoid a local TLB flush if no LDT entries had
been faulted in to begin with.

Finally, correct a comment in __get_page_type().  Under no circumstance is it
safe to forgo the TLB shootdown for GDT/LDT pages, as that would allow one
vcpu to gain a writeable mapping to a frame still mapped as a GDT/LDT by
another vcpu.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agoxen/x86: Introduce static inline wrappers for l{idt,gdt,ldt,tr}()
Andrew Cooper [Mon, 2 Oct 2017 13:58:17 +0000 (13:58 +0000)]
xen/x86: Introduce static inline wrappers for l{idt,gdt,ldt,tr}()

This avoids indirection and parameter constraint issues.  Doing so relaxes the
load_LDT() constraints from %ax to any general purpose register.  The helpers
are upgraded to full compiler barriers, because nothing good will come of
having these reordered with respect to other segment accesses.

The triple-fault reboot method stays as is, to avoid the int3 possibly getting
moved relative to the lidt.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
7 years agox86/smp: Rework cpu_smpboot_alloc() to cope with more than just -ENOMEM
Andrew Cooper [Mon, 2 Oct 2017 13:50:05 +0000 (13:50 +0000)]
x86/smp: Rework cpu_smpboot_alloc() to cope with more than just -ENOMEM

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
7 years agox86/hvm: fix interaction between internal and external emulation
Paul Durrant [Tue, 28 Nov 2017 14:05:19 +0000 (14:05 +0000)]
x86/hvm: fix interaction between internal and external emulation

A call to handle_hvm_io_completion() is needed for completing I/O
that requires external emulation. Such completion should be requested when
hvm_vcpu_io_need_completion() returns true after hvm_emulate_once() has
completed. This is indicative of the underlying I/O emulation having
returned X86EMUL_RETRY and hence a re-emulation of the instruction is
needed to pick up the result of the I/O.

A call to handle_hvm_io_completion() is NOT needed when the underlying
I/O has not returned X86EMUL_RETRY since there will be no result to pick
up. Hence it bogus to request such completion when mmio_retry is set,
since this can only happen if the underlying I/O emulation has returned
X86EMUL_OKAY (meaning the I/O has completed successfully).

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agox86: Avoid corruption on migrate for vcpus using CPUID Faulting
Andrew Cooper [Sat, 25 Nov 2017 15:17:14 +0000 (15:17 +0000)]
x86: Avoid corruption on migrate for vcpus using CPUID Faulting

Xen 4.8 and later virtualises CPUID Faulting support for guests.  However, the
value of MSR_MISC_FEATURES_ENABLES is omitted from the vcpu state, meaning
that the current cpuid faulting setting is lost on migrate/suspend/resume.

Instead of following the MSR status quo, take the opportunity to make the
logic more generic, and in particular, trivial to extend for future MSRs.

This is done by discarding the notion of optional MSRs, and requiring the
toolstack to be prepared to move all of the MSRs, although only a subset will
typically need to move.

This allows for the use of guest_{rd,wr}msr() alone to evaluate whether an MSR
needs moving.  This is a benefit because it means there is a single piece of
logic responsible for evaluating whether a guest can use an MSR, and which
values are acceptable.

One small adjustment to guest_wrmsr() is required to cope with being called in
toolstack context.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoREADME, Makefiles, Config.mk: Update for branching 4.10 vs 4.11-unstable
Ian Jackson [Fri, 1 Dec 2017 15:06:11 +0000 (15:06 +0000)]
README, Makefiles, Config.mk: Update for branching 4.10 vs 4.11-unstable

Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com>
7 years agoRevert "xen/arm: domain_builder: irq sanity check logic fix" 4.10.0-rc7
Andrew Cooper [Wed, 29 Nov 2017 11:45:02 +0000 (11:45 +0000)]
Revert "xen/arm: domain_builder: irq sanity check logic fix"

This reverts commit 11e7dd958de73a45645bd40d82280660bd2c9ee8.

It breaks boot on ARM.

Reported-by: Julien Grall <julien.grall@linaro.org>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
7 years agoxen/arm: domain_builder: irq sanity check logic fix
Stewart Hildebrand [Tue, 28 Nov 2017 14:42:03 +0000 (14:42 +0000)]
xen/arm: domain_builder: irq sanity check logic fix

It's not possible for an irq to be both below 16 and greater/equal than 32.
Also fix the reference to linux documentation while we're at it.

Signed-off-by: Stewart Hildebrand <stewart.hildebrand@dornerworks.com>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Release-acked-by: Julien Grall <julien.grall@linaro.org>
7 years agoarm64: ITS: fix cacheability adjustment
Andre Przywara [Thu, 16 Nov 2017 12:02:35 +0000 (12:02 +0000)]
arm64: ITS: fix cacheability adjustment

If the host GICv3 redistributor reports that the pending table cannot
use shareable memory, we try to drop the cacheability attributes as
well. However we fail horribly in doing computer science 101 bit
masking, effectively clearing the whole register instead of just a few
bits.
Fix this by removing the one redundant masking operation and adding the
magic negation for the actually needed other operation.

Reported-by: Manish Jaggi <manish.jaggi@linaro.org>
Signed-off-by: Andre Przywara <andre.przywara@linaro.org>
Reviewed-by: Julien Grall <julien.grall@linaro.org>
Release-Acked-by: Julien Grall <julien.grall@linaro.org>