]> xenbits.xensource.com Git - people/dwmw2/xen.git/log
people/dwmw2/xen.git
5 years agox86/boot: Rationalise stack handling during early boot
Andrew Cooper [Wed, 8 Jan 2020 13:36:42 +0000 (13:36 +0000)]
x86/boot: Rationalise stack handling during early boot

The top (numerically higher addresses) of cpu0_stack[] contains the BSP's
cpu_info block.  Logic in Xen expects this to be initialised to 0, but this
area of stack is also used during early boot.

Update the head.S code to avoid using the cpu_info block.  Additionally,
update the stack_start variable to match, which avoids __high_start() and
efi_arch_post_exit_boot() needing to make the adjustment manually.

Finally, leave a big warning by the BIOS BSS initialisation, because it is by
no means obvious that the stack doesn't survive the REP STOS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/MCE: correct struct mcinfo_extended for compat guests
Jan Beulich [Thu, 9 Jan 2020 10:09:02 +0000 (11:09 +0100)]
x86/MCE: correct struct mcinfo_extended for compat guests

The use of any kind of pointers in the public interface is wrong,
including dimensioning arrays based on the size of pointers. The least
bad option of addressing the issue looks to be to pin down the number
that the (64-bit) hypervisor has used anyway (even when passing
information to compat but privileged guests). There aren't actual
instantiations of the structure apart from ones allocated dynamically
out of struct mc_info's mi_data[], which is entirely controlled by the
hypervisor.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/MCE: avoid leaking stack data
Jan Beulich [Thu, 9 Jan 2020 10:08:29 +0000 (11:08 +0100)]
x86/MCE: avoid leaking stack data

While HYPERVISOR_mca is a privileged operation, we still shouldn't leak
stack contents (the tail of every array entry's mc_msrvalues[] of
XEN_MC_physcpuinfo output). Simply use a zeroing allocation here.

Take the occasion and also restrict the involved local variable's scope.

Reported-by: Ilja Van Sprundel <ivansprundel@ioactive.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: clear per cpu stub page information in cpu_smpboot_free()
Juergen Gross [Thu, 9 Jan 2020 10:07:38 +0000 (11:07 +0100)]
x86: clear per cpu stub page information in cpu_smpboot_free()

cpu_smpboot_free() removes the stubs for the cpu going offline, but it
isn't clearing the related percpu variables. This will result in
crashes when a stub page is released due to all related cpus gone
offline and one of those cpus going online later.

Fix that by clearing stubs.addr and stubs.mfn in order to allocate a
new stub page when needed, irrespective of whether the CPU gets parked
or removed.

Fixes: 2e6c8f182c9c50 ("x86: distinguish CPU offlining from CPU removal")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Tao Xu <tao3.xu@intel.com>
5 years agox86/boot: Simplify BSS zeroing
Andrew Cooper [Wed, 8 Jan 2020 13:11:13 +0000 (13:11 +0000)]
x86/boot: Simplify BSS zeroing

There is no need to load a non-flat %es to zero the BSS.  Use sym_esi()
instead, which is easier to follow, faster (avoids two segment loads) and
doesn't require use of the stack.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Map the trampoline as read-only
Andrew Cooper [Mon, 6 Jan 2020 13:36:30 +0000 (13:36 +0000)]
x86/boot: Map the trampoline as read-only

c/s ec92fcd1d08, which caused the trampoline GDT Access bits to be set,
removed the final writes which occurred between enabling paging and switching
to the high mappings.  There don't plausibly need to be any memory writes in
few instructions is takes to perform this transition.

As a consequence, we can remove the RWX mapping of the trampoline.  It is RX
via its identity mapping below 1M, and RW via the directmap.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Check for E820_RAM earlier when searching the E820
Andrew Cooper [Sat, 28 Dec 2019 14:41:11 +0000 (14:41 +0000)]
x86/boot: Check for E820_RAM earlier when searching the E820

There is no point performing the masking calculations if we are going to
throw the result away.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoMAINTAINERS: fix malformed entry
Juergen Gross [Wed, 8 Jan 2020 16:57:16 +0000 (17:57 +0100)]
MAINTAINERS: fix malformed entry

MAINTAINERS entries tagged with "L:" should have a pure mail address
as the second word. Fix a malformed entry. Otherwise add_maintainers.pl
will produce an empty "Cc:" line.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/spinlock: disable spinlock debugging in console_force_unlock()
Juergen Gross [Wed, 8 Jan 2020 10:43:24 +0000 (11:43 +0100)]
xen/spinlock: disable spinlock debugging in console_force_unlock()

console_force_unlock() might result in subsequent ASSERT() triggering
when CONFIG_DEBUG_LOCKS was active. Avoid that by calling
spin_debug_disable() in console_force_unlock() and make the spinlock
debug assertions trigger only if spin_debug was active.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/boot: boot_vid_mode doesn't need to be global
Andrew Cooper [Tue, 7 Jan 2020 12:12:51 +0000 (12:12 +0000)]
x86/boot: boot_vid_mode doesn't need to be global

AFAICT, it has never had an external user since its introduction.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: Fix RANDCONFIG build
Andrew Cooper [Tue, 7 Jan 2020 13:41:40 +0000 (13:41 +0000)]
x86/mem_sharing: Fix RANDCONFIG build

Travis reports: https://travis-ci.org/andyhhp/xen/jobs/633751811

  mem_sharing.c:361:13: error: 'rmap_has_entries' defined but not used [-Werror=unused-function]
   static bool rmap_has_entries(const struct page_info *page)
               ^
  cc1: all warnings being treated as errors

This happens in a release build (disables MEM_SHARING_AUDIT) when
CONFIG_MEM_SHARING is enabled.

Expand both trivial helpers into their single callsite.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tamas K Lengyel <tamas@tklengyel.com>

5 years agotools: Allow to make *-dir-force-update without ./configure
Anthony PERARD [Thu, 19 Dec 2019 14:42:16 +0000 (14:42 +0000)]
tools: Allow to make *-dir-force-update without ./configure

This also allows to run `make src-tarball` without first having to run
`./configure`.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/hyperv: drop all __packed from hyperv-tlfs.h
Wei Liu [Tue, 7 Jan 2020 17:17:03 +0000 (17:17 +0000)]
x86/hyperv: drop all __packed from hyperv-tlfs.h

All structures are already naturally aligned. Linux added those
attributes out of paranoia.

In Xen we've had instance we had to drop pointless __packed to placate
gcc 9 (see ca9310b24e "x86/IO-APIC: fix build with gcc9"), it is better
to drop those attributes in hyperv-tlfs.h as well.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hyperv: drop usage of GENMASK_ULL from hyperv-tlfs.h
Wei Liu [Tue, 7 Jan 2020 17:09:38 +0000 (17:09 +0000)]
x86/hyperv: drop usage of GENMASK_ULL from hyperv-tlfs.h

I'm told that GENMASK_ULL shouldn't be used outside of Arm code in its
current form.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agolibxl: don't needlessly report "highmem" in use
Jan Beulich [Wed, 8 Jan 2020 14:04:36 +0000 (15:04 +0100)]
libxl: don't needlessly report "highmem" in use

Due to the unconditional updating of dom->highmem_end in
libxl__domain_device_construct_rdm() I've observed on a 2Gb HVM guest
with a passed through device (without overly large BARs, and with no RDM
ranges at all)

(d2) RAM in high memory; setting high_mem resource base to 100000000
...
(d2) E820 table:
(d2)  [00]: 00000000:00000000 - 00000000:000a0000: RAM
(d2)  HOLE: 00000000:000a0000 - 00000000:000d0000
(d2)  [01]: 00000000:000d0000 - 00000000:00100000: RESERVED
(d2)  [02]: 00000000:00100000 - 00000000:7f800000: RAM
(d2)  HOLE: 00000000:7f800000 - 00000000:fc000000
(d2)  [03]: 00000000:fc000000 - 00000001:00000000: RESERVED
(d2)  [04]: 00000001:00000000 - 00000001:00000000: RAM

both of which aren't really appropriate in this case. Arrange for this
to not happen.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/mm: re-order a few conditionals
Jan Beulich [Wed, 8 Jan 2020 14:03:58 +0000 (15:03 +0100)]
x86/mm: re-order a few conditionals

is_{hvm,pv}_*() can be expensive now, so where possible evaluate cheaper
conditions first.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mm: rename and tidy create_pae_xen_mappings()
Jan Beulich [Wed, 8 Jan 2020 14:03:19 +0000 (15:03 +0100)]
x86/mm: rename and tidy create_pae_xen_mappings()

After dad74b0f9e ("i386: fix handling of Xen entries in final L2 page
table") and the removal of 32-bit support the function doesn't modify
state anymore, and hence its name has been misleading. Change its name,
constify parameters and a local variable, and make it return bool.

Also drop the call to it from mod_l3_entry(): The function explicitly
disallows 32-bit domains to modify slot 3. This way we also won't
re-check slot 3 when a slot other than slot 3 changes. Doing so has
needlessly disallowed making some L2 table recursively link back to an
L2 used in some L3's 3rd slot, as we check for the type ref count to be
1. (Note that allowing dynamic changes of L3 entries in the way we do is
bogus anyway, as that's not how L3s behave in the native and EPT cases:
They get re-evaluated only upon CR3 reloads. NPT is different in this
regard.)

As a result of this we no longer need to play games to get at the start
of the L3 table.

Additionally move the single remaining call site, allowing to drop one
is_pv_32bit_domain() invocation and a _PAGE_PRESENT check (in the
function itself) as well as to exit the loop early (remaining entries
have all been set to empty just ahead of this loop).

Further move a BUG_ON() such that in the common case its condition
wouldn't need evaluating.

Finally, since we're at it, move init_xen_pae_l2_slots() next to the
renamed function, as they really belong together (in fact
init_xen_pae_l2_slots() was [indirectly] broken out of this function).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mm: mod_l<N>_entry() have no need to use __copy_from_user()
Jan Beulich [Wed, 8 Jan 2020 14:02:26 +0000 (15:02 +0100)]
x86/mm: mod_l<N>_entry() have no need to use __copy_from_user()

mod_l1_entry()'s need to do so went away with commit 2d0557c5cb ("x86:
Fold page_info lock into type_info"), and the other three never had such
a need, at least going back as far as 3.2.0. Replace the uses by
l<N>e_read_atomic().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agosched: fix resuming from S3 with smt=0
Juergen Gross [Wed, 8 Jan 2020 13:59:25 +0000 (14:59 +0100)]
sched: fix resuming from S3 with smt=0

When resuming from S3 and smt=0 or maxcpus= are specified we must not
do anything in cpu_schedule_callback(). This is not true today for
taking down a cpu during resume.

If anything goes wrong during resume all the scheduler related error
handling is in cpupool.c, so we can just bail out early from
cpu_schedule_callback() when suspending or resuming.

This fixes commit 0763cd2687897b55e7 ("xen/sched: don't disable
scheduler on cpus during suspend").

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mm: change pl*e to l*t in virt_to_xen_l*e
Wei Liu [Tue, 7 Jan 2020 12:06:49 +0000 (12:06 +0000)]
x86/mm: change pl*e to l*t in virt_to_xen_l*e

We will need to have a variable named pl*e when we rewrite
virt_to_xen_l*e. Change pl*e to l*t to reflect better its purpose.
This will make reviewing later patch easier.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mm: introduce l{1,2}t local variables to modify_xen_mappings
Wei Liu [Tue, 7 Jan 2020 12:06:46 +0000 (12:06 +0000)]
x86/mm: introduce l{1,2}t local variables to modify_xen_mappings

The pl2e and pl1e variables are heavily (ab)used in that function.  It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.

We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.

Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mm: introduce l{1,2}t local variables to map_pages_to_xen
Wei Liu [Tue, 7 Jan 2020 12:06:45 +0000 (12:06 +0000)]
x86/mm: introduce l{1,2}t local variables to map_pages_to_xen

The pl2e and pl1e variables are heavily (ab)used in that function. It
is fine at the moment because all page tables are always mapped so
there is no need to track the life time of each variable.

We will soon have the requirement to map and unmap page tables. We
need to track the life time of each variable to avoid leakage.

Introduce some l{1,2}t variables with limited scope so that we can
track life time of pointers to xen page tables more easily.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: move some xen mm function declarations
Wei Liu [Tue, 7 Jan 2020 12:06:43 +0000 (12:06 +0000)]
x86: move some xen mm function declarations

They were put into page.h but mm.h is more appropriate.

The real reason is that I will be adding some new functions which
takes mfn_t. It turns out it is a bit difficult to do in page.h.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/dombuilder: Don't allocate dom->p2m_host[] for translated domains
Andrew Cooper [Tue, 17 Dec 2019 18:20:33 +0000 (18:20 +0000)]
tools/dombuilder: Don't allocate dom->p2m_host[] for translated domains

xc_dom_p2m() and dom->p2m_host[] implement a linear transform for translated
domains, but waste a substantial chunk of RAM doing so.

ARM literally never reads dom->p2m_host[] (because of the xc_dom_translated()
short circuit in xc_dom_p2m()).  Drop it all.

x86 HVM does use dom->p2m_host[] for xc_domain_populate_physmap_exact() calls
when populating 4k pages.  Reuse the same tactic from 2M/1G ranges and use an
on-stack array instead.  Drop the memory allocation.

x86 PV guests do use dom->p2m_host[] as a non-identity transform.  Rename the
field to pv_p2m to make it clear it is PV-only.

No change in the constructed guests.

Reported-by: Varad Gautam <vrd@amazon.de>
Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/dombuilder: Remove p2m_guest from the common interface
Andrew Cooper [Tue, 17 Dec 2019 17:41:36 +0000 (17:41 +0000)]
tools/dombuilder: Remove p2m_guest from the common interface

In-guest p2m's are a concept specific to x86 PV guests.  alloc_p2m_list() is
the only hook which initialises dom->p2m_guest, making
xc_dom_update_guest_p2m() a nop for non-PV guests.

Move p2m_guest into xc_dom_image_x86 and adjust alloc_p2m_list() to match.

Drop xc_dom_update_guest_p2m() entirely.

One caller, move_l3_below_4G(), only uses it to modify a single entry, so
rewriting the whole guest p2m is wasteful - opencode the single update
instead.  The other caller is common code.  Instead, move the logic into the
setup_pgtables() hooks, which know their own sizeof_pfn and can do away with
the switch statement.

No change in the constructed guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/dombuilder: Remove PV-only, mandatory hooks
Andrew Cooper [Tue, 17 Dec 2019 17:08:22 +0000 (17:08 +0000)]
tools/dombuilder: Remove PV-only, mandatory hooks

Currently, the setup_pgtable() hook is optional, but alloc_pgtable() hook is
not.  Both are specific to x86 PV guests, and stubbed in various ways by the
dombuilders for translated guests (x86 HVM, ARM).

Make alloc_pgtables() optional, and drop all the stubs for translated guest
types.

No change in the constructed guests.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Julien Grall <julien@xen.org>
5 years agotools/dombuilder: xc_dom_x86 cleanup
Andrew Cooper [Tue, 17 Dec 2019 17:03:17 +0000 (17:03 +0000)]
tools/dombuilder: xc_dom_x86 cleanup

The two xc_dom_params structures for PV pagetables are never modified and can
live in .rodata.  Reduce their scope to the alloc_pgtable_*() functions which
construct xc_dom_image_x86 appropriately.

Rename {alloc,setup}_pgtables() to {alloc,setup}_pgtables_pv() to highlight
that they are PV only, and drop some _x86() suffixes from static helpers.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/shim: Short circuit control/hardware checks in PV_SHIM_EXCLUSIVE builds
Andrew Cooper [Mon, 28 Oct 2019 10:58:02 +0000 (10:58 +0000)]
x86/shim: Short circuit control/hardware checks in PV_SHIM_EXCLUSIVE builds

The net diffstat is:
  add/remove: 0/13 grow/shrink: 25/129 up/down: 6297/-20469 (-14172)

With the following objects/functions removed entirely:
  iommu_hwdom_none                               1       -      -1
  hwdom_max_order                                4       -      -4
  extra_hwdom_irqs                               4       -      -4
  ctldom_max_order                               4       -      -4
  acpi_c1e_quirk                                43       -     -43
  hvm_pirq_eoi                                  62       -     -62
  max_order                                     94       -     -94
  conring_puts                                 104       -    -104
  propagate_node                               119       -    -119
  mmio_ro_emulate_ops                          224       -    -224
  mmcfg_intercept_ops                          224       -    -224
  pci_cfg_ok                                   295       -    -295
  p2m_lock                                     546       -    -546

And the following reduced to stubs:
  arch_iommu_hwdom_init                        852       2    -850
  p2m_add_foreign                              880      16    -864

This patch also has the unintended but useful consequence of stopping
hardware_dom= functionality from being usable (in at least PV_SHIM_EXCLUSIVE
builds).

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Tested-by: Sergey Dyasli <sergey.dyasli@citrix.com>
5 years agotools/save: Drop unused parameters from xc_domain_save()
Andrew Cooper [Fri, 3 Jan 2020 18:31:46 +0000 (18:31 +0000)]
tools/save: Drop unused parameters from xc_domain_save()

XCFLAGS_CHECKPOINT_COMPRESS has been unused since c/s b15bc4345 (2015),
XCFLAGS_HVM since c/s 9e8672f1c (2013), and XCFLAGS_STDVGA since c/s
087d43326 (2007).  Drop the constants, and code which sets them.

The separate hvm parameter (appeared in c/s d11bec8a1, 2007 and ultimately
redundant with XCFLAGS_HVM), is used for sanity checking and debug printing,
then discarded and replaced with Xen's idea of whether the domain is PV or
HVM.

Rearrange the logic in xc_domain_save() to ask Xen sightly earlier, and use a
consistent idea of 'hvm' throughout.  Removing this parameter removes the
final user of libxl's dss->hvm, so drop that field as well.

Update the doxygen comment to be accurate.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <Ian.Jackson@citrix.com>
5 years agoxen/cpupool: Fold error paths in cpupool_create()
Andrew Cooper [Fri, 29 Mar 2019 16:51:12 +0000 (16:51 +0000)]
xen/cpupool: Fold error paths in cpupool_create()

The compiler can't fold because of the write to *perr in the first hunk.

No functional change, but slightly better compiled code.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
5 years agolivepatch: use proper rc variable in livepatch_do_action()
Pawel Wieczorkiewicz [Mon, 6 Jan 2020 12:56:23 +0000 (12:56 +0000)]
livepatch: use proper rc variable in livepatch_do_action()

Fix c&p bug in the livepatch_do_action() code of
LIVEPATCH_ACTION_REPLACE case.
The correct variable handling return code of revert action is
other->rc in this case.

Coverity-ID: 1457467
Fixes: 6047104c3c ("livepatch: Add per-function applied/reverted state tracking marker")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
5 years agoCoverity: Improve model for {,un}map_domain_page()
Andrew Cooper [Mon, 6 Jan 2020 13:26:28 +0000 (13:26 +0000)]
Coverity: Improve model for {,un}map_domain_page()

The first attempt resulted in several "Free of address-of
expression (BAD_FREE)" issues, because of code which relies on the fact that
any pointer in the same page is ok to pass to unmap_domain_page()

Model this property to remove the issues.

Coverity IDs: 1135356 113536{0,1} 1401300 141809{0,1} 1438864
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/smpboot: Use printk_once() rather than opencoding it
Andrew Cooper [Mon, 6 Jan 2020 13:22:11 +0000 (13:22 +0000)]
x86/smpboot: Use printk_once() rather than opencoding it

Shrink the text to be less verbose.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/libxc: disable x2APIC when using nested virtualization
Roger Pau Monne [Fri, 3 Jan 2020 17:29:35 +0000 (18:29 +0100)]
tools/libxc: disable x2APIC when using nested virtualization

There are issues as reported by osstest when Xen is running nested on
itself and the L1 Xen is using x2APIC. While those are being
investigated, disable announcing the x2APIC feature in CPUID when nested
HVM mode is enabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxen: make gdbsx support configurable
Juergen Gross [Thu, 19 Dec 2019 07:42:09 +0000 (08:42 +0100)]
xen: make gdbsx support configurable

Gdbsx support in the hypervisor is rarely used and it is opening a
way for dom0 to modify the running hypervisor by very easy means.

Remove the possibility to read/write hypervisor memory, it was never
used by gdbsx.

Add a Kconfig option to control support of gdbsx. Default is on.

While at it correct a wrong comment in related code and remove dead
code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen: put more code under CONFIG_CRASH_DEBUG
Juergen Gross [Thu, 19 Dec 2019 07:42:08 +0000 (08:42 +0100)]
xen: put more code under CONFIG_CRASH_DEBUG

debugger_trap_entry() is not needed without CONFIG_CRASH_DEBUG, so only
include it if CONFIG_CRASH_DEBUG is defined.

While at it remove CONFIG_HAS_GDBSX as it can easily be replaced by
CONFIG_CRASH_DEBUG.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agotools/restore: Drop unused parameters from xc_domain_restore()
Andrew Cooper [Fri, 3 Jan 2020 17:06:51 +0000 (17:06 +0000)]
tools/restore: Drop unused parameters from xc_domain_restore()

The hvm and pae parameters are a remnant of legacy migration.  They have 0
passed in from libxl_stream_read.c's process_record(), and are discarded in
xc_domain_restore().

While dropping these, update the doxygen comment to be accurate, and simplify
the other hvm vs pv handling in xc_domain_restore().

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/boot: Clean up the trampoline transition into Long mode
Andrew Cooper [Thu, 2 Jan 2020 14:38:32 +0000 (14:38 +0000)]
x86/boot: Clean up the trampoline transition into Long mode

The jmp after setting %cr0 is redundant with the following ljmp.

The CPUID to protect the jump to higher mappings was inserted due to an
abundance of caution/paranoia before Spectre was public.  It doesn't usefully
protect against an attack, which is able to leak memory with one single
instruction's worth of onward speculation.

Only CPU Hotplug (if used at all) will use this path while guests are
executing.  An attacker would have to be running and primed on an adjacent
thread while a hotplug event occurred, to gain one single data sample, and
have some other way of inferring that a hotplug event has occurred, which it
won't know directly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Drop stale comment
Andrew Cooper [Thu, 2 Jan 2020 16:20:17 +0000 (16:20 +0000)]
x86/boot: Drop stale comment

This ought to have disappeared in c/s 60685089cb0

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/efi: Drop infinite loops and use unreachable()/noreturn
Andrew Cooper [Thu, 2 Jan 2020 13:52:23 +0000 (13:52 +0000)]
xen/efi: Drop infinite loops and use unreachable()/noreturn

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agox86: rename guest/hypercall.h to guest/xen-hcall.h
Wei Liu [Wed, 25 Dec 2019 17:58:35 +0000 (17:58 +0000)]
x86: rename guest/hypercall.h to guest/xen-hcall.h

We will provide a header file for Hyper-V hypercalls.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hyperv: detect absolutely necessary MSRs
Wei Liu [Fri, 27 Dec 2019 17:14:58 +0000 (17:14 +0000)]
x86/hyperv: detect absolutely necessary MSRs

If they are not available, disable Hyper-V related features.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: include xen/lib.h in guest/pvh-boot.h
Wei Liu [Sun, 29 Dec 2019 18:29:25 +0000 (18:29 +0000)]
x86: include xen/lib.h in guest/pvh-boot.h

It needs ASSERT_UNREACHABLE.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agodomctl: return EEXIST from XEN_DOMCTL_createdomain...
Paul Durrant [Fri, 3 Jan 2020 16:06:57 +0000 (17:06 +0100)]
domctl: return EEXIST from XEN_DOMCTL_createdomain...

...if a specified domid is already in use.

XEN_DOMCTL_createdomain allows a domid to be specified by its caller and
will correctly fail if that domid is already in use. However the errno
returned in this case will be EINVAL, making it indistinguishable from
several other failures. Also a value of EINVAL does not seem appropriate
as the specified domid is valid [1] but just not (transiently) available.

[1] any invalid value passed in is ignored and causes Xen to choose an
    unused and valid value.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/save: reserve HVM save record numbers that have been consumed...
Paul Durrant [Fri, 3 Jan 2020 16:06:03 +0000 (17:06 +0100)]
x86/save: reserve HVM save record numbers that have been consumed...

...for patches not (yet) upstream.

This patch is simply adding a comment to reserve save record number space
to avoid the risk of clashes between existent downstream changes made by
Amazon and future upstream changes which may be incompatible.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/HVM: use single (atomic) MOV for aligned emulated writes
Jan Beulich [Fri, 3 Jan 2020 16:04:41 +0000 (17:04 +0100)]
x86/HVM: use single (atomic) MOV for aligned emulated writes

Using memcpy() may result in multiple individual byte accesses
(dependening how memcpy() is implemented and how the resulting insns,
e.g. REP MOVSB, get carried out in hardware), which isn't what we
want/need for carrying out guest insns as correctly as possible. Fall
back to memcpy() only for accesses not 2, 4, or 8 bytes in size.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agotools/xl/xl_cmdtable.c: Fix a simple typo.
Chad Dougherty [Thu, 2 Jan 2020 18:10:51 +0000 (18:10 +0000)]
tools/xl/xl_cmdtable.c: Fix a simple typo.

Signed-off-by: Chad Dougherty <crd@acm.org>
5 years agodocs/process/branching-checklist: Fix a broken rune
Ian Jackson [Fri, 13 Dec 2019 17:01:44 +0000 (17:01 +0000)]
docs/process/branching-checklist: Fix a broken rune

cr-daily-branch ought to be called via cr-for-branches so that we take
the lock.  Otherwise strange things can occur if cron runs
cr-daily-branch in the same directory - in particular, it will be
likely to update the osstest revision, breaking everything.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/tasklet: Switch data parameter from unsigned long to void *.
Andrew Cooper [Fri, 26 Apr 2019 15:53:27 +0000 (16:53 +0100)]
xen/tasklet: Switch data parameter from unsigned long to void *.

Most users pass a vcpu pointer, and only stopmachine_action() takes an integer
parameter.  Switch to using void * to substantially reduce the number of
explicit casts.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/tasklet: Fix return value truncation on arm64
Andrew Cooper [Thu, 11 Apr 2019 12:54:36 +0000 (13:54 +0100)]
xen/tasklet: Fix return value truncation on arm64

The use of return_reg() assumes ARM's 32bit ABI.  Therefore, a failure such as
-EINVAL will appear as a large positive number near 4 billion to a 64bit ARM
guest which happens to use continue_hypercall_on_cpu().

Introduce a new arch_hypercall_tasklet_result() hook which is implemented by
both architectures, and drop the return_reg() macros.  This logic will be
extended in a later change to make continuations out of the tasklet work.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien@xen.org>
5 years agox86/debug: Plumb pending_dbg through the monitor and devicemodel interfaces
Andrew Cooper [Thu, 31 May 2018 17:50:50 +0000 (18:50 +0100)]
x86/debug: Plumb pending_dbg through the monitor and devicemodel interfaces

Like %cr2 for pagefaults, %dr6 contains ancillary information for debug
exceptions, and needs similar handling.

For xendevicemodel_inject_event(), no ABI change is needed (although an API
one would be ideal).  Switch from 'cr2' to 'extra' in variable names which
don't constitute an API change, and update the documentation to match.

For the monitor interface, vm_event_debug needs extending with a pending_dbg
field.  This shall behave like the VT-x PENDING_DBG control.  Extend
hvm_monitor_debug() and for now, always pass in 0 - this will be fixed
eventually, when other hypervisor bugfixes are complete.

While modifying hvm_monitor_debug(), take the opportunity to correct trap type
and instruction length from unsigned long to unsigned int, as they are both
tiny values.

Finally, adjust xen-access.c to the new expectations.  Introspection tools
intercepting debug exceptions should mirror the new pending_dbg field into
xendevicemodel_inject_event() for %dr6 to be processed correctly for the
guest.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Petre Pircalabu <ppircalabu@bitdefender.com>
5 years agotools/libxc: Fix HVM_PARAM_PAE_ENABLED handling in xc_cpuid_apply_policy()
Andrew Cooper [Fri, 20 Dec 2019 15:26:00 +0000 (15:26 +0000)]
tools/libxc: Fix HVM_PARAM_PAE_ENABLED handling in xc_cpuid_apply_policy()

Despite as suggested in c/s 685e922d6f3, not all HVM_PARAMs are handled
in the same way.  HVM_PARAM_PAE_ENABLED is a toolstack-only value, and
the xc_cpuid_apply_policy() used to be the only consumer.

Reinstate the old behaviour (mad as it is) to avoid regressions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Fold gdt_48 into the bottom of trampoline_gdt
Andrew Cooper [Mon, 19 Aug 2019 13:16:53 +0000 (14:16 +0100)]
x86/boot: Fold gdt_48 into the bottom of trampoline_gdt

Saves 8 bytes in the trampoline.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Reposition trampoline data
Andrew Cooper [Mon, 19 Aug 2019 13:16:53 +0000 (14:16 +0100)]
x86/boot: Reposition trampoline data

... to separate code from data.  In particular, trampoline_realmode_entry's
write to trampoline_cpu_started clobbers the I-cache line containing
trampoline_protmode_entry, which won't be great for AP startup performance.

Reformat the comments for trampoline_gdt to reduce their volume.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mm: drop redundant smp_wmb() from _put_final_page_type()
Jan Beulich [Fri, 27 Dec 2019 09:02:48 +0000 (10:02 +0100)]
x86/mm: drop redundant smp_wmb() from _put_final_page_type()

get_page_light()'s use of cmpxchg() is a full barrier already anyway.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mm: avoid IOMMU operations in more cases in _get_page_type()
Jan Beulich [Fri, 27 Dec 2019 09:01:43 +0000 (10:01 +0100)]
x86/mm: avoid IOMMU operations in more cases in _get_page_type()

All that really matters is whether writability of a page changes; in
particular e.g. page table -> page table (but different levels)
transitions do not require unmapping the page from the IOMMU again.

Note that the XSA-288 fix did arrange for PGT_none pages not needing
special consideration here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: move vgc_flags to struct pv_vcpu
Jan Beulich [Fri, 27 Dec 2019 08:57:05 +0000 (09:57 +0100)]
x86: move vgc_flags to struct pv_vcpu

There's been effectively no use of the field for HVM.

Also shrink the field to unsigned int, even if this doesn't immediately
yield any space benefit for the structure itself. The resulting 32-bit
padding slot can eventually be used for some other field. The change in
size makes accesses slightly more efficient though, as no REX.W prefix
is going to be needed anymore on the respective insns.

Mirror the HVM side change here (dropping of setting the field to
VGCF_online) also to Arm, on the assumption that it was cloned like
this originally. VGCF_online really should simply and consistently be
the guest view of the inverse of VPF_down, and hence needs representing
only in the get/set vCPU context interfaces.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: move and rename NR_VECTORS
Jan Beulich [Fri, 27 Dec 2019 08:56:04 +0000 (09:56 +0100)]
x86: move and rename NR_VECTORS

This is an architectural definition, so move it to x86-defns.h and add
an X86_ prefix. This in particular allows removing the inclusion of
irq_vectors.h by virtually every source file, due to irq.h and
hvm/vmx/vmcs.h having needed to include it: Changes to IRQ vector usage
shouldn't really trigger full rebuilds.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: re-use legacy vector ranges on APs
Jan Beulich [Fri, 27 Dec 2019 08:54:59 +0000 (09:54 +0100)]
x86/IRQ: re-use legacy vector ranges on APs

The legacy vectors have been actively used on CPU 0 only. CPUs not
sharing vector space with CPU 0 can easily re-use them, slightly
increasing the relatively scarce resource of total vectors available in
the system. As a result the legacy vector range simply becomes a
sub-range of the dynamic one, with an extra check performed in
_assign_irq_vector() (we can't rely on the
"per_cpu(vector_irq, new_cpu)[vector] >= 0" check in the subsequent
loop, as we need to also exclude vectors of disabled legacy IRQs).

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: flip legacy and dynamic vector ranges
Jan Beulich [Fri, 27 Dec 2019 08:54:19 +0000 (09:54 +0100)]
x86/IRQ: flip legacy and dynamic vector ranges

There's no reason to have the PIC vectors (which are typically entirely
unused on 64-bit systems anyway) right below the high priority ones. Put
them in the lowest possible range, and shift the dynamic vector range up
accordingly. This is to reduce the priority of PIC vectors in the LAPIC
vs all other ones.

Note that irq_move_cleanup_interrupt(), despite using
FIRST_DYNAMIC_VECTOR, does not get touched, as PIC interrupts aren't
movable.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: simplify pending EOI stack logic for internally used IRQs
Jan Beulich [Fri, 27 Dec 2019 08:53:35 +0000 (09:53 +0100)]
x86/IRQ: simplify pending EOI stack logic for internally used IRQs

In 5655ce8b1ec2 ("x86/IRQ: make internally used IRQs also honor the
pending EOI stack") it was mentioned that both the check_eoi_deferral
per-CPU variable and the cpu_has_pending_apic_eoi() were added just to
have as little impact on existing behavior as possible, to reduce the
risk of a last minute regression in 4.13.

Upon closer inspection, dropping the variable is an option only if all
callers of ->end() would assume the responsibility of also calling
flush_ready_eoi(). Therefore only drop the cpu_has_pending_apic_eoi()
guard now.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: move and rename __do_IRQ_guest()
Jan Beulich [Fri, 27 Dec 2019 08:52:41 +0000 (09:52 +0100)]
x86/IRQ: move and rename __do_IRQ_guest()

This is for it to be next to do_IRQ(). Beyond the actual code movement
this
- drops the leading underscores,
- passes in desc and vector, rather than irq,
- flips the order of two ASSERT()s,
- changes i and sp to unsigned int,
- restricts the scope of d and sp,
- corrects style.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/IRQ: move do_IRQ()
Jan Beulich [Fri, 27 Dec 2019 08:51:52 +0000 (09:51 +0100)]
x86/IRQ: move do_IRQ()

This is to avoid forward declarations of static functions. Beyond the
actual code movement this does
- u8 -> uint8_t,
- convert to Xen style,
- drop unnecessary parentheses and alike,
- strip trailing white space.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/hvm/rtc: preserved guest RTC offset during suspend/resume/migrate
Paul Durrant [Fri, 27 Dec 2019 08:50:31 +0000 (09:50 +0100)]
x86/hvm/rtc: preserved guest RTC offset during suspend/resume/migrate

The emulated RTC is synchronized with the PV wallclock; any write to the
RTC will update struct domain's 'time_offset_seconds' field and call
update_domain_wallclock().

However, the value of 'time_offset_seconds' is not preserved in any save
record and indeed, when the RTC save record is loaded, the CMOS values
will be updated based on an offset value which may or may not have been
set by the toolstack [1]. This may result in making bogus values available
to the guest and messing up any calculations done in the call to
alarm_timer_update() at the end of rtc_load().

This patch extends the RTC save record to contain an offset value, which
will be zero filled on load of an older record. The 'time_offset_secoonds'
field in struct domain is also modified into a 'time_offset' struct,
containing a 'seconds' field and a boolean 'set' field.

The code in rtc_load() then uses the new value in the save record to
update the value of struct domain's 'time_offset.seconds' unless
'time_offset.set' is true, which will only be the case if the toolstack has
already performed a XEN_DOMCTL_settimeoffset.

[1] There is currently no way for a toolstack to read the value of
    'time_offset_seconds' from struct domain. In the past, any hope of
    preservation of the value across a guest life-cycle operation was based
    on relying on qemu-dm to write a value into xenstore whenever the RTC
    was updated, in response to an IOREQ with type IOREQ_TYPE_TIMEOFFSET
    being sent by Xen; see:

    https://xenbits.xen.org/gitweb/?p=qemu-xen-traditional.git;a=blob;f=i386-dm/helper2.c#l457

    but this behaviour was never forward-ported into upstream QEMU, which
    completely ignores that IOREQ type.
    In either case, nothing in xl or libxl ever samples the value of
    RTC offset from xenstore so any offset adjustment to a non-zero value
    performed by the guest (which in the case of Windows is highly likely
    as it normally writes RTC in local time, whereas Xen maintains time in
    UTC) is completely lost with the de-facto toolstack, and always has
    been. Instead, PV drivers are relied upon to paper over this gaping
    hole.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agox86/vvmx: virtualize x2APIC mode and APIC accesses can't both be enabled
Roger Pau Monne [Tue, 24 Dec 2019 15:32:47 +0000 (16:32 +0100)]
x86/vvmx: virtualize x2APIC mode and APIC accesses can't both be enabled

According to the Intel SDM, "virtualize x2APIC mode" and "virtualize
APIC accesses" can't be enabled at the same time, or else a
vm{launch/entry} failure will happen. This was seen when running Xen
nested and with x2APIC enabled:

  (XEN) d3v0 VMLAUNCH error: 0x7
  [...]
  (XEN) *** Control State ***
  (XEN) PinBased=0000003f CPUBased=b6a075fe SecondaryExec=000014fb
  [...]

Fix this by making sure nvmx_update_secondary_exec_control clears the
incompatible bits from the host vmcs before merging it with the nested
vmcs.

This fixes a regression reported by osstest in the
test-amd64-amd64-qemuu-nested-intel job.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agolibxc/migration: Drop unimplemented domain types
Andrew Cooper [Tue, 17 Dec 2019 17:49:47 +0000 (17:49 +0000)]
libxc/migration: Drop unimplemented domain types

x86 PVH is completely obsolete - it was intended for legacy PVH before that
idea was abandoned.  There was an RFC series for ARM in 2015, but there is
plenty of outstanding work which hasn't been done yet.

No functional change.  New types can be (re)introduced with the code which
actually implements them.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien@xen.org>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxc/migration: Rename TSC_INFO to X86_TSC_INFO
Andrew Cooper [Tue, 17 Dec 2019 13:38:14 +0000 (13:38 +0000)]
libxc/migration: Rename TSC_INFO to X86_TSC_INFO

This record is specific to x86, and should have had a prefix to being with.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agodocs/migration: Remove numbering for typical records
Andrew Cooper [Mon, 16 Dec 2019 17:15:23 +0000 (17:15 +0000)]
docs/migration: Remove numbering for typical records

The numbers aren't referenced directly, and explicit numbering makes an
unnecesserily large diff when inserting something new in the middle.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxc/restore: Don't duplicate state in process_vcpu_basic()
Andrew Cooper [Wed, 18 Dec 2019 19:43:18 +0000 (19:43 +0000)]
libxc/restore: Don't duplicate state in process_vcpu_basic()

vcpu_guest_context_any_t is currently allocated on the stack, and copied from
a mutable buffer which is freed immediately after its use here.  Mutate the
buffer in place instead of duplicating it.

The code is as it is due to how it was developed.  Originally,
process_vcpu_basic() operated on a const pointer from the X86_VCPU_BASIC
record, but during upstreaming, the addition of Remus support required
buffering of X86_VCPU_BASIC records each checkpoint.

By the time process_vcpu_basic() runs, we are commited to completing state
restoration and unpausing the guest.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agogolang/xenlight: implement array C to Go marshaling
Nick Rosbrook [Mon, 23 Dec 2019 15:17:02 +0000 (10:17 -0500)]
golang/xenlight: implement array C to Go marshaling

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agogolang/xenlight: add error return type to Context.Cpupoolinfo
Nick Rosbrook [Mon, 23 Dec 2019 15:17:07 +0000 (10:17 -0500)]
golang/xenlight: add error return type to Context.Cpupoolinfo

A previous commit that removed Context.CheckOpen revealed
an ineffectual assignent to err in Context.Cpupoolinfo, as
there is no error return type.

Since it appears that the intent is to return an error here,
add an error return value to the function signature.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agogolang/xenlight: revise use of Context type
Nick Rosbrook [Mon, 23 Dec 2019 15:17:06 +0000 (10:17 -0500)]
golang/xenlight: revise use of Context type

Remove the exported global context variable, 'Ctx.' Generally, it is
better to not export global variables for use through a Go package.
However, there are some exceptions that can be found in the standard
library.

Add a NewContext function instead, and remove the Open, IsOpen, and
CheckOpen functions as a result.

Also, comment-out an ineffectual assignment to 'err' inside the function
Context.CpupoolInfo so that compilation does not fail.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoMAINTAINERS: put hyperv-tlfs.h under viridian maintainership
Wei Liu [Mon, 23 Dec 2019 12:51:43 +0000 (12:51 +0000)]
MAINTAINERS: put hyperv-tlfs.h under viridian maintainership

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Wei Liu <wl@xen.org>
Acked-by: Paul Durrant <paul@xen.org>
5 years agox86emul: introduce CASE_SIMD_..._FP_VEX()
Jan Beulich [Mon, 23 Dec 2019 13:16:11 +0000 (14:16 +0100)]
x86emul: introduce CASE_SIMD_..._FP_VEX()

Since there are many AVX{,2} insns having legacy SIMD counterparts, have
macros covering both in one go. This (imo) improves readability and helps
prepare for optionally disabling SIMD support in the emulator.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: drop CASE_SIMD_DOUBLE_FP()
Jan Beulich [Mon, 23 Dec 2019 13:15:17 +0000 (14:15 +0100)]
x86emul: drop CASE_SIMD_DOUBLE_FP()

It's used only by CASE_SIMD_ALL_FP(), which can equally well be
implemented in terms of CASE_SIMD_{PACKED,SCALAR}_FP().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86emul: introduce CASE_SIMD_PACKED_INT_VEX()
Jan Beulich [Mon, 23 Dec 2019 13:13:37 +0000 (14:13 +0100)]
x86emul: introduce CASE_SIMD_PACKED_INT_VEX()

Since there are many AVX{,2} insns having legacy MMX and SIMD
counterparts, have a macro covering all three in one go. This (imo)
improves readability (simply by the shrunk number of lines) and helps
prepare for optionally disabling MMX and SIMD support in the emulator.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/hyperv: change hv_tlb_flush_ex to fix clang build
Wei Liu [Mon, 23 Dec 2019 11:03:30 +0000 (11:03 +0000)]
x86/hyperv: change hv_tlb_flush_ex to fix clang build

Clang complains:

In file included from synic.c:15:
/builds/xen-project/xen/xen/include/asm/guest/hyperv-tlfs.h:900:18: error: field 'hv_vp_set' with variable sized type 'struct hv_vpset' not at the end of a struct or class is a GNU extension [-Werror,-Wgnu-variable-sized-type-not-at-end]
        struct hv_vpset hv_vp_set;
                        ^
1 error generated.
/builds/xen-project/xen/xen/Rules.mk:198: recipe for target 'synic.o' failed
make[6]: *** [synic.o] Error 1

Comment out the last variable size array from hv_tlb_flush_ex to fix
clang builds.

Fixes: bbba482664 ("x86: import hyperv-tlfs.h from Linux")
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/viridian: drop viridian_stimer_config_msr
Wei Liu [Sun, 22 Dec 2019 23:12:15 +0000 (23:12 +0000)]
x86/viridian: drop viridian_stimer_config_msr

Use hv_stimer_config instead. No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86/viridian: drop virdian_sint_msr
Wei Liu [Sun, 22 Dec 2019 23:06:00 +0000 (23:06 +0000)]
x86/viridian: drop virdian_sint_msr

Use hv_synic_sint in hyperv-tlfs.h instead.

This requires adding the missing "polling" member to hv_synic_sint.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86/viridian: drop a wrong invalid value from reference TSC implementation
Wei Liu [Fri, 20 Dec 2019 21:08:28 +0000 (21:08 +0000)]
x86/viridian: drop a wrong invalid value from reference TSC implementation

The only invalid value mentioned in Hyper-V TLFS 5.0c is 0. Michael
Kelley confirmed that 0xFFFFFFFF was never used [0].

[0] https://lists.xen.org/archives/html/xen-devel/2019-12/msg01564.html

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86: move viridian_guest_os_id_msr to hyperv-tlfs.h
Wei Liu [Fri, 20 Dec 2019 19:43:59 +0000 (19:43 +0000)]
x86: move viridian_guest_os_id_msr to hyperv-tlfs.h

Suggested-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86: provide and use hv_tsc_scale
Wei Liu [Fri, 20 Dec 2019 19:18:16 +0000 (19:18 +0000)]
x86: provide and use hv_tsc_scale

The Hyper-V clock source and Xen's own viridian code need the same
functionality.

Move the function in viridian/time.c to hyperv.h and use it in both
places.

No functional change.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c
Wei Liu [Tue, 17 Dec 2019 18:28:39 +0000 (18:28 +0000)]
x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c

Use the one defined in hyperv-tlfs.h instead. No functional change
intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86/viridian: drop duplicate defines from private.h and viridian.c
Wei Liu [Tue, 17 Dec 2019 17:20:01 +0000 (17:20 +0000)]
x86/viridian: drop duplicate defines from private.h and viridian.c

Also add HVCALL_EXT_CALL_QUERY_CAPABILITIES to hyperv-tlfs.h.
HvGetPartitionID was never used in code so just dropped it.

No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86: Hyper-V clock source's offset should be signed
Wei Liu [Fri, 20 Dec 2019 19:47:49 +0000 (19:47 +0000)]
x86: Hyper-V clock source's offset should be signed

Also drop the useless inline keyword.

Fixes: 685d16bd5 (x86: implement Hyper-V clock source)
Signed-off-by: Wei Liu <liuwe@microsoft.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agolivepatch: Fix typos and other errors in tests Makefile
Pawel Wieczorkiewicz [Fri, 20 Dec 2019 18:23:39 +0000 (18:23 +0000)]
livepatch: Fix typos and other errors in tests Makefile

There was a bunch of typos (s/actions/action/) as well as one missing
config.h target dependency. Also, xen_expectation target has
unnecessary cycle dependency.

Fixes: 25164571fc ('Merge branch 'livepatch.aws.v6' into staging')
Signed-off-by: Pawel Wieczorkiewicz <wipawel@amazon.de>
Tested-by: Julien Grall <julien@xen.org>
5 years agox86/viridian: drop private copy of definitions from synic.c
Wei Liu [Wed, 18 Dec 2019 14:42:30 +0000 (14:42 +0000)]
x86/viridian: drop private copy of definitions from synic.c

Use hyperv-tlfs.h instead. No functional change intended.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agox86: implement Hyper-V clock source
Wei Liu [Thu, 24 Oct 2019 14:54:15 +0000 (15:54 +0100)]
x86: implement Hyper-V clock source

Implement a clock source using Hyper-V's reference TSC page.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hyperv: extract more information from Hyper-V
Wei Liu [Thu, 24 Oct 2019 13:22:53 +0000 (14:22 +0100)]
x86/hyperv: extract more information from Hyper-V

Provide a structure to store that information. The structure will be
accessed from other places later so make it public.

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: import hyperv-tlfs.h from Linux
Wei Liu [Thu, 24 Oct 2019 11:17:03 +0000 (12:17 +0100)]
x86: import hyperv-tlfs.h from Linux

Take a pristine copy from Linux commit b2d8b167e15bb5ec2691d1119c025630a247f649.

Do the following to fix it up for Xen:

1. include xen/types.h and xen/bitops.h
2. fix up invocations of BIT macro

Signed-off-by: Wei Liu <liuwe@microsoft.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/libxc: Drop unused xc_compression_*()
Andrew Cooper [Thu, 19 Dec 2019 14:51:31 +0000 (14:51 +0000)]
tools/libxc: Drop unused xc_compression_*()

There have been no users of the xc_compression_*() interface since Migration
v2 replaced legacy migration (2015, c/s b15bc4345).

It would need adjusting to fit into migration v2, and can be pulled out of git
history if someone wants to resurrect it in the future.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools/libxc: Drop other examples of the 'goto x; } else if' antipattern
Andrew Cooper [Wed, 18 Dec 2019 22:08:02 +0000 (22:08 +0000)]
tools/libxc: Drop other examples of the 'goto x; } else if' antipattern

None of these are buggy, but the resulting code is more robust.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86emul: use CASE_SIMD_PACKED_INT() where possible
Jan Beulich [Fri, 20 Dec 2019 15:46:20 +0000 (16:46 +0100)]
x86emul: use CASE_SIMD_PACKED_INT() where possible

This (imo) improves readability (simply by the shrunk number of lines)
and helps prepare for optionally disabling MMX and SIMD support in the
emulator.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/vm_event: add short-circuit for breakpoints (aka "fast single step")
Sergey Kovalev [Fri, 20 Dec 2019 15:45:32 +0000 (16:45 +0100)]
x86/vm_event: add short-circuit for breakpoints (aka "fast single step")

When using DRAKVUF (or another system using altp2m with shadow pages similar
to what is described in
https://xenproject.org/2016/04/13/stealthy-monitoring-with-xen-altp2m),
after a breakpoint is hit the system switches to the default
unrestricted altp2m view with singlestep enabled. When the singlestep
traps to Xen another vm_event is sent to the monitor agent, which then
normally disables singlestepping and switches the altp2m view back to
the restricted view.

This patch short-circuiting that last part so that it doesn't need to send the
vm_event out for the singlestep event and should switch back to the restricted
view in Xen automatically.

This optimization gains about 35% speed-up.

Was tested on Debian branch of Xen 4.12. See at:
https://github.com/skvl/xen/tree/debian/knorrie/4.12/fast-singlestep

Rebased on master:
https://github.com/skvl/xen/tree/fast-singlestep

Signed-off-by: Sergey Kovalev <valor@list.ru>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agox86/time: update vtsc_last with cmpxchg and drop vtsc_lock
Igor Druzhinin [Fri, 20 Dec 2019 15:44:38 +0000 (16:44 +0100)]
x86/time: update vtsc_last with cmpxchg and drop vtsc_lock

Now that vtsc_last is the only entity protected by vtsc_lock we can
simply update it using a single atomic operation and drop the spinlock
entirely. This is extremely important for the case of running nested
(e.g. shim instance with lots of vCPUs assigned) since if preemption
happens somewhere inside the critical section that would immediately
mean that other vCPU stop progressing (and probably being preempted
as well) waiting for the spinlock to be freed.

This fixes constant shim guest boot lockups with ~32 vCPUs if there is
vCPU overcommit present (which increases the likelihood of preemption).

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: explicitly disallow guest access to PPIN
Jan Beulich [Fri, 20 Dec 2019 15:30:13 +0000 (16:30 +0100)]
x86: explicitly disallow guest access to PPIN

To fulfill the "protected" in its name, don't let the real hardware
values leak. While we could report a control register value expressing
this (which I would have preferred), unconditionally raise #GP for all
accesses (in the interest of getting this done).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/apic: allow enabling x2APIC mode regardless of interrupt remapping
Roger Pau Monné [Fri, 20 Dec 2019 15:29:22 +0000 (16:29 +0100)]
x86/apic: allow enabling x2APIC mode regardless of interrupt remapping

x2APIC mode doesn't mandate interrupt remapping, and hence can be
enabled independently. This patch enables x2APIC when available,
regardless of whether there's interrupt remapping support.

This is beneficial specially when running on virtualized environments,
since it reduces the amount of vmexits. For example when sending an
IPI in xAPIC mode Xen performs at least 3 different accesses to the
APIC MMIO region, while when using x2APIC mode a single wrmsr is used.

The following numbers are from a lock profiling of a Xen PV shim
running a Linux PV kernel with 32 vCPUs and xAPIC mode:

(XEN) Global lock flush_lock: addr=ffff82d0804af1c0, lockval=03190319, not locked
(XEN)   lock:656153(892606463454), block:602183(9495067321843)

Average lock time:   1360363ns
Average block time: 15767743ns

While the following are from the same configuration but with the shim
using x2APIC mode:

(XEN) Global lock flush_lock: addr=ffff82d0804b01c0, lockval=1adb1adb, not locked
(XEN)   lock:1841883(1375128998543), block:1658716(10193054890781)

Average lock time:   746588ns
Average block time: 6145147ns

Enabling x2APIC has halved the average lock time, thus reducing
contention.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/smp: check APIC ID on AP bringup
Roger Pau Monné [Fri, 20 Dec 2019 15:28:27 +0000 (16:28 +0100)]
x86/smp: check APIC ID on AP bringup

Check that the processor to be woken up APIC ID is addressable in the
current APIC mode.

Note that in practice systems with APIC IDs > 255 should already have
x2APIC enabled by the firmware, and hence this is mostly a safety
belt.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/apic: force phys mode if interrupt remapping is disabled
Roger Pau Monné [Fri, 20 Dec 2019 15:27:48 +0000 (16:27 +0100)]
x86/apic: force phys mode if interrupt remapping is disabled

Cluster mode can only be used with interrupt remapping support, since
the top 16bits of the APIC ID are filled with the cluster ID, and
hence on systems where the physical ID is still smaller than 255 the
cluster ID is not. Force x2APIC to use physical mode if there's no
interrupt remapping support.

Note that this requires a further patch in order to enable x2APIC
without interrupt remapping support.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/ioapic: only use dest32 with x2apic and interrupt remapping enabled
Roger Pau Monné [Fri, 20 Dec 2019 15:26:09 +0000 (16:26 +0100)]
x86/ioapic: only use dest32 with x2apic and interrupt remapping enabled

The IO-APIC code assumes that x2apic being enabled also implies
interrupt remapping being enabled, and hence will use the 32bit
destination field in the IO-APIC entry.

This is safe now, but there's no reason to not enable x2APIC even
without interrupt remapping, and hence the IO-APIC code needs to use
the 32 bit destination field only when both interrupt remapping and
x2APIC are enabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>