]> xenbits.xensource.com Git - people/dwmw2/xen.git/log
people/dwmw2/xen.git
5 years agox86/setup: simplify handling of initrdidx when no initrd present lu-master-v3
David Woodhouse [Thu, 30 Jan 2020 11:06:07 +0000 (11:06 +0000)]
x86/setup: simplify handling of initrdidx when no initrd present

Remove a ternary operator that made my brain hurt and replace it with
something simpler that makes it clearer that the >= mbi->mods_count
is because of what find_first_bit() returns when it doesn't find
anything. Just have a simple condition to set initrdidx to zero in
that case, and a much simpler ternary operator in the create_dom0()
call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: finish plumbing in live update path through __start_xen()
David Woodhouse [Thu, 30 Jan 2020 09:53:28 +0000 (09:53 +0000)]
x86/setup: finish plumbing in live update path through __start_xen()

With this we are fairly much done hacking up __start_xen() to support
live update. The live update functions themselves are still stubs,
but now we can start populating those with actual save/restore of
domain information.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: lift dom0 creation out into create_dom0 function
David Woodhouse [Thu, 30 Jan 2020 08:49:59 +0000 (08:49 +0000)]
x86/setup: lift dom0 creation out into create_dom0 function

It's about to become optional as __start_xen() grows a different path
for live update, so move it out of the way.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd shell of lu_reserve_pages()
David Woodhouse [Wed, 29 Jan 2020 15:52:06 +0000 (15:52 +0000)]
Add shell of lu_reserve_pages()

This currently only iterates over the records and prints the version of
Xen that we're live updating from.

In the fullness of time, it will also reserve the pages passed over as
M2P as well as the pages belonging to preserved domains.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd LU_VERSION and LU_END records to live update stream
David Woodhouse [Mon, 27 Jan 2020 23:46:42 +0000 (23:46 +0000)]
Add LU_VERSION and LU_END records to live update stream

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd lu_stream_{open,close,append}_record()
David Woodhouse [Mon, 27 Jan 2020 23:46:19 +0000 (23:46 +0000)]
Add lu_stream_{open,close,append}_record()

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoMigrate migration stream definitions into Xen public headers
David Woodhouse [Mon, 27 Jan 2020 16:54:01 +0000 (16:54 +0000)]
Migrate migration stream definitions into Xen public headers

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoStart documenting the live update handover
David Woodhouse [Mon, 27 Jan 2020 15:41:58 +0000 (15:41 +0000)]
Start documenting the live update handover

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoDetect live update breadcrumb at boot and map data stream
David Woodhouse [Thu, 16 Jan 2020 14:14:50 +0000 (15:14 +0100)]
Detect live update breadcrumb at boot and map data stream

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: move vm_init() before end_boot_allocator()
David Woodhouse [Wed, 22 Jan 2020 13:02:14 +0000 (13:02 +0000)]
x86/setup: move vm_init() before end_boot_allocator()

We would like to be able to use vmap() to map the live update data, and
we need to do a first pass of the live update data before we prime the
heap because we need to know which pages need to be preserved.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/vmap: allow vmap() to be called during early boot
David Woodhouse [Wed, 22 Jan 2020 12:41:49 +0000 (12:41 +0000)]
xen/vmap: allow vmap() to be called during early boot

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/vmap: allow vm_init_type to be called during early_boot
Wei Liu [Wed, 12 Dec 2018 12:17:09 +0000 (12:17 +0000)]
xen/vmap: allow vm_init_type to be called during early_boot

We want to move vm_init, which calls vm_init_type under the hood, to
early boot stage. Add a path to get page from boot allocator instead.

Add an emacs block to that file while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
5 years agoDon't add bad pages above HYPERVISOR_VIRT_END to the domheap
David Woodhouse [Tue, 21 Jan 2020 14:05:21 +0000 (14:05 +0000)]
Don't add bad pages above HYPERVISOR_VIRT_END to the domheap

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd basic lu_save_all() shell
David Woodhouse [Thu, 16 Jan 2020 13:18:55 +0000 (14:18 +0100)]
Add basic lu_save_all() shell

5 years agoAdd kimage_add_live_update_data()
David Woodhouse [Wed, 15 Jan 2020 17:46:54 +0000 (18:46 +0100)]
Add kimage_add_live_update_data()

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd basic live update stream creation
David Woodhouse [Thu, 16 Jan 2020 12:55:44 +0000 (13:55 +0100)]
Add basic live update stream creation

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd IND_WRITE64 primitive to kexec kimage
David Woodhouse [Wed, 15 Jan 2020 16:58:44 +0000 (17:58 +0100)]
Add IND_WRITE64 primitive to kexec kimage

This allows a single page-aligned physical address to be written to
the current destination, intended to pass the location of the live
update data stream from one Xen to the next.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd KEXEC_TYPE_LIVE_UPDATE
David Woodhouse [Wed, 15 Jan 2020 16:57:08 +0000 (17:57 +0100)]
Add KEXEC_TYPE_LIVE_UPDATE

This is identical to the default case... for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd KEXEC_RANGE_MA_LIVEUPDATE
David Woodhouse [Thu, 12 Dec 2019 17:02:10 +0000 (17:02 +0000)]
Add KEXEC_RANGE_MA_LIVEUPDATE

This allows kexec userspace to tell the next Xen where the range is,
on its command line.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoReserve live update memory regions
David Woodhouse [Thu, 16 Jan 2020 08:51:45 +0000 (09:51 +0100)]
Reserve live update memory regions

The live update handover requires that a region of memory be reserved
for the new Xen to use in its boot allocator. The original Xen may use
that memory but not for any pages which are mapped to domains, or which
would need to be preserved across the live update for any other reason.

The same constraints apply to initmem pages freed from the Xen image,
since the new Xen will be loaded into the same physical location as the
previous Xen.

There is separate work ongoing which will make the xenheap meet this
requirement by eliminating share_xen_page_with_guest(). For the meantime,
just don't add those pages to the heap at all in the live update case.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/boot: Reserve live update boot memory
David Woodhouse [Mon, 9 Dec 2019 16:32:01 +0000 (16:32 +0000)]
x86/boot: Reserve live update boot memory

For live update to work, it will need a region of memory that can be
given to the boot allocator while it parses the state information from
the previous Xen and works out which of the other pages of memory it
can consume.

Reserve that like the crashdump region, and accept it on the command
line. Use only that region for early boot, and register the remaining
RAM (all of it for now, until the real live update happens) later.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: Don't skip 2MiB underneath relocated Xen image
David Woodhouse [Mon, 2 Dec 2019 16:39:00 +0000 (16:39 +0000)]
x86/setup: Don't skip 2MiB underneath relocated Xen image

Set 'e' correctly to reflect the location that Xen is actually relocated
to from its default 2MiB location. Not 2MiB below that.

This is only vaguely a bug fix. The "missing" 2MiB would have been used
in the end, and fed to the allocator. It's just that other things don't
get to sit right up *next* to the Xen image, and it isn't very tidy.

For live update, I'd quite like a single contiguous region for the
reserved bootmem and Xen, allowing the 'slack' in the former to be used
when Xen itself grows larger. Let's not allow 2MiB of random heap pages
to get in the way...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/mm: remove donate_page()
Paul Durrant [Fri, 24 Jan 2020 15:31:03 +0000 (15:31 +0000)]
xen/mm: remove donate_page()

This function was only ever used by TMEM, so had its sole caller dropped by
c/s c492e19fdd "xen: remove tmem from hypervisor".

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Julien Grall <julien@xen.org>
5 years agox86/hvm: make domain_destroy() method optional
Paul Durrant [Fri, 24 Jan 2020 15:30:59 +0000 (15:30 +0000)]
x86/hvm: make domain_destroy() method optional

This method is currently empty for SVM so make it optional and, while in
the neighbourhood, make it an alternative_vcall().

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/hvm: add domain_relinquish_resources() method
Paul Durrant [Fri, 24 Jan 2020 15:30:58 +0000 (15:30 +0000)]
x86/hvm: add domain_relinquish_resources() method

There are two functions in hvm.c to deal with tear-down and a domain:
hvm_domain_relinquish_resources() and hvm_domain_destroy(). However, only
the latter has an associated method in 'hvm_funcs'. This patch adds
a method for the former.

A subsequent patch will define a VMX implementation.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/vmx: make apic_access_mfn type-safe
Paul Durrant [Fri, 24 Jan 2020 15:30:57 +0000 (15:30 +0000)]
x86/vmx: make apic_access_mfn type-safe

Use mfn_t rather than unsigned long.  Fix vmx_free_vlapic_mapping() to be
fully idempotent by avoiding a double free, but the sentinal needs to remain
as _mfn(0) to be safe even in the case that vmx_alloc_vlapic_mapping() hasn't
been called.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/libxl: Reposition build_pre() logic between architectures
Andrew Cooper [Fri, 20 Dec 2019 17:13:41 +0000 (17:13 +0000)]
tools/libxl: Reposition build_pre() logic between architectures

The call to xc_domain_disable_migrate() is made only from x86, while its
handling in Xen is common.  Move it to the libxl__build_pre().

hvm_set_conf_params(), hvm_set_viridian_features(),
hvm_set_mca_capabilities(), and the altp2m logic is all in common code (parts
ifdef'd) but despite this, is all actually x86 specific, as least as currently
implemented in Xen.  Some concepts (nested virt, altp2m) are common in
principle, but need their interface changing to be part of domain_create, and
are not expecting to survive in their current HVM_PARAM form.

Move it all into x86 specific code, and fold all of the xc_hvm_param_set()
calls together into hvm_set_conf_params() in a far more coherent way.

Finally - ensure that all hypercalls have their return values checked.

No practical change in constructed domains.  Fewer useless hypercalls now to
construct an ARM guest.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoxen/list: Remove prefetching
Andrew Cooper [Tue, 14 Jan 2020 19:54:04 +0000 (19:54 +0000)]
xen/list: Remove prefetching

Xen inherited its list infrastructure from Linux.  One area where has fallen
behind is that of prefetching, which as it turns out is a performance penalty
in most cases.

Prefetch of NULL on x86 is now widely measured to have glacial performance
properties, and will unconditionally hit on every hlist use due to the
termination condition.

Cross-port the following Linux patches:

  75d65a425c (2011) "hlist: remove software prefetching in hlist iterators"
  e66eed651f (2011) "list: remove prefetching from regular list iterators"
  c0d15cc7ee (2013) "linked-list: Remove __list_for_each"

to Xen, which results in the following net diffstat on x86:

  add/remove: 0/1 grow/shrink: 27/83 up/down: 576/-1648 (-1072)

(The code additions comes from a few now-inlined functions, and slightly
different basic block padding.)

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agolibxl: Fix comment about dcs.sdss
Anthony PERARD [Thu, 23 Jan 2020 16:56:46 +0000 (16:56 +0000)]
libxl: Fix comment about dcs.sdss

The field 'sdss' was named 'dmss' before, commit 3148bebbf0ab did the
renamed but didn't update the comment.

Fixes: 3148bebbf0ab ("libxl: rename a field in libxl__domain_create_state")
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/test/livepatch: remove include of Config.mk
Anthony PERARD [Fri, 17 Jan 2020 10:53:52 +0000 (10:53 +0000)]
xen/test/livepatch: remove include of Config.mk

livepatch/Makefile seems to only be used via Rules.mk, which already
includes Config.mk, avoid the second include.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
5 years agoxen/build: Remove left over -DMAX_PHYS_IRQS
Anthony PERARD [Fri, 17 Jan 2020 10:53:47 +0000 (10:53 +0000)]
xen/build: Remove left over -DMAX_PHYS_IRQS

The use of MAX_PHYS_IRQS have been removed in cf5e6f2d3441 ("x86:
eliminate hard-coded NR_IRQS"), so remove the left over CFLAGS.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: make CONFIG_DEBUG_LOCKS usable without CONFIG_DEBUG
Juergen Gross [Tue, 21 Jan 2020 10:13:01 +0000 (11:13 +0100)]
xen: make CONFIG_DEBUG_LOCKS usable without CONFIG_DEBUG

In expert mode it is possible to enable CONFIG_DEBUG_LOCKS without
having enabled CONFIG_DEBUG. The coding is depending on CONFIG_DEBUG
as it is using ASSERT(), however.

Fix that by using BUG_ON() instead of ASSERT() in rel_lock().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/libxl: Code-gen improvements for libxl_save_msgs_gen.pl
Andrew Cooper [Fri, 20 Dec 2019 12:42:47 +0000 (12:42 +0000)]
tools/libxl: Code-gen improvements for libxl_save_msgs_gen.pl

our @msgs() is an array of $msginfo's where the first element is a
unique number.  The $msgnum_used check ensures they are unique.  Instead
if specifying them explicitly, generate msgnum locally.  This reduces
the diff necessary to edit the middle of the @msgs() array.

All other hunks are adjusting formatting in the generated C, to make it
easier to follow.

No change in behaviour of the generated C.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agox86/mem_access: move _ve functions to x86 header
Tamas K Lengyel [Fri, 24 Jan 2020 13:56:21 +0000 (06:56 -0700)]
x86/mem_access: move _ve functions to x86 header

These functions don't belong in the common mem_access header as there is no #VE
equivalent on ARM.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoRevert "tools/libxl: Plumb domain_create_state down into libxl__build_pre()"
Andrew Cooper [Fri, 24 Jan 2020 14:53:09 +0000 (14:53 +0000)]
Revert "tools/libxl: Plumb domain_create_state down into libxl__build_pre()"

This reverts commit aacc143006429de46932aabae17c13846c71fa45.

OSSTest reports that it breaks stubdoms.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoArm/p2m: fix build after ea22bcd030da and 2aa977eb6baa
Jan Beulich [Fri, 24 Jan 2020 12:48:13 +0000 (13:48 +0100)]
Arm/p2m: fix build after ea22bcd030da and 2aa977eb6baa

Each of these commits introduced a function prototype referencing a
structure which hadn't at least been forward declared. Add such
declarations.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/microcode: use const qualifier for microcode buffer
Eslam Elnikety [Fri, 24 Jan 2020 09:31:55 +0000 (10:31 +0100)]
x86/microcode: use const qualifier for microcode buffer

The buffer holding the microcode bits should be marked as const.

Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/microcode: avoid unnecessary xmalloc/memcpy of ucode data
Eslam Elnikety [Fri, 24 Jan 2020 09:31:21 +0000 (10:31 +0100)]
x86/microcode: avoid unnecessary xmalloc/memcpy of ucode data

When using `ucode=scan` and if a matching module is found, the microcode
payload is maintained in an xmalloc()'d region. This is unnecessary since
the bootmap would just do. Remove the xmalloc and xfree on the microcode
module scan path.

This commit also does away with the restriction on the microcode module
size limit. The concern that a large microcode module would consume too
much memory preventing guests launch is misplaced since this is all the
init path. While having such safeguards is valuable, this should apply
across the board for all early/late microcode loading. Having it just on
the `scan` path is confusing.

Looking forward, we are a bit closer (i.e., one xmalloc down) to pulling
the early microcode loading of the BSP a bit earlier in the early boot
process. This commit is the low hanging fruit. There is still a sizable
amount of work to get there as there are still a handful of xmalloc in
microcode_{amd,intel}.c.

First, there are xmallocs on the path of finding a matching microcode
update. Similar to the commit at hand, searching through the microcode
blob can be done on the already present buffer with no need to xmalloc
any further. Even better, do the filtering in microcode.c before
requesting the microcode update on all CPUs. The latter requires careful
restructuring and exposing the arch-specific logic for iterating over
patches and declaring a match.

Second, there are xmallocs for the microcode cache. Here, we would need
to ensure that the cache corresponding to the BSP gets xmalloc()'d and
populated after the fact.

Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/microcode: improve documentation for ucode=
Eslam Elnikety [Fri, 24 Jan 2020 09:30:54 +0000 (10:30 +0100)]
x86/microcode: improve documentation for ucode=

Specify applicability and the default value. Also state that, in case of
EFI, the microcode update blob specified in the EFI cfg takes precedence
over `ucode=scan`, if the latter is specified on Xen commend line.

No functional changes.

Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agosched: avoid cpumasks on stack in sched/core.c
Juergen Gross [Fri, 24 Jan 2020 09:30:05 +0000 (10:30 +0100)]
sched: avoid cpumasks on stack in sched/core.c

There are still several instances of cpumask_t on the stack in
scheduling code. Avoid them as far as possible.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agox86/mem_sharing: Skip xen heap pages in memshr nominate
Tamas K Lengyel [Fri, 24 Jan 2020 09:28:56 +0000 (10:28 +0100)]
x86/mem_sharing: Skip xen heap pages in memshr nominate

Trying to share these would fail anyway, better to skip them early.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: enable mem_sharing on first memop
Tamas K Lengyel [Fri, 24 Jan 2020 09:28:22 +0000 (10:28 +0100)]
x86/mem_sharing: enable mem_sharing on first memop

It is wasteful to require separate hypercalls to enable sharing on both the
parent and the client domain during VM forking. To speed things up we enable
sharing on the first memop in case it wasn't already enabled.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: convert MEM_SHARING_DESTROY_GFN to a bool
Tamas K Lengyel [Fri, 24 Jan 2020 09:27:35 +0000 (10:27 +0100)]
x86/mem_sharing: convert MEM_SHARING_DESTROY_GFN to a bool

MEM_SHARING_DESTROY_GFN is used on the 'flags' bitfield during unsharing.
However, the bitfield is not used for anything else, so just convert it to a
bool instead.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: make add_to_physmap static and shorten name
Tamas K Lengyel [Fri, 24 Jan 2020 09:25:47 +0000 (10:25 +0100)]
x86/mem_sharing: make add_to_physmap static and shorten name

It's not being called from outside mem_sharing.c

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: use INVALID_MFN and p2m_is_shared in relinquish_shared_pages
Tamas K Lengyel [Fri, 24 Jan 2020 09:25:12 +0000 (10:25 +0100)]
x86/mem_sharing: use INVALID_MFN and p2m_is_shared in relinquish_shared_pages

While using _mfn(0) is of no consequence during teardown, INVALID_MFN is the
correct value that should be used.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: define mem_sharing_domain to hold some scattered variables
Tamas K Lengyel [Fri, 24 Jan 2020 09:24:18 +0000 (10:24 +0100)]
x86/mem_sharing: define mem_sharing_domain to hold some scattered variables

Create struct mem_sharing_domain under hvm_domain and move mem sharing
variables into it from p2m_domain and hvm_domain.

Expose the mem_sharing_enabled macro to be used consistently across Xen.

Remove some duplicate calls to mem_sharing_enabled in mem_sharing.c

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mem_sharing: don't try to unshare twice during page fault
Tamas K Lengyel [Fri, 24 Jan 2020 09:21:16 +0000 (10:21 +0100)]
x86/mem_sharing: don't try to unshare twice during page fault

The page was already tried to be unshared in get_gfn_type_access. If that
didn't work, then trying again is pointless. Don't try to send vm_event again
either, simply check if there is a ring or not.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mem_sharing: drop flags from mem_sharing_unshare_page
Tamas K Lengyel [Fri, 24 Jan 2020 09:19:42 +0000 (10:19 +0100)]
x86/mem_sharing: drop flags from mem_sharing_unshare_page

All callers pass 0 in.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mem_sharing: make get_two_gfns take locks conditionally
Tamas K Lengyel [Fri, 24 Jan 2020 09:18:10 +0000 (10:18 +0100)]
x86/mem_sharing: make get_two_gfns take locks conditionally

During VM forking the client lock will already be taken.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mm: Make use of the default access param from xc_altp2m_create_view
Alexandru Stefan ISAILA [Fri, 17 Jan 2020 13:31:33 +0000 (13:31 +0000)]
x86/mm: Make use of the default access param from xc_altp2m_create_view

At this moment the default_access param from xc_altp2m_create_view is
not used.

This patch assigns default_access to p2m->default_access at the time of
initializing a new altp2m view.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Reviewed-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mm: Pull vendor-independent altp2m code out of p2m-ept.c and into p2m.c
Alexandru Stefan ISAILA [Fri, 17 Jan 2020 13:31:31 +0000 (13:31 +0000)]
x86/mm: Pull vendor-independent altp2m code out of p2m-ept.c and into p2m.c

No functional changes.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/altp2m: Add hypercall to set a range of sve bits
Alexandru Stefan ISAILA [Fri, 17 Jan 2020 13:31:30 +0000 (13:31 +0000)]
x86/altp2m: Add hypercall to set a range of sve bits

By default the sve bits are not set.
This patch adds a new hypercall, xc_altp2m_set_supress_ve_multi(),
to set a range of sve bits.
The core function, p2m_set_suppress_ve_multi(), does not break in case
of a error and it is doing a best effort for setting the bits in the
given range. A check for continuation is made in order to have
preemption on large ranges.
The gfn of the first error is stored in
xen_hvm_altp2m_suppress_ve_multi.first_error_gfn and the error code is
stored in xen_hvm_altp2m_suppress_ve_multi.first_error.
If no error occurred the values will be 0.

Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/mm: Add array_index_nospec to guest provided index values
Alexandru Stefan ISAILA [Fri, 17 Jan 2020 13:31:26 +0000 (13:31 +0000)]
x86/mm: Add array_index_nospec to guest provided index values

This patch aims to sanitize indexes, potentially guest provided
values, for altp2m_eptp[] and altp2m_p2m[] arrays.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agox86/boot: Drop sym_fs()
Andrew Cooper [Thu, 9 Jan 2020 14:06:38 +0000 (14:06 +0000)]
x86/boot: Drop sym_fs()

All remaining users of sym_fs() can trivially be switched to using sym_esi()
instead.  This is shorter to encode and faster to execute.

This removes the final uses of %fs during boot, which allows us to drop
BOOT_FS from the trampoline GDT, which drops an 16M arbitrary limit on Xen's
compiled size.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Simplify pagetable manipulation loops
Andrew Cooper [Fri, 10 Jan 2020 01:04:28 +0000 (01:04 +0000)]
x86/boot: Simplify pagetable manipulation loops

For __page_tables_{start,end} and L3 bootmap initialisation, the logic is
unnecesserily complicated owing to its attempt to use the LOOP instruction,
which results in an off-by-8 memory address owing to LOOP's termination
condition.

Rewrite both loops for improved clarity and speed.

Misc notes:
 * TEST $IMM, MEM can't macrofuse.  The loop has 0x1200 iterations, so pull
   the $_PAGE_PRESENT constant out into a spare register to turn the TEST into
   its %REG, MEM form, which can macrofuse.
 * Avoid the use of %fs-relative references.  %esi-relative is the more common
   form in the code, and doesn't suffer an address generation overhead.
 * Avoid LOOP.  CMP/JB isn't microcoded and faster to execute in all cases.
 * For a 4 interation trivial loop, even compilers unroll these.  The
   generated code size is a fraction larger, but this is init and the asm is
   far easier to follow.
 * Reposition the l2=>l1 bootmap construction so the asm reads in pagetable
   level order.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Drop explicit %fs uses
Andrew Cooper [Thu, 9 Jan 2020 14:06:08 +0000 (14:06 +0000)]
x86/boot: Drop explicit %fs uses

The trampoline relocation code uses %fs for accessing Xen, and this comes with
an arbitrary 16M limitation.  We could adjust the limit, but the boot code is
a confusing mix of %ds/%esi-based and %fs-based accesses, and the use of %fs
is longer to encode, and incurs an address generation overhead.

Rewrite the logic to use %ds, for better consistency with the surrounding
code, and a marginal performance improvement.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Size the boot/directmap mappings dynamically
Andrew Cooper [Fri, 10 Jan 2020 14:05:29 +0000 (14:05 +0000)]
x86/boot: Size the boot/directmap mappings dynamically

... rather than presuming that 16M will do.  On the EFI side, use
l2e_add_flags() to reduce the code-generation overhead of using
l2e_from_paddr() twice.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/boot: Create the l2_xenmap[] mappings dynamically
Andrew Cooper [Fri, 10 Jan 2020 16:35:14 +0000 (16:35 +0000)]
x86/boot: Create the l2_xenmap[] mappings dynamically

The build-time construction of l2_xenmap[] imposes an arbitrary limit of 16M
total, which is a limit looking to be lifted.

Adjust both the BIOS and EFI paths to fill it in dynamically, based on the
final linked size of Xen.  l2_xenmap[] stays between __page_tables_{start,end}
(rather than move into .bss.page_aligned) as it is expected to gain a
different pagetable reference shortly.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/sched: add const qualifier where appropriate
Juergen Gross [Fri, 8 Nov 2019 16:15:35 +0000 (17:15 +0100)]
xen/sched: add const qualifier where appropriate

Make use of the const qualifier more often in scheduling code.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Meng Xu <mengxu@cis.upenn.edu>
5 years agoxen/sched: eliminate sched_tick_suspend() and sched_tick_resume()
Juergen Gross [Fri, 8 Nov 2019 15:33:32 +0000 (16:33 +0100)]
xen/sched: eliminate sched_tick_suspend() and sched_tick_resume()

sched_tick_suspend() and sched_tick_resume() only call rcu related
functions, so eliminate them and do the rcu_idle_timer*() calling in
rcu_idle_[enter|exit]().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Julien Grall <julien@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/sched: switch scheduling to bool where appropriate
Juergen Gross [Fri, 8 Nov 2019 11:50:58 +0000 (12:50 +0100)]
xen/sched: switch scheduling to bool where appropriate

Scheduling code has several places using int or bool_t instead of bool.
Switch those.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: replace null scheduler percpu-variable with pdata hook
Juergen Gross [Fri, 8 Nov 2019 11:16:10 +0000 (12:16 +0100)]
xen/sched: replace null scheduler percpu-variable with pdata hook

Instead of having an own percpu-variable for private data per cpu the
generic scheduler interface for that purpose should be used.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: use scratch cpumask instead of allocating it on the stack
Juergen Gross [Fri, 8 Nov 2019 08:15:04 +0000 (09:15 +0100)]
xen/sched: use scratch cpumask instead of allocating it on the stack

In rt scheduler there are three instances of cpumasks allocated on the
stack. Replace them by using cpumask_scratch.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Meng Xu <mengxu@cis.upenn.edu>
5 years agoxen/sched: remove special cases for free cpus in schedulers
Juergen Gross [Fri, 8 Nov 2019 07:02:53 +0000 (08:02 +0100)]
xen/sched: remove special cases for free cpus in schedulers

With the idle scheduler now taking care of all cpus not in any cpupool
the special cases in the other schedulers for no cpupool associated
can be removed.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: cleanup sched.h
Juergen Gross [Fri, 8 Nov 2019 09:56:42 +0000 (10:56 +0100)]
xen/sched: cleanup sched.h

There are some items in include/xen/sched.h which can be moved to
private.h as they are scheduler private.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: make sched-if.h really scheduler private
Juergen Gross [Thu, 7 Nov 2019 14:34:37 +0000 (15:34 +0100)]
xen/sched: make sched-if.h really scheduler private

include/xen/sched-if.h should be private to scheduler code, so move it
to common/sched/private.h and move the remaining use cases to
cpupool.c and core.c.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen/sched: move schedulers and cpupool coding to dedicated directory
Juergen Gross [Wed, 22 Jan 2020 14:06:43 +0000 (15:06 +0100)]
xen/sched: move schedulers and cpupool coding to dedicated directory

Move sched*c and cpupool.c to a new directory common/sched.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoVT-d: don't pass bridge devices to domain_context_mapping_one()
Jan Beulich [Wed, 22 Jan 2020 15:39:58 +0000 (16:39 +0100)]
VT-d: don't pass bridge devices to domain_context_mapping_one()

When passed a non-NULL pdev, the function does an owner check when it
finds an already existing context mapping. Bridges, however, don't get
passed through to guests, and hence their owner is always going to be
Dom0, leading to the assigment of all but one of the function of multi-
function PCI devices behind bridges to fail.

Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agox86/smp: use APIC ALLBUT destination shorthand when possible
Roger Pau Monné [Wed, 22 Jan 2020 15:38:39 +0000 (16:38 +0100)]
x86/smp: use APIC ALLBUT destination shorthand when possible

If the IPI destination mask matches the mask of online CPUs use the
APIC ALLBUT destination shorthand in order to send an IPI to all CPUs
on the system except the current one. This can only be safely used
when no CPU hotplug or unplug operations are taking place, no
offline CPUs or those have been onlined and parked, all CPUs in the
system have been accounted for (ie: the number of CPUs doesn't exceed
NR_CPUS and APIC IDs are below MAX_APICS) and there's no possibility
of CPU hotplug (ie: no disabled CPUs have been reported by the
firmware tables).

This is specially beneficial when using the PV shim, since using the
shorthand avoids performing an APIC register write (or multiple ones
if using xAPIC mode) for each destination when doing a global TLB
flush.

The lock time of flush_lock on a 32 vCPU guest using the shim in
x2APIC mode without the shorthand is:

Global lock flush_lock: addr=ffff82d0804b21c0, lockval=f602f602, not locked
  lock:228455938(79406065573135), block:205908580(556416605761539)

Average lock time: 347577ns

While the same guest using the shorthand:

Global lock flush_lock: addr=ffff82d0804b41c0, lockval=d9c4d9bc, cpu=12
  lock:1890775(416719148054), block:1663958(2500161282949)

Average lock time: 220395ns

Approximately a 1/3 improvement in the lock time.

Note that this requires locking the CPU maps (get_cpu_maps) which uses
a trylock. This is currently safe as all users of cpu_add_remove_lock
do a trylock, but will need reevaluating if non-trylock users appear.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/arm: gic: Remove pointless assertion against enum gic_sgi
Julien Grall [Sat, 18 Jan 2020 15:39:24 +0000 (15:39 +0000)]
xen/arm: gic: Remove pointless assertion against enum gic_sgi

The Arm Compiler will complain that the assertions ASSERT(sgi < 16) are
always true. This is because sgi is an item of the enum gic_sgi and
should always contain less than 16 SGIs.

Rather than using ASSERTs, introduce a new item in the enum that could
be checked against a build time.

Take the opportunity to remove the specific assigned values for each
item. This is fine because enum always starts at zero and values will be
assigned by increment of one. None of our code also rely on hardcoded
value.

[stefano: grammar fixes in commit message]

Signed-off-by: Julien Grall <julien@xen.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
CC: Andrii Anisov <andrii_anisov@epam.com>
5 years agoRevert "xen/arm32: setup: Give a xenheap page to the boot allocator"
Julien Grall [Thu, 16 Jan 2020 21:51:36 +0000 (21:51 +0000)]
Revert "xen/arm32: setup: Give a xenheap page to the boot allocator"

Since commit c61c1b4943 "xen/page_alloc: statically allocate
bootmem_region_list", the boot allocator does not use the first page of
the first region passed for its own purpose.

This reverts commit ae84f55353475f569daddb9a81ac0a6bc7772c90.

Signed-off-by: Julien Grall <julien@xen.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agogolang/xenlight: Don't leak memory on context open failure
George Dunlap [Fri, 17 Jan 2020 14:01:05 +0000 (14:01 +0000)]
golang/xenlight: Don't leak memory on context open failure

If libxl_ctx_alloc() returns an error, we need to destroy the logger
that we made.

Restructure the Close() method such that it checks for each resource
to be freed and then frees it.  This allows Close() to be come
idempotent, as well as to be a useful clean-up to a partially-created
context.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agogolang/xenlight: Errors are negative
George Dunlap [Thu, 26 Dec 2019 17:18:14 +0000 (17:18 +0000)]
golang/xenlight: Errors are negative

Commit 871e51d2d4 changed the sign on the xenlight error types (making
the values negative, same as the C-generated constants), but failed to
flip the sign in the Error() string function.  The result is that
ErrorNonspecific.String() prints "libxl error: 1" rather than the
human-readable error message.

Get rid of the whole issue by making libxlErrors a map, and mapping
actual error values to string, falling back to printing the actual
value of the Error type if it's not present.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agogo/xenlight: More informative error messages
George Dunlap [Thu, 26 Dec 2019 14:45:08 +0000 (14:45 +0000)]
go/xenlight: More informative error messages

If an error is encountered deep in a complicated data structure, it's
often difficult to tell where the error actually is.  Make the error
message from the generated toC() and fromC() structures more
informative by tagging which field being converted encountered the
error.  This will have the effect of giving a "stack trace" of the
failure inside a nested data structure.

NB that my version of python insists on reordering a couple of switch
statements for some reason; In other patches I've reverted those
changes, but in this case it's more difficult because they interact
with actual code changes.  I'll leave this here for now, as we're
going to remove helpers.gen.go from being tracked by git at some point
in the near future anyway.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agogo/xenlight: Fix CpuidPoliclyList conversion
George Dunlap [Thu, 26 Dec 2019 17:43:17 +0000 (17:43 +0000)]
go/xenlight: Fix CpuidPoliclyList conversion

Empty Go strings should be converted to `nil` libxl_cpuid_policy_list;
otherwise libxl_cpuid_parse_config gets confused.

Also, libxl_cpuid_policy_list returns a weird error, not a "normal"
libxl error; if it returns one of these non-standard errors, convert
it to ErrorInval.

Finally, make the fromC() method take a pointer, and set the value of
CpuidPolicyList such that it will generate a valid CpuidPolicyList in
response.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agogolang/xenlight: Do proper nil / NULL conversions for builtin Bitmap type
George Dunlap [Thu, 26 Dec 2019 17:40:33 +0000 (17:40 +0000)]
golang/xenlight: Do proper nil / NULL conversions for builtin Bitmap type

Similar to the autogenerated types, but for `builtin` Bitmap type.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agoIntroduce CHANGELOG.md
Paul Durrant [Mon, 13 Jan 2020 15:32:17 +0000 (15:32 +0000)]
Introduce CHANGELOG.md

As agreed during the 2020-01 community call [1] this patch introduces a
changelog, based on the principles explained at keepachangelog.com [2].
A new MAINTAINERS entry is also added, with myself as (currently sole)
maintainer.

[1] See C.2 at https://cryptpad.fr/pad/#/2/pad/edit/ERZtMYD5j6k0sv-NG6Htl-AJ/
[2] https://keepachangelog.com/en/1.0.0/

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Lars Kurth <lars.kurth@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolinkfarm: Exclude .*.tmp
Anthony PERARD [Wed, 15 Jan 2020 16:44:54 +0000 (16:44 +0000)]
linkfarm: Exclude .*.tmp

Exclude intermidiate files .*.tmp from the linkfarm, those are
generated by %.o:%.c rules in xen/Rules.mk when
CONFIG_ENFORCE_UNIQUE_SYMBOLS=y.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibxl: event: Document lifetime API for libxl_childproc_setmode
Ian Jackson [Fri, 17 Jan 2020 18:12:07 +0000 (18:12 +0000)]
libxl: event: Document lifetime API for libxl_childproc_setmode

There is already an identical comment for
libxl_osevent_register_hooks.

libxl_childproc_setmode's hooks parameter has the same property and
this should be documented.

Reported-by; George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agox86/smp: move and clean APIC helpers
Roger Pau Monné [Mon, 20 Jan 2020 11:48:05 +0000 (12:48 +0100)]
x86/smp: move and clean APIC helpers

Move __prepare_ICR{2}, apic_wait_icr_idle and
__default_send_IPI_shortcut to the top of the file, since they will be
used by send_IPI_mask in future changes.

While there, take the opportunity to remove the leading underscores,
drop the inline attribute, drop the default prefix from the shorthand
helper, change the return type of the prepare helpers to unsigned and
do some minor style cleanups.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoVT-d: dma_pte_clear_one() can't fail anymore
Jan Beulich [Mon, 20 Jan 2020 11:47:31 +0000 (12:47 +0100)]
VT-d: dma_pte_clear_one() can't fail anymore

Hence it's pointless for it to return an error indicator, and it's even
less useful for it to be __must_check. This is a result of commit
e8afe1124cc1 ("iommu: elide flushing for higher order map/unmap
operations") moving the TLB flushing out of the function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoVT-d: adjust log messages in domain_context_mapping_one()
Jan Beulich [Mon, 20 Jan 2020 11:46:13 +0000 (12:46 +0100)]
VT-d: adjust log messages in domain_context_mapping_one()

Add missing newlines, use %pd, and drop exclamation marks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoxen/char: scif-uart: Remove useless ASSERT condition
Artem Mygaiev [Wed, 9 Oct 2019 14:20:16 +0000 (17:20 +0300)]
xen/char: scif-uart: Remove useless ASSERT condition

cnt is unsigned, so always >=0

Coverity-ID: 1381848
Signed-off-by: Artem Mygaiev <artem_mygaiev@epam.com>
[julien: Update commit title]
Acked-by: Julien Grall <julien@xen.org>
5 years agobuild: fix dependency file generation with ENFORCE_UNIQUE_SYMBOLS=y
Jan Beulich [Fri, 17 Jan 2020 16:38:19 +0000 (17:38 +0100)]
build: fix dependency file generation with ENFORCE_UNIQUE_SYMBOLS=y

The recorded file, unless overridden by -MQ (or -MT) is that specified
by -o, which doesn't produce correct dependencies and hence will cause
failure to re-build when included files change.

Fixes: 81ecb38b83b0 ("build: provide option to disambiguate symbol names")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/shadow: use single (atomic) MOV for emulated writes
Jason Andryuk [Fri, 17 Jan 2020 15:19:16 +0000 (16:19 +0100)]
x86/shadow: use single (atomic) MOV for emulated writes

This is the corresponding change to the shadow code as made by
bf08a8a08a2e "x86/HVM: use single (atomic) MOV for aligned emulated
writes" to the non-shadow HVM code.

The bf08a8a08a2e commit message:
Using memcpy() may result in multiple individual byte accesses
(depending how memcpy() is implemented and how the resulting insns,
e.g. REP MOVSB, get carried out in hardware), which isn't what we
want/need for carrying out guest insns as correctly as possible. Fall
back to memcpy() only for accesses not 2, 4, or 8 bytes in size.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Tim Deegan <tim@xen.org>
5 years agox86/sm{e, a}p: do not enable SMEP/SMAP in PV shim by default on AMD
Igor Druzhinin [Fri, 17 Jan 2020 15:18:20 +0000 (16:18 +0100)]
x86/sm{e, a}p: do not enable SMEP/SMAP in PV shim by default on AMD

Due to AMD and Hygon being unable to selectively trap CR4 bit modifications
running 32-bit PV guest inside PV shim comes with significant performance
hit. Moreover, for SMEP in particular every time CR4.SMEP changes on context
switch to/from 32-bit PV guest, it gets trapped by L0 Xen which then
tries to perform global TLB invalidation for PV shim domain. This usually
results in eventual hang of a PV shim with at least several vCPUs.

Since the overall security risk is generally lower for shim Xen as it being
there more of a defense-in-depth mechanism, choose to disable SMEP/SMAP in
it by default on AMD and Hygon unless a user chose otherwise.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: adjust EFI-related build message
Jan Beulich [Fri, 17 Jan 2020 15:17:23 +0000 (16:17 +0100)]
x86: adjust EFI-related build message

As of commit 93249f7fc17c ("x86/efi: split compiler vs linker support"),
EFI support in xen.gz may be available even if no xen.efi gets
generated. Distinguish the cases when emitting the message.

Also drop the pointlessly (afaict) left use of $(filter ...) (needed
only when used in $(if ...)), from the ifeq() introduced by 7059afb202ff
("x86/Makefile: remove $(guard) use from $(TARGET).efi target").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86: refine link time stub area related assertion
Jan Beulich [Fri, 17 Jan 2020 15:15:28 +0000 (16:15 +0100)]
x86: refine link time stub area related assertion

While it has been me to introduce this, the use of | there has become
(and perhaps was from the very beginning) misleading. Rather than
avoiding the right side of it when linking the xen.efi intermediate file
at a different base address, make the expression cope with that case,
thus verifying placement on every step.

Furthermore the original check was too strict: We don't use one page per
CPU, so account for this as well. This involves moving the
STUBS_PER_PAGE definition and making DIV_ROUND_UP() accessible from
assembly (and hence the linker script); move a few other potentially
generally useful definitions along with it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/time: update TSC stamp on restore from deep C-state
Igor Druzhinin [Fri, 17 Jan 2020 15:11:20 +0000 (16:11 +0100)]
x86/time: update TSC stamp on restore from deep C-state

If ITSC is not available on CPU (e.g if running nested as PV shim)
then X86_FEATURE_NONSTOP_TSC is not advertised in certain cases, i.e.
all AMD and some old Intel processors. In which case TSC would need to
be restored on CPU from platform time by Xen upon exiting C-states.

As platform time might be behind the last TSC stamp recorded for the
current CPU, invariant of TSC stamp being always behind local TSC counter
is violated. This has an effect of get_s_time() going negative resulting
in eventual system hang or crash.

Fix this issue by updating local TSC stamp along with TSC counter write.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoget-maintainer.pl: Dont fall over when L: contains a display name
Lars Kurth [Fri, 17 Jan 2020 15:10:57 +0000 (16:10 +0100)]
get-maintainer.pl: Dont fall over when L: contains a display name

Prior to this change e-mail addresses of the form "display name
<email>" would result into empty output. Also see
https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg00753.html

Signed-off-by: Lars Kurth <lars.kurth@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>
5 years agox86/page: Remove bifurcated PAGE_HYPERVISOR constant
Andrew Cooper [Mon, 13 Jan 2020 12:42:09 +0000 (12:42 +0000)]
x86/page: Remove bifurcated PAGE_HYPERVISOR constant

Despite being vaguely aware, the difference between PAGE_HYPERVISOR in ASM and
C code has nevertheless caused several bugs I should have known better about,
and contributed to review confusion.

There are exactly 4 uses of these constants in asm code (and one is shortly
going to disappear).

Instead of creating the constants which behave differently between ASM and C
code, expose all the constants and use non-ambiguous non-NX ones in ASM.
Adjust the hiding to just _PAGE_NX, which contains a C ternary expression.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/libxc: Construct 32bit PV guests with L3 A/D bits set
Andrew Cooper [Tue, 14 Jan 2020 12:17:45 +0000 (12:17 +0000)]
tools/libxc: Construct 32bit PV guests with L3 A/D bits set

With the 32 PAE build of Xen gone, 32bit PV guests' top level pagetables no
longer behave exactly like PAE in hardware.

They should have A/D bits set, for the same performance reasons as apply to
other levels.  This brings the domain builder in line with how Xen constructs
a 32bit dom0.

As a purely code improvement, make use of range notation to initialise
identical values in adjacent array elements.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/libxl: Plumb domain_create_state down into libxl__build_pre()
Andrew Cooper [Thu, 2 Jan 2020 21:37:36 +0000 (21:37 +0000)]
tools/libxl: Plumb domain_create_state down into libxl__build_pre()

To fix CPUID handling, libxl__build_pre() is going to have to distinguish
between a brand new VM vs one which is being migrated-in/resumed.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agogolang/xenlight: implement array Go to C marshaling
Nick Rosbrook [Sat, 4 Jan 2020 21:00:53 +0000 (16:00 -0500)]
golang/xenlight: implement array Go to C marshaling

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agogolang/xenlight: implement keyed union Go to C marshaling
Nick Rosbrook [Sat, 4 Jan 2020 21:00:52 +0000 (16:00 -0500)]
golang/xenlight: implement keyed union Go to C marshaling

Since the C union cannot be directly populated, populate the fields of the
corresponding C struct defined in the cgo preamble, and then copy that
struct as bytes into the byte slice that Go uses as the union.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agogolang/xenlight: begin Go to C type marshaling
Nick Rosbrook [Sat, 4 Jan 2020 21:00:51 +0000 (16:00 -0500)]
golang/xenlight: begin Go to C type marshaling

Implement conversions for basic types such as strings and integer
types in toC functions.

Modify function signatures of toC implementations for builtin
types to be consistent with the signature of the generated toC
functions.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agox86/hvm: always expose x2APIC feature in max HVM cpuid policy
Roger Pau Monne [Tue, 24 Dec 2019 10:18:10 +0000 (11:18 +0100)]
x86/hvm: always expose x2APIC feature in max HVM cpuid policy

On hardware without x2APIC support Xen emulated local APIC will
provide such mode, and hence the feature should be set in the maximum
HVM cpuid policy.

Not exposing it in the maximum policy results in HVM domains not
getting such feature exposed unless it's also supported by the
underlying hardware.

This was regressed by c/s 3e0c8272f20 which caused x2APIC not to be enabled
unilaterally for HVM guests.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agolibxc/migration: Adjust layout of struct xc_sr_context
Andrew Cooper [Thu, 19 Dec 2019 21:19:35 +0000 (21:19 +0000)]
libxc/migration: Adjust layout of struct xc_sr_context

We are shortly going to want to introduce some common x86 fields, so having
x86_pv and x86_hvm as the top level objects is a problem.  Insert a
surrounding struct x86 and drop the x86 prefix from the pv/hvm objects.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agotools/migration: Formatting and style cleanup
Andrew Cooper [Thu, 5 Dec 2019 15:57:13 +0000 (15:57 +0000)]
tools/migration: Formatting and style cleanup

The code has devating from the prevailing style in many ways.  Adjust spacing,
indentation, position of operators, layout of multiline comments, removal of
superfluous comments, constness, trailing commas, and use of unqualified
'unsigned'.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agoxen/vcpu: Improve sanity checks in vcpu_create()
Andrew Cooper [Wed, 15 Jan 2020 18:44:18 +0000 (18:44 +0000)]
xen/vcpu: Improve sanity checks in vcpu_create()

The BUG_ON() is confusing to follow.  The (!is_idle_domain(d) || vcpu_id) part
is a vestigial remnant of architectures poisioning idle_vcpu[0] with non-NULL
pointers.

Now that idle_vcpu[0] is NULL on all architectures, and d->max_vcpus specified
before vcpu_create() is called, we can properly range check the requested
vcpu_id.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>