]> xenbits.xensource.com Git - people/dwmw2/xen.git/log
people/dwmw2/xen.git
5 years agolive update: document the handover protocol lu-master
David Woodhouse [Mon, 27 Jan 2020 15:41:58 +0000 (15:41 +0000)]
live update: document the handover protocol

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agolive update: define LU_VERSION record and add it to live update stream
David Woodhouse [Fri, 31 Jan 2020 23:43:44 +0000 (23:43 +0000)]
live update: define LU_VERSION record and add it to live update stream

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agolive update: add basic lu_save_all() shell
David Woodhouse [Thu, 16 Jan 2020 13:18:55 +0000 (14:18 +0100)]
live update: add basic lu_save_all() shell

This is the template for iterating over all domains and creating the
live update migration stream to pass the required information to the
next Xen. So far it only writes a single END record.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agolive update: add helper functions for handling migration records
David Woodhouse [Mon, 27 Jan 2020 23:46:19 +0000 (23:46 +0000)]
live update: add helper functions for handling migration records

On top of the basic byte stream handling for struct lu_stream, add
helper functions to create and process migration records of the form now
defined in <public/migration_stream.h>.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: finish plumbing in live update path through __start_xen()
David Woodhouse [Wed, 29 Jan 2020 15:52:06 +0000 (15:52 +0000)]
x86/setup: finish plumbing in live update path through __start_xen()

With this we are fairly much done modifying __start_xen() to support
live update. The live update functions themselves are still stubs,
but next we can start populating those with actual save/restore of
memory and domain information.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: detect live update breadcrumb at boot and map data stream
David Woodhouse [Thu, 16 Jan 2020 14:14:50 +0000 (15:14 +0100)]
x86/setup: detect live update breadcrumb at boot and map data stream

The breadcrumb is written to the first page of the reserved bootmem,
as the last instructions processed by kexec_reloc(). Check for it there
and follow it to the migration data stream.

Mark the pages of the sglist and the individual data stream PGC_allocated
so that init_heap_pages() won't touch them.

Also make lu_stream_free() remove the PGC_allocated flag. That function
is already used for the cleanup in the error case on kexec but the flag
should never be set in that case so clearing it will be a no-op.

Other pages which are handed across live update will also need to have
their PGC_allocated flag removed as part of the "rehabilitation" as
they are introduced directly into domains, etc.

They will remain in state PGC_state_uninitialised so that when they are
eventually returned to the heap, init_heap_pages() will process them
correctly and create node structures as appropriate, etc.

Note: we can't use PGC_state_inuse for this as we do want those pages
to be processed by init_heap_pages() if/when they are subsequently freed.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agolive update: add kimage_add_live_update_data()
David Woodhouse [Wed, 15 Jan 2020 17:46:54 +0000 (18:46 +0100)]
live update: add kimage_add_live_update_data()

This appends the IND_WRITE64 directives to the kimage, to write the
breadcrumb which leads the next Xen to the live update data stream.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agolive update: basic data stream handling
David Woodhouse [Thu, 16 Jan 2020 12:55:44 +0000 (13:55 +0100)]
live update: basic data stream handling

This adds the basic struct lu_stream which manages the scatter/gather
list of data pages, and helper functions to append to it and free it.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/kexec: add IND_WRITE64 primitive to kexec kimage
David Woodhouse [Wed, 15 Jan 2020 16:58:44 +0000 (17:58 +0100)]
x86/kexec: add IND_WRITE64 primitive to kexec kimage

This allows a single page-aligned physical address to be written to
the current destination, intended to place the 'breadcrumb' leading to
the live update data stream into the first page of the reserved boot
memory (which might be in active use by the running Xen, so can't be
written earlier).

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agokexec: add KEXEC_TYPE_LIVE_UPDATE
David Woodhouse [Wed, 15 Jan 2020 16:57:08 +0000 (17:57 +0100)]
kexec: add KEXEC_TYPE_LIVE_UPDATE

This is identical to the default case... for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agokexec: add KEXEC_RANGE_MA_LIVEUPDATE
David Woodhouse [Thu, 12 Dec 2019 17:02:10 +0000 (17:02 +0000)]
kexec: add KEXEC_RANGE_MA_LIVEUPDATE

This allows kexec userspace to tell the next Xen where the reserved
bootmem is, using the liveupdate= command line parameter.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: reserve live update memory regions
David Woodhouse [Thu, 16 Jan 2020 08:51:45 +0000 (09:51 +0100)]
x86/setup: reserve live update memory regions

The live update handover requires that a region of memory be reserved
for the new Xen to use in its boot allocator. The original Xen may use
that memory but not for any pages which are mapped to domains, or which
would need to be preserved across the live update for any other reason.

The same constraints apply to initmem pages freed from the Xen image,
since the new Xen will be loaded into the same physical location as the
previous Xen.

There is separate work ongoing which will make the xenheap meet this
requirement by eliminating share_xen_page_with_guest(). For the meantime,
just don't add those pages to the heap at all in the live update case.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/boot: Add CONFIG_LIVE_UPDATE and liveupdate= command line parameter
David Woodhouse [Fri, 31 Jan 2020 22:01:00 +0000 (22:01 +0000)]
x86/boot: Add CONFIG_LIVE_UPDATE and liveupdate= command line parameter

For live update to work, the newly booting Xen will need a region of
memory that can be given to the boot allocator while it parses the state
information from the previous Xen and works out which of the other pages
of memory it can consume.

Reserve that like the crashdump region, and accept it on the command
line. Use only that region for early boot, and register the remaining
RAM later.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agolibxc: migrate migration stream definitions into Xen public headers
David Woodhouse [Mon, 27 Jan 2020 16:54:01 +0000 (16:54 +0000)]
libxc: migrate migration stream definitions into Xen public headers

These data structures will be used for live update, which is basically
just live migration from one Xen to the next on the same machine via
in-memory data structures, instead of across the network.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Well-excellent-carry-on-then-by: Ian Jackson <ian.jackson@eu.citrix.com>
Go-with-that-for-now-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/setup: move vm_init() before acpi calls
David Woodhouse [Tue, 17 Mar 2020 23:22:30 +0000 (23:22 +0000)]
x86/setup: move vm_init() before acpi calls

For direct map removal and also for live update, we want to be able
to use vmap() early. Make it capable of using the boot allocator for
its page tables and data structures, and call vm_init() earlier in boot.

Combined effort from Wei, Hongyan and David.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: David Woodhouse <dwmw2@amazon.co.uk>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
5 years agox86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END
David Woodhouse [Tue, 21 Jan 2020 14:05:21 +0000 (14:05 +0000)]
x86/setup: Fix badpage= handling for memory above HYPERVISOR_VIRT_END

Bad pages are identified by get_platform_badpages() and with badpage=
on the command line.

The boot allocator currently automatically elides these from the regions
passed to it with init_boot_pages(). The xenheap is then initialised
with the pages which are still marked as free by the boot allocator when
end_boot_allocator() is called.

However, any memory above HYPERVISOR_VIRT_END is passed directly to
init_domheap_pages() later in __start_xen(), and the bad page list is
not consulted.

Fix this by marking those pages as PGC_broken in the frametable at the
time end_boot_allocator() runs, and then making init_heap_pages() skip
over any pages which are so marked.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/page_alloc: move bad page enumeration into elide_bad_pages() function
David Woodhouse [Mon, 3 Feb 2020 17:31:03 +0000 (17:31 +0000)]
xen/page_alloc: move bad page enumeration into elide_bad_pages() function

Right now only the boot allocator cares about bad pages, but this is a
bug. Pages added directly with init_domheap_pages() (such as anything
above HYPERVISOR_VIRT_END) don't get the bad pages filtered out.

As the first step toward fixing that, create a separate function for the
logic which parses the badpage= command line option and gathers the
other lists of pages to be avoided, and make it invoke a callback for
each bad page range.

Make it inline to avoid the excessive cost of indirect function calls
if retpoline is enabled. And use bad_smfn, bad_emfn as the variable
names instead of calling them pfns.

No functional change.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/mm: Introduce PGC_state_uninitialised
David Woodhouse [Fri, 7 Feb 2020 13:01:36 +0000 (13:01 +0000)]
xen/mm: Introduce PGC_state_uninitialised

It is possible for pages to enter general circulation without ever
being process by init_heap_pages().

For example, pages of the multiboot module containing the initramfs may
be assigned via assign_pages() to dom0 as it is created. And some code
including map_pages_to_xen() has checks on 'system_state' to determine
whether to use the boot or the heap allocator, but it seems impossible
to prove that pages allocated by the boot allocator are not subsequently
freed with free_heap_pages().

This actually works fine in the majority of cases; there are only a few
esoteric corner cases which init_heap_pages() handles before handing the
page range off to free_heap_pages():
 • Excluding MFN #0 to avoid inappropriate cross-zone merging.
 • Ensuring that the node information structures exist, when the first
   page(s) of a given node are handled.
 • High order allocations crossing from one node to another.

To handle this case, shift PG_state_inuse from its current value of
zero, to another value. Use zero, which is the initial state of the
entire frame table, as PG_state_uninitialised.

Fix a couple of assertions which were assuming that PG_state_inuse is
zero, and make them cope with the PG_state_uninitialised case too where
appopriate.

Finally, make free_heap_pages() call through to init_heap_pages() when
given a page range which has not been initialised. This cannot keep
recursing because init_heap_pages() will set each page state to
PGC_state_inuse before passing it back to free_heap_pages() for the
second time.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/mm: fold PGC_broken into PGC_state bits
David Woodhouse [Fri, 7 Feb 2020 11:56:50 +0000 (11:56 +0000)]
xen/mm: fold PGC_broken into PGC_state bits

Only PGC_state_offlining and PGC_state_offlined are valid in conjunction
with PGC_broken. The other two states (free and inuse) were never valid
for a broken page.

By folding PGC_broken in, we can have three bits for PGC_state which
allows up to 8 states, of which 6 are currently used and 2 are available
for new use cases.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: lift dom0 creation out into create_dom0() function
David Woodhouse [Fri, 31 Jan 2020 22:51:02 +0000 (22:51 +0000)]
x86/setup: lift dom0 creation out into create_dom0() function

The creation of dom0 can be relatively self-contained. Shift it into
a separate function and simplify __start_xen() a little bit.

This is a cleanup in its own right, but will be even more desireable
when live update provides an alternative path through __start_xen()
that doesn't involve creating a new dom0 at all.

Move the calculation of the 'initrd' parameter for create_dom0()
down past the cosmetic printk about NX support, because in the fullness
of time the whole initrd and create_dom0() part will be under the same
"not live update" conditional. And in the meantime it's just neater.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: simplify handling of initrdidx when no initrd present
David Woodhouse [Fri, 31 Jan 2020 22:41:30 +0000 (22:41 +0000)]
x86/setup: simplify handling of initrdidx when no initrd present

Remove a ternary operator that made my brain hurt.

Replace it with something simpler that makes it somewhat clearer that
the check for initrdidx < mbi->mods_count is because larger values are
what find_first_bit() will return when it doesn't find anything.

Also drop the explicit check for module #0 since that would be the
dom0 kernel and the corresponding bit is always clear in module_map.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Acked-by: Julien Grall <julien@xen.org>
5 years agoAdd -MP to CFLAGS along with -MMD.
David Woodhouse [Tue, 17 Mar 2020 14:21:22 +0000 (14:21 +0000)]
Add -MP to CFLAGS along with -MMD.

This causes gcc (yes, and clang) to emit phony targets for each dependency.

This means that when a header file is deleted, the C files which *used*
to include it will no longer stop building with bogus out-of-date
dependencies like this:

  make[5]: *** No rule to make target
  '/home/dwmw2/git/xen/xen/include/asm/hvm/svm/amd-iommu-proto.h',
  needed by 'p2m.o'. Stop.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agolibxl: fix cleanup bug in initiate_domain_create()
Paweł Marczewski [Fri, 13 Mar 2020 11:25:10 +0000 (11:25 +0000)]
libxl: fix cleanup bug in initiate_domain_create()

In case of errors, we immediately call domcreate_complete()
which cleans up the console_xswait object. Make sure it is initialized
before we start cleanup.

Signed-off-by: Paweł Marczewski <pawel@invisiblethingslab.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agolibfsimage: fix parentheses in macro parameters
Roger Pau Monne [Fri, 13 Mar 2020 08:45:58 +0000 (09:45 +0100)]
libfsimage: fix parentheses in macro parameters

VERIFY_DN_TYPE and VERIFY_OS_TYPE should use parentheses when
accessing the type parameter. Note that none of the current usages
require this, it's just done for correctness.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibfsimage: fix clang 10 build
Roger Pau Monne [Fri, 13 Mar 2020 08:45:57 +0000 (09:45 +0100)]
libfsimage: fix clang 10 build

clang complains with:

fsys_zfs.c:826:2: error: converting the enum constant to a boolean [-Werror,-Wint-in-bool-context]
        VERIFY_DN_TYPE(dn, DMU_OT_PLAIN_FILE_CONTENTS);
        ^
/wrkdirs/usr/ports/sysutils/xen-tools/work/xen-4.13.0/tools/libfsimage/zfs/../../../tools/libfsimage/zfs/fsys_zfs.h:74:11: note: expanded from macro 'VERIFY_DN_TYPE'
        if (type && (dnp)->dn_type != type) { \
                 ^
1 error generated.

Fix this by not forcing an implicit conversion of the enum into a
boolean and instead comparing with the 0 enumerator.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agotools/helpers: xen-init-dom0: Mark clear_domid_history() static
Julien Grall [Thu, 12 Mar 2020 20:24:07 +0000 (20:24 +0000)]
tools/helpers: xen-init-dom0: Mark clear_domid_history() static

xen-init-dom0 is a standalone binary, so all the functions but the
main() should be static.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Cc: paul@xen.org
Acked-by: Wei Liu <wl@xen.org>
5 years agoscripts: Replace tabs in locking.sh
Jason Andryuk [Thu, 12 Mar 2020 14:54:16 +0000 (10:54 -0400)]
scripts: Replace tabs in locking.sh

Replace two stray tabs with spaces to make the file whitespace
consistent.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agorcu: fix rcu_lock_domain()
Juergen Gross [Wed, 11 Mar 2020 12:18:49 +0000 (13:18 +0100)]
rcu: fix rcu_lock_domain()

rcu_lock_domain() misuses the domain structure as rcu lock, which is
working only as long as rcu_read_lock() isn't evaluating the lock.

Fix that by adding a rcu lock to struct domain and use that for
rcu_lock_domain().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agorcu: use rcu softirq for forcing quiescent state
Juergen Gross [Wed, 11 Mar 2020 12:17:41 +0000 (13:17 +0100)]
rcu: use rcu softirq for forcing quiescent state

As rcu callbacks are processed in __do_softirq() there is no need to
use the scheduling softirq for forcing quiescent state. Any other
softirq would do the job and the scheduling one is the most expensive.

So use the already existing rcu softirq for that purpose. For telling
apart why the rcu softirq was raised add a flag for the current usage.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agomemaccess: reduce include dependencies
Jan Beulich [Tue, 10 Mar 2020 16:06:57 +0000 (17:06 +0100)]
memaccess: reduce include dependencies

The common header doesn't itself need to include public/vm_event.h nor
public/memory.h. Drop their inclusion. This requires using the non-
typedef names in two prototypes and an inline function; by not changing
the callers and function definitions at the same time it'll remain
certain that the build would fail if the typedef itself was changed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agox86 / p2m: replace page_list check in p2m_alloc_table...
Paul Durrant [Tue, 10 Mar 2020 16:06:09 +0000 (17:06 +0100)]
x86 / p2m: replace page_list check in p2m_alloc_table...

... with a check of domain_tot_pages().

The check of page_list prevents the prior allocation of PGC_extra pages,
whereas what the code is trying to verify is that the toolstack has not
already RAM for the domain.

Signed-off-by: Paul Durrant <paul@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agovmevent: reduce include dependencies
Jan Beulich [Tue, 10 Mar 2020 14:38:25 +0000 (15:38 +0100)]
vmevent: reduce include dependencies

There's no need for virtually everything to include public/vm_event.h.
Move its inclusion out of sched.h. This requires using the non-typedef
name in p2m_mem_paging_resume()'s prototype; by not changing the
function definition at the same time it'll remain certain that the build
would fail if the typedef itself was changed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
5 years agoIOMMU: iommu_snoop is x86-only
Jan Beulich [Tue, 10 Mar 2020 14:37:30 +0000 (15:37 +0100)]
IOMMU: iommu_snoop is x86-only

In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor. Provide a #define for the opposite case.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agoIOMMU: iommu_qinval is x86-only
Jan Beulich [Tue, 10 Mar 2020 14:36:45 +0000 (15:36 +0100)]
IOMMU: iommu_qinval is x86-only

In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agoIOMMU: iommu_igfx is x86-only
Jan Beulich [Tue, 10 Mar 2020 14:35:57 +0000 (15:35 +0100)]
IOMMU: iommu_igfx is x86-only

In fact it's VT-d specific, but we don't have a way yet to build code
for just one vendor.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agoIOMMU: iommu_intpost is x86/HVM-only
Jan Beulich [Tue, 10 Mar 2020 14:33:56 +0000 (15:33 +0100)]
IOMMU: iommu_intpost is x86/HVM-only

Provide a #define for all other cases.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agoIOMMU: iommu_intremap is x86-only
Jan Beulich [Tue, 10 Mar 2020 14:32:16 +0000 (15:32 +0100)]
IOMMU: iommu_intremap is x86-only

Provide a #define for other cases; it didn't seem worthwhile to me to
introduce an IOMMU_INTREMAP Kconfig option at this point.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Paul Durrant <paul@xen.org>
5 years agox86/hap: improve hypervisor assisted guest TLB flush
Roger Pau Monné [Tue, 10 Mar 2020 14:30:27 +0000 (15:30 +0100)]
x86/hap: improve hypervisor assisted guest TLB flush

The current implementation of the hypervisor assisted flush for HAP is
extremely inefficient.

First of all there's no need to call paging_update_cr3, as the only
relevant part of that function when doing a flush is the ASID vCPU
flush, so just call that function directly.

Since hvm_asid_flush_vcpu is protected against concurrent callers by
using atomic operations there's no need anymore to pause the affected
vCPUs.

Finally the global TLB flush performed by flush_tlb_mask is also not
necessary, since we only want to flush the guest TLB state it's enough
to trigger a vmexit on the pCPUs currently holding any vCPU state, as
such vmexit will already perform an ASID/VPID update, and thus clear
the guest TLB.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/paging: add TLB flush hook
Roger Pau Monné [Tue, 10 Mar 2020 14:29:24 +0000 (15:29 +0100)]
x86/paging: add TLB flush hook

Add shadow and hap implementation specific helpers to perform guest
TLB flushes. Note that the code for both is exactly the same at the
moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
further patches that will add implementation specific optimizations to
them.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Tim Deegan <tim@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amzn.com> [viridian]
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86: refine APIC ID restriction
Jan Beulich [Tue, 10 Mar 2020 14:27:56 +0000 (15:27 +0100)]
x86: refine APIC ID restriction

Now that we distinguish "restricted" and "full" interrupt remapping
mode, the 8-bit-APIC-ID restriction also needs to be enforced for
"restricted".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoAMD/IOMMU: without XT, x2APIC needs to be forced into physical mode
Jan Beulich [Tue, 10 Mar 2020 14:25:58 +0000 (15:25 +0100)]
AMD/IOMMU: without XT, x2APIC needs to be forced into physical mode

The wider cluster mode APIC IDs aren't generally representable. Convert
the iommu_intremap variable into a tristate, allowing the AMD IOMMU
driver to signal this special restriction to the apic_x2apic_probe().
(Note: assignments to the variable get adjusted, while existing
consumers - all assuming a boolean property - are left alone.)

While we are not aware of any hardware/firmware with this as a
restriction, it is a situation which could be created on fully x2apic-
capable systems via firmware settings.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agogolang/xenlight: Fix handling of marshalling of empty elements for keyed unions
George Dunlap [Thu, 5 Mar 2020 11:34:07 +0000 (11:34 +0000)]
golang/xenlight: Fix handling of marshalling of empty elements for keyed unions

Keyed types in libxl_types.idl can have elements of type 'None'.  The
golang type generator (correctly) don't implement any union types for
these empty elements.  However, the toC and fromC helper generators
incorrectly treat these elements as invalid.

Consider for example, libxl_channelinfo.  The idl contains the
following keyed element:

    ("u", KeyedUnion(None, libxl_channel_connection, "connection",
           [("unknown", None),
            ("pty", Struct(None, [("path", string),])),
            ("socket", None),
           ])),

But the toC marshaller currently looks like this:

switch x.Connection {
case ChannelConnectionPty:
tmp, ok := x.ConnectionUnion.(ChannelinfoConnectionUnionPty)
if !ok {
return errors.New("wrong type for union key connection")
}
var pty C.libxl_channelinfo_connection_union_pty
if tmp.Path != "" {
pty.path = C.CString(tmp.Path)
}
ptyBytes := C.GoBytes(unsafe.Pointer(&pty), C.sizeof_libxl_channelinfo_connection_union_pty)
copy(xc.u[:], ptyBytes)
default:
return fmt.Errorf("invalid union key '%v'", x.Connection)
}

Which means toC() will fail for ChannelConnectionUnknown or
ChannelConnectionSocket.

Modify the generator to handle keyed union elements of type 'None'.
For fromC, set the value to 'nil'; for toC, leave things as-is.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Reviewed-by: Nick Rosbrook <rosbrookn@ainfosec.com>
5 years agogolang/xenlight: implement constructor generation
Nick Rosbrook [Mon, 2 Mar 2020 20:10:24 +0000 (15:10 -0500)]
golang/xenlight: implement constructor generation

Generate constructors for generated Go types. Call libxl_<type>_init so
the Go type can be properly initialized.

If a type has a keyed union field, add a parameter to the function
signature to set the key variable, and call the init function for the
keyed union.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>
5 years agoVT-d: fix and extend RMRR reservation check
Jan Beulich [Mon, 9 Mar 2020 09:00:26 +0000 (10:00 +0100)]
VT-d: fix and extend RMRR reservation check

First of all in commit d6573bc6e6b7 ("VT-d: check all of an RMRR for
being E820-reserved") along with changing the function used, the enum-
like value passed should have been changed too (to E820_*). Do so now.
(Luckily the actual values of RAM_TYPE_RESERVED and E820_RESERVED
match, so the breakage introduced was "only" latent.)

Furthermore one of my systems surfaces RMRR in an ACPI NVS E820 range.
The purpose of the check is just to make sure there won't be "ordinary"
mappings of these ranges, and domains (including Dom0) won't want to
use the region to e.g. put PCI device BARs there. The two ACPI related
E820 types are good enough for this purpose, so allow them as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
5 years agoMAINTAINERS: Update my entries (again again)
Paul Durrant [Fri, 6 Mar 2020 11:24:17 +0000 (11:24 +0000)]
MAINTAINERS: Update my entries (again again)

Unfortunately I need to stop using all my Amazon email addresses for all
open source work.

Signed-off-by: Paul Durrant <paul@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/hvm: allow ASID flush when v != current
Roger Pau Monné [Fri, 6 Mar 2020 09:18:13 +0000 (10:18 +0100)]
x86/hvm: allow ASID flush when v != current

Current implementation of hvm_asid_flush_vcpu is not safe to use
unless the target vCPU is either paused or the currently running one,
as it modifies the generation without any locking.

Fix this by using atomic operations when accessing the generation
field, both in hvm_asid_flush_vcpu_asid and other ASID functions. This
allows to safely flush the current ASID generation. Note that for the
flush to take effect if the vCPU is currently running a vmexit is
required.

Compilers will normally do such writes and reads as a single
instruction, so the usage of atomic operations is mostly used as a
safety measure.

Note the same could be achieved by introducing an extra field to
hvm_vcpu_asid that signals hvm_asid_handle_vmenter the need to call
hvm_asid_flush_vcpu on the given vCPU before vmentry, this however
seems unnecessary as hvm_asid_flush_vcpu itself only sets two vCPU
fields to 0, so there's no need to delay this to the vmentry ASID
helper.

This is not a bugfix as no callers that would violate the assumptions
listed in the first paragraph have been found, but a preparatory
change in order to allow remote flushing of HVM vCPUs.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: move as-option-add to xen/
Anthony PERARD [Fri, 6 Mar 2020 09:16:24 +0000 (10:16 +0100)]
build: move as-option-add to xen/

Only xen/ uses as-option-add and as-insn, so there aren't needed in
Config.mk.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: make tests in test/ directly
Anthony PERARD [Fri, 6 Mar 2020 09:16:07 +0000 (10:16 +0100)]
build: make tests in test/ directly

It is unnecessary to make _tests via Rules.mk because the target
use Rules.mk as well.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: run targets csopes,tags,.. without Rules.mk
Anthony PERARD [Fri, 6 Mar 2020 09:15:49 +0000 (10:15 +0100)]
build: run targets csopes,tags,.. without Rules.mk

Those targets make use of $(all_sources) which depends on TARGET_ARCH,
so we just need to set TARGET_ARCH earlier and once.

XEN_TARGET_ARCH isn't expected to change during the build, so
TARGET_SUBARCH and TARGET_ARCH aren't going to change either. Set them
once and for all in the Xen root Makefile. This allows to run more
targets without Rules.mk.

XEN_TARGET_ARCH is actually changed in arch/x86/boot/build32.mk, but
it doesn't use the TARGET_{,SUB}ARCH variables either, and doesn't use
Rules.mk (it replaces it).

TARGET_{,SUB}ARCH are no longer overridden because that would have
no effect on the values that Rules.mk will use.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: extract clean target from Rules.mk
Anthony PERARD [Fri, 6 Mar 2020 09:14:33 +0000 (10:14 +0100)]
build: extract clean target from Rules.mk

Most of the code executed by Rules.mk isn't necessary for the clean
target, especially not the CFLAGS. This patch makes running make clean
much faster.

The patch extract the clean target into a different Makefile,
Makefile.clean.

Since Makefile.clean, doesn't want to include Config.mk, we need to
define the variables DEPS_INCLUDE and DEPS in a place common to
Rules.mk and Makefile.clean, this is Kbuild.include. DEPS_RM is only
needed in Makefile.clean so can be defined there.

Even so Rules.mk includes Config.mk, it includes Kbuild.include after,
so the effective definition of DEPS_INCLUDE is "xen/" one and the
same one as used by Makefile.clean.

This is inspired by Kbuild, with Makefile.clean partially copied from
Linux v5.4.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: use $(clean) shorthand for clean targets
Anthony PERARD [Fri, 6 Mar 2020 09:14:18 +0000 (10:14 +0100)]
build: use $(clean) shorthand for clean targets

Collect all the clean targets as we are going to modify it shortly.
Also, this is inspired by Linux's Kbuild.

"Kbuild.include" isn't included by "Makefile", but the "_clean" target
is only used by Rules.mk which include Kbuild.include.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: use obj-y += subdir/ instead of subdir-y
Anthony PERARD [Fri, 6 Mar 2020 09:11:23 +0000 (10:11 +0100)]
build: use obj-y += subdir/ instead of subdir-y

This is part of upgrading our build system and import more of Linux's
one.

In Linux, subdir-y in Makefiles is only used to descend into
subdirectory when there are no object to build, Xen doesn't have that
and all subdir have object to be included in the final binary.

To allow the new syntax, the "obj-y" and "subdir-*" calculation in
Rules.mk is changed and partially imported from Linux's Kbuild.

The command used to modify the Makefile was:
    sed -i -r 's#^subdir-(.*)#obj-\1/#;' **/Makefile

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
5 years agox86/dom0: Fix build with clang
Andrew Cooper [Thu, 5 Mar 2020 17:57:37 +0000 (17:57 +0000)]
x86/dom0: Fix build with clang

find_memory() isn't marked as __init, so if it isn't fully inlined, it ends up
tripping:

  Error: size of dom0_build.o:.text is 0x0c1

Fixes: 73b47eea21 "x86/dom0: improve PVH initrd and metadata placement"
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agoxen/grant-table: Remove 'led' variable in map_grant_ref
Julien Grall [Tue, 25 Feb 2020 18:36:33 +0000 (18:36 +0000)]
xen/grant-table: Remove 'led' variable in map_grant_ref

The name of the variable 'led' is confusing and only used in one place a
line after. So remove it.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/grant-table: Remove outdated warning in gnttab_grow_table()
Julien Grall [Tue, 25 Feb 2020 12:32:49 +0000 (12:32 +0000)]
xen/grant-table: Remove outdated warning in gnttab_grow_table()

One of the warning message in gnttab_grow_table() refers to a function
was removed in commit 6425f91c72 "xen/gnttab: Fold grant_table_{create,
set_limits}() into grant_table_init()".

Since the commit, gt->active will be allocated while initializing the
grant table at domain creation. Therefore gt-active will always be
valid.

Rather than replacing the warning by another one, drop the check
completely as we will likely not come back to a semi-initialized world.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/x86: hap: Clean-up and harden hap_enable()
Julien Grall [Mon, 3 Feb 2020 23:57:05 +0000 (23:57 +0000)]
xen/x86: hap: Clean-up and harden hap_enable()

Unlike shadow_enable(), hap_enable() can only be called once during
domain creation and with the mode equal to
PG_external | PG_translate | PG_refcounts.

If it were called twice, then we might have some interesting problems
as the p2m tables would be re-allocated (and therefore all the mappings
would be lost).

Add code to sanity check the mode and that the function is only called
once. Take the opportunity to an if checking that PG_translate is set.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/x86: hap: Fix coding style in hap_enable()
Julien Grall [Mon, 3 Feb 2020 23:57:53 +0000 (23:57 +0000)]
xen/x86: hap: Fix coding style in hap_enable()

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
5 years agoiommu: fix check for autotranslated hardware domain
Roger Pau Monné [Thu, 5 Mar 2020 09:43:46 +0000 (10:43 +0100)]
iommu: fix check for autotranslated hardware domain

The current position of the check_hwdom_reqs is wrong, as there's a
is_iommu_enabled at the top of the function that will prevent getting
to the check on systems without an IOMMU, because the hardware domain
won't have the XEN_DOMCTL_CDF_iommu flag set.

Move the position of the check so it's done before the
is_iommu_enabled one, and thus attempts to create a translated
hardware domain without an IOMMU can be detected.

Fixes: f89f555827a ('remove late (on-demand) construction of IOMMU page tables')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/dom0: improve PVH initrd and metadata placement
Roger Pau Monné [Thu, 5 Mar 2020 09:43:15 +0000 (10:43 +0100)]
x86/dom0: improve PVH initrd and metadata placement

Don't assume there's going to be enough space at the tail of the
loaded kernel and instead try to find a suitable memory area where the
initrd and metadata can be loaded.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/mm: switch to new APIs in arch_init_memory
Wei Liu [Thu, 5 Mar 2020 09:42:18 +0000 (10:42 +0100)]
x86/mm: switch to new APIs in arch_init_memory

The function will map and unmap pages on demand.

Since we now map and unmap Xen PTE pages, we would like to track the
lifetime of mappings so that 1) we do not dereference memory through a
variable after it is unmapped, 2) we do not unmap more than once.
Therefore, we introduce the UNMAP_DOMAIN_PAGE macro to nullify the
variable after unmapping, and ignore NULL.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoallow only sizeof(bool) variables for boolean_param()
Juergen Gross [Thu, 5 Mar 2020 09:40:40 +0000 (10:40 +0100)]
allow only sizeof(bool) variables for boolean_param()

Support of other variable sizes than that of normal bool ones for
boolean_param() don't make sense, so catch any other sized variables
at build time.

Fix the one parameter using a plain int instead of bool.

Signed-off-by: Juergen Gross <jgross@suse.com>
[add __read_mostly]
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agolibxl: wait for console path before firing console_available
Paweł Marczewski [Tue, 3 Mar 2020 13:28:20 +0000 (14:28 +0100)]
libxl: wait for console path before firing console_available

If the path doesn't become available after LIBXL_INIT_TIMEOUT
seconds, fail the domain creation.

If we skip the bootloader, the TTY path will be set by xenconsoled.
However, there is no guarantee that this will happen by the time we
want to call the console_available callback, so we have to wait.

Signed-off-by: Paweł Marczewski <pawel@invisiblethingslab.com>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
5 years agoautomation: document vsyscall=emulate for old glibc
Wei Liu [Tue, 25 Feb 2020 12:10:48 +0000 (12:10 +0000)]
automation: document vsyscall=emulate for old glibc

Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Doug Goldstein <cardoe@cardoe.com>
5 years agoxen/arm: Workaround clang/armclang support for register allocation
Julien Grall [Mon, 17 Feb 2020 22:20:34 +0000 (22:20 +0000)]
xen/arm: Workaround clang/armclang support for register allocation

Clang 8.0 (see [1]) and by extent some of the version of armclang does
not support register allocation using the syntax rN.

Thankfully, both GCC [2] and clang are able to support the xN syntax for
Arm64. Introduce a new macro ASM_REG() and use in common code for
register allocation.

[1] https://reviews.llvm.org/rL328829
[2] https://gcc.gnu.org/onlinedocs/gcc/Machine-Constraints.html

Cc: Andrii Anisov <andrii_anisov@epam.com>
Signed-off-by: Julien Grall <julien@xen.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoMAINTAINERS: remove myself from REST and Public interfaces
Konrad Rzeszutek Wilk [Tue, 3 Mar 2020 15:04:03 +0000 (16:04 +0100)]
MAINTAINERS: remove myself from REST and Public interfaces

.due to -ENOTIME. Been busy with management and have had
not much chance to do anything besides that.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
5 years agoMAINTAINERS: update my email address (again)
Paul Durrant [Tue, 3 Mar 2020 15:03:35 +0000 (16:03 +0100)]
MAINTAINERS: update my email address (again)

It is now more convenient for me to use my @amzn.com address rather
than @amazon.com.

Signed-off-by: Paul Durrant <pdurrant@amzn.com>
5 years agoMAINTAINERS: Paul to co-maintain vendor-independent IOMMU code
Jan Beulich [Tue, 3 Mar 2020 15:03:13 +0000 (16:03 +0100)]
MAINTAINERS: Paul to co-maintain vendor-independent IOMMU code

Having just a single maintainer is not helpful anywhere, and can be
avoided here quite easily, seeing that Paul has been doing quite a bit
of IOMMU work lately.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
5 years agosched: fix error path in cpupool_unassign_cpu_start()
Juergen Gross [Tue, 3 Mar 2020 15:02:32 +0000 (16:02 +0100)]
sched: fix error path in cpupool_unassign_cpu_start()

In case moving away all domains from the cpu to be removed is failing
in cpupool_unassign_cpu_start() the error path is missing to release
sched_res_rculock.

The normal exit path is releasing domlist_read_lock instead (this is
currently no problem as the reference to the specific rcu lock is not
used by rcu_read_unlock()).

While at it indent the present error label by one space.

Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
5 years agocredit2: avoid NULL deref in csched2_res_pick() when tracing
Jan Beulich [Tue, 3 Mar 2020 15:01:30 +0000 (16:01 +0100)]
credit2: avoid NULL deref in csched2_res_pick() when tracing

The issue here results from one of the downsides of using goto: The
early "goto out" and "goto out_up" in the function very clearly bypass
any possible initialization of min_rqd, yet the tracing code at the end
of the function consumes the value. There's even a comment regarding the
trace record not being accurate in this case.

CID: 1460432
Fixes: 9c84bc004653 ("sched: rework credit2 run-queue allocation")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Dario Faggioli <dfaggioli@suse.com>
5 years agoxen: do live patching only from main idle loop
Juergen Gross [Tue, 11 Feb 2020 09:31:22 +0000 (10:31 +0100)]
xen: do live patching only from main idle loop

One of the main design goals of core scheduling is to avoid actions
which are not directly related to the domain currently running on a
given cpu or core. Live patching is one of those actions which are
allowed taking place on a cpu only when the idle scheduling unit is
active on that cpu.

Unfortunately live patching tries to force the cpus into the idle loop
just by raising the schedule softirq, which will no longer be
guaranteed to work with core scheduling active. Additionally there are
still some places in the hypervisor calling check_for_livepatch_work()
without being in the idle loop.

It is easy to force a cpu into the main idle loop by scheduling a
tasklet on it. So switch live patching to use tasklets for switching to
idle and raising scheduling events. Additionally the calls of
check_for_livepatch_work() outside the main idle loop can be dropped.

As tasklets are only running on idle vcpus and stop_machine_run()
is activating tasklets on all cpus but the one it has been called on
to rendezvous, it is mandatory for stop_machine_run() to be called on
an idle vcpu, too, as otherwise there is no way for scheduling to
activate the idle vcpu for the tasklet on the sibling of the cpu
stop_machine_run() has been called on.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
5 years agox86/mce: fix logic and comments around MSR_PPIN_CTL
Tony Luck [Mon, 2 Mar 2020 14:40:50 +0000 (15:40 +0100)]
x86/mce: fix logic and comments around MSR_PPIN_CTL

There are two implemented bits in the PPIN_CTL MSR:

Bit0: LockOut (R/WO)
      Set 1 to prevent further writes to MSR_PPIN_CTL.

Bit 1: Enable_PPIN (R/W)
       If 1, enables MSR_PPIN to be accessible using RDMSR.
       If 0, an attempt to read MSR_PPIN will cause #GP.

So there are four defined values:
0: PPIN is disabled, PPIN_CTL may be updated
1: PPIN is disabled. PPIN_CTL is locked against updates
2: PPIN is enabled. PPIN_CTL may be updated
3: PPIN is enabled. PPIN_CTL is locked against updates

Code would only enable the X86_FEATURE_INTEL_PPIN feature for case "2".
When it should have done so for both case "2" and case "3".

Fix the final test to just check for the enable bit.
Also fix some of the other comments in this function.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[Linux commit ???]

One of the adjusted comments doesn't exist in our code, and I disagree
with the adjustment to the other one and its associate code change: I
don't think there's a point trying to enable PPIN if the locked bit is
set. Hence it's just the main code change that gets pulled in, plus it
gets cloned to the AMD side.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agox86/mce: add Xeon Icelake to list of CPUs that support PPIN
Tony Luck [Mon, 2 Mar 2020 14:40:09 +0000 (15:40 +0100)]
x86/mce: add Xeon Icelake to list of CPUs that support PPIN

New CPU model, same MSRs to control and read the inventory number.

Signed-off-by: Tony Luck <tony.luck@intel.com>
[Linux commit dc6b025de95bcd22ff37c4fabb022ec8a027abf1]
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/guest: prepare hypervisor ops to use alternative calls
Roger Pau Monné [Mon, 2 Mar 2020 14:37:35 +0000 (15:37 +0100)]
xen/guest: prepare hypervisor ops to use alternative calls

Adapt the hypervisor ops framework so it can be used with the
alternative calls framework. So far no hooks are modified to make use
of the alternatives patching, as they are not in any hot path.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen: make sure stop_machine_run() is always called in a tasklet
Juergen Gross [Fri, 28 Feb 2020 17:13:48 +0000 (18:13 +0100)]
xen: make sure stop_machine_run() is always called in a tasklet

With core scheduling active it is mandatory for stop_machine_run() to
be called in idle context only (so either during boot or in a tasklet),
as otherwise a scheduling deadlock would occur: stop_machine_run()
does a cpu rendezvous by activating a tasklet on all other cpus. In
case stop_machine_run() was not called in an idle vcpu it would block
scheduling the idle vcpu on its siblings with core scheduling being
active, resulting in a hang.

Put a BUG_ON() into stop_machine_run() to test for being called in an
idle vcpu only and adapt the missing call site (ucode loading) to use a
tasklet for calling stop_machine_run().

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoIOMMU/x86: don't bypass softirq processing in arch_iommu_hwdom_init()
Jan Beulich [Mon, 2 Mar 2020 09:49:48 +0000 (10:49 +0100)]
IOMMU/x86: don't bypass softirq processing in arch_iommu_hwdom_init()

Even when a page doesn't need mapping, we should check whether softirq
processing should be invoked. Otherwise with sufficiently much RAM
chances of a to-be-mapped page actually occurring with the loop counter
having the "right" value may become diminishingly small.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoAMD/IOMMU: correct handling when XT's prereq features are unavailable
Jan Beulich [Fri, 28 Feb 2020 15:25:43 +0000 (16:25 +0100)]
AMD/IOMMU: correct handling when XT's prereq features are unavailable

We should neither cause IOMMU initialization as a whole to fail in this
case (we should still be able to bring up the system in non-x2APIC or
x2APIC physical mode), nor should the remainder of the function be
skipped (as the main part of it won't get entered a 2nd time) in such an
event. It is merely necessary for the function to indicate to the caller
(iov_supports_xt()) that setup failed as far as x2APIC is concerned.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
5 years agox86/smp: use a dedicated CPU mask in send_IPI_mask
Roger Pau Monné [Fri, 28 Feb 2020 15:24:26 +0000 (16:24 +0100)]
x86/smp: use a dedicated CPU mask in send_IPI_mask

Some callers of send_IPI_mask pass the scratch cpumask as the mask
parameter of send_IPI_mask, so the scratch cpumask cannot be used by
the function. The following trace has been obtained with a debug patch
and shows one of those callers:

(XEN) scratch CPU mask already in use by arch/x86/mm.c#_get_page_type+0x1f9/0x1abf
(XEN) Xen BUG at smp.c:45
[...]
(XEN) Xen call trace:
(XEN)    [<ffff82d0802abb53>] R scratch_cpumask+0xd3/0xf9
(XEN)    [<ffff82d0802abc21>] F send_IPI_mask+0x72/0x1ca
(XEN)    [<ffff82d0802ac13e>] F flush_area_mask+0x10c/0x16c
(XEN)    [<ffff82d080296c56>] F arch/x86/mm.c#_get_page_type+0x3ff/0x1abf
(XEN)    [<ffff82d080298324>] F get_page_type+0xe/0x2c
(XEN)    [<ffff82d08038624f>] F pv_set_gdt+0xa1/0x2aa
(XEN)    [<ffff82d08027dfd6>] F arch_set_info_guest+0x1196/0x16ba
(XEN)    [<ffff82d080207a55>] F default_initialise_vcpu+0xc7/0xd4
(XEN)    [<ffff82d08027e55b>] F arch_initialise_vcpu+0x61/0xcd
(XEN)    [<ffff82d080207e78>] F do_vcpu_op+0x219/0x690
(XEN)    [<ffff82d08038be16>] F pv_hypercall+0x2f6/0x593
(XEN)    [<ffff82d080396432>] F lstar_enter+0x112/0x120

_get_page_type will use the scratch cpumask to call flush_tlb_mask,
which in turn calls send_IPI_mask.

Fix this by using a dedicated per CPU cpumask in send_IPI_mask.

Fixes: 5500d265a2a8 ('x86/smp: use APIC ALLBUT destination shorthand when possible')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoiommu/arm: Don't allow the same micro-TLB to be shared between domains
Oleksandr Tyshchenko [Mon, 17 Feb 2020 15:05:35 +0000 (17:05 +0200)]
iommu/arm: Don't allow the same micro-TLB to be shared between domains

For the IPMMU-VMSA we need to prevent the use cases where devices
which use the same micro-TLB are assigned to different Xen domains
(micro-TLB cannot be shared between multiple Xen domains, since it
points to the context bank to use for the page walk).

As each Xen domain uses individual context bank pointed by context_id,
we can potentially recognize that use case by comparing current and new
context_id for the already enabled micro-TLB and prevent different
context bank from being set.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <julien@xen.org>
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
5 years agotools/libxl: Simplify callback handling in libxl-save-helper
Andrew Cooper [Thu, 2 Jan 2020 19:06:54 +0000 (19:06 +0000)]
tools/libxl: Simplify callback handling in libxl-save-helper

The {save,restore}_callback helpers can have their scope reduced vastly, and
helper_setcallbacks_{save,restore}() doesn't need to use a ternary operator to
write 0 (meaning NULL) into an already zeroed structure.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
5 years agox86/cpuid: Introduce and use default CPUID policies
Andrew Cooper [Fri, 21 Feb 2020 15:23:31 +0000 (15:23 +0000)]
x86/cpuid: Introduce and use default CPUID policies

For now, the default and max policies remain identical, but this will change
in the future.

Introduce calculate_{pv,hvm}_def_policy().  As *_def derives from *_max, quite
a bit of the derivation logic can be avoided the second time around - this
will cope with simple feature differences for now.

Update XEN_SYSCTL_get_cpu_* and init_domain_cpuid_policy() to use the default
policies as appropriate.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/cpuid: Compile out unused logic/objects
Andrew Cooper [Tue, 25 Feb 2020 17:36:12 +0000 (17:36 +0000)]
x86/cpuid: Compile out unused logic/objects

CPUID Policy objects are large (1860 bytes at the time of writing), so
compiling them out based on CONFIG_{PV,HVM} makes a lot of sense.

This involves a bit of complexity in init_domain_cpuid_policy() and
recalculate_cpuid_policy() as is_pv_domain() can't be evaulated at compile
time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/msr: Introduce and use default MSR policies
Andrew Cooper [Fri, 21 Feb 2020 15:23:31 +0000 (15:23 +0000)]
x86/msr: Introduce and use default MSR policies

For now, the default and max policies remain identical, but this will change
in the future.

Update XEN_SYSCTL_get_cpu_policy and init_domain_msr_policy() to use the
default policies.

Take the opportunity sort PV ahead of HVM, as is the prevailing style
elsewhere.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/msr: Compile out unused logic/objects
Andrew Cooper [Wed, 26 Feb 2020 12:26:14 +0000 (12:26 +0000)]
x86/msr: Compile out unused logic/objects

Arrange to compile out the PV or HVM logic and objects as applicable.  This
involves a bit of complexity in init_domain_msr_policy() as is_pv_domain()
can't be evaulated at compile time.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/gen-cpuid: Create max and default variations of INIT_*_FEATURES
Andrew Cooper [Tue, 25 Feb 2020 12:30:49 +0000 (12:30 +0000)]
x86/gen-cpuid: Create max and default variations of INIT_*_FEATURES

For now, write the same content for both.  Update the users of the
initialisers to use the new name, and extend xen-cpuid to dump both default
and max featuresets.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/gen-cpuid: Rework internal logic to ease future changes
Andrew Cooper [Tue, 25 Feb 2020 12:59:35 +0000 (12:59 +0000)]
x86/gen-cpuid: Rework internal logic to ease future changes

Better split the logic between parse/calculate/write.  Collect the feature
comment by their comment character(s), and perform the accumulation operations
in crunch_numbers().

Avoid rendering the featuresets to C uint32_t's in crunch_numbers(), and
instead do this in write_results().  Update format_uint32s() to call
featureset_to_uint32s() internally.

No functional change - the generated cpuid-autogen.h is identical.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agotools/libxc: Simplify xc_get_static_cpu_featuremask()
Andrew Cooper [Wed, 26 Feb 2020 18:15:35 +0000 (18:15 +0000)]
tools/libxc: Simplify xc_get_static_cpu_featuremask()

Drop XC_FEATUREMASK_DEEP_FEATURES.  It isn't used by any callers, and unlike
the other static masks, won't be of interest to anyone without other pieces of
cpuid-autogen.h

In xc_get_static_cpu_featuremask(), use a 2d array instead of individually
named variables, and drop the switch statement completely.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agox86/sysctl: Don't return cpu policy data for compiled-out support (2)
Andrew Cooper [Wed, 26 Feb 2020 15:28:27 +0000 (15:28 +0000)]
x86/sysctl: Don't return cpu policy data for compiled-out support (2)

Just as with c/s 96dc77b4b1 for XEN_SYSCTL_get_cpu_policy,
XEN_SYSCTL_get_cpu_featureset wants to become conditional.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: fix section-renaming of libfdt and libelf
Anthony PERARD [Thu, 27 Feb 2020 14:47:23 +0000 (15:47 +0100)]
build: fix section-renaming of libfdt and libelf

In common/libelf/Makefile, when SECTIONS gets defined
SPECIAL_DATA_SECTIONS doesn't exist, so only "text data" sections are
been renamed. This was different before 48115d14743e ("Move more
kernel decompression bits to .init.* sections"). By introducing the
same renaming mechanism the to libfdt (9ba1f198f61e ["xen/libfdt: Put
all libfdt in init"]), the issue was extended to there as well.

Move SPECIAL_DATA_SECTIONS in Rules.mk before including "Makefile" to
fix this.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: allow to test clang .include without asm symlink
Anthony PERARD [Thu, 27 Feb 2020 14:46:14 +0000 (15:46 +0100)]
build: allow to test clang .include without asm symlink

The clang test for "asm()-s support .include." needs to be modified
because the symbolic link asm -> asm-x86 may not exist when the test
is runned. Since it's an x86 test, we don't need the link.

This will be an issue with the following patch "xen/build: have the
root Makefile generates the CFLAGS".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agolibxl/PCI: align reserved device memory boundary for HAP guests
Jan Beulich [Thu, 27 Feb 2020 14:45:31 +0000 (15:45 +0100)]
libxl/PCI: align reserved device memory boundary for HAP guests

As the code comment says, this will allow use of a 2Mb super page
mapping at the end of "low" memory.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl/PCI: pass correct "hotplug" argument to libxl__device_pci_setdefault()
Jan Beulich [Thu, 27 Feb 2020 14:45:05 +0000 (15:45 +0100)]
libxl/PCI: pass correct "hotplug" argument to libxl__device_pci_setdefault()

Uniformly passing "false" can't be right, but has been benign because of
the function not using this parameter.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl/PCI: make "rdm=" parsing comply with documentation
Jan Beulich [Thu, 27 Feb 2020 14:44:41 +0000 (15:44 +0100)]
libxl/PCI: make "rdm=" parsing comply with documentation

Documentation says "<RDM_RESERVATION_STRING> is a comma separated list
of <KEY=VALUE> settings, from the following list". There's no mention
of a specific order, yet so far the parsing logic did accept only
strategy, then policy (and neither of the two omitted). Make "state"
move
- back to STATE_TYPE when finding a comma after having parsed the
  <VALUE> part of a setting,
- to STATE_TERMINAL otherwise.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl/PCI: establish per-device reserved memory policy earlier
Jan Beulich [Thu, 27 Feb 2020 14:44:17 +0000 (15:44 +0100)]
libxl/PCI: establish per-device reserved memory policy earlier

Reserved device memory region processing as well as E820 table creation
happen earlier than the adding of (PCI) devices, yet they consume the
policy (e.g. to decide whether to add entries to the E820 table). But so
far it was only at the stage of PCI device addition that the final
policy was established (i.e. if not explicitly specified by the guest
config file).

Note that I couldn't find the domain ID to be available in
libxl__domain_device_construct_rdm(), but observing that
libxl__device_pci_setdefault() also doesn't use it, for the time being
DOMID_INVALID gets passed. An obvious alternative would be to drop the
unused parameter/argument, yet at that time the question would be
whether to also drop other unused ones.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl/PCI: honor multiple per-device reserved memory regions
Jan Beulich [Thu, 27 Feb 2020 14:43:55 +0000 (15:43 +0100)]
libxl/PCI: honor multiple per-device reserved memory regions

While in "host" strategy all regions get processed, of the per-device
ones only the first entry has been consumed so far.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agolibxl: add initializers for libxl__domid_history
Paul Durrant [Wed, 26 Feb 2020 13:12:13 +0000 (13:12 +0000)]
libxl: add initializers for libxl__domid_history

This patch fixes Coverity issue CID 1459006 (Insecure data handling
(INTEGER_OVERFLOW)).

The problem is that the error paths for libxl__mark_domid_recent() and
libxl__is_domid_recent() check the 'f' field in struct libxl__domid_history
when it may not have been initialized.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agodomctl: fix typo in comment
Olaf Hering [Wed, 26 Feb 2020 16:13:39 +0000 (17:13 +0100)]
domctl: fix typo in comment

Add missing 'a' to sharing.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Wei Liu <wl@xen.org>
5 years agobuild: remove use of AFLAGS-y
Anthony PERARD [Wed, 26 Feb 2020 16:41:53 +0000 (17:41 +0100)]
build: remove use of AFLAGS-y

And simply add directly to AFLAGS.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agobuild: remove confusing comment on the %.s:%.S rule
Anthony PERARD [Wed, 26 Feb 2020 16:41:37 +0000 (17:41 +0100)]
build: remove confusing comment on the %.s:%.S rule

That comment was introduce by 3943db776371 ("[XEN] Can be built
-std=gnu99 (except for .S files).") to explain why CFLAGS was removed
from the command line. The comment is already written where the
-std=gnu flags gets remove from AFLAGS, no need to repeat it.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
5 years agoMakefile: fix install-tests
Anthony PERARD [Wed, 26 Feb 2020 16:41:02 +0000 (17:41 +0100)]
Makefile: fix install-tests

The top-level makefile make uses of internal implementation detail of
the xen build system. Avoid that by creating a new target
"install-tests" in xen/Makefile, and by fixing the top-level Makefile
to not call xen/Rules.mk anymore.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/include: remove include of Config.mk
Anthony PERARD [Wed, 26 Feb 2020 16:40:06 +0000 (17:40 +0100)]
xen/include: remove include of Config.mk

It isn't necessary to include Config.mk here because this Makefile is
only used by xen/Rules.mk which already includes Config.mk.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>