]> xenbits.xensource.com Git - people/dwmw2/xen.git/log
people/dwmw2/xen.git
5 years agox86/setup: simplify handling of initrdidx when no initrd present lu-4.11.2
David Woodhouse [Thu, 30 Jan 2020 11:06:07 +0000 (11:06 +0000)]
x86/setup: simplify handling of initrdidx when no initrd present

Remove a ternary operator that made my brain hurt and replace it with
something simpler that makes it clearer that the >= mbi->mods_count
is because of what find_first_bit() returns when it doesn't find
anything. Just have a simple condition to set initrdidx to zero in
that case, and a much simpler ternary operator in the create_dom0()
call.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: finish plumbing in live update path through __start_xen()
David Woodhouse [Thu, 30 Jan 2020 09:53:28 +0000 (09:53 +0000)]
x86/setup: finish plumbing in live update path through __start_xen()

With this we are fairly much done hacking up __start_xen() to support
live update. The live update functions themselves are still stubs,
but now we can start populating those with actual save/restore of
domain information.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: lift dom0 creation out into create_dom0 function
David Woodhouse [Thu, 30 Jan 2020 08:49:59 +0000 (08:49 +0000)]
x86/setup: lift dom0 creation out into create_dom0 function

It's about to become optional as __start_xen() grows a different path
for live update, so move it out of the way.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd shell of lu_reserve_pages()
David Woodhouse [Wed, 29 Jan 2020 15:52:06 +0000 (15:52 +0000)]
Add shell of lu_reserve_pages()

This currently only iterates over the records and prints the version of
Xen that we're live updating from.

In the fullness of time, it will also reserve the pages passed over as
M2P as well as the pages belonging to preserved domains.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd LU_VERSION and LU_END records to live update stream
David Woodhouse [Mon, 27 Jan 2020 23:46:42 +0000 (23:46 +0000)]
Add LU_VERSION and LU_END records to live update stream

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd lu_stream_{open,close,append}_record()
David Woodhouse [Mon, 27 Jan 2020 23:46:19 +0000 (23:46 +0000)]
Add lu_stream_{open,close,append}_record()

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoMigrate migration stream definitions into Xen public headers
David Woodhouse [Mon, 27 Jan 2020 16:54:01 +0000 (16:54 +0000)]
Migrate migration stream definitions into Xen public headers

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoStart documenting the live update handover
David Woodhouse [Mon, 27 Jan 2020 15:41:58 +0000 (15:41 +0000)]
Start documenting the live update handover

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoDetect live update breadcrumb at boot and map data stream
David Woodhouse [Thu, 16 Jan 2020 14:14:50 +0000 (15:14 +0100)]
Detect live update breadcrumb at boot and map data stream

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: move vm_init() before end_boot_allocator()
David Woodhouse [Wed, 22 Jan 2020 13:02:14 +0000 (13:02 +0000)]
x86/setup: move vm_init() before end_boot_allocator()

We would like to be able to use vmap() to map the live update data, and
we need to do a first pass of the live update data before we prime the
heap because we need to know which pages need to be preserved.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/vmap: allow vmap() to be called during early boot
David Woodhouse [Wed, 22 Jan 2020 12:41:49 +0000 (12:41 +0000)]
xen/vmap: allow vmap() to be called during early boot

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/vmap: allow vm_init_type to be called during early_boot
Wei Liu [Wed, 12 Dec 2018 12:17:09 +0000 (12:17 +0000)]
xen/vmap: allow vm_init_type to be called during early_boot

We want to move vm_init, which calls vm_init_type under the hood, to
early boot stage. Add a path to get page from boot allocator instead.

Add an emacs block to that file while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
5 years agoDon't add bad pages above HYPERVISOR_VIRT_END to the domheap
David Woodhouse [Tue, 21 Jan 2020 14:05:21 +0000 (14:05 +0000)]
Don't add bad pages above HYPERVISOR_VIRT_END to the domheap

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoxen/page_alloc: statically allocate bootmem_region_list
Hongyan Xia [Tue, 17 Dec 2019 14:33:19 +0000 (14:33 +0000)]
xen/page_alloc: statically allocate bootmem_region_list

The existing code assumes that the first mfn passed to the boot
allocator is mapped, which creates problems when, e.g., we do not have
a direct map, and may create other bootstrapping problems in the
future. Make it static. The size is kept the same as before (1 page).

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Julien Grall <julien@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c61c1b49430527ee16fbf5b55aca195c325b1a23)

This patch is necessary for liveupdate to prevent the boot allocator to
use the first page of the ranges passed. This page may belong to a
domain restored.

5 years agoAdd basic lu_save_all() shell
David Woodhouse [Thu, 16 Jan 2020 13:18:55 +0000 (14:18 +0100)]
Add basic lu_save_all() shell

5 years agoAdd kimage_add_live_update_data()
David Woodhouse [Wed, 15 Jan 2020 17:46:54 +0000 (18:46 +0100)]
Add kimage_add_live_update_data()

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd basic live update stream creation
David Woodhouse [Thu, 16 Jan 2020 12:55:44 +0000 (13:55 +0100)]
Add basic live update stream creation

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd IND_WRITE64 primitive to kexec kimage
David Woodhouse [Wed, 15 Jan 2020 16:58:44 +0000 (17:58 +0100)]
Add IND_WRITE64 primitive to kexec kimage

This allows a single page-aligned physical address to be written to
the current destination, intended to pass the location of the live
update data stream from one Xen to the next.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd KEXEC_TYPE_LIVE_UPDATE
David Woodhouse [Wed, 15 Jan 2020 16:57:08 +0000 (17:57 +0100)]
Add KEXEC_TYPE_LIVE_UPDATE

This is identical to the default case... for now.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoAdd KEXEC_RANGE_MA_LIVEUPDATE
David Woodhouse [Thu, 12 Dec 2019 17:02:10 +0000 (17:02 +0000)]
Add KEXEC_RANGE_MA_LIVEUPDATE

This allows kexec userspace to tell the next Xen where the range is,
on its command line.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoReserve live update memory regions
David Woodhouse [Thu, 16 Jan 2020 08:51:45 +0000 (09:51 +0100)]
Reserve live update memory regions

The live update handover requires that a region of memory be reserved
for the new Xen to use in its boot allocator. The original Xen may use
that memory but not for any pages which are mapped to domains, or which
would need to be preserved across the live update for any other reason.

The same constraints apply to initmem pages freed from the Xen image,
since the new Xen will be loaded into the same physical location as the
previous Xen.

There is separate work ongoing which will make the xenheap meet this
requirement by eliminating share_xen_page_with_guest(). For the meantime,
just don't add those pages to the heap at all in the live update case.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/boot: Reserve live update boot memory
David Woodhouse [Mon, 9 Dec 2019 16:32:01 +0000 (16:32 +0000)]
x86/boot: Reserve live update boot memory

For live update to work, it will need a region of memory that can be
given to the boot allocator while it parses the state information from
the previous Xen and works out which of the other pages of memory it
can consume.

Reserve that like the crashdump region, and accept it on the command
line. Use only that region for early boot, and register the remaining
RAM (all of it for now, until the real live update happens) later.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agox86/setup: Don't skip 2MiB underneath relocated Xen image
David Woodhouse [Mon, 2 Dec 2019 16:39:00 +0000 (16:39 +0000)]
x86/setup: Don't skip 2MiB underneath relocated Xen image

Set 'e' correctly to reflect the location that Xen is actually relocated
to from its default 2MiB location. Not 2MiB below that.

This is only vaguely a bug fix. The "missing" 2MiB would have been used
in the end, and fed to the allocator. It's just that other things don't
get to sit right up *next* to the Xen image, and it isn't very tidy.

For live update, I'd quite like a single contiguous region for the
reserved bootmem and Xen, allowing the 'slack' in the former to be used
when Xen itself grows larger. Let's not allow 2MiB of random heap pages
to get in the way...

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
5 years agoupdate Xen version to 4.11.2 RELEASE-4.11.2
Jan Beulich [Tue, 25 Jun 2019 07:12:12 +0000 (09:12 +0200)]
update Xen version to 4.11.2

5 years agoxen/arm: time: cycles_t should be an uint64_t and not unsigned long
Julien Grall [Thu, 20 Jun 2019 17:47:06 +0000 (18:47 +0100)]
xen/arm: time: cycles_t should be an uint64_t and not unsigned long

Since commit ca73ac8e7d "xen/arm: Add an isb() before reading CNTPCT_EL0
to prevent re-ordering", get_cycles() is now returning the number of
cycles and used in more callers.

While the counter registers is always 64-bit, get_cycles() will only
reutrn a 32-bit on Arm32 and therefore truncate the value. This will
result to weird behavior by both Xen and the Guest as the timer will not
be setup correctly.

This could be resolved by switch cycles_t from unsigned long to
unsigned int.

This change was originally introduced by
da3d55ae67225798c2ad8f42af2f432f6f2b2214 "console: avoid printing no or
null time stamps".

Signed-off-by: Julien Grall <julien.grall@arm.com>
[Stefano: improve commit message]
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agox86: drop arch_evtchn_inject()
Jan Beulich [Tue, 18 Jun 2019 14:06:40 +0000 (16:06 +0200)]
x86: drop arch_evtchn_inject()

For whatever reason this was omitted from the backport of d9195962a6
("events: drop arch_evtchn_inject()").

Signed-off-by: Jan Beulich <jbeulich@suse.com>
5 years agoXSM: adjust Kconfig names
Jan Beulich [Tue, 18 Jun 2019 14:05:31 +0000 (16:05 +0200)]
XSM: adjust Kconfig names

Since the Kconfig option renaming was not backported, the new uses of
involved CONFIG_* settings should have been adopted to the existing
names in the XSA-295 series. Do this now, also changing XSM_SILO to just
SILO to better match its FLASK counterpart.

To avoid breaking the Kconfig menu structure also adjust XSM_POLICY's
dependency (as was also silently done on master during the renaming).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/arm: grant-table: Protect gnttab_clear_flag against guest misbehavior
Julien Grall [Mon, 29 Apr 2019 14:05:30 +0000 (15:05 +0100)]
xen/arm: grant-table: Protect gnttab_clear_flag against guest misbehavior

The function gnttab_clear_flag is used to clear the access flags. On
Arm, it is implemented using a loop and guest_cmpxchg.

It is possible that guest_cmpxchg will always return a different value
than old. This can happen if the guest updated the memory before Xen has
time to do the exchange. Because of that, there are no way for to
promise the loop will end.

It is possible to make the current code safe by re-using the same
principle as applied on the guest atomic helper. However this patch
takes a different approach that should lead to more efficient code in
the default case.

A new helper is introduced to clear a set of bits on a 16-bits word.
This should avoid a an extra loop to check cmpxchg succeeded.

Note that a mask is used instead of a bit, so the helper can be re-used
later on for clearing multiple flags at the same time.

This is part of XSA-295.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Add performance counters in guest atomic helpers
Julien Grall [Mon, 29 Apr 2019 14:05:29 +0000 (15:05 +0100)]
xen/arm: Add performance counters in guest atomic helpers

Add performance counters in guest atomic helpers to be able to detect
whether a guest is often paused during the operations.

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen: Use guest atomics helpers when modifying atomically guest memory
Julien Grall [Mon, 29 Apr 2019 14:05:28 +0000 (15:05 +0100)]
xen: Use guest atomics helpers when modifying atomically guest memory

On Arm, exclusive load-store atomics should only be used between trusted
thread. As not all the guests are trusted, it may be possible to DoS Xen
when updating shared memory with guest atomically.

This patch replaces all the atomics operations on shared memory with
a guest by the new guest atomics helpers. The x86 code was not audited
to know where guest atomics helpers could be used. I will leave that
to the x86 folks.

Note that some rework was required in order to plumb use the new guest
atomics in event channel and grant-table.

Because guest_test_bit is ignoring the parameter "d" for now, it
means there a lot of places do not need to drop the const. We may want
to revisit this in the future if the parameter "d" becomes necessary.

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/cmpxchg: Provide helper to safely modify guest memory atomically
Julien Grall [Mon, 29 Apr 2019 14:05:27 +0000 (15:05 +0100)]
xen/cmpxchg: Provide helper to safely modify guest memory atomically

On Arm, exclusive load-store atomics should only be used between trusted
thread. As not all the guests are trusted, it may be possible to DoS Xen
when updating shared memory with guest atomically.

This patch adds a new helper that will update the guest memory safely.
For x86, it is already possible to use the current helper safely. So
just wrap it.

For Arm, we will first attempt to update the guest memory with the
loop bounded by a maximum number of iterations. If it fails, we will
pause the domain and try again.

Note that this heuristics assumes that a page can only
be shared between Xen and one domain. Not Xen and multiple domain.

The maximum number of iterations is based on how many times atomic_inc()
can be executed in 1uS. The maximum value is per-CPU to cater big.LITTLE
and calculated when the CPU is booting.

The maximum number of iterations is based on how many times a simple
load-store atomic operation can be executed in 1uS. The maximum
value is per-CPU to cater big.LITTLE and calculated when the CPU is
booting. The heuristic was randomly chosen and can be modified if
impact too much good-behaving guest.

This is part of XSA-295.

Signed-of-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/bitops: Provide helpers to safely modify guest memory atomically
Julien Grall [Mon, 29 Apr 2019 14:05:26 +0000 (15:05 +0100)]
xen/bitops: Provide helpers to safely modify guest memory atomically

On Arm, exclusive load-store atomics should only be used between trusted
thread. As not all the guests are trusted, it may be possible to DoS Xen
when updating shared memory with guest atomically.

This patch adds a new set of helper that will update the guest memory
safely. For x86, it is already possible to use the current helpers
safely. So just wrap them.

For Arm, we will first attempt to update the guest memory with the loop
bounded by a maximum number of iterations. If it fails, we will pause the
domain and try again.

Note that this heuristics assumes that a page can only be shared between
Xen and one domain. Not Xen and multiple domain.

The maximum number of iterations is based on how many times a simple
load-store atomic operation can be executed in 1uS. The maximum value is
per-CPU to cater big.LITTLE and calculated when the CPU is booting. The
heuristic was randomly chosen and can be modified if impact too much
good-behaving guest.

Note, while test_bit does not requires to use atomic operation, a
wrapper for test_bit was added for completeness. In this case, the
domain stays constified to avoid major rework in the caller for the
time-being.

This is part of XSA-295.

Signed-of-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Turn on SILO mode by default on Arm
Julien Grall [Mon, 29 Apr 2019 14:05:25 +0000 (15:05 +0100)]
xen/arm: Turn on SILO mode by default on Arm

On Arm, exclusive load-store atomics should only be used between trusted
thread. As not all the guests are trusted, it may be possible to DoS Xen
when updating shared memory with guest atomically.

Recent patches introduced new helpers to update shared memory with guest
atomically. Those helpers relies on a memory region to be be shared with
Xen and a single guest.

At the moment, nothing prevent a guest sharing a page with Xen and as
well with another guest (e.g via grant table).

For the scope of the XSA, the quickest way is to deny communications
between unprivileged guest. So this patch is enabling and using SILO
mode by default on Arm.

Users wanted finer graine policy could wrote their own Flask policy.

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
5 years agoxen/xsm: Add new SILO mode for XSM
Xin Li [Tue, 9 Oct 2018 09:33:20 +0000 (17:33 +0800)]
xen/xsm: Add new SILO mode for XSM

When SILO is enabled, there would be no page-sharing or event notifications
between unprivileged VMs (no grant tables or event channels).

Signed-off-by: Xin Li <xin.li@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/xsm: Introduce new boot parameter xsm
Xin Li [Tue, 9 Oct 2018 09:33:19 +0000 (17:33 +0800)]
xen/xsm: Introduce new boot parameter xsm

Introduce new boot parameter xsm to choose which xsm module is enabled,
and set default to dummy. And add new option in Kconfig to choose the
default XSM implementation.

Signed-off-by: Xin Li <xin.li@citrix.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
5 years agoxen/xsm: remove unnecessary #define
Xin Li [Tue, 9 Oct 2018 09:33:18 +0000 (17:33 +0800)]
xen/xsm: remove unnecessary #define

this #define is unnecessary since XSM_INLINE is redefined in
xsm/dummy.h, it's a risk of build breakage, so remove it.

Signed-off-by: Xin Li <xin.li@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
5 years agoxen/arm: cmpxchg: Provide a new helper that can timeout
Julien Grall [Wed, 22 May 2019 20:39:17 +0000 (13:39 -0700)]
xen/arm: cmpxchg: Provide a new helper that can timeout

Exclusive load-store atomics should only be used between trusted
threads. As not all the guests are trusted, it may be possible to DoS
Xen when updating shared memory with guest atomically.

To prevent the infinite loop, we introduce a new helper that can timeout.
The timeout is based on the maximum number of iterations.

It will be used in follow-up patch to make atomic operations on shared
memory safe.

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
5 years agoxen/arm: bitops: Implement a new set of helpers that can timeout
Julien Grall [Mon, 29 Apr 2019 14:05:23 +0000 (15:05 +0100)]
xen/arm: bitops: Implement a new set of helpers that can timeout

Exclusive load-store atomics should only be used between trusted
threads. As not all the guests are trusted, it may be possible to DoS
Xen when updating shared memory with guest atomically.

To prevent the infinite loop, we introduce a new set of helpers that can
timeout. The timeout is based on the maximum number of iterations.

They will be used in follow-up patch to make atomic operations
on shared memory safe.

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: cmpxchg: Simplify the cmpxchg implementation
Julien Grall [Mon, 29 Apr 2019 14:05:22 +0000 (15:05 +0100)]
xen/arm32: cmpxchg: Simplify the cmpxchg implementation

The only difference between each case of the cmpxchg is the size of
used. Rather than duplicating the code, provide a macro to generate each
cases.

This makes the code easier to read and modify.

While doing the rework, the case for 64-bit cmpxchg is removed. This is
unused today (already commented) and it would not be possible to use
it directly.

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm64: cmpxchg: Simplify the cmpxchg implementation
Julien Grall [Wed, 22 May 2019 20:37:53 +0000 (13:37 -0700)]
xen/arm64: cmpxchg: Simplify the cmpxchg implementation

The only difference between each case of the cmpxchg is the size of
used. Rather than duplicating the code, provide a macro to generate each
cases.

This makes the code easier to read and modify.

This is part of XSA-295.

Signed-off-by; Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
5 years agoxen/arm: bitops: Consolidate prototypes in one place
Julien Grall [Mon, 29 Apr 2019 14:05:20 +0000 (15:05 +0100)]
xen/arm: bitops: Consolidate prototypes in one place

The prototype are the same between arm32 and arm64. Consolidate them in
asm-arm/bitops.h.

This change will help the introductions of new helpers in a follow-up
patch.

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm32: bitops: Rewrite bitop helpers in C
Julien Grall [Mon, 29 Apr 2019 14:05:19 +0000 (15:05 +0100)]
xen/arm32: bitops: Rewrite bitop helpers in C

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
5 years agoxen/arm64: bitops: Rewrite bitop helpers in C
Julien Grall [Mon, 29 Apr 2019 14:05:18 +0000 (15:05 +0100)]
xen/arm64: bitops: Rewrite bitop helpers in C

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefanos@xilinx.com>
5 years agoxen/grant_table: Rework the prototype of _set_status* for lisibility
Julien Grall [Mon, 29 Apr 2019 14:05:17 +0000 (15:05 +0100)]
xen/grant_table: Rework the prototype of _set_status* for lisibility

It is not clear from the parameters name whether domid and gt_version
correspond to the local or remote domain. A follow-up patch will make
them more confusing.

So rename domid (resp. gt_version) to ldomid (resp. rgt_version). At
the same time re-order the parameters to hopefully make it more
readable.

This is part of XSA-295.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agoxen/arm: Add an isb() before reading CNTPCT_EL0 to prevent re-ordering
Julien Grall [Mon, 29 Apr 2019 14:05:16 +0000 (15:05 +0100)]
xen/arm: Add an isb() before reading CNTPCT_EL0 to prevent re-ordering

Per D8.2.1 in ARM DDI 0487C.a, "a read to CNTPCT_EL0 can occur
speculatively and out of order relative to other instructions executed
on the same PE."

Add an instruction barrier to get accurate number of cycles when
requested in get_cycles(). For the other users of CNPCT_EL0, replace by
a call to get_cycles().

This is part of XSA-295.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
5 years agocommon: avoid atomic read-modify-write accesses in map_vcpu_info()
Jan Beulich [Tue, 12 Mar 2019 13:40:56 +0000 (14:40 +0100)]
common: avoid atomic read-modify-write accesses in map_vcpu_info()

There's no need to set the evtchn_pending_sel bits one by one. Simply
write full words with all ones.

For Arm this requires extending write_atomic() to also handle 64-bit
values; for symmetry read_atomic() gets adjusted as well.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agoevents: drop arch_evtchn_inject()
Jan Beulich [Tue, 12 Mar 2019 13:40:24 +0000 (14:40 +0100)]
events: drop arch_evtchn_inject()

Have the only user call vcpu_mark_events_pending() instead, at the same
time arranging for correct ordering of the writes (evtchn_pending_sel
should be written before evtchn_upcall_pending).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
5 years agoxen/arm: mm: Set-up page permission for Xen mappings earlier on
Julien Grall [Thu, 29 Nov 2018 11:37:43 +0000 (11:37 +0000)]
xen/arm: mm: Set-up page permission for Xen mappings earlier on

Xen mapping is first create using a 2MB page and then shatterred in 4KB
page for fine-graine permission. However, it is not safe to break-down
superpage page without going to an intermediate step invalidating
the entry.

As we are changing Xen mappings, we cannot go through the intermediate
step. The only solution is to create Xen mapping using 4KB entries
directly. As the Xen should always access the mappings according with
the runtime permission, it is then possible to set-up the permissions
while create the mapping.

We are still playing with the fire as there are still some
break-before-make issue in setup_pagetables (i.e switch between 2 sets of
page-tables). But it should slightly be better than the current state.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Reported-by: Jan-Peter Larsson <Jan-Peter.Larsson@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Matthew Daley <mattd@bugfuzz.com>
(cherry picked from commit 00c96d77422a4b84247bec5dadf434363d312cac)

5 years agolibacpi: report PCI slots as enabled only for hotpluggable devices
Igor Druzhinin [Thu, 6 Jun 2019 12:11:24 +0000 (14:11 +0200)]
libacpi: report PCI slots as enabled only for hotpluggable devices

DSDT for qemu-xen lacks _STA method of PCI slot object. If _STA method
doesn't exist then the slot is assumed to be always present and active
which in conjunction with _EJ0 method makes every device ejectable for
an OS even if it's not the case.

qemu-kvm is able to dynamically add _EJ0 method only to those slots
that either have hotpluggable devices or free for PCI passthrough.
As Xen lacks this capability we cannot use their way.

qemu-xen-traditional DSDT has _STA method which only reports that
the slot is present if there is a PCI devices hotplugged there.
This is done through querying of its PCI hotplug controller.
qemu-xen has similar capability that reports if device is "hotpluggable
or absent" which we can use to achieve the same result.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6761965243b113230bed900d6105be05b28f5cea
master date: 2019-05-24 10:30:21 +0200

5 years agox86/IO-APIC: fix build with gcc9
Jan Beulich [Thu, 6 Jun 2019 12:11:09 +0000 (14:11 +0200)]
x86/IO-APIC: fix build with gcc9

There are a number of pointless __packed attributes which cause gcc 9 to
legitimately warn:

utils.c: In function 'vtd_dump_iommu_info':
utils.c:287:33: error: converting a packed 'struct IO_APIC_route_entry' pointer (alignment 1) to a 'struct IO_APIC_route_remap_entry' pointer (alignment 8) may result in an unaligned pointer value [-Werror=address-of-packed-member]
  287 |                 remap = (struct IO_APIC_route_remap_entry *) &rte;
      |                                 ^~~~~~~~~~~~~~~~~~~~~~~~~

intremap.c: In function 'ioapic_rte_to_remap_entry':
intremap.c:343:25: error: converting a packed 'struct IO_APIC_route_entry' pointer (alignment 1) to a 'struct IO_APIC_route_remap_entry' pointer (alignment 8) may result in an unaligned pointer value [-Werror=address-of-packed-member]
  343 |     remap_rte = (struct IO_APIC_route_remap_entry *) old_rte;
      |                         ^~~~~~~~~~~~~~~~~~~~~~~~~

Simply drop these attributes. Take the liberty and also re-format the
structure definitions at the same time.

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: ca9310b24e6205de5387e5982ccd42c35caf89d4
master date: 2019-05-24 10:19:59 +0200

5 years agox86emul: add support for missing {,V}PMADDWD insns
Jan Beulich [Thu, 6 Jun 2019 12:10:46 +0000 (14:10 +0200)]
x86emul: add support for missing {,V}PMADDWD insns

Their pre-AVX512 incarnations have clearly been overlooked during much
earlier work. Their memory access pattern is entirely standard, so no
specific tests get added to the harness.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Alexandru Isaila <aisaila@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 1a48bdd599b268a2d9b7d0c45f1fd40c4892186e
master date: 2019-05-16 13:43:17 +0200

5 years agox86/IRQ: avoid UB (or worse) in trace_irq_mask()
Jan Beulich [Thu, 6 Jun 2019 12:09:56 +0000 (14:09 +0200)]
x86/IRQ: avoid UB (or worse) in trace_irq_mask()

Dynamically allocated CPU mask objects may be smaller than cpumask_t, so
copying has to be restricted to the actual allocation size. This is
particulary important since the function doesn't bail early when tracing
is not active, so even production builds would be affected by potential
misbehavior here.

Take the opportunity and also
- use initializers instead of assignment + memset(),
- constify the cpumask_t input pointer,
- u32 -> uint32_t.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
master commit: 6fafb8befa99620a2d7323b9eca5c387bad1f59f
master date: 2019-05-13 16:41:03 +0200

5 years agox86/boot: Fix latent memory corruption with early_boot_opts_t
Andrew Cooper [Thu, 6 Jun 2019 12:09:37 +0000 (14:09 +0200)]
x86/boot: Fix latent memory corruption with early_boot_opts_t

c/s ebb26b509f "xen/x86: make VGA support selectable" added an #ifdef
CONFIG_VIDEO into the middle the backing space for early_boot_opts_t,
but didn't adjust the structure definition in cmdline.c

This only functions correctly because the affected fields are at the end
of the structure, and cmdline.c doesn't write to them in this case.

To retain the slimming effect of compiling out CONFIG_VIDEO, adjust
cmdline.c with enough #ifdef-ary to make C's idea of the structure match
the declaration in asm.  This requires adding __maybe_unused annotations
to two helper functions.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 30596213617fcf4dd7b71d244e16c8fc0acf456b
master date: 2019-05-13 10:35:38 +0100

5 years agox86/svm: Fix handling of ICEBP intercepts
Andrew Cooper [Thu, 6 Jun 2019 12:09:20 +0000 (14:09 +0200)]
x86/svm: Fix handling of ICEBP intercepts

c/s 9338a37d "x86/svm: implement debug events" added support for introspecting
ICEBP debug exceptions, but didn't account for the fact that
svm_get_insn_len() (previously __get_instruction_length) can fail and may
already have raised #GP with the guest.

If svm_get_insn_len() fails, return back to guest context rather than
continuing and mistaking a trap-style VMExit for a fault-style one.

Spotted by Coverity.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Acked-by: Brian Woods <brian.woods@amd.com>
master commit: 1495b4ff9b4af2b9c0f12cdb6491082cecf34f86
master date: 2019-05-13 10:35:37 +0100

5 years agodrivers/video: drop framebuffer size constraints
Marek Marczykowski-Górecki [Thu, 6 Jun 2019 12:08:29 +0000 (14:08 +0200)]
drivers/video: drop framebuffer size constraints

The limit 1900x1200 do not match real world devices (1900 looks like a
typo, should be 1920). But in practice the limits are arbitrary and do
not serve any real purpose. As discussed in "Increase framebuffer size
to todays standards" thread, drop them completely.

This fixes graphic console on device with 3840x2160 native resolution.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
drivers/video: drop unused limits

MAX_BPP, MAX_FONT_W, MAX_FONT_H are not used in the code at all.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 19600eb75aa9b1df3e4b0a4e55a5d08b957e1fd9
master date: 2019-05-13 10:13:24 +0200
master commit: 343459e34a6d32ba44a21f8b8fe4c1f69b1714c2
master date: 2019-05-13 10:12:56 +0200

5 years agobitmap: fix bitmap_fill with zero-sized bitmap
Marek Marczykowski-Górecki [Thu, 6 Jun 2019 12:08:10 +0000 (14:08 +0200)]
bitmap: fix bitmap_fill with zero-sized bitmap

When bitmap_fill(..., 0) is called, do not try to write anything. Before
this patch, it tried to write almost LONG_MAX, surely overwriting
something.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 93df28be2d4f620caf18109222d046355ac56327
master date: 2019-05-13 10:12:00 +0200

5 years agox86/vmx: correctly gather gs_shadow value for current vCPU
Tamas K Lengyel [Thu, 6 Jun 2019 12:07:54 +0000 (14:07 +0200)]
x86/vmx: correctly gather gs_shadow value for current vCPU

Currently the gs_shadow value is only cached when the vCPU is being scheduled
out by Xen. Reporting this (usually) stale value through vm_event is incorrect,
since it doesn't represent the actual state of the vCPU at the time the event
was recorded. This prevents vm_event subscribers from correctly finding kernel
structures in the guest when it is trapped while in ring3.

Refresh shadow_gs value when the context being saved is for the current vCPU.

Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: f69fc1c2f36e8a74ba54c9c8fa5c904ea1ad319e
master date: 2019-05-13 09:55:59 +0200

5 years agox86/mtrr: recalculate P2M type for domains with iocaps
Igor Druzhinin [Thu, 6 Jun 2019 12:07:06 +0000 (14:07 +0200)]
x86/mtrr: recalculate P2M type for domains with iocaps

This change reflects the logic in epte_get_entry_emt() and allows
changes in guest MTTRs to be reflected in EPT for domains having
direct access to certain hardware memory regions but without IOMMU
context assigned (e.g. XenGT).

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: f3d880bf2be92534c5bacf11de2f561cbad550fb
master date: 2019-05-13 09:54:45 +0200

5 years agoAMD/IOMMU: disable previously enabled IOMMUs upon init failure
Jan Beulich [Thu, 6 Jun 2019 12:06:49 +0000 (14:06 +0200)]
AMD/IOMMU: disable previously enabled IOMMUs upon init failure

If any IOMMUs were successfully initialized before encountering failure,
the successfully enabled ones should be disabled again before cleaning
up their resources.

Move disable_iommu() next to enable_iommu() to avoid a forward
declaration, and take the opportunity to remove stray blank lines ahead
of both functions' final closing braces.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Brian Woods <brian.woods@amd.com>
master commit: 87a3347d476443c66c79953d77d6aef1d2bb3bbd
master date: 2019-05-13 09:52:43 +0200

5 years agotrace: fix build with gcc9
Jan Beulich [Thu, 6 Jun 2019 12:06:29 +0000 (14:06 +0200)]
trace: fix build with gcc9

While I've not observed this myself, gcc 9 (imo validly) reportedly may
complain

trace.c: In function '__trace_hypercall':
trace.c:826:19: error: taking address of packed member of 'struct <anonymous>' may result in an unaligned pointer value [-Werror=address-of-packed-member]
  826 |     uint32_t *a = d.args;

and the fix is rather simple - remove the __packed attribute. Introduce
a BUILD_BUG_ON() as replacement, for the unlikely case that Xen might
get ported to an architecture where array alignment higher that that of
its elements.

Reported-by: Martin Liška <martin.liska@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
master commit: 3fd3b266d4198c06e8e421ca515d9ba09ccd5155
master date: 2019-05-13 09:51:23 +0200

5 years agoxen/sched: fix csched2_deinit_pdata()
Juergen Gross [Mon, 27 May 2019 13:55:20 +0000 (15:55 +0200)]
xen/sched: fix csched2_deinit_pdata()

Commit 753ba43d6d16e688 ("xen/sched: fix credit2 smt idle handling")
introduced a regression when switching cpus between cpupools.

When assigning a cpu to a cpupool with credit2 being the default
scheduler csched2_deinit_pdata() is called for the credit2 private data
after the new scheduler's private data has been hooked to the per-cpu
scheduler data. Unfortunately csched2_deinit_pdata() will cycle through
all per-cpu scheduler areas it knows of for removing the cpu from the
respective sibling masks including the area of the just moved cpu. This
will (depending on the new scheduler) either clobber the data of the
new scheduler or in case of sched_rt lead to a crash.

Avoid that by removing the cpu from the list of active cpus in credit2
data first.

The opposite problem is occurring when removing a cpu from a cpupool:
init_pdata() of credit2 will access the per-cpu data of the old
scheduler.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
master commit: ffd3367ed682b6ac6f57fcb151921054dd4cce7e
master date: 2019-05-17 15:41:17 +0200

5 years agooxenstored: Don't re-open a xenctrl handle for every domain introduction
Andrew Cooper [Wed, 3 Oct 2018 09:32:54 +0000 (10:32 +0100)]
oxenstored: Don't re-open a xenctrl handle for every domain introduction

Currently, an xc handle is opened in main() which is used for cleanup
activities, and a new xc handle is temporarily opened every time a domain is
introduced.  This is inefficient, and amongst other things, requires full root
privileges for the lifetime of oxenstored.

All code using the Xenctrl handle is in domains.ml, so initialise xc as a
global (now happens just before main() is called) and drop it as a parameter
from Domains.create and Domains.cleanup.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
(cherry picked from commit 129025fe30934c6a04bbd9c05ade479d34ce4985)

5 years agoxl: handle PVH type in apply_global_affinity_masks again
Wei Liu [Fri, 12 Apr 2019 10:03:25 +0000 (11:03 +0100)]
xl: handle PVH type in apply_global_affinity_masks again

A call site in create_domain can call it with PVH type. That site was
missed during the review of 48dab9767.

Reinstate PVH type in the switch.

Reported-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 860d6e158dbb581c3aabc6a20ae8d83b325bffd8)
(cherry picked from commit b4f291b0ca914454cbac9fa5580bb35f8ab04eee)

5 years agotools/libxc: Fix issues with libxc and Xen having different featureset lengths
Andrew Cooper [Thu, 29 Nov 2018 18:10:38 +0000 (18:10 +0000)]
tools/libxc: Fix issues with libxc and Xen having different featureset lengths

In almost all cases, Xen and libxc will agree on the featureset length,
because they are built from the same source.

However, there are circumstances (e.g. security hotfixes) where the featureset
gets longer and dom0 will, after installing updates, be running with an old
Xen but new libxc.  Despite writing the code with this scenario in mind, there
were some bugs.

First, xen-cpuid's get_featureset() erroneously allocates a buffer based on
Xen's featureset length, but records libxc's length, which may be longer.

In this situation, the hypercall bounce buffer code reads/writes the recorded
length, which is beyond the end of the allocated object, and a later free()
encounters corrupt heap metadata.  Fix this by recording the same length that
we allocate.

Secondly, get_cpuid_domain_info() has a related bug when the passed-in
featureset is a different length to libxc's.

A large amount of the libxc cpuid functionality depends on info->featureset
being as long as expected, and it is allocated appropriately.  However, in the
case that a shorter external featureset is passed in, the logic to check for
trailing nonzero bits may read off the end of it.  Rework the logic to use the
correct upper bound.

In addition, leave a comment next to the fields in struct cpuid_domain_info
explaining the relationship between the various lengths, and how to cope with
different lengths.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit c393b64dcee6684da25257b033148740cb6d7ff0)

5 years agotools/xl: use libxl_domain_info to get domain type for vcpu-pin
Igor Druzhinin [Tue, 9 Apr 2019 12:01:58 +0000 (13:01 +0100)]
tools/xl: use libxl_domain_info to get domain type for vcpu-pin

Parsing the config seems to be an overkill for this particular task
and the config might simply be absent. Type returned from libxl_domain_info
should be either LIBXL_DOMAIN_TYPE_HVM or LIBXL_DOMAIN_TYPE_PV but in
that context distinction between PVH and HVM should be irrelevant.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 48dab9767d2eb173495707cb1fd8ceaf73604ac1)
(cherry picked from commit c59579d8319b776ae6243da1999737e2b4737710)

5 years agotools/libxl: correct vcpu affinity output with sparse physical cpu map
Juergen Gross [Fri, 31 Aug 2018 15:22:04 +0000 (17:22 +0200)]
tools/libxl: correct vcpu affinity output with sparse physical cpu map

With not all physical cpus online (e.g. with smt=0) the output of hte
vcpu affinities is wrong, as the affinity bitmaps are capped after
nr_cpus bits, instead of using max_cpu_id.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit 2ec5339ec9218fbf1583fa85b74d1d2f15f1b3b8)

5 years agotools/ocaml: Dup2 /dev/null to stdin in daemonize()
Christian Lindig [Wed, 27 Feb 2019 10:33:42 +0000 (10:33 +0000)]
tools/ocaml: Dup2 /dev/null to stdin in daemonize()

Don't close stdin in daemonize() but dup2 /dev/null instead.  Otherwise, fd 0
gets reused later:

  [root@idol ~]# ls -lav /proc/`pgrep xenstored`/fd
  total 0
  dr-x------ 2 root root  0 Feb 28 11:02 .
  dr-xr-xr-x 9 root root  0 Feb 27 15:59 ..
  lrwx------ 1 root root 64 Feb 28 11:02 0 -> /dev/xen/evtchn
  l-wx------ 1 root root 64 Feb 28 11:02 1 -> /dev/null
  l-wx------ 1 root root 64 Feb 28 11:02 2 -> /dev/null
  lrwx------ 1 root root 64 Feb 28 11:02 3 -> /dev/xen/privcmd
  ...

Signed-off-by: Christian Lindig <christian.lindig@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
(cherry picked from commit 677e64dbe315343620c3b266e9eb16623b118038)

5 years agotools/misc/xenpm: fix getting info when some CPUs are offline
Marek Marczykowski-Górecki [Wed, 31 Oct 2018 13:04:58 +0000 (14:04 +0100)]
tools/misc/xenpm: fix getting info when some CPUs are offline

Use physinfo.max_cpu_id instead of physinfo.nr_cpus to get max CPU id.
This fixes for example 'xenpm get-cpufreq-para' with smt=off, which
otherwise would miss half of the cores.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
(cherry picked from commit ffb60a58df48419c1f2607cd3cc919fa2bfc9c2d)

5 years agox86: fix build race when generating temporary object files
Jan Beulich [Wed, 15 May 2019 07:49:35 +0000 (09:49 +0200)]
x86: fix build race when generating temporary object files

The rules to generate xen-syms and xen.efi may run in parallel, but both
recursively invoke $(MAKE) to build symbol/relocation table temporary
object files. These recursive builds would both re-generate the .*.d2
files (where needed). Both would in turn invoke the same rule, thus
allowing for a race on the .*.d2.tmp intermediate files.

The dependency files of the temporary .xen*.o files live in xen/ rather
than xen/arch/x86/ anyway, so won't be included no matter what. Take the
opportunity and delete them, as the just re-generated .xen*.S files will
trigger a proper re-build of the .xen*.o ones anyway.

Empty the DEPS variable in case the set of goals consists of just those
temporary object files, thus eliminating the race.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 761bb575ce97255029d2d2249b2719e54bc76825
master date: 2019-04-11 10:25:05 +0200

5 years agoVT-d: posted interrupts require interrupt remapping
Jan Beulich [Wed, 15 May 2019 07:49:04 +0000 (09:49 +0200)]
VT-d: posted interrupts require interrupt remapping

Initially I had just noticed the unnecessary indirection in the call
from pi_update_irte(). The generic wrapper having an iommu_intremap
conditional made me look at the setup code though. So first of all
enforce the necessary dependency.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 6c54663786d9f1ed04153867687c158675e7277d
master date: 2019-04-09 15:12:07 +0200

5 years agovm_event: fix XEN_VM_EVENT_RESUME domctl
Petre Pircalabu [Wed, 15 May 2019 07:48:28 +0000 (09:48 +0200)]
vm_event: fix XEN_VM_EVENT_RESUME domctl

Make XEN_VM_EVENT_RESUME return 0 in case of success, instead of
-EINVAL.
Remove vm_event_resume form vm_event.h header and set the function's
visibility to static as is used only in vm_event.c.
Move the vm_event_check_ring test inside vm_event_resume in order to
simplify the code.

Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com>
Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
master commit: b32c0446b103aa801ee18780b2fdd78dfc0b9052
master date: 2019-04-05 15:42:03 +0200

5 years agoxen/timers: Fix memory leak with cpu unplug/plug
Andrew Cooper [Wed, 15 May 2019 07:47:32 +0000 (09:47 +0200)]
xen/timers: Fix memory leak with cpu unplug/plug

timer_softirq_action() realloc's itself a larger timer heap whenever
necessary, which includes bootstrapping from the empty dummy_heap.  Nothing
ever freed this allocation.

CPU plug and unplug has the side effect of zeroing the percpu data area, which
clears ts->heap.  This in turn causes new timers to be put on the list rather
than the heap, and for timer_softirq_action() to bootstrap itself again.

This in practice leaks ts->heap every time a CPU is unplugged and replugged.

Implement free_percpu_timers() which includes freeing ts->heap when
appropriate, and update the notifier callback with the recent cpu parking
logic and free-avoidance across suspend.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/cpu: Fix ARM build following c/s 597fbb8

c/s 597fbb8 "xen/timers: Fix memory leak with cpu unplug/plug" broke the ARM
build by being the first patch to add park_offline_cpus to common code.

While it is currently specific to Intel hardware (for reasons of being able to
handle machine check exceptions without an immediate system reset), it isn't
inherently architecture specific, so define it to be false on ARM for now.

Add a comment in both smp.h headers explaining the intended behaviour of the
option.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
timers: move back migrate_timers_from_cpu() invocation

Commit 597fbb8be6 ("xen/timers: Fix memory leak with cpu unplug/plug")
went a little too far: Migrating timers away from a CPU being offlined
needs to heppen independent of whether it get parked or fully offlined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
xen/timers: Fix memory leak with cpu unplug/plug (take 2)

Previous attempts to fix this leak failed to identify the root cause, and
ultimately failed.  The cause is the CPU_UP_PREPARE case (re)initialising
ts->heap back to dummy_heap, which leaks the previous allocation.

Rearrange the logic to only initialise ts once.  This also avoids the
redundant (but benign, due to ts->inactive always being empty) initialising of
the other ts fields.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 597fbb8be6021440cd53493c14201c32671bade1
master date: 2019-04-08 11:16:06 +0100
master commit: a6448adfd3d537aacbbd784e5bf1777ab3ff5f85
master date: 2019-04-09 10:12:57 +0100
master commit: 1aec95350ac8261cba516371710d4d837c26f6a0
master date: 2019-04-15 17:51:30 +0100
master commit: e978e9ed9e1ff0dc326e72708ed03cac2ba41db8
master date: 2019-05-13 10:35:37 +0100

5 years agox86emul: suppress general register update upon AVX gather failures
Jan Beulich [Wed, 15 May 2019 07:46:41 +0000 (09:46 +0200)]
x86emul: suppress general register update upon AVX gather failures

While destination and mask registers may indeed need updating in this
case, the rIP update in particular needs to be avoided, as well as e.g.
raising a single step trap.

Reported-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 74f299bbd7d5cc52325b5866c17b44dd0bd1c5a2
master date: 2019-04-03 10:14:32 +0200

5 years agoxen/sched: fix credit2 smt idle handling
Juergen Gross [Wed, 15 May 2019 07:45:58 +0000 (09:45 +0200)]
xen/sched: fix credit2 smt idle handling

Credit2's smt_idle_mask_set() and smt_idle_mask_clear() are used to
identify idle cores where vcpus can be moved to. A core is thought to
be idle when all siblings are known to have the idle vcpu running on
them.

Unfortunately the information of a vcpu running on a cpu is per
runqueue. So in case not all siblings are in the same runqueue a core
will never be regarded to be idle, as the sibling not in the runqueue
is never known to run the idle vcpu.

Use a credit2 specific cpumask of siblings with only those cpus
being marked which are in the same runqueue as the cpu in question.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
master commit: 753ba43d6d16e688f688e01e1c77463ea2c6ec9f
master date: 2019-03-29 18:28:21 +0000

5 years agox86/spec-ctrl: Introduce options to control VERW flushing
Andrew Cooper [Wed, 12 Dec 2018 19:22:15 +0000 (19:22 +0000)]
x86/spec-ctrl: Introduce options to control VERW flushing

The Microarchitectural Data Sampling vulnerability is split into categories
with subtly different properties:

 MLPDS - Microarchitectural Load Port Data Sampling
 MSBDS - Microarchitectural Store Buffer Data Sampling
 MFBDS - Microarchitectural Fill Buffer Data Sampling
 MDSUM - Microarchitectural Data Sampling Uncacheable Memory

MDSUM is a special case of the other three, and isn't distinguished further.

These issues pertain to three microarchitectural buffers.  The Load Ports, the
Store Buffers and the Fill Buffers.  Each of these structures are flushed by
the new enhanced VERW functionality, but the conditions under which flushing
is necessary vary.

For this concise overview of the issues and default logic, the abbreviations
SP (Store Port), FB (Fill Buffer), LP (Load Port) and HT (Hyperthreading) are
used for brevity:

 * Vulnerable hardware is divided into two categories - parts which suffer
   from SP only, and parts with any other combination of vulnerabilities.

 * SP only has an HT interaction when the thread goes idle, due to the static
   partitioning of resources.  LP and FB have HT interactions at all points,
   due to the competitive sharing of resources.  All issues potentially leak
   data across the return-to-guest transition.

 * The microcode which implements VERW flushing also extends MSR_FLUSH_CMD, so
   we don't need to do both on the HVM return-to-guest path.  However, some
   parts are not vulnerable to L1TF (therefore have no MSR_FLUSH_CMD), but are
   vulnerable to MDS, so do require VERW on the HVM path.

Note that we deliberately support mds=1 even without MD_CLEAR in case the
microcode has been updated but the feature bit not exposed.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 3c04c258ab40405a74e194d9889a4cbc7abe94b4)

5 years agox86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers
Andrew Cooper [Wed, 12 Dec 2018 19:22:15 +0000 (19:22 +0000)]
x86/spec-ctrl: Infrastructure to use VERW to flush pipeline buffers

Three synthetic features are introduced, as we need individual control of
each, depending on circumstances.  A later change will enable them at
appropriate points.

The verw_sel field doesn't strictly need to live in struct cpu_info.  It lives
there because there is a convenient hole it can fill, and it reduces the
complexity of the SPEC_CTRL_EXIT_TO_{PV,HVM} assembly by avoiding the need for
any temporary stack maintenance.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 548a932ac786d6bf3584e4b54f2ab993e1117710)

5 years agox86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data Sampling
Andrew Cooper [Wed, 12 Sep 2018 13:36:00 +0000 (14:36 +0100)]
x86/spec-ctrl: CPUID/MSR definitions for Microarchitectural Data Sampling

The MD_CLEAR feature can be automatically offered to guests.  No
infrastructure is needed in Xen to support the guest making use of it.

This is part of XSA-297, CVE-2018-12126, CVE-2018-12127, CVE-2018-12130, CVE-2019-11091.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit d4f6116c080dc013cd1204c4d8ceb95e5f278689)

5 years agox86/spec-ctrl: Misc non-functional cleanup
Andrew Cooper [Wed, 12 Sep 2018 13:36:00 +0000 (14:36 +0100)]
x86/spec-ctrl: Misc non-functional cleanup

 * Identify BTI in the spec_ctrl_{enter,exit}_idle() comments, as other
   mitigations will shortly appear.
 * Use alternative_input() and cover the lack of memory cobber with a further
   barrier.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit 9b62eba6c429c327e1507816bef403ccc87357ae)

5 years agox86/boot: Detect the firmware SMT setting correctly on Intel hardware
Andrew Cooper [Fri, 5 Apr 2019 12:26:30 +0000 (13:26 +0100)]
x86/boot: Detect the firmware SMT setting correctly on Intel hardware

While boot_cpu_data.x86_num_siblings is an accurate value to use on AMD
hardware, it isn't on Intel when the user has disabled Hyperthreading in the
firmware.  As a result, a user which has chosen to disable HT still gets
nagged on L1TF-vulnerable hardware when they haven't chosen an explicit
smt=<bool> setting.

Make use of the largely-undocumented MSR_INTEL_CORE_THREAD_COUNT which in
practice exists since Nehalem, when booting on real hardware.  Fall back to
using the ACPI table APIC IDs.

While adjusting this logic, fix a latent bug in amd_get_topology().  The
thread count field in CPUID.0x8000001e.ebx is documented as 8 bits wide,
rather than 2 bits wide.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit b12fec4a125950240573ea32f65c61fb9afa74c3)

5 years agox86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT
Andrew Cooper [Fri, 5 Apr 2019 12:26:30 +0000 (12:26 +0000)]
x86/msr: Definitions for MSR_INTEL_CORE_THREAD_COUNT

This is a model specific register which details the current configuration
cores and threads in the package.  Because of how Hyperthread and Core
configuration works works in firmware, the MSR it is de-facto constant and
will remain unchanged until the next system reset.

It is a read only MSR (so unilaterally reject writes), but for now retain its
leaky-on-read properties.  Further CPUID/MSR work is required before we can
start virtualising a consistent topology to the guest, and retaining the old
behaviour is the safest course of action.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit d4120936bcd1695faf5b575f1259c58e31d2b18b)

5 years agox86/spec-ctrl: Reposition the XPTI command line parsing logic
Andrew Cooper [Wed, 12 Sep 2018 13:36:00 +0000 (14:36 +0100)]
x86/spec-ctrl: Reposition the XPTI command line parsing logic

It has ended up in the middle of the mitigation calculation logic.  Move it to
be beside the other command line parsing.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
(cherry picked from commit c2c2bb0d60c642e64a5243a79c8b1548ffb7bc5b)

6 years agox86/spec-ctrl: Extend repoline safey calcuations for eIBRS and Atom parts
Andrew Cooper [Fri, 3 May 2019 08:55:55 +0000 (10:55 +0200)]
x86/spec-ctrl: Extend repoline safey calcuations for eIBRS and Atom parts

All currently-released Atom processors are in practice retpoline-safe, because
they don't fall back to a BTB prediction on RSB underflow.

However, an additional meaning of Enhanced IRBS is that the processor may not
be retpoline-safe.  The Gemini Lake platform, based on the Goldmont Plus
microarchitecture is the first Atom processor to support eIBRS.

Until Xen gets full eIBRS support, Gemini Lake will still be safe using
regular IBRS.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 17f74242ccf0ce6e51c03a5860947865c0ef0dc2
master date: 2019-03-18 16:26:40 +0000

6 years agox86/msr: Shorten ARCH_CAPABILITIES_* constants
Andrew Cooper [Fri, 3 May 2019 08:55:10 +0000 (10:55 +0200)]
x86/msr: Shorten ARCH_CAPABILITIES_* constants

They are unnecesserily verbose, and ARCH_CAPS_* is already the more common
version.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: ba27aaa88548c824a47dcf5609288ee1c05d2946
master date: 2019-03-18 16:26:40 +0000

6 years agox86/e820: fix build with gcc9
Jan Beulich [Fri, 3 May 2019 08:53:40 +0000 (10:53 +0200)]
x86/e820: fix build with gcc9

e820.c: In function ‘clip_to_limit’:
.../xen/include/asm/string.h:10:26: error: ‘__builtin_memmove’ offset [-16, -36] is out of the bounds [0, 20484] of object ‘e820’ with type ‘struct e820map’ [-Werror=array-bounds]
   10 | #define memmove(d, s, n) __builtin_memmove(d, s, n)
      |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~
e820.c:404:13: note: in expansion of macro ‘memmove’
  404 |             memmove(&e820.map[i], &e820.map[i+1],
      |             ^~~~~~~
e820.c:36:16: note: ‘e820’ declared here
   36 | struct e820map e820;
      |                ^~~~

While I can't see where the negative offsets would come from, converting
the loop index to unsigned type helps. Take the opportunity and also
convert several other local variables and copy_e820_map()'s second
parameter to unsigned int (and bool in one case).

Reported-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: 22e2f8dddf5fbed885b5e4db3ffc9e1101be9ec0
master date: 2019-03-18 11:38:36 +0100

6 years agoxen: Fix backport of "xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct"
Andrew Cooper [Fri, 3 May 2019 08:52:32 +0000 (10:52 +0200)]
xen: Fix backport of "xen/cmdline: Fix buggy strncmp(s, LITERAL, ss - s) construct"

These were missed as a consequence of being rebased over other cmdline
cleanup.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agoxen: Fix backport of "x86/tsx: Implement controls for RTM force-abort mode"
Andrew Cooper [Fri, 3 May 2019 08:51:31 +0000 (10:51 +0200)]
xen: Fix backport of "x86/tsx: Implement controls for RTM force-abort mode"

The posted version of this patch depends on c/s 3c555295 "x86/vpmu: Improve
documentation and parsing for vpmu=" (Xen 4.12 and later) to prevent
`vpmu=rtm-abort` impliying `vpmu=1`, which is outside of security support.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
6 years agotools/firmware: update OVMF Makefile, when necessary
Wei Liu [Wed, 28 Nov 2018 17:43:33 +0000 (17:43 +0000)]
tools/firmware: update OVMF Makefile, when necessary

[ This is two commits from master aka staging-4.12: ]

OVMF has become dependent on OpenSSL, which is included as a
submodule.  Initialise submodules before building.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
(cherry picked from commit b16281870e06f5f526029a4e69634a16dc38e8e4)

tools: only call git when necessary in OVMF Makefile

Users may choose to export a snapshot of OVMF and build it
with xen.git supplied ovmf-makefile. In that case we don't
need to call `git submodule`.

Fixes b16281870e.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Juergen Gross <jgross@suse.com>
(cherry picked from commit 68292c94a60eab24514ab4a8e4772af24dead807)

6 years agoArm/atomic: correct asm() constraints in build_add_sized()
Jan Beulich [Tue, 12 Mar 2019 13:42:17 +0000 (14:42 +0100)]
Arm/atomic: correct asm() constraints in build_add_sized()

The memory operand is an in/out one, and the auxiliary register gets
written to early.

Take the opportunity and also drop the redundant cast (the inline
functions' parameters are already of the casted-to type).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Julien Grall <julien.grall@arm.com>
(cherry picked from commit 51ceb1623b9956440f1b9943c67010a90d61f5c5)

6 years agox86/pv: Fix construction of 32bit dom0's
Andrew Cooper [Mon, 18 Mar 2019 16:09:08 +0000 (17:09 +0100)]
x86/pv: Fix construction of 32bit dom0's

dom0_construct_pv() has logic to transition dom0 into a compat domain when
booting an ELF32 image.

One aspect which is missing is the CPUID policy recalculation, meaning that a
32bit dom0 sees a 64bit policy, which differ by the Long Mode feature flag in
particular.  Another missing item is the x87_fip_width initialisation.

Update dom0_construct_pv() to use switch_compat(), rather than retaining the
opencoding.  Position the call to switch_compat() such that the compat32 local
variable can disappear entirely.

The 32bit monitor table is now created by setup_compat_l4(), avoiding the need
to for manual creation later.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 356f437171c5bb90701ac9dd7ba4dbbd05988e38
master date: 2019-03-15 14:59:27 +0000

6 years agox86/tsx: Implement controls for RTM force-abort mode
Andrew Cooper [Mon, 18 Mar 2019 16:08:25 +0000 (17:08 +0100)]
x86/tsx: Implement controls for RTM force-abort mode

The CPUID bit and MSR are deliberately not exposed to guests, because they
won't exist on newer processors.  As vPMU isn't security supported, the
misbehaviour of PCR3 isn't expected to impact production deployments.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 6be613f29b4205349275d24367bd4c82fb2960dd
master date: 2019-03-12 17:05:21 +0000

6 years agox86/vtd: Don't include control register state in the table pointers
Andrew Cooper [Mon, 18 Mar 2019 16:07:45 +0000 (17:07 +0100)]
x86/vtd: Don't include control register state in the table pointers

iremap_maddr and qinval_maddr point to the base of a block of contiguous RAM,
allocated by the driver, holding the Interrupt Remapping table, and the Queued
Invalidation ring.

Despite their name, they are actually the values of the hardware register,
including control metadata in the lower 12 bits.  While uses of these fields
do appear to correctly shift out the metadata, this is very subtle behaviour
and confusing to follow.

Nothing uses the metadata, so make the fields actually point at the base of
the relevant tables.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: a9a05aeee10a5a3763a41305a9f38112dd1fcc82
master date: 2019-03-12 13:57:13 +0000

6 years agox86/HVM: don't crash guest in hvmemul_find_mmio_cache()
Jan Beulich [Mon, 18 Mar 2019 16:07:11 +0000 (17:07 +0100)]
x86/HVM: don't crash guest in hvmemul_find_mmio_cache()

Commit 35a61c05ea ("x86emul: adjust handling of AVX2 gathers") builds
upon the fact that the domain will actually survive running out of MMIO
result buffer space. Drop the domain_crash() invocation. Also delay
incrementing of the usage counter, such that the function can't possibly
use/return an out-of-bounds slot/pointer in case execution subsequently
makes it into the function again without a prior reset of state.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
master commit: a43c1dec246bdee484e6a3de001cc6850a107abe
master date: 2019-03-12 14:39:46 +0100

6 years agoiommu: leave IOMMU enabled by default during kexec crash transition
Igor Druzhinin [Mon, 18 Mar 2019 16:06:37 +0000 (17:06 +0100)]
iommu: leave IOMMU enabled by default during kexec crash transition

It's unsafe to disable IOMMU on a live system which is the case
if we're crashing since remapping hardware doesn't usually know what
to do with ongoing bus transactions and frequently raises NMI/MCE/SMI,
etc. (depends on the firmware configuration) to signal these abnormalities.
This, in turn, doesn't play well with kexec transition process as there is
no handling available at the moment for this kind of events resulting
in failures to enter the kernel.

Modern Linux kernels taught to copy all the necessary DMAR/IR tables
following kexec from the previous kernel (Xen in our case) - so it's
currently normal to keep IOMMU enabled. It might require minor changes to
kdump command line that enables IOMMU drivers (e.g. intel_iommu=on /
intremap=on) but recent kernels don't require any additional changes for
the transition to be transparent.

A fallback option is still left for compatibility with ancient crash
kernels which didn't like to have IOMMU active under their feet on boot.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 12c36f577d454996c882ecdc5da8113ca2613646
master date: 2019-03-12 14:38:12 +0100

6 years agox86/cpuid: add missing PCLMULQDQ dependency
Jan Beulich [Mon, 18 Mar 2019 16:05:45 +0000 (17:05 +0100)]
x86/cpuid: add missing PCLMULQDQ dependency

Since we can't seem to be able to settle our discussion for the wider
adjustment previously posted, let's at least add the missing dependency
for 4.12. I'm not convinced though that attaching it to SSE is correct.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: eeb31ee522c7bb8541eb4c037be2c42bfcf0a3c3
master date: 2019-03-05 18:04:23 +0100

6 years agox86/mm: fix #GP(0) in switch_cr3_cr4()
Jan Beulich [Mon, 18 Mar 2019 16:05:07 +0000 (17:05 +0100)]
x86/mm: fix #GP(0) in switch_cr3_cr4()

With "pcid=no-xpti" and opposite XPTI settings in two 64-bit PV domains
(achievable with one of "xpti=no-dom0" or "xpti=no-domu"), switching
from a PCID-disabled to a PCID-enabled 64-bit PV domain fails to set
CR4.PCIDE in time, as CR4.PGE would not be set in either (see
pv_fixup_guest_cr4(), in particular as used by write_ptbase()), and
hence the early CR4 write would be skipped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
master commit: fdc2056767ba74346dfd8bbe868bb22521ba1418
master date: 2019-03-05 17:02:36 +0100

6 years agox86/nmi: correctly check MSB of P6 performance counter MSR in watchdog
Igor Druzhinin [Mon, 18 Mar 2019 16:04:30 +0000 (17:04 +0100)]
x86/nmi: correctly check MSB of P6 performance counter MSR in watchdog

The logic currently tries to work out if a recent overflow (that indicates
that NMI comes from the watchdog) happened by checking MSB of performance
counter MSR that is initially sign extended from a negative value
that we program it to. A possibly incorrect assumption here is that
MSB is always bit 32 while on modern hardware it's usually 47 and
the actual bit-width is reported through CPUID. Checking bit 32 for
overflows is usually fine since we never program it to anything
exceeding 32-bits and NMI is handled shortly after overflow occurs.

A problematic scenario that we saw occurs on systems where SMIs taking
significant time are possible. In that case, NMI handling is deferred to
the point firmware exits SMI which might take enough time for the counter
to go through bit 32 and set it to 1 again. So the logic described above
will misread it and report an unknown NMI erroneously.

Fortunately, we can use the actual MSB, which is usually higher than the
currently hardcoded 32, and treat this case correctly at least on modern
hardware.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: 0452d02b6e7849537914dd30cbfc8eb27cdad2ce
master date: 2019-02-28 13:44:40 +0000

6 years agox86/hvm: Increase the triple fault log message level to XENLOG_ERR
Andrew Cooper [Mon, 18 Mar 2019 16:03:41 +0000 (17:03 +0100)]
x86/hvm: Increase the triple fault log message level to XENLOG_ERR

At INFO level, it doesn't get printed out by default in release builds,
leading to unqualified logging such as this:

  (XEN) [   66.995993] Freed 524kB init memory
  (XEN) [ 1993.144997] *** Dumping Dom9 vcpu#2 state: ***
  (XEN) [ 1993.145008] ----[ Xen-4.11.1  x86_64  debug=n   Not tainted ]----
  (XEN) [ 1993.145011] CPU:    21
  (XEN) [ 1993.145015] RIP:    0010:[<ffffe0002ba950ef>]
  (XEN) [ 1993.145018] RFLAGS: 0000000000010246   CONTEXT: hvm guest (d9v2)
  (XEN) [ 1993.145026] rax: 00000000ffffe000   rbx: ffffe0002d8e1440   rcx: 0000ffffe0002ba9
  (XEN) [ 1993.145031] rdx: 0000000000000000   rsi: ffffe0002ba93575   rdi: fffff803dfb9f340
  (XEN) [ 1993.145035] rbp: ffffd001cd791200   rsp: ffffd001cd791140   r8:  0000000000000130
  (XEN) [ 1993.145039] r9:  0000000080000000   r10: 0000000000000000   r11: 0000000000000020
  (XEN) [ 1993.145043] r12: ffffe0002ba9306d   r13: 0000000000000000   r14: 0000000000000001
  (XEN) [ 1993.145047] r15: fffff803dfb9f200   cr0: 0000000080050031   cr4: 0000000000170678
  (XEN) [ 1993.145051] cr3: 00000000001aa002   cr2: 0000020488403f70
  (XEN) [ 1993.145056] fsb: 0000000060f71000   gsb: ffffd001cc1af000   gss: 0000009d60f6f000
  (XEN) [ 1993.145060] ds: 002b   es: 002b   fs: 0053   gs: 002b   ss: 0018   cs: 0010

A triple fault is fatal to the domain under all circumstances (so will print
at most once), and in practice is always an error condition rather than a
reboot fallback.

Reported-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
master commit: 1c8ca185e3c6e003398471edd9dbac0cd118137c
master date: 2019-02-28 11:16:27 +0000

6 years agox86/vmx: Properly flush the TLB when an altp2m is modified
Andrew Cooper [Mon, 18 Mar 2019 16:02:49 +0000 (17:02 +0100)]
x86/vmx: Properly flush the TLB when an altp2m is modified

Modifications to an altp2m mark the p2m as needing flushing, but this was
never wired up in the return-to-guest path.  As a result, stale TLB entries
can remain after resuming the guest.

In practice, this manifests as a missing EPT_VIOLATION or #VE exception when
the guest subsequently accesses a page which has had its permissions reduced.

vmx_vmenter_helper() now has 11 p2ms to potentially invalidate, but issuing 11
INVEPT instructions isn't clever.  Instead, count how many contexts need
invalidating, and use INVEPT_ALL_CONTEXT if two or more are in need of
flushing.

This doesn't have an XSA because altp2m is not yet a security-supported
feature.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Kevin Tian <kevin.tian@intel.com>
master commit: 69f7643df68ef8e994221a996e336a47cbb7bbc8
master date: 2019-02-28 11:16:27 +0000

6 years agox86/shadow: don't use map_domain_page_global() on paths that may not fail
Jan Beulich [Mon, 18 Mar 2019 16:02:02 +0000 (17:02 +0100)]
x86/shadow: don't use map_domain_page_global() on paths that may not fail

The assumption (according to one comment) and hope (according to
another) that map_domain_page_global() can't fail are both wrong on
large enough systems. Do away with the guest_vtable field altogether,
and establish / tear down the desired mapping as necessary.

The alternatives, discarded as being undesirable, would have been to
either crash the guest in sh_update_cr3() when the mapping fails, or to
bubble up an error indicator, which upper layers would have a hard time
to deal with (other than again by crashing the guest).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
master commit: 43282a5e64da26fad544e0100abf35048cf65b46
master date: 2019-02-26 16:56:26 +0100

6 years agoviridian: fix the HvFlushVirtualAddress/List hypercall implementation
Paul Durrant [Mon, 18 Mar 2019 16:01:21 +0000 (17:01 +0100)]
viridian: fix the HvFlushVirtualAddress/List hypercall implementation

The current code uses hvm_asid_flush_vcpu() but this is insufficient for
a guest running in shadow mode, which results in guest crashes early in
boot if the 'hcall_remote_tlb_flush' is enabled.

This patch, instead of open coding a new flush algorithm, adapts the one
already used by the HVMOP_flush_tlbs Xen hypercall. The implementation is
modified to allow TLB flushing a subset of a domain's vCPUs. A callback
function determines whether or not a vCPU requires flushing. This mechanism
was chosen because, while it is the case that the currently implemented
viridian hypercalls specify a vCPU mask, there are newer variants which
specify a sparse HV_VP_SET and thus use of a callback will avoid needing to
expose details of this outside of the viridian subsystem if and when those
newer variants are implemented.

NOTE: Use of the common flush function requires that the hypercalls are
      restartable and so, with this patch applied, viridian_hypercall()
      can now return HVM_HCALL_preempted. This is safe as no modification
      to struct cpu_user_regs is done before the return.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
master commit: ce98ee3050a824994ce4957faa8f53ecb8c7da9d
master date: 2019-02-26 16:55:06 +0100