Julien Grall [Tue, 26 May 2020 17:31:33 +0000 (18:31 +0100)]
xen: Check the alignment of the offset pased via VCPUOP_register_vcpu_info
Currently a guest is able to register any guest physical address to use
for the vcpu_info structure as long as the structure can fits in the
rest of the frame.
This means a guest can provide an address that is not aligned to the
natural alignment of the structure.
On Arm 32-bit, unaligned access are completely forbidden by the
hypervisor. This will result to a data abort which is fatal.
On Arm 64-bit, unaligned access are only forbidden when used for atomic
access. As the structure contains fields (such as evtchn_pending_self)
that are updated using atomic operations, any unaligned access will be
fatal as well.
While the misalignment is only fatal on Arm, a generic check is added
as an x86 guest shouldn't sensibly pass an unaligned address (this
would result to a split lock).
This is XSA-327.
Reported-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
x86/ept: flush cache when modifying PTEs and sharing page tables
Modifications made to the page tables by EPT code need to be written
to memory when the page tables are shared with the IOMMU, as Intel
IOMMUs can be non-coherent and thus require changes to be written to
memory in order to be visible to the IOMMU.
In order to achieve this make sure data is written back to memory
after writing an EPT entry when the recalc bit is not set in
atomic_write_ept_entry. If such bit is set, the entry will be
adjusted and atomic_write_ept_entry will be called a second time
without the recalc bit set. Note that when splitting a super page the
new tables resulting of the split should also be written back.
Failure to do so can allow devices behind the IOMMU access to the
stale super page, or cause coherency issues as changes made by the
processor to the page tables are not visible to the IOMMU.
This allows to remove the VT-d specific iommu_pte_flush helper, since
the cache write back is now performed by atomic_write_ept_entry, and
hence iommu_iotlb_flush can be used to flush the IOMMU TLB. The newly
used method (iommu_iotlb_flush) can result in less flushes, since it
might sometimes be called rightly with 0 flags, in which case it
becomes a no-op.
This is part of XSA-321.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Some VT-d IOMMUs are non-coherent, which requires a cache write back
in order for the changes made by the CPU to be visible to the IOMMU.
This cache write back was unconditionally done using clflush, but there are
other more efficient instructions to do so, hence implement support
for them using the alternative framework.
This is part of XSA-321.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
vtd: don't assume addresses are aligned in sync_cache
Current code in sync_cache assume that the address passed in is
aligned to a cache line size. Fix the code to support passing in
arbitrary addresses not necessarily aligned to a cache line size.
This is part of XSA-321.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
The hook is only implemented for VT-d and it uses the already existing
iommu_sync_cache function present in VT-d code. The new hook is
added so that the cache can be flushed by code outside of VT-d when
using shared page tables.
Note that alloc_pgtable_maddr must use the now locally defined
sync_cache function, because IOMMU ops are not yet setup the first
time the function gets called during IOMMU initialization.
No functional change intended.
This is part of XSA-321.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Rename __iommu_flush_cache to iommu_sync_cache and remove
iommu_flush_cache_page. Also remove the iommu_flush_cache_entry
wrapper and just use iommu_sync_cache instead. Note the _entry suffix
was meaningless as the wrapper was already taking a size parameter in
bytes. While there also constify the addr parameter.
No functional change intended.
This is part of XSA-321.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 7 Jul 2020 12:37:46 +0000 (14:37 +0200)]
vtd: improve IOMMU TLB flush
Do not limit PSI flushes to order 0 pages, in order to avoid doing a
full TLB flush if the passed in page has an order greater than 0 and
is aligned. Should increase the performance of IOMMU TLB flushes when
dealing with page orders greater than 0.
This is part of XSA-321.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
x86/ept: atomically modify entries in ept_next_level
ept_next_level was passing a live PTE pointer to ept_set_middle_entry,
which was then modified without taking into account that the PTE could
be part of a live EPT table. This wasn't a security issue because the
pages returned by p2m_alloc_ptp are zeroed, so adding such an entry
before actually initializing it didn't allow a guest to access
physical memory addresses it wasn't supposed to access.
This is part of XSA-328.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 7 Jul 2020 12:36:52 +0000 (14:36 +0200)]
x86/EPT: ept_set_middle_entry() related adjustments
ept_split_super_page() wants to further modify the newly allocated
table, so have ept_set_middle_entry() return the mapped pointer rather
than tearing it down and then getting re-established right again.
Similarly ept_next_level() wants to hand back a mapped pointer of
the next level page, so re-use the one established by
ept_set_middle_entry() in case that path was taken.
Pull the setting of suppress_ve ahead of insertion into the higher level
table, and don't have ept_split_super_page() set the field a 2nd time.
This is part of XSA-328.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 7 Jul 2020 12:36:24 +0000 (14:36 +0200)]
x86/shadow: correct an inverted conditional in dirty VRAM tracking
This originally was "mfn_x(mfn) == INVALID_MFN". Make it like this
again, taking the opportunity to also drop the unnecessary nearby
braces.
This is XSA-319.
Fixes: 246a5a3377c2 ("xen: Use a typesafe to define INVALID_MFN") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Thu, 19 Mar 2020 13:17:31 +0000 (13:17 +0000)]
xen/common: event_channel: Don't ignore error in get_free_port()
Currently, get_free_port() is assuming that the port has been allocated
when evtchn_allocate_port() is not return -EBUSY.
However, the function may return an error when:
- We exhausted all the event channels. This can happen if the limit
configured by the administrator for the guest ('max_event_channels'
in xl cfg) is higher than the ABI used by the guest. For instance,
if the guest is using 2L, the limit should not be higher than 4095.
- We cannot allocate memory (e.g Xen has not more memory).
Users of get_free_port() (such as EVTCHNOP_alloc_unbound) will validly
assuming the port was valid and will next call evtchn_from_port(). This
will result to a crash as the memory backing the event channel structure
is not present.
Fixes: 368ae9a05fe ("xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 6 Jul 2020 15:14:24 +0000 (17:14 +0200)]
x86emul: fix FXRSTOR test for most AMD CPUs
AMD CPUs that we classify as X86_BUG_FPU_PTRS don't touch the selector/
offset portion of the save image during FXSAVE unless an unmasked
exception is pending. Hence the selector zapping done between the
initial FXSAVE and the emulated FXRSTOR needs to be mirrored onto the
second FXSAVE, output of which gets fed into memcmp() to compare with
the input image.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Wei Liu [Fri, 3 Jul 2020 20:10:01 +0000 (20:10 +0000)]
kdd: fix build again
Restore Tim's patch. The one that was committed was recreated by me
because git didn't accept my saved copy. I made some mistakes while
recreating that patch and here we are.
Fixes: 3471cafbdda3 ("kdd: stop using [0] arrays to access packet contents") Reported-by: Michael Young <m.a.young@durham.ac.uk> Signed-off-by: Wei Liu <wl@xen.org> Reviewed-by: Tim Deegan <tim@xen.org> Release-acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Thu, 2 Jul 2020 09:11:40 +0000 (11:11 +0200)]
build: tweak variable exporting for make 3.82
While I've been running into an issue here only because of an additional
local change I'm carrying, to be able to override just the compiler in
$(XEN_ROOT)/.config (rather than the whole tool chain), in
config/StdGNU.mk:
I'd like to propose to nevertheless correct the underlying issue:
Exporting an unset variable changes its origin from "undefined" to
"file". This comes into effect because of our adding of -rR to
MAKEFLAGS, which make 3.82 wrongly applies also upon re-invoking itself
after having updated auto.conf{,.cmd}.
Move the export statement past $(XEN_ROOT)/config/$(XEN_OS).mk inclusion
(which happens through $(XEN_ROOT)/Config.mk) such that the variables
already have their designated values at that point, while retaining
their initial origin up to the point they get defined.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Bertrand Marquis <bertrand.marquis@arm.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Commit e9aca9470ed86 introduced a regression when avoiding sending
IPIs for certain flush operations. Xen page fault handler
(spurious_page_fault) relies on blocking interrupts in order to
prevent handling TLB flush IPIs and thus preventing other CPUs from
removing page tables pages. Switching to assisted flushing avoided such
IPIs, and thus can result in pages belonging to the page tables being
removed (and possibly re-used) while __page_fault_type is being
executed.
Force some of the TLB flushes to use IPIs, thus avoiding the assisted
TLB flush. Those selected flushes are the page type change (when
switching from a page table type to a different one, ie: a page that
has been removed as a page table) and page allocation. This sadly has
a negative performance impact on the pvshim, as less assisted flushes
can be used. Note the flush in grant-table code is also switched to
use an IPI even when not strictly needed. This is done so that a
common arch_flush_tlb_mask can be introduced and always used in common
code.
Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
using an IPI (x86 only). Note that the flag is only meaningfully defined
when the hypervisor supports PV or shadow paging mode, as otherwise
hardware assisted paging domains are in charge of their page tables and
won't share page tables with Xen, thus not influencing the result of
page walks performed by the spurious fault handler.
Just passing this new flag when calling flush_area_mask prevents the
usage of the assisted flush without any other side effects.
Note the flag is not defined on Arm.
Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available') Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Release-acked-by: Paul Durrant <paul@xen.org>
Trusted Applications use a popular approach to determine the required
size of a buffer: the client provides a memory reference with the NULL
pointer to a buffer. This is so called "Null memory reference". TA
updates the reference with the required size and returns it back to the
client. Then the client allocates a buffer of the needed size and
repeats the operation.
This behavior is described in TEE Client API Specification, paragraph
3.2.5. Memory References.
OP-TEE represents this null memory reference as a TMEM parameter with
buf_ptr = 0x0. This is the only case when we should allow a TMEM
buffer without the OPTEE_MSG_ATTR_NONCONTIG flag. This also the
special case for a buffer with OPTEE_MSG_ATTR_NONCONTIG flag.
This could lead to a potential issue, because IPA 0x0 is a valid
address, but OP-TEE will treat it as a special case. So, care should
be taken when construction OP-TEE enabled guest to make sure that such
guest have no memory at IPA 0x0 and none of its memory is mapped at PA
0x0.
optee: immediately free buffers that are released by OP-TEE
Normal World can share a buffer with OP-TEE for two reasons:
1. A client application wants to exchange data with TA
2. OP-TEE asks for shared buffer for internal needs
The second case was handled more strictly than necessary:
1. In RPC request OP-TEE asks for buffer
2. NW allocates buffer and provides it via RPC response
3. Xen pins pages and translates data
4. Xen provides buffer to OP-TEE
5. OP-TEE uses it
6. OP-TEE sends request to free the buffer
7. NW frees the buffer and sends the RPC response
8. Xen unpins pages and forgets about the buffer
The problem is that Xen should forget about buffer in between stages 6
and 7. I.e. the right flow should be like this:
6. OP-TEE sends request to free the buffer
7. Xen unpins pages and forgets about the buffer
8. NW frees the buffer and sends the RPC response
This is because OP-TEE internally frees the buffer before sending the
"free SHM buffer" request. So we have no reason to hold reference for
this buffer anymore. Moreover, in multiprocessor systems NW have time
to reuse the buffer cookie for another buffer. Xen complained about this
and denied the new buffer registration. I have seen this issue while
running tests on iMX SoC.
So, this patch basically corrects that behavior by freeing the buffer
earlier, when handling RPC return from OP-TEE.
Andrew Cooper [Wed, 1 Jul 2020 11:39:59 +0000 (12:39 +0100)]
x86/spec-ctrl: Protect against CALL/JMP straight-line speculation
Some x86 CPUs speculatively execute beyond indirect CALL/JMP instructions.
With CONFIG_INDIRECT_THUNK / Retpolines, indirect CALL/JMP instructions are
converted to direct CALL/JMP's to __x86_indirect_thunk_REG(), leaving just a
handful of indirect JMPs implementing those stubs.
There is no architectrual execution beyond an indirect JMP, so use INT3 as
recommended by vendors to halt speculative execution. This is shorter than
LFENCE (which would also work fine), but also shows up in logs if we do
unexpected execute them.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Roger Pau Monné [Mon, 29 Jun 2020 16:03:49 +0000 (18:03 +0200)]
mm: fix public declaration of struct xen_mem_acquire_resource
XENMEM_acquire_resource and it's related structure is currently inside
a __XEN__ or __XEN_TOOLS__ guarded section to limit it's scope to the
hypervisor or the toolstack only. This is wrong as the hypercall is
already being used by the Linux kernel at least, and as such needs to
be public.
Also switch the usage of uint64_aligned_t to plain uint64_t, as
uint64_aligned_t is only to be used by the toolstack. Doing such
change will reduce the size of the structure on 32bit x86 by 4bytes,
since there will be no padding added after the frame_list handle.
This is fine, as users of the previous layout will allocate 4bytes of
padding that won't be read by Xen, and users of the new layout won't
allocate those, which is also fine since Xen won't try to access them.
Note that the structure already has compat handling, and such handling
will take care of copying the right size (ie: minus the padding) when
called from a 32bit x86 context. This is true for the compat code both
before and after this patch, since the structures in the memory.h
compat header are subject to a pragma pack(4), which already removed
the trailing padding that would otherwise be introduced by the
alignment of the frame field to 8 bytes.
Fixes: 3f8f12281dd20 ('x86/mm: add HYPERVISOR_memory_op to acquire guest resources') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Fri, 26 Jun 2020 14:35:27 +0000 (15:35 +0100)]
changelog: Add notes about CET and Migration changes
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Mon, 8 Jun 2020 17:47:58 +0000 (18:47 +0100)]
x86/livepatch: Make livepatching compatible with CET Shadow Stacks
Just like the alternatives infrastructure, the livepatch infrastructure
disables CR0.WP to perform patching, which is not permitted with CET active.
Modify arch_livepatch_{quiesce,revive}() to disable CET before disabling WP,
and reset the dirty bits on all virtual regions before re-enabling CET.
One complication is that arch_livepatch_revive() has to fix up the top of the
shadow stack. This depends on the functions not being inlined, even under
LTO. Another limitation is that reset_virtual_region_perms() may shatter the
final superpage of .text depending on alignment.
This logic, and its downsides, are temporary until the patching infrastructure
can be adjusted to not use CR0.WP.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Fri, 19 Jun 2020 11:14:32 +0000 (12:14 +0100)]
x86/msr: Disallow access to Processor Trace MSRs
We do not expose the feature to guests, so should disallow access to the
respective MSRs. For simplicity, drop the entire block of MSRs, not just the
subset which have been specified thus far.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wl@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
input_fd & output_fd may be the same FD. In that case, mark both as -1
when closing one. That avoids a dangling FD reference.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Thu, 11 Jun 2020 03:29:35 +0000 (23:29 -0400)]
vchan-socket-proxy: Cleanup resources on exit
Close open FDs and close th vchan connection when exiting the program.
This addresses some Coverity findings about leaking file descriptors.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Thu, 11 Jun 2020 03:29:34 +0000 (23:29 -0400)]
vchan-socket-proxy: Set closed FDs to -1
These FDs are closed, so set them to -1 so they are no longer valid.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Thu, 11 Jun 2020 03:29:33 +0000 (23:29 -0400)]
vchan-socket-proxy: Switch data_loop() to take state
Switch data_loop to take a pointer to vchan_proxy_state.
No functional change.
This removes a dead store to input_fd identified by Coverity.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Thu, 11 Jun 2020 03:29:32 +0000 (23:29 -0400)]
vchan-socket-proxy: Use a struct to store state
Use a struct to group the vchan ctrl and FDs. This will facilite
tracking the state of open and closed FDs and ctrl in data_loop().
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Thu, 11 Jun 2020 03:29:31 +0000 (23:29 -0400)]
vchan-socket-proxy: Unify main return value
Introduce 'ret' for main's return value and remove direct returns. This
is in preparation for a unified exit path with resource cleanup.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Thu, 11 Jun 2020 03:29:30 +0000 (23:29 -0400)]
vchan-socket-proxy: Check xs_watch return value
Check the return value of xs_watch and error out on failure.
This was found by Citrix's Coverity.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Thu, 11 Jun 2020 03:29:29 +0000 (23:29 -0400)]
vchan-socket-proxy: Move perror() into connect_socket
errno is reset by subsequent system & library calls, so it may be
inaccurate by the time connect_socket returns. Call perror immediately
after failing system calls to print the proper message.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Thu, 11 Jun 2020 03:29:28 +0000 (23:29 -0400)]
vchan-socket-proxy: Move perror() into listen_socket
The use of perror on the return from listen_socket can produce
misleading results like:
UNIX socket path "/tmp/aa....aa" too long (156 >= 108)
listen socket: Success
errno is reset by subsequent system & library calls, so it may be
inaccurate by the time listen_socket returns. Call perror immediately
after failing system calls to print the proper message.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Check the socket path length to ensure sun_path is NUL terminated.
This was spotted by Citrix's Coverity.
Also use strcpy to avoid a warning "'__builtin_strncpy' specified bound
108 equals destination size [-Werror=stringop-truncation]" flagged by
gcc 10.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Release-acked-by: Paul Durrant <paul@xen.org>
Grzegorz Uriasz [Sun, 14 Jun 2020 16:17:08 +0000 (16:17 +0000)]
libxl: tooling expects wrong errno
When iommu is not enabled for a given domain then pci passthrough
hypercalls such as xc_test_assign_device return EOPNOTSUPP.
The code responsible for this is in "iommu_do_domctl" inside
xen/drivers/passthrough/iommu.c
This patch fixes the error message reported by libxl when assigning
pci devices to domains without iommu.
Signed-off-by: Grzegorz Uriasz <gorbak25@gmail.com> Tested-by: Grzegorz Uriasz <gorbak25@gmail.com>
Backport: 4.13 Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Tim Deegan [Fri, 26 Jun 2020 10:40:44 +0000 (10:40 +0000)]
kdd: stop using [0] arrays to access packet contents
GCC 10 is unhappy about this, and we already use 64k buffers
in the only places where packets are allocated, so move the
64k size into the packet definition.
Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: Wei Liu <wl@xen.org> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Wed, 10 Jun 2020 11:40:04 +0000 (12:40 +0100)]
tools: fix error path of xendevicemodel_open()
c/s 6902cb00e03 "tools/libxendevicemodel: extract functions and add a compat
layer" introduced calls to both xencall_open() and osdep_xendevicemodel_open()
but failed to fix up the error path.
c/s f68c7c618a3 "libs/devicemodel: free xencall handle in error path in
_open()" fixed up the xencall_open() aspect of the error path (missing the
osdep_xendevicemodel_open() aspect), but positioned the xencall_close()
incorrectly, creating the same pattern proved to be problematic by c/s 30a72f02870 "tools: fix error path of xenhypfs_open()".
Reposition xtl_logger_destroy(), and introduce the missing
osdep_xendevicemodel_close().
Fixes: 6902cb00e03 ("tools/libxendevicemodel: extract functions and add a compat layer") Fixes: f68c7c618a3 ("libs/devicemodel: free xencall handle in error path in _open()")
Backport: 4.9+ Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Juergen Gross <jgross@suse.com> Release-acked-by: Paul Durrant <paul@xen.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Thu, 25 Jun 2020 15:16:02 +0000 (17:16 +0200)]
scripts: don't rely on "stat -" support
While commit b72682c602b8 ("scripts: Use stat to check lock claim")
validly indicates that stat has gained support for the special "-"
command line option in 2009, we should still try to avoid breaking being
able to run on even older distros. As it has been determined, contary to
the comment in the script using /dev/stdin (/proc/self/fd/$_lockfd) is
fine here, as Linux specially treats these /proc inodes.
Suggested-by: Ian Jackson <ian.jackson@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Tested-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Thu, 25 Jun 2020 07:12:21 +0000 (09:12 +0200)]
x86/CPUID: fill all fields in x86_cpuid_policy_fill_native()
Coverity validly complains that the new call from
tools/tests/cpu-policy/test-cpu-policy.c:test_cpuid_current() leaves
two fields uninitialized, yet they get then consumed by
x86_cpuid_copy_to_buffer(). (All other present callers of the function
pass a pointer to a static - and hence initialized - buffer.)
Coverity-ID: 1464809 Fixes: c22ced93e167 ("tests/cpu-policy: Confirm that CPUID serialisation is sorted") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Grzegorz Uriasz [Thu, 25 Jun 2020 07:11:09 +0000 (09:11 +0200)]
x86/acpi: use FADT flags to determine the PMTMR width
On some computers the bit width of the PM Timer as reported
by ACPI is 32 bits when in fact the FADT flags report correctly
that the timer is 24 bits wide. On affected machines such as the
ASUS FX504GM and never gaming laptops this results in the inability
to resume the machine from suspend. Without this patch suspend is
broken on affected machines and even if a machine manages to resume
correctly then the kernel time and xen timers are trashed.
Signed-off-by: Grzegorz Uriasz <gorbak25@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
These files are in tree so that people can build (including from git)
without needing less-than-a-decade-old flex and bison.
We should update them periodically. Debian buster has been Debian
stable for a while. Our CI is running buster.
There should be no significant functional change; it's possible that
there are bugfixes but I have not reviewed the changes. I *have*
checked that the flex I am using has the fix for CVE-2016-6354.
CC: Paul Durrant <paul@xen.org> CC: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Ian Jackson [Fri, 12 Jun 2020 14:31:06 +0000 (15:31 +0100)]
tools: Commit autoconf output from Debian buster
These files are in tree so that people can build (including from git)
without needing recent autotools.
We should update them periodically. Debian buster has been Debian
stable fopr a while. Our CI is running buster.
There should be no significant functional change; it's possible that
there are bugfixes to the configure scripts but I have not reviewed
them.
These files were last changed in 83c845033dc8bb3a35ae245effb7832b6823174a
libxl: use vchan for QMP access with Linux stubdomain
where a new feature was added. However, that commit contains a lot of
extraneous noise in configure compared to its parent.
Compared to 83c845033dc8bb3a35ae245effb7832b6823174a~, this commit
restores those extraneous changes, leaving precisely the correct
changes. So one way of looking at the changes we are making now, is
that we are undoing accidental changes to the autoconf output.
I used Debian's autoconf 2.69-11 on amd64.
CC: Wei Liu <wl@xen.org> CC: Nick Rosbrook <rosbrookn@gmail.com> Reported-by: Nick Rosbrook <rosbrookn@gmail.com> CC: Paul Durrant <paul@xen.org> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Tamas K Lengyel [Fri, 19 Jun 2020 13:24:55 +0000 (15:24 +0200)]
x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE
While forking VMs running a small RTOS system (Zephyr) a Xen crash has been
observed due to a mm-lock order violation while copying the HVM CPU context
from the parent. This issue has been identified to be due to
hap_update_paging_modes first getting a lock on the gfn using get_gfn. This
call also creates a shared entry in the fork's memory map for the cr3 gfn. The
function later calls hap_update_cr3 while holding the paging_lock, which
results in the lock-order violation in vmx_load_pdptrs when it tries to unshare
the above entry when it grabs the page with the P2M_UNSHARE flag set.
Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was
unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure
the p2m is properly populated.
Note that the lock order violation is avoided because before the paging_lock is
taken a lookup is performed with P2M_ALLOC that forks the page, thus the second
lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep
P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it
which don't take the paging_lock and that have no previous lookup. Currently no
other code-path exists leading there with the paging_lock taken, thus no
further adjustments are necessary.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Paul Durrant <paul@xen.org>
Roger Pau Monné [Fri, 19 Jun 2020 13:23:50 +0000 (15:23 +0200)]
x86/hvm: check against VIOAPIC_LEVEL_TRIG in hvm_gsi_deassert
In order to avoid relying on the specific values of
VIOAPIC_{LEVEL/EDGE}_TRIG.
No functional change.
Requested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Olaf Hering [Wed, 17 Jun 2020 06:08:41 +0000 (07:08 +0100)]
stubdom/vtpm: add extern to function declarations
Code compiled with gcc10 will not link properly due to multiple definition of the same function.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Samuel Thibault <samuel.thibaut@ens-lyon.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Wed, 17 Jun 2020 02:36:42 +0000 (03:36 +0100)]
xl: Allow shutdown wait for domain death
`xl shutdown -w` waits for the first of either domain shutdown or death.
Shutdown is the halting of the guest operating system, and death is the
freeing of domain resources.
Allow specifying -w multiple times to wait for only domain death. This
is useful in scripts so that all resources are free before the script
continues.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Igor Druzhinin [Wed, 17 Jun 2020 02:19:13 +0000 (03:19 +0100)]
tools/xen-ucode: return correct exit code on failed microcode update
Otherwise it's difficult to know if operation failed inside the automation.
While at it, also switch to returning 1 and 2 instead of errno to avoid
incompatibilies between errno and special exit code numbers.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Release-acked-by: Paul Durrant <paul@xen.org> Reviewed-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Andrew Cooper [Fri, 12 Jun 2020 12:39:13 +0000 (13:39 +0100)]
x86/spec-ctrl: Hide RDRAND by default on IvyBridge client
To combat the absence of mitigating microcode, arrange to hide RDRAND by
default on IvyBridge client hardware.
Adjust the default feature derivation to hide RDRAND on IvyBridge client
parts, unless `cpuid=rdrand` is explicitly provided.
Adjust the restore path in xc_cpuid_apply_policy() to not hide RDRAND from VMs
which migrated from pre-4.14.
In all cases, individual guests can continue using RDRAND if explicitly
enabled in their config files.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Mon, 15 Jun 2020 12:42:11 +0000 (13:42 +0100)]
x86/cpuid: Introduce missing feature adjustment in calculate_pv_def_policy()
This was an accidental asymmetry with the HVM side.
No change in behaviour at this point.
Fixes: 83b387382 ("x86/cpuid: Introduce and use default CPUID policies") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Mon, 24 Feb 2020 17:15:56 +0000 (17:15 +0000)]
x86/hvm: Disable MPX by default
Memory Protection eXtension support has been dropped from GCC and Linux, and
will be dropped from future Intel CPUs.
With all other default/max pieces in place, move MPX from default to max.
This means that VMs won't be offered it by default, but can explicitly opt
into using it via cpuid="host,mpx=1" in their vm.cfg file.
Adjust the legacy restore path in libxc to cope safely with pre-4.14 VMs.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Tue, 25 Feb 2020 15:33:31 +0000 (15:33 +0000)]
x86/gen-cpuid: Distinguish default vs max in feature annotations
The toolstack logic can now correctly distinguish a clean boot from a
migrate/restore.
Allow lowercase a/s/h to be used to annotate a non-default feature.
Due to the emulator work prepared earlier in 4.14, this now allows VMs to
explicity opt in to the TSXLDTRK, MOVDIR{I,64B} and SERIALIZE instructions via
their xl.cfg file, rather than getting them as a matter of default.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Fri, 12 Jun 2020 13:07:10 +0000 (14:07 +0100)]
tools/libx[cl]: Plumb bool restore down into xc_cpuid_apply_policy()
In order to safely disable some features by default, without breaking
migration from 4.13 or older, the CPUID logic needs to distinguish the two
cases.
Plumb a restore boolean down from the two callers of libxl__cpuid_legacy() all
the way down into xc_cpuid_apply_policy().
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Fri, 12 Jun 2020 13:07:10 +0000 (14:07 +0100)]
tools/libx[cl]: Merge xc_cpuid_set() into xc_cpuid_apply_policy()
This reduces the number of CPUID handling entry-points to just one.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Fri, 12 Jun 2020 13:07:10 +0000 (14:07 +0100)]
tools/libx[cl]: Move processing loop down into xc_cpuid_set()
Currently, libxl__cpuid_legacy() passes each element of the policy list to
xc_cpuid_set() individually. This is wasteful both in terms of the number of
hypercalls made, and the quantity of repeated merging/auditing work performed
by Xen.
Move the loop processing down into xc_cpuid_set(), which allows us to do one
set of hypercalls, rather than one per list entry.
In xc_cpuid_set(), obtain the full host, guest max and current policies to
begin with, and loop over the xend array, processing one leaf at a time.
Replace the linear search with a binary search, seeing as the serialised
leaves are sorted.
No change in behaviour from the guests point of view.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Fri, 12 Jun 2020 15:48:02 +0000 (16:48 +0100)]
tests/cpu-policy: Confirm that CPUID serialisation is sorted
The existing x86_cpuid_copy_to_buffer() does produce sorted results, and we're
about to start relying on this. Extend the unit tests.
As test_cpuid_serialise_success() is a fairly limited set of synthetic
examples right now, introduce test_cpuid_current() to operate on the full
policy for the current CPU.
Tweak the fail() macro to allow for simplified control flow.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Fri, 12 Jun 2020 13:05:44 +0000 (14:05 +0100)]
tools/libx[cl]: Introduce struct xc_xend_cpuid for xc_cpuid_set()
In order to combine the functionality of xc_cpuid_set() with
xc_cpuid_apply_policy(), arrange to pass the data in a single contained
struct, rather than two arrays.
libxl__cpuid_policy is the ideal structure to use, but that would introduce a
reverse dependency between libxc and libxl. Introduce xc_xend_cpuid (with a
transparent union to provide more useful names for the inputs), and use this
structure in libxl.
The public API has libxl_cpuid_policy as an opaque type referencing
libxl__cpuid_policy. Drop the inappropriate comment about its internals, and
use xc_xend_cpuid as a differently named opaque backing object. Users of both
libxl and libxc are not permitted to look at the internals.
No change in behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Bertrand Marquis [Tue, 16 Jun 2020 08:31:26 +0000 (10:31 +0200)]
x86/boot: use BASEDIR for include path
Use $(BASEDIR)/include instead of $(XEN_ROOT)/xen/include for the
include path to be coherent with the rest of the Makefiles.
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jason Andryuk [Tue, 16 Jun 2020 08:31:08 +0000 (10:31 +0200)]
libacpi: widen TPM detection
The hardcoded tpm_signature is too restrictive to detect many TPMs. For
instance, it doesn't accept a QEMU emulated TPM (VID 0x1014 DID 0x0001).
Make the TPM detection match that in rombios which accepts a wider
range.
With this change, the TPM's TCPA ACPI table is generated and the guest
OS can automatically load the tpm_tis driver. It also allows seabios to
detect and use the TPM. However, seabios skips some TPM initialization
when running under Xen, so it will not populate any PCRs unless modified
to run the initialization under Xen.
Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Tamas K Lengyel [Tue, 16 Jun 2020 08:30:48 +0000 (10:30 +0200)]
libxc: xc_memshr_fork with interrupts blocked
Toolstack side for creating forks with interrupt injection blocked.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wl@xen.org> Release-acked-by: Paul Durrant <paul@xen.org>
Tamas K Lengyel [Tue, 16 Jun 2020 08:29:16 +0000 (10:29 +0200)]
x86/mem_sharing: block interrupt injection for forks
When running VM forks without device models (QEMU), it may
be undesirable for Xen to inject interrupts. When creating such forks from
Windows VMs we have observed the kernel trying to process interrupts
immediately after the fork is executed. However without QEMU running such
interrupt handling may not be possible because it may attempt to interact with
devices that are not emulated by a backend. In the best case scenario such
interrupt handling would only present a detour in the VM forks' execution
flow, but in the worst case as we actually observed can completely stall it.
By disabling interrupt injection a fuzzer can exercise the target code without
interference. For other use-cases this option probably doesn't make sense,
that's why this is not enabled by default.
Forks & memory sharing are only available on Intel CPUs so this only applies
to vmx. Note that this is part of the experimental VM forking feature that's
completely disabled by default and can only be enabled by using
XEN_CONFIG_EXPERT during compile time.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wl@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Paul Durrant <paul@xen.org>
Nick Rosbrook [Mon, 15 Jun 2020 15:39:42 +0000 (11:39 -0400)]
golang/xenlight: sort cases in switch statement
The xenlight_golang_union_from_C function iterates over a dict to
construct a switch statement that marshals a C keyed union into a Go
type. Because python does not guarantee dict ordering across all
versions, this can result in the switch statement being generated in a
different order depending on the version of python used. For example,
running gengotypes.py with python2.7 and python3.6 will yield different
orderings.
Iterate over sorted(cases.items()) rather than cases.items() to fix
this.
This patch changes the ordering from what was previously checked-in, but
running gengotypes.py with different versions of python will now yield
the same result.
Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Nick Rosbrook [Fri, 12 Jun 2020 14:31:02 +0000 (15:31 +0100)]
tools: check go compiler version if present
Currently, no minimum go compiler version is required by the configure
scripts. However, the go bindings actually will not build with some
older versions of go. Add a check for a minimum go version of 1.11.1 in
accordance with tools/golang/xenlight/go.mod.
Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Tested-by: Nick Rosbrook <rosbrookn@ainfosec.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Fri, 12 Jun 2020 10:55:19 +0000 (11:55 +0100)]
tools/libxc: Drop config_transformed parameter from xc_cpuid_set()
libxl is now the sole caller of xc_cpuid_set(). The config_transformed output
is ignored, and this patch trivially highlights the resulting memory leak.
"transformed" config is now properly forwarded on migrate as part of the
general VM state, so delete the transformation logic completely, rather than
trying to adjust just libxl to avoid leaking memory.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Release-acked-by: Paul Durrant <paul@xen.org> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Roger Pau Monne [Wed, 10 Jun 2020 14:29:23 +0000 (16:29 +0200)]
x86/passthrough: introduce a flag for GSIs not requiring an EOI or unmask
There's no need to setup a timer for GSIs that are edge triggered,
since those don't require any EIO or unmask, and hence couldn't block
other interrupts.
Note this is only used by PVH dom0, that can setup the passthrough of
edge triggered interrupts from the vIO-APIC. One example of such kind
of interrupt that can be used by a PVH dom0 would be the RTC timer.
While there introduce an out label to do the unlock and reduce code
duplication.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Roger Pau Monne [Wed, 10 Jun 2020 14:29:22 +0000 (16:29 +0200)]
x86/passthrough: do not assert edge triggered GSIs for PVH dom0
Edge triggered interrupts do not assert the line, so the handling done
in Xen should also avoid asserting it. Asserting the line prevents
further edge triggered interrupts on the same vIO-APIC pin from being
delivered, since the line is not de-asserted.
One case of such kind of interrupt is the RTC timer, which is edge
triggered and available to a PVH dom0. Note this should not affect
domUs, as it only modifies the behavior of IDENTITY_GSI kind of passed
through interrupts.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Release-acked-by: Paul Durrant <paul@xen.org>
Juergen Gross [Wed, 20 May 2020 08:35:01 +0000 (10:35 +0200)]
tools/libxengnttab: correct size of allocated memory
The size of the memory allocated for the IOCTL_GNTDEV_MAP_GRANT_REF
ioctl() parameters is calculated wrong, which results in too much
memory allocated.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Juergen Gross [Tue, 9 Jun 2020 14:48:49 +0000 (16:48 +0200)]
tools: fix error path of xenhypfs_open()
In case of an error in xenhypfs_open() the error path will cause a
segmentation fault due to a wrong sequence of closing calls.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Fixes: 86234eafb9529 ("libs: add libxenhypfs") Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Paul Durrant <paul@xen.org> Release-acked-by: Paul Durrant <paul@xen.org> Acked-by: Wei Liu <wl@xen.org>
Ian Jackson [Tue, 9 Jun 2020 11:26:36 +0000 (12:26 +0100)]
docs-parse-support-md: Cope with buster's pandoc
Provide the implementation for newer pandoc json.
I have done an adhoc test and this now works on both buster and
stretch and seems to produce the expected support matrix when run
using the example rune (which processes unstable and 4.11).
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Ian Jackson [Tue, 9 Jun 2020 11:21:48 +0000 (12:21 +0100)]
docs-parse-support-md: Prepare for coping with pandoc versions
Different pandoc versions generate, and expect, a different toplevel
structure for their json output and inpout. Newer pandoc's toplevel
is a hash. We are going to want to support this. We can tell what
kind of output we should produce by looking at the input we got (which
itself came from pandoc). So:
* Make space for code to read toplevel objects which are not arrays.
Currently this code is absent and we just die explicitly (rather
than dying because we tried to use a hashref as an array ref).
* Move generation of the toplevel json structure out of
pandoc2html_inline, and abstract it away through a subref which is
set up when we read the input file.
This is just prep work. No functional change other than a change to
an error message.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Paul Durrant [Tue, 9 Jun 2020 10:56:24 +0000 (12:56 +0200)]
ioreq: handle pending emulation racing with ioreq server destruction
When an emulation request is initiated in hvm_send_ioreq() the guest vcpu is
blocked on an event channel until that request is completed. If, however,
the emulator is killed whilst that emulation is pending then the ioreq
server may be destroyed. Thus when the vcpu is awoken the code in
handle_hvm_io_completion() will find no pending request to wait for, but will
leave the internal vcpu io_req.state set to IOREQ_READY and the vcpu shutdown
deferall flag in place (because hvm_io_assist() will never be called). The
emulation request is then completed anyway. This means that any subsequent call
to hvmemul_do_io() will find an unexpected value in io_req.state and will
return X86EMUL_UNHANDLEABLE, which in some cases will result in continuous
re-tries.
This patch fixes the issue by moving the setting of io_req.state and clearing
of shutdown deferral (as will as MSI-X write completion) out of hvm_io_assist()
and directly into handle_hvm_io_completion().
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Tue, 9 Jun 2020 10:55:53 +0000 (12:55 +0200)]
x86/Intel: insert Ice Lake and Comet Lake model numbers
Both match prior generation processors as far as LBR and C-state MSRs
go (SDM rev 072) as well as applicability of the if_pschange_mc erratum
(recent spec updates).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Tamas K Lengyel [Tue, 9 Jun 2020 10:54:17 +0000 (12:54 +0200)]
x86/monitor: revert default behavior when monitoring register write events
For the last couple years we have received numerous reports from users of
monitor vm_events of spurious guest crashes when using events. In particular,
it has observed that the problem occurs when vm_events are being disabled. The
nature of the guest crash varied widely and has only occured occasionally. This
made debugging the issue particularly hard. We had discussions about this issue
even here on the xen-devel mailinglist with no luck figuring it out.
The bug has now been identified as a race-condition between register event
handling and disabling the monitor vm_event interface. The default behavior
regarding emulation of register write events is changed so that they get
postponed until the corresponding vm_event handler decides whether to allow such
write to take place. Unfortunately this can only be implemented by performing the
deny/allow step when the vCPU gets scheduled.
Due to that postponed emulation of the event if the user decides to pause the
VM in the vm_event handler and then disable events, the entire emulation step
is skipped the next time the vCPU is resumed. Even if the user doesn't pause
during the vm_event handling but exits immediately and disables vm_event, the
situation becomes racey as disabling vm_event may succeed before the guest's
vCPUs get scheduled with the pending emulation task. This has been particularly
the case with VMS that have several vCPUs as after the VM is unpaused it may
actually take a long time before all vCPUs get scheduled.
In this patch we are reverting the default behavior to always perform emulation
of register write events when the event occurs. To postpone them can be turned
on as an option. In that case the user of the interface still has to take care
of only disabling the interface when its safe as it remains buggy.
Fixes: 96760e2fba10 ('vm_event: deny register writes if refused by vm_event
reply').
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Reviewed-by: Roger Pau Monné <rogerpau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Roger Pau Monné [Mon, 8 Jun 2020 16:13:53 +0000 (18:13 +0200)]
x86/rtc: provide mediated access to RTC for PVH dom0
Mediated access to the RTC was provided for PVHv1 dom0 using the PV
code paths (guest_io_{write/read}), but those accesses where never
implemented for PVHv2 dom0. This patch provides such mediated accesses
to the RTC for PVH dom0, just like it's provided for a classic PV
dom0.
Pull out some of the RTC logic from guest_io_{read/write} into
specific helpers that can be used by both PV and HVM guests. The
setup of the handlers for PVH is done in rtc_init, which is already
used to initialize the fully emulated RTC.
Without this a Linux PVH dom0 will read garbage when trying to access
the RTC, and one vCPU will be constantly looping in
rtc_timer_do_work.
Note that such issue doesn't happen on domUs because the ACPI
NO_CMOS_RTC flag is set in FADT, which prevents the OS from accessing
the RTC. Also the X86_EMU_RTC flag is not set for PVH dom0, as the
accesses are not emulated but rather forwarded to the physical
hardware.
No functional change expected for classic PV dom0.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Julien Grall [Sun, 7 Jun 2020 15:51:54 +0000 (16:51 +0100)]
xen/arm: mm: Access a PT entry before the table is unmapped
xen_pt_next_level() will retrieve the MFN from the entry right after the
page-table has been unmapped.
After calling xen_unmap_table(), there is no guarantee the mapping will
still be valid. Depending on the implementation, this may result to a
data abort in Xen.
Re-order the code to retrieve the MFN before the table is unmapped.
Fixes: 53abb9a1dcd9 ("xen/arm: mm: Rework Xen page-tables walk during update") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Release-acked-by: Paul Durrant <paul@xen.org>
Nick Rosbrook [Mon, 8 Jun 2020 16:10:39 +0000 (17:10 +0100)]
golang/xenlight: remove call to go fmt in gengotypes.py
Since the golang bindings are now set to be re-generated whenever a
change is made to tools/libxl/libxl_types.idl, the call to go fmt in
gengotypes.py results in a dirty git tree for users without go
installed.
As an immediate fix, just remove the call to go fmt from gengotypes.py.
While here, make sure the DO NOT EDIT comment and package declaration
remain formatted correctly. All other generated code is left
un-formatted for now.
Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Remove trailing whitespace.
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Thu, 4 Jun 2020 20:39:37 +0000 (21:39 +0100)]
docs/support-matrix: unbreak docs rendering
The cronjob which renders https://xenbits.xen.org/docs/ has been broken for a
while. commitish_version() pulls an old version of xen/Makefile out of
history, and uses the xenversion rule.
Currently, this fails with:
tmp.support-matrix.xen.make:130: scripts/Kbuild.include: No such file or directory
which is because the Makefile legitimately references Kbuild.include with a
relative rather than absolute path.
Rework support-matrix-generate to use sed to extract the major/minor version,
rather than expecting xen/Makefile to be usable in a different tree.
Fixes: 945e80a7301f ("docs: Provide support-matrix-generate, to generate a support matrix in HTML")
Backport: 4.11+ Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 8 Jun 2020 08:25:40 +0000 (10:25 +0200)]
build: fix dependency tracking for preprocessed files
While the issue is more general, I noticed that asm-macros.i not getting
re-generated as needed. This was due to its .*.d file mentioning
asm-macros.o instead of asm-macros.i. Use -MQ here as well, and while at
it also use -MQ to avoid the somewhat fragile sed-ary on the *.lds
dependency tracking files. While there, further avoid open-coding $(CPP)
and drop the bogus (Arm) / stale (x86) -Ui386.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com> Release-acked-by: Paul Durrant <paul@xen.org>
Igor Druzhinin [Fri, 5 Jun 2020 15:12:11 +0000 (17:12 +0200)]
x86/svm: do not try to handle recalc NPT faults immediately
A recalculation NPT fault doesn't always require additional handling
in hvm_hap_nested_page_fault(), moreover in general case if there is no
explicit handling done there - the fault is wrongly considered fatal.
This covers a specific case of migration with vGPU assigned which
uses direct MMIO mappings made by XEN_DOMCTL_memory_mapping hypercall:
at a moment log-dirty is enabled globally, recalculation is requested
for the whole guest memory including those mapped MMIO regions
which causes a page fault being raised at the first access to them;
but due to MMIO P2M type not having any explicit handling in
hvm_hap_nested_page_fault() a domain is erroneously crashed with unhandled
SVM violation.
Instead of trying to be opportunistic - use safer approach and handle
P2M recalculation in a separate NPT fault by attempting to retry after
making the necessary adjustments. This is aligned with Intel behavior
where there are separate VMEXITs for recalculation and EPT violations
(faults) and only faults are handled in hvm_hap_nested_page_fault().
Do it by also unifying do_recalc return code with Intel implementation
where returning 1 means P2M was actually changed.
Since there was no case previously where p2m_pt_handle_deferred_changes()
could return a positive value - it's safe to replace ">= 0" with just "== 0"
in VMEXIT_NPF handler. finish_type_change() is also not affected by the
change as being able to deal with >0 return value of p2m->recalc from
EPT implementation.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Wei Liu [Fri, 5 Jun 2020 11:37:25 +0000 (12:37 +0100)]
libs/hypfs: use correct zlib name in pc file
Its name is "zlib" not "z".
Reported-by: Olaf Hering <olaf@aepfle.de> Fixes: 86234eafb952 ("libs: add libxenhypfs") Signed-off-by: Wei Liu <wl@xen.org> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Andrew Cooper [Wed, 3 Jun 2020 15:56:03 +0000 (16:56 +0100)]
x86/shim: Fix defconfig selection and trim the build further
Several options (TBOOT, XENOPROF, Scheduler) depend on EXPERT to be able to
deselect/configure.
Enabling EXPERT now causes the request of the Credit1 scheduler to be honoured
(rather than giving us Credit2), but take this opportunity to switch to Null,
as the previously problematic issues are now believed to be fixed.
Enabling EXPERT also allows XEN_SHSTK to be selected, and we don't want this
being built for shim. We also don't want TRACEBUFFER or GDBSX either.
Take this oppotunity to swap the disable of HVM_FEP for a general disable of
HVM (likely to have wider impliciations in the future), and disable ARGO (will
necesserily need plumbing work to function in shim).
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Juergen Gross [Wed, 3 Jun 2020 11:28:07 +0000 (13:28 +0200)]
fix build with CONFIG_HYPFS_CONFIG enabled
Commit 58263ed7713e ("xen: add /buildinfo/config entry to hypervisor
filesystem") added a dependency to .config, but the hypervisor's build
config could be have another name via setting KCONFIG_CONFIG.
Fix that by using $(KCONFIG_CONFIG) instead. Additionally reference
the config file via $(XEN_ROOT) instead of a relative path.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Extend the disclaimer about runtime loading. While we've done our best to
make the mechaism reliable, the safety of late loading does ultimately depend
on the contents of the blobs.
Extend the xen-ucode portion with examples of how to use it.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
--- CC: George Dunlap <George.Dunlap@eu.citrix.com> CC: Ian Jackson <ian.jackson@citrix.com> CC: Jan Beulich <JBeulich@suse.com> CC: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> CC: Stefano Stabellini <sstabellini@kernel.org> CC: Wei Liu <wl@xen.org> CC: Julien Grall <julien@xen.org> CC: Paul Durrant <paul@xen.org>
Andrew Cooper [Mon, 1 Jun 2020 14:37:20 +0000 (15:37 +0100)]
x86/ucode: Fix errors with start/end_update()
c/s 9267a439c "x86/ucode: Document the behaviour of the microcode_ops hooks"
identified several poor behaviours of the start_update()/end_update_percpu()
hooks.
AMD have subsequently confirmed that OSVW don't, and are not expected to,
change across a microcode load, rendering all of this complexity unecessary.
Instead of fixing up the logic to not leave the OSVW state reset in a number
of corner cases, delete the logic entirely.
This in turn allows for the removal of the poorly-named 'start_update'
parameter to microcode_update_one(), and for svm_host_osvw_{init,reset}() to
become static.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Wei Liu [Tue, 2 Jun 2020 09:01:38 +0000 (10:01 +0100)]
] m4: use test instead of []
It is reported that [] was removed by autoconf, which caused the
following error:
./configure: line 4681: -z: command not found
Switch to test. That's what is used throughout our configure scripts.
Also put the variable expansion in quotes.
Signed-off-by: Wei Liu <wl@xen.org> Reported-by: Bertrand Marquis <Bertrand.Marquis@arm.com> Fixes: 8a6b1665d987 ("configure: also add EXTRA_PREFIX to {CPP/LD}FLAGS") Signed-off-by: Wei Liu <wl@xen.org> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Paul Durrant <paul@xen.org> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>