xenbits.xensource.com Git - people/hx242/xen.git/log

Force disable direct map

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

hap: switch to xenheap pages for HAP...

...and use the direct map for EPT walking.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/setup: destroy mappings in the direct map when directmap=no

Detailed comments were added explaining why we need to create then
destroy instead of not creating them in the first place.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/setup: vmap heap nodes when they are outside the direct map

Since we now have early vmap, we can simply vmap the nodes we they do
not fall inside the direct map region. As a consequence, we now need to
leave early boot slightly sooner.

The reason is that when we do not have a direct map, we need to allocate
memory for the page tables of vmap or xenheap when mapping the nodes.
Unfortunately, when system == early_boot, PTE pages are allocated from
alloc_boot_pages, but we are in the middle of passing boot pages to the
heap allocator, so those pages might be allocated again later by the
heap allocator to another user, contaminating the PTE pages!

Therefore, we leave early_boot status slightly earlier, so that page
table pages are allocated from the heap instead of the boot allocator.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86: map/unmap pages in restore_all_guests.

Before, it assumed the pv cr3 could be accessed via a direct map. This
is no longer true.

Note that we do not map and unmap root_pgt for now since it is still a
xenheap page.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

xen/page_alloc: add a path for xenheap when there is no direct map

When there is not an always-mapped direct map, xenheap allocations need
to be mapped and unmapped on-demand.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/domain_page: remove the fast paths when no direct map

When there is no direct map, never use mfn_to_virt for any mappings.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/mapcache: initialise the mapcache even for the idle domain

In order to use the mapcache in the idle domain, we also have to
populate its page tables in the PERDOMAIN region.

Signed-off-by: Wei Wang <wawei@amazon.de>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/domain_page: implement pure per-vCPU mapping infrastructure

Note that entries in the mapcache will occupy inuse slots. With 16 slots
per vCPU and a mapcache capacity of 8, we only have another 8 available,
which is not enough for nested page table walks. We need to increase the
number of slots.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86: lift mapcache to the arch level

It is going to be needed by HVM and idle domain as well, because without
the direct map, both need a mapcache to map pages.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Wang <wawei@amazon.de>
Reviewed-by: Hongyan Xia <hongyxia@amazon.com>

x86: add Persistent Map (PMAP) infrastructure

The basic idea is like Persistent Kernel Map (PKMAP) in linux. We
pre-populate all the relevant page tables before system is fully set
up.

It is needed to bootstrap map domain page infrastructure -- we need
some way to map pages to set up the mapcache without a direct map.

This infrastructure is not lock-protected therefore can only be used
before smpboot. After smpboot, mapcache has to be used.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/srat: map the pages for acpi_slit when there is no directmap

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/numa: map the pages for memnodemap when there is no directmap

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

acpi: map pages when there is no direct map

Also, add a helper to vmap pages that are physically contiguous.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/setup: move vm_init() before acpi calls

Signed-off-by: David Woodhouse <dwmw2@amazon.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v2:
- add variable to track whether vm_init() has been called, since it is
not idempotent.
- remove fixmap usage in acpi_os_map/unmap_memory, because now vmap is
always ready before any acpi functions are called.

xen/vmap: allow vmap() to be called during early boot

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Hongyan Xia <hongyxia@amazon.co.uk>

xen/vmap: allow vm_init_type to be called during early_boot

We want to move vm_init, which calls vm_init_type under the hood, to
early boot stage. Add a path to get page from boot allocator instead.

Add an emacs block to that file while I was there.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed since posting it upstream:
- always check the return value of allocations.

x86: add an option to enable and disable the direct map

Also add a helper function to retrieve it.

Change arch_mfn_in_direct_map to check this option before returning.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/pv: rewrite how building PV dom0 handles domheap mappings

Building a PV dom0 is allocating from the domheap but uses it like the
xenheap. This is clearly wrong. Fix.

Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v2:
- use lXstart_mfn instead of lXstart_maddr.
- change the pattern on mpt_alloc to be closer to the original.

x86/pv: domheap pages should be mapped while relocating initrd

Xen shouldn't use domheap page as if they were xenheap pages. Map and
unmap pages accordingly.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Wei Wang <wawei@amazon.de>
Reviewed-by: Hongyan Xia <hongyxia@amazon.de>

x86/mm: drop _new suffix for page table APIs

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86: switch to use domheap page for page tables

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/mm: drop old page table APIs

Two sets of old APIs, alloc/free_xen_pagetable() and lXe_to_lYe(), are
now dropped to avoid the dependency on direct map.

There are two special cases which still have not been re-written into
the new APIs, thus need special treatment:

rpt in smpboot.c cannot use ephemeral mappings yet. The problem is that
rpt is read and written in context switch code, but the mapping
infrastructure is NOT context-switch-safe, meaning we cannot map rpt in
one domain and unmap in another. Before the mapping infrastructure
supports context switches, rpt has to be globally mapped.

Also, lXe_to_lYe() during Xen image relocation cannot be converted into
map/unmap pairs. We cannot hold on to mappings while the mapping
infrastructure is being relocated! It is enough to remove the direct map
in the second e820 pass, so we still use the direct map (<4GiB) in Xen
relocation (which is during the first e820 pass).

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/smpboot: switch clone_mapping() to new APIs

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v7:
- change patch title
- remove initialiser of pl3e.
- combine the initialisation of pl3e into a single assignment.
- use the new alloc_map_clear() helper.
- use the normal map_domain_page() in the error path.

x86/smpboot: add exit path for clone_mapping()

We will soon need to clean up page table mappings in the exit path.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v7:
- edit commit message.
- begin with rc = 0 and set it to -ENOMEM ahead of if().

efi: switch to new APIs in EFI code

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v7:
- add blank line after declaration.
- rename efi_l4_pgtable into efi_l4t.
- pass the mapped efi_l4t to copy_mapping() instead of map it again.
- use the alloc_map_clear_xen_pt() API.
- unmap pl3e, pl2e, l1t earlier.

efi: use new page table APIs in copy_mapping

After inspection ARM doesn't have alloc_xen_pagetable so this function
is x86 only, which means it is safe for us to change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v7:
- hoist l3 variables out of the loop to avoid repetitive mappings.

x86_64/mm: switch to new APIs in setup_m2p_table

Avoid repetitive mapping of l2_ro_mpt by keeping it across loops, and
only unmap and map it when crossing 1G boundaries.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v7:
- avoid repetitive mapping of l2_ro_mpt.
- edit commit message.
- switch to alloc_map_clear_xen_pt().

x86_64/mm: switch to new APIs in paging_init

Map and unmap pages instead of relying on the direct map.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v7:
- use the new alloc_map_clear_xen_pt() helper.
- move the unmap of pl3t up a bit.
- remove the unmaps in the nomem path.

x86_64/mm: introduce pl2e in paging_init

We will soon map and unmap pages in paging_init(). Introduce pl2e so
that we can use l2_ro_mpt to point to the page table itself.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
---
Changed in v7:
- reword commit message.

x86/mm: switch to new APIs in modify_xen_mappings

Page tables allocated in that function should be mapped and unmapped
now.

Note that pl2e now maybe mapped and unmapped in different iterations, so
we need to add clean-ups for that.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v7:
- use normal unmap in the error path.

x86/mm: switch to new APIs in map_pages_to_xen

Page tables allocated in that function should be mapped and unmapped
now.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>

x86/mm: rewrite virt_to_xen_l*e

Rewrite those functions to use the new APIs. Modify its callers to unmap
the pointer returned. Since alloc_xen_pagetable_new() is almost never
useful unless accompanied by page clearing and a mapping, introduce a
helper alloc_map_clear_xen_pt() for this sequence.

Note that the change of virt_to_xen_l1e() also requires vmap_to_mfn() to
unmap the page, which requires domain_page.h header in vmap.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed in v7:
- remove a comment.
- use l1e_get_mfn() instead of converting things back and forth.
- add alloc_map_clear_xen_pt().
- unmap before the next mapping to reduce mapcache pressure.
- use normal unmap calls instead of the macro in error paths because
unmap can handle NULL now.

x86/mm: make sure there is one exit path for modify_xen_mappings

We will soon need to handle dynamically mapping / unmapping page
tables in the said function. Since dynamic mappings may map and unmap
pl3e in different iterations, lift pl3e out of the loop.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed since v4:
- drop the end_of_loop goto label.

Changed since v3:
- remove asserts on rc since it never gets changed to anything else.

x86/mm: map_pages_to_xen would better have one exit path

We will soon rewrite the function to handle dynamically mapping and
unmapping of page tables. Since dynamic mappings may map and unmap pages
in different iterations of the while loop, we need to lift pl3e out of
the loop.

No functional change.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Signed-off-by: Hongyan Xia <hongyxia@amazon.com>
---
Changed since v4:
- drop the end_of_loop goto label.

Changed since v3:
- remove asserts on rc since rc never gets changed to anything else.
- reword commit message.

README, Makefile: Xen 4.14.0 release

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

Config.mk: Nail subtrees to the Xen 4.14.0 release tags

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

SUPPORT.md: Set version and release/support dates

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

Revert "SUPPORT.md: Set version and release/support dates"

This reverts commit e4670f8b045b11a524171b119d9d4a20bf643367.

SUPPORT.md: Set version and release/support dates

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Acked-by: Julien Grall <jgrall@amazon.com>

SUPPORT.md: Spell Experimental correctly

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
(cherry picked from commit 139ce42388c3fe7096a09b3d397250fe14906809)

docs: Replace non-UTF-8 character in hypfs-paths.pandoc

From the docs cronjob on xenbits:

  /usr/bin/pandoc --number-sections --toc --standalone misc/hypfs-paths.pandoc --output html/misc/hypfs-paths.html
  pandoc: Cannot decode byte '\x92': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8 stream
  make: *** [Makefile:236: html/misc/hypfs-paths.html] Error 1

Fixes: 5a4a411bde4 ("docs: specify stability of hypfs path documentation")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
(cherry picked from commit 9ffdda96d9e7c3d9c7a5bbe2df6ab30f63927542)

docs: specify stability of hypfs path documentation

In docs/misc/hypfs-paths.pandoc the supported paths in the hypervisor
file system are specified. Make it more clear that path availability
might change, e.g. due to scope widening or narrowing (e.g. being
limited to a specific architecture).

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
master commit: 5a4a411bde4f73ff8ce43d6e52b77302973e8f68
master date: 2020-07-20 13:38:00 +0200

x86: restore pv_rtc_handler() invocation

This was lost when making the logic accessible to PVH Dom0.

While doing so make the access to the global function pointer safe
against races (as noticed by Roger): The only current user wants to be
invoked just once (but can tolerate to be invoked multiple times),
zapping the pointer at that point.

Fixes: 835d8d69d96a ("x86/rtc: provide mediated access to RTC for PVH dom0")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
master commit: f8fe3c07363d11fc81d8e7382dbcaa357c861569
master date: 2020-07-15 15:46:30 +0200

Branch 4.14: Turn off debug on this stable branch

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

SUPPORT.md: Set release notes link

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>

pvcalls: Document correctly and explicitely the padding for all arches

The specification of pvcalls suggests there is padding for 32-bit x86 at
the end of most the structure. However, they are not described in in the
public header.

Because of that all the structures would have a different size between
32-bit x86 and 64-bit x86.

For all the other architectures supported (Arm and 64-bit x86), the
structure have the sames sizes because they contain implicit padding
thanks to the 64-bit alignment of the field uint64_t field.

Given the specification is authoritative, the padding will now be the
same for all architectures. The potential breakage of compatibility is
ought to be fine as pvcalls is still a tech preview.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>

pvcalls: Clearly spell out that the header is just a reference

A recent thread on xen-devel [1] pointed out that the header was
provided as a reference for the specification.

Unfortunately, this was never written down in xen.git so for an external
user (or a reviewer) it is not clear whether the spec or the header
should be followed when there is a conflict.

To avoid more confusion, a paragraph is added at the top of the header
to clearly spell out it is only provided for reference.

[1] https://lore.kernel.org/xen-devel/alpine.DEB.2.21.2006151343430.9074@sstabellini-ThinkPad-T480s/

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>

xen: Check the alignment of the offset pased via VCPUOP_register_vcpu_info

Currently a guest is able to register any guest physical address to use
for the vcpu_info structure as long as the structure can fits in the
rest of the frame.

This means a guest can provide an address that is not aligned to the
natural alignment of the structure.

On Arm 32-bit, unaligned access are completely forbidden by the
hypervisor. This will result to a data abort which is fatal.

On Arm 64-bit, unaligned access are only forbidden when used for atomic
access. As the structure contains fields (such as evtchn_pending_self)
that are updated using atomic operations, any unaligned access will be
fatal as well.

While the misalignment is only fatal on Arm, a generic check is added
as an x86 guest shouldn't sensibly pass an unaligned address (this
would result to a split lock).

This is XSA-327.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

x86/ept: flush cache when modifying PTEs and sharing page tables

Modifications made to the page tables by EPT code need to be written
to memory when the page tables are shared with the IOMMU, as Intel
IOMMUs can be non-coherent and thus require changes to be written to
memory in order to be visible to the IOMMU.

In order to achieve this make sure data is written back to memory
after writing an EPT entry when the recalc bit is not set in
atomic_write_ept_entry. If such bit is set, the entry will be
adjusted and atomic_write_ept_entry will be called a second time
without the recalc bit set. Note that when splitting a super page the
new tables resulting of the split should also be written back.

Failure to do so can allow devices behind the IOMMU access to the
stale super page, or cause coherency issues as changes made by the
processor to the page tables are not visible to the IOMMU.

This allows to remove the VT-d specific iommu_pte_flush helper, since
the cache write back is now performed by atomic_write_ept_entry, and
hence iommu_iotlb_flush can be used to flush the IOMMU TLB. The newly
used method (iommu_iotlb_flush) can result in less flushes, since it
might sometimes be called rightly with 0 flags, in which case it
becomes a no-op.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

vtd: optimize CPU cache sync

Some VT-d IOMMUs are non-coherent, which requires a cache write back
in order for the changes made by the CPU to be visible to the IOMMU.
This cache write back was unconditionally done using clflush, but there are
other more efficient instructions to do so, hence implement support
for them using the alternative framework.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/alternative: introduce alternative_2

It's based on alternative_io_2 without inputs or outputs but with an
added memory clobber.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

vtd: don't assume addresses are aligned in sync_cache

Current code in sync_cache assume that the address passed in is
aligned to a cache line size. Fix the code to support passing in
arbitrary addresses not necessarily aligned to a cache line size.

This is part of XSA-321.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/iommu: introduce a cache sync hook

The hook is only implemented for VT-d and it uses the already existing
iommu_sync_cache function present in VT-d code. The new hook is
added so that the cache can be flushed by code outside of VT-d when
using shared page tables.

Note that alloc_pgtable_maddr must use the now locally defined
sync_cache function, because IOMMU ops are not yet setup the first
time the function gets called during IOMMU initialization.

No functional change intended.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

vtd: prune (and rename) cache flush functions

Rename __iommu_flush_cache to iommu_sync_cache and remove
iommu_flush_cache_page. Also remove the iommu_flush_cache_entry
wrapper and just use iommu_sync_cache instead. Note the _entry suffix
was meaningless as the wrapper was already taking a size parameter in
bytes. While there also constify the addr parameter.

No functional change intended.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

vtd: improve IOMMU TLB flush

Do not limit PSI flushes to order 0 pages, in order to avoid doing a
full TLB flush if the passed in page has an order greater than 0 and
is aligned. Should increase the performance of IOMMU TLB flushes when
dealing with page orders greater than 0.

This is part of XSA-321.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

x86/ept: atomically modify entries in ept_next_level

ept_next_level was passing a live PTE pointer to ept_set_middle_entry,
which was then modified without taking into account that the PTE could
be part of a live EPT table. This wasn't a security issue because the
pages returned by p2m_alloc_ptp are zeroed, so adding such an entry
before actually initializing it didn't allow a guest to access
physical memory addresses it wasn't supposed to access.

This is part of XSA-328.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/EPT: ept_set_middle_entry() related adjustments

ept_split_super_page() wants to further modify the newly allocated
table, so have ept_set_middle_entry() return the mapped pointer rather
than tearing it down and then getting re-established right again.

Similarly ept_next_level() wants to hand back a mapped pointer of
the next level page, so re-use the one established by
ept_set_middle_entry() in case that path was taken.

Pull the setting of suppress_ve ahead of insertion into the higher level
table, and don't have ept_split_super_page() set the field a 2nd time.

This is part of XSA-328.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

x86/shadow: correct an inverted conditional in dirty VRAM tracking

This originally was "mfn_x(mfn) == INVALID_MFN". Make it like this
again, taking the opportunity to also drop the unnecessary nearby
braces.

This is XSA-319.

Fixes: 246a5a3377c2 ("xen: Use a typesafe to define INVALID_MFN")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

xen/common: event_channel: Don't ignore error in get_free_port()

Currently, get_free_port() is assuming that the port has been allocated
when evtchn_allocate_port() is not return -EBUSY.

However, the function may return an error when:
    - We exhausted all the event channels. This can happen if the limit
    configured by the administrator for the guest ('max_event_channels'
    in xl cfg) is higher than the ABI used by the guest. For instance,
    if the guest is using 2L, the limit should not be higher than 4095.
    - We cannot allocate memory (e.g Xen has not more memory).

Users of get_free_port() (such as EVTCHNOP_alloc_unbound) will validly
assuming the port was valid and will next call evtchn_from_port(). This
will result to a crash as the memory backing the event channel structure
is not present.

Fixes: 368ae9a05fe ("xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86emul: fix FXRSTOR test for most AMD CPUs

AMD CPUs that we classify as X86_BUG_FPU_PTRS don't touch the selector/
offset portion of the save image during FXSAVE unless an unmasked
exception is pending. Hence the selector zapping done between the
initial FXSAVE and the emulated FXRSTOR needs to be mirrored onto the
second FXSAVE, output of which gets fed into memcmp() to compare with
the input image.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

Config: Update QEMU

Backport 2 commits to fix building QEMU without PCI passthrough
support.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>

kdd: fix build again

Restore Tim's patch. The one that was committed was recreated by me
because git didn't accept my saved copy. I made some mistakes while
recreating that patch and here we are.

Fixes: 3471cafbdda3 ("kdd: stop using [0] arrays to access packet contents")
Reported-by: Michael Young <m.a.young@durham.ac.uk>
Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Tim Deegan <tim@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>

build: tweak variable exporting for make 3.82

While I've been running into an issue here only because of an additional
local change I'm carrying, to be able to override just the compiler in
$(XEN_ROOT)/.config (rather than the whole tool chain), in
config/StdGNU.mk:

ifeq ($(filter-out default undefined,$(origin CC)),)

I'd like to propose to nevertheless correct the underlying issue:
Exporting an unset variable changes its origin from "undefined" to
"file". This comes into effect because of our adding of -rR to
MAKEFLAGS, which make 3.82 wrongly applies also upon re-invoking itself
after having updated auto.conf{,.cmd}.

Move the export statement past $(XEN_ROOT)/config/$(XEN_OS).mk inclusion
(which happens through $(XEN_ROOT)/Config.mk) such that the variables
already have their designated values at that point, while retaining
their initial origin up to the point they get defined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/tlb: fix assisted flush usage

Commit e9aca9470ed86 introduced a regression when avoiding sending
IPIs for certain flush operations. Xen page fault handler
(spurious_page_fault) relies on blocking interrupts in order to
prevent handling TLB flush IPIs and thus preventing other CPUs from
removing page tables pages. Switching to assisted flushing avoided such
IPIs, and thus can result in pages belonging to the page tables being
removed (and possibly re-used) while __page_fault_type is being
executed.

Force some of the TLB flushes to use IPIs, thus avoiding the assisted
TLB flush. Those selected flushes are the page type change (when
switching from a page table type to a different one, ie: a page that
has been removed as a page table) and page allocation. This sadly has
a negative performance impact on the pvshim, as less assisted flushes
can be used. Note the flush in grant-table code is also switched to
use an IPI even when not strictly needed. This is done so that a
common arch_flush_tlb_mask can be introduced and always used in common
code.

Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
using an IPI (x86 only). Note that the flag is only meaningfully defined
when the hypervisor supports PV or shadow paging mode, as otherwise
hardware assisted paging domains are in charge of their page tables and
won't share page tables with Xen, thus not influencing the result of
page walks performed by the spurious fault handler.

Just passing this new flag when calling flush_area_mask prevents the
usage of the assisted flush without any other side effects.

Note the flag is not defined on Arm.

Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available')
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Paul Durrant <paul@xen.org>

optee: allow plain TMEM buffers with NULL address

Trusted Applications use a popular approach to determine the required
size of a buffer: the client provides a memory reference with the NULL
pointer to a buffer. This is so called "Null memory reference". TA
updates the reference with the required size and returns it back to the
client. Then the client allocates a buffer of the needed size and
repeats the operation.

This behavior is described in TEE Client API Specification, paragraph
3.2.5. Memory References.

OP-TEE represents this null memory reference as a TMEM parameter with
buf_ptr = 0x0. This is the only case when we should allow a TMEM
buffer without the OPTEE_MSG_ATTR_NONCONTIG flag. This also the
special case for a buffer with OPTEE_MSG_ATTR_NONCONTIG flag.

This could lead to a potential issue, because IPA 0x0 is a valid
address, but OP-TEE will treat it as a special case. So, care should
be taken when construction OP-TEE enabled guest to make sure that such
guest have no memory at IPA 0x0 and none of its memory is mapped at PA
0x0.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Release-acked-by: Paul Durrant <paul@xen.org>

optee: immediately free buffers that are released by OP-TEE

Normal World can share a buffer with OP-TEE for two reasons:
1. A client application wants to exchange data with TA
2. OP-TEE asks for shared buffer for internal needs

The second case was handled more strictly than necessary:

1. In RPC request OP-TEE asks for buffer
2. NW allocates buffer and provides it via RPC response
3. Xen pins pages and translates data
4. Xen provides buffer to OP-TEE
5. OP-TEE uses it
6. OP-TEE sends request to free the buffer
7. NW frees the buffer and sends the RPC response
8. Xen unpins pages and forgets about the buffer

The problem is that Xen should forget about buffer in between stages 6
and 7. I.e. the right flow should be like this:

6. OP-TEE sends request to free the buffer
7. Xen unpins pages and forgets about the buffer
8. NW frees the buffer and sends the RPC response

This is because OP-TEE internally frees the buffer before sending the
"free SHM buffer" request. So we have no reason to hold reference for
this buffer anymore. Moreover, in multiprocessor systems NW have time
to reuse the buffer cookie for another buffer. Xen complained about this
and denied the new buffer registration. I have seen this issue while
running tests on iMX SoC.

So, this patch basically corrects that behavior by freeing the buffer
earlier, when handling RPC return from OP-TEE.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/spec-ctrl: Protect against CALL/JMP straight-line speculation

Some x86 CPUs speculatively execute beyond indirect CALL/JMP instructions.

With CONFIG_INDIRECT_THUNK / Retpolines, indirect CALL/JMP instructions are
converted to direct CALL/JMP's to __x86_indirect_thunk_REG(), leaving just a
handful of indirect JMPs implementing those stubs.

There is no architectrual execution beyond an indirect JMP, so use INT3 as
recommended by vendors to halt speculative execution. This is shorter than
LFENCE (which would also work fine), but also shows up in logs if we do
unexpected execute them.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>

kconfig: fix typo in XEN_SHSTK description

Rename 'vai' to 'via'.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>

mm: fix public declaration of struct xen_mem_acquire_resource

XENMEM_acquire_resource and it's related structure is currently inside
a __XEN__ or __XEN_TOOLS__ guarded section to limit it's scope to the
hypervisor or the toolstack only. This is wrong as the hypercall is
already being used by the Linux kernel at least, and as such needs to
be public.

Also switch the usage of uint64_aligned_t to plain uint64_t, as
uint64_aligned_t is only to be used by the toolstack. Doing such
change will reduce the size of the structure on 32bit x86 by 4bytes,
since there will be no padding added after the frame_list handle.

This is fine, as users of the previous layout will allocate 4bytes of
padding that won't be read by Xen, and users of the new layout won't
allocate those, which is also fine since Xen won't try to access them.

Note that the structure already has compat handling, and such handling
will take care of copying the right size (ie: minus the padding) when
called from a 32bit x86 context. This is true for the compat code both
before and after this patch, since the structures in the memory.h
compat header are subject to a pragma pack(4), which already removed
the trailing padding that would otherwise be introduced by the
alignment of the frame field to 8 bytes.

Fixes: 3f8f12281dd20 ('x86/mm: add HYPERVISOR_memory_op to acquire guest resources')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>

xsm: Drop trailing whitespace from build scripts

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/boot: Don't disable PV32 when XEN_SHSTK is compiled out

There is no need to automatically disable PV32 support on SHSTK-capable
hardware if Xen isn't actually using the feature.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>

changelog: Add notes about CET and Migration changes

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/livepatch: Make livepatching compatible with CET Shadow Stacks

Just like the alternatives infrastructure, the livepatch infrastructure
disables CR0.WP to perform patching, which is not permitted with CET active.

Modify arch_livepatch_{quiesce,revive}() to disable CET before disabling WP,
and reset the dirty bits on all virtual regions before re-enabling CET.

One complication is that arch_livepatch_revive() has to fix up the top of the
shadow stack. This depends on the functions not being inlined, even under
LTO. Another limitation is that reset_virtual_region_perms() may shatter the
final superpage of .text depending on alignment.

This logic, and its downsides, are temporary until the patching infrastructure
can be adjusted to not use CR0.WP.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/msr: Disallow access to Processor Trace MSRs

We do not expose the feature to guests, so should disallow access to the
respective MSRs. For simplicity, drop the entire block of MSRs, not just the
subset which have been specified thus far.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Handle closing shared input/output_fd

input_fd & output_fd may be the same FD. In that case, mark both as -1
when closing one. That avoids a dangling FD reference.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Cleanup resources on exit

Close open FDs and close th vchan connection when exiting the program.

This addresses some Coverity findings about leaking file descriptors.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Set closed FDs to -1

These FDs are closed, so set them to -1 so they are no longer valid.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Switch data_loop() to take state

Switch data_loop to take a pointer to vchan_proxy_state.

No functional change.

This removes a dead store to input_fd identified by Coverity.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Use a struct to store state

Use a struct to group the vchan ctrl and FDs. This will facilite
tracking the state of open and closed FDs and ctrl in data_loop().

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Unify main return value

Introduce 'ret' for main's return value and remove direct returns. This
is in preparation for a unified exit path with resource cleanup.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Check xs_watch return value

Check the return value of xs_watch and error out on failure.

This was found by Citrix's Coverity.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Move perror() into connect_socket

errno is reset by subsequent system & library calls, so it may be
inaccurate by the time connect_socket returns. Call perror immediately
after failing system calls to print the proper message.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Move perror() into listen_socket

The use of perror on the return from listen_socket can produce
misleading results like:
UNIX socket path "/tmp/aa....aa" too long (156 >= 108)
listen socket: Success

errno is reset by subsequent system & library calls, so it may be
inaccurate by the time listen_socket returns. Call perror immediately
after failing system calls to print the proper message.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

vchan-socket-proxy: Ensure UNIX path NUL terminated

Check the socket path length to ensure sun_path is NUL terminated.

This was spotted by Citrix's Coverity.

Also use strcpy to avoid a warning "'__builtin_strncpy' specified bound
108 equals destination size [-Werror=stringop-truncation]" flagged by
gcc 10.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>

libxl: tooling expects wrong errno

When iommu is not enabled for a given domain then pci passthrough
hypercalls such as xc_test_assign_device return EOPNOTSUPP.
The code responsible for this is in "iommu_do_domctl" inside
xen/drivers/passthrough/iommu.c
This patch fixes the error message reported by libxl when assigning
pci devices to domains without iommu.

Signed-off-by: Grzegorz Uriasz <gorbak25@gmail.com>
Tested-by: Grzegorz Uriasz <gorbak25@gmail.com>
Backport: 4.13
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

kdd: stop using [0] arrays to access packet contents

GCC 10 is unhappy about this, and we already use 64k buffers
in the only places where packets are allocated, so move the
64k size into the packet definition.

Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>

tools: fix error path of xendevicemodel_open()

c/s 6902cb00e03 "tools/libxendevicemodel: extract functions and add a compat
layer" introduced calls to both xencall_open() and osdep_xendevicemodel_open()
but failed to fix up the error path.

c/s f68c7c618a3 "libs/devicemodel: free xencall handle in error path in
_open()" fixed up the xencall_open() aspect of the error path (missing the
osdep_xendevicemodel_open() aspect), but positioned the xencall_close()
incorrectly, creating the same pattern proved to be problematic by c/s
30a72f02870 "tools: fix error path of xenhypfs_open()".

Reposition xtl_logger_destroy(), and introduce the missing
osdep_xendevicemodel_close().

Fixes: 6902cb00e03 ("tools/libxendevicemodel: extract functions and add a compat layer")
Fixes: f68c7c618a3 ("libs/devicemodel: free xencall handle in error path in _open()")
Backport: 4.9+
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>

scripts: don't rely on "stat -" support

While commit b72682c602b8 ("scripts: Use stat to check lock claim")
validly indicates that stat has gained support for the special "-"
command line option in 2009, we should still try to avoid breaking being
able to run on even older distros. As it has been determined, contary to
the comment in the script using /dev/stdin (/proc/self/fd/$_lockfd) is
fine here, as Linux specially treats these /proc inodes.

Suggested-by: Ian Jackson <ian.jackson@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/CPUID: fill all fields in x86_cpuid_policy_fill_native()

Coverity validly complains that the new call from
tools/tests/cpu-policy/test-cpu-policy.c:test_cpuid_current() leaves
two fields uninitialized, yet they get then consumed by
x86_cpuid_copy_to_buffer(). (All other present callers of the function
pass a pointer to a static - and hence initialized - buffer.)

Coverity-ID: 1464809
Fixes: c22ced93e167 ("tests/cpu-policy: Confirm that CPUID serialisation is sorted")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/acpi: use FADT flags to determine the PMTMR width

On some computers the bit width of the PM Timer as reported
by ACPI is 32 bits when in fact the FADT flags report correctly
that the timer is 24 bits wide. On affected machines such as the
ASUS FX504GM and never gaming laptops this results in the inability
to resume the machine from suspend. Without this patch suspend is
broken on affected machines and even if a machine manages to resume
correctly then the kernel time and xen timers are trashed.

Signed-off-by: Grzegorz Uriasz <gorbak25@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>

tools: Commit flex (2.6.4) & bison (3.3.2) output from Debian buster

These files are in tree so that people can build (including from git)
without needing less-than-a-decade-old flex and bison.

We should update them periodically.  Debian buster has been Debian
stable for a while.  Our CI is running buster.

There should be no significant functional change; it's possible that
there are bugfixes but I have not reviewed the changes.  I *have*
checked that the flex I am using has the fix for CVE-2016-6354.

CC: Paul Durrant <paul@xen.org>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

tools: Commit autoconf output from Debian buster

These files are in tree so that people can build (including from git)
without needing recent autotools.

We should update them periodically.  Debian buster has been Debian
stable fopr a while.  Our CI is running buster.

There should be no significant functional change; it's possible that
there are bugfixes to the configure scripts but I have not reviewed
them.

These files were last changed in
  83c845033dc8bb3a35ae245effb7832b6823174a
  libxl: use vchan for QMP access with Linux stubdomain
where a new feature was added.  However, that commit contains a lot of
extraneous noise in configure compared to its parent.

Compared to 83c845033dc8bb3a35ae245effb7832b6823174a~, this commit
restores those extraneous changes, leaving precisely the correct
changes.  So one way of looking at the changes we are making now, is
that we are undoing accidental changes to the autoconf output.

I used Debian's autoconf 2.69-11 on amd64.

CC: Wei Liu <wl@xen.org>
CC: Nick Rosbrook <rosbrookn@gmail.com>
Reported-by: Nick Rosbrook <rosbrookn@gmail.com>
CC: Paul Durrant <paul@xen.org>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/vmx: use P2M_ALLOC in vmx_load_pdptrs instead of P2M_UNSHARE

While forking VMs running a small RTOS system (Zephyr) a Xen crash has been
observed due to a mm-lock order violation while copying the HVM CPU context
from the parent. This issue has been identified to be due to
hap_update_paging_modes first getting a lock on the gfn using get_gfn. This
call also creates a shared entry in the fork's memory map for the cr3 gfn. The
function later calls hap_update_cr3 while holding the paging_lock, which
results in the lock-order violation in vmx_load_pdptrs when it tries to unshare
the above entry when it grabs the page with the P2M_UNSHARE flag set.

Since vmx_load_pdptrs only reads from the page its usage of P2M_UNSHARE was
unnecessary to start with. Using P2M_ALLOC is the appropriate flag to ensure
the p2m is properly populated.

Note that the lock order violation is avoided because before the paging_lock is
taken a lookup is performed with P2M_ALLOC that forks the page, thus the second
lookup in vmx_load_pdptrs succeeds without having to perform the fork. We keep
P2M_ALLOC in vmx_load_pdptrs because there are code-paths leading up to it
which don't take the paging_lock and that have no previous lookup. Currently no
other code-path exists leading there with the paging_lock taken, thus no
further adjustments are necessary.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/hvm: check against VIOAPIC_LEVEL_TRIG in hvm_gsi_deassert

In order to avoid relying on the specific values of
VIOAPIC_{LEVEL/EDGE}_TRIG.

No functional change.

Requested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>

stubdom/vtpm: add extern to function declarations

Code compiled with gcc10 will not link properly due to multiple definition of the same function.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Samuel Thibault <samuel.thibaut@ens-lyon.org>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

xl: Allow shutdown wait for domain death

`xl shutdown -w` waits for the first of either domain shutdown or death.
Shutdown is the halting of the guest operating system, and death is the
freeing of domain resources.

Allow specifying -w multiple times to wait for only domain death. This
is useful in scripts so that all resources are free before the script
continues.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

tools/xen-ucode: return correct exit code on failed microcode update

Otherwise it's difficult to know if operation failed inside the automation.

While at it, also switch to returning 1 and 2 instead of errno to avoid
incompatibilies between errno and special exit code numbers.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Reviewed-by: Igor Druzhinin <igor.druzhinin@citrix.com>

x86/spec-ctrl: Hide RDRAND by default on IvyBridge client

To combat the absence of mitigating microcode, arrange to hide RDRAND by
default on IvyBridge client hardware.

Adjust the default feature derivation to hide RDRAND on IvyBridge client
parts, unless `cpuid=rdrand` is explicitly provided.

Adjust the restore path in xc_cpuid_apply_policy() to not hide RDRAND from VMs
which migrated from pre-4.14.

In all cases, individual guests can continue using RDRAND if explicitly
enabled in their config files.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>

x86/cpuid: Introduce missing feature adjustment in calculate_pv_def_policy()

This was an accidental asymmetry with the HVM side.

No change in behaviour at this point.

Fixes: 83b387382 ("x86/cpuid: Introduce and use default CPUID policies")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>