]> xenbits.xensource.com Git - people/sstabellini/xen-unstable.git/.git/log
people/sstabellini/xen-unstable.git/.git
4 years agokernel-doc: public/hvm/params.h hyp-docs-1
Stefano Stabellini [Thu, 6 Aug 2020 23:27:22 +0000 (16:27 -0700)]
kernel-doc: public/hvm/params.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/elfnote.h
Stefano Stabellini [Thu, 6 Aug 2020 23:27:16 +0000 (16:27 -0700)]
kernel-doc: public/elfnote.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/xen.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:14 +0000 (16:05 -0700)]
kernel-doc: public/xen.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/version.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:14 +0000 (16:05 -0700)]
kernel-doc: public/version.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/vcpu.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:14 +0000 (16:05 -0700)]
kernel-doc: public/vcpu.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/sched.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:14 +0000 (16:05 -0700)]
kernel-doc: public/sched.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/memory.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:14 +0000 (16:05 -0700)]
kernel-doc: public/memory.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/hypfs.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:14 +0000 (16:05 -0700)]
kernel-doc: public/hypfs.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/grant_table.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:14 +0000 (16:05 -0700)]
kernel-doc: public/grant_table.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/features.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:14 +0000 (16:05 -0700)]
kernel-doc: public/features.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/event_channel.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:13 +0000 (16:05 -0700)]
kernel-doc: public/event_channel.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/device_tree_defs.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:13 +0000 (16:05 -0700)]
kernel-doc: public/device_tree_defs.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/hvm/hvm_op.h
Stefano Stabellini [Thu, 6 Aug 2020 23:05:13 +0000 (16:05 -0700)]
kernel-doc: public/hvm/hvm_op.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agokernel-doc: public/arch-arm.h
Stefano Stabellini [Thu, 6 Aug 2020 23:04:56 +0000 (16:04 -0700)]
kernel-doc: public/arch-arm.h

Convert in-code comments to kernel-doc format wherever possible.

Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agox86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()
Paul Durrant [Fri, 31 Jul 2020 15:43:31 +0000 (17:43 +0200)]
x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()

Re-factor the code to take advantage of the fact that the APIC access page is
a 'special' page. The VMX code is left alone and hence the APIC access page is
still inserted into the P2M with type p2m_mmio_direct. This is left alone as it
is not obvious there is another suitable type to use, and the necessary
re-ordering in epte_get_entry_emt() is straightforward.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/hvm: set 'ipat' in EPT for special pages
Paul Durrant [Fri, 31 Jul 2020 15:42:47 +0000 (17:42 +0200)]
x86/hvm: set 'ipat' in EPT for special pages

All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
when PV drivers running in a guest populate the BAR space of the Xen Platform
PCI Device with pages such as the Shared Info page or Grant Table pages,
accesses to these pages will be cachable.

However, should IOMMU mappings be enabled be enabled for the guest then these
accesses become uncachable. This has a substantial negative effect on I/O
throughput of PV devices. Arguably PV drivers should bot be using BAR space to
host the Shared Info and Grant Table pages but it is currently commonplace for
them to do this and so this problem needs mitigation. Hence this patch makes
sure the 'ipat' bit is set for any special page regardless of where in GFN
space it is mapped.

NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
      that there is any similar mitigation possible for AMD NPT. Downstreams
      such as Citrix XenServer have been carrying a patch similar to this for
      several releases though.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86emul: replace UB shifts
Jan Beulich [Fri, 31 Jul 2020 15:41:58 +0000 (17:41 +0200)]
x86emul: replace UB shifts

Displacement values can be negative, hence we shouldn't left-shift them.
Or else we get

(XEN) UBSAN: Undefined behaviour in x86_emulate/x86_emulate.c:3482:55
(XEN) left shift of negative value -2

While auditing shifts, I noticed a pair of missing parentheses, which
also get added right here.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agotools/xen-cpuid: show enqcmd
Olaf Hering [Fri, 31 Jul 2020 15:41:27 +0000 (17:41 +0200)]
tools/xen-cpuid: show enqcmd

Translate <29> into a feature string.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/PV: drop a few misleading paging_mode_refcounts() checks
Jan Beulich [Fri, 31 Jul 2020 15:40:13 +0000 (17:40 +0200)]
x86/PV: drop a few misleading paging_mode_refcounts() checks

The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
was apparently inconsistent so far: There was a type ref acquired
unconditionally for the new top level page table, but the dropping of
the old type ref was conditional upon !paging_mode_refcounts(). Mirror
this also to arch_set_info_guest().

Also move new_guest_cr3()'s #ifdef to around the function - both callers
now get built only when CONFIG_PV, i.e. no need to retain a stub.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agotools/configure: drop BASH configure variable
Andrew Cooper [Fri, 26 Jun 2020 16:46:38 +0000 (17:46 +0100)]
tools/configure: drop BASH configure variable

This is a weird variable to have in the first place.  The only user of it is
XSM's CONFIG_SHELL, which opencodes a fallback to sh.  The scripts are shebang
sh, which is already necessary to support non-Linux build environments.

Make the mkflask.sh and mkaccess_vector.sh scripts executable, drop the
CONFIG_SHELL, and drop the $BASH variable to prevent further use.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoxen/spinlock: move debug helpers inside the locked regions
Roger Pau Monne [Wed, 29 Jul 2020 11:13:30 +0000 (13:13 +0200)]
xen/spinlock: move debug helpers inside the locked regions

Debug helpers such as lock profiling or the invariant pCPU assertions
must strictly be performed inside the exclusive locked region, or else
races might happen.

Note the issue was not strictly introduced by the pointed commit in
the Fixes tag, since lock stats where already incremented before the
barrier, but that commit made it more apparent as manipulating the cpu
field could happen outside of the locked regions and thus trigger the
BUG_ON on rel_lock(). This is only enabled on debug builds, and thus
releases are not affected.

Fixes: 80cba391a35 ('spinlocks: in debug builds store cpu holding the lock')
Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
4 years agox86/cpuid: Fix APIC bit clearing
Fam Zheng [Wed, 29 Jul 2020 17:51:45 +0000 (18:51 +0100)]
x86/cpuid: Fix APIC bit clearing

The bug is obvious here, other places in this function used
"cpufeat_mask" correctly.

Fixed: b648feff8ea2 ("xen/x86: Improvements to in-hypervisor cpuid sanity checks")
Signed-off-by: Fam Zheng <famzheng@amazon.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/hvm: Clean up track_dirty_vram() calltree
Andrew Cooper [Fri, 20 Jul 2018 17:22:25 +0000 (17:22 +0000)]
x86/hvm: Clean up track_dirty_vram() calltree

 * Rename nr to nr_frames.  A plain 'nr' is confusing to follow in the the
   lower levels.
 * Use DIV_ROUND_UP() rather than opencoding it in several different ways
 * The hypercall input is capped at uint32_t, so there is no need for
   nr_frames to be unsigned long in the lower levels.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/hvm: only translate ISA interrupts to GSIs in virtual timers
Roger Pau Monne [Mon, 27 Jul 2020 17:05:39 +0000 (19:05 +0200)]
x86/hvm: only translate ISA interrupts to GSIs in virtual timers

Only call hvm_isa_irq_to_gsi for ISA interrupts, interrupts
originating from an IO APIC pin already use a GSI and don't need to be
translated.

I haven't observed any issues from this, but I think it's better to
use it correctly.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/vpt: only try to resume timers belonging to enabled devices
Roger Pau Monne [Mon, 27 Jul 2020 17:05:38 +0000 (19:05 +0200)]
x86/vpt: only try to resume timers belonging to enabled devices

Check whether the emulated device is actually enabled before trying to
resume the associated timers.

Thankfully all those structures are zeroed at initialization, and
since the devices are not enabled they are never populated, which
triggers the pt->vcpu check at the beginning of pt_resume forcing an
exit from the function.

While there limit the scope of i and make it unsigned.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/hvm: fix ISA IRQ 0 handling when set as lowest priority mode in IO APIC
Roger Pau Monne [Mon, 27 Jul 2020 17:05:37 +0000 (19:05 +0200)]
x86/hvm: fix ISA IRQ 0 handling when set as lowest priority mode in IO APIC

Lowest priority destination mode does allow the vIO APIC code to
select a vCPU to inject the interrupt to, but the selected vCPU must
be part of the possible destinations configured for such IO APIC pin.

Fix the code in order to only force vCPU 0 if it's part of the
listed destinations.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/hvm: don't force vCPU 0 for IRQ 0 when using fixed destination mode
Roger Pau Monne [Mon, 27 Jul 2020 17:05:36 +0000 (19:05 +0200)]
x86/hvm: don't force vCPU 0 for IRQ 0 when using fixed destination mode

When the IO APIC pin mapped to the ISA IRQ 0 has been configured to
use fixed delivery mode, do not forcefully route interrupts to vCPU 0,
as the OS might have setup those interrupts to be injected to a
different vCPU, and injecting to vCPU 0 can cause the OS to miss such
interrupts or errors to happen due to unexpected vectors being
injected on vCPU 0.

In order to fix remove such handling altogether for fixed destination
mode pins and just inject them according to the data setup in the
IO-APIC entry.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/hvm: fix vIO-APIC build without IRQ0_SPECIAL_ROUTING
Roger Pau Monne [Mon, 27 Jul 2020 17:05:35 +0000 (19:05 +0200)]
x86/hvm: fix vIO-APIC build without IRQ0_SPECIAL_ROUTING

pit_channel0_enabled needs to be guarded with IRQ0_SPECIAL_ROUTING
since it's only used when the special handling of ISA IRQ 0 is
enabled. However such helper being a single line it's better to just
inline it directly in vioapic_deliver where it's used.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoprint: introduce a format specifier for pci_sbdf_t
Roger Pau Monne [Mon, 27 Jul 2020 10:31:36 +0000 (12:31 +0200)]
print: introduce a format specifier for pci_sbdf_t

The new format specifier is '%pp', and prints a pci_sbdf_t using the
seg:bus:dev.func format. Replace all SBDFs printed using
'%04x:%02x:%02x.%u' to use the new format specifier.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Acked-by: Julien Grall <julien.grall@arm.com>
For just the pieces where Jan is the only maintainer:
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agopublic/domctl: Fix the struct xen_domctl ABI in 32bit builds
Andrew Cooper [Mon, 27 Jul 2020 18:21:09 +0000 (19:21 +0100)]
public/domctl: Fix the struct xen_domctl ABI in 32bit builds

The Xen domctl ABI currently relies on the union containing a field with
alignment of 8.

32bit projects which only copy the used subset of functionality end up with an
ABI breakage if they don't have at least one uint64_aligned_t field copied.

Insert explicit padding, and some build assertions to ensure it never changes
moving forwards.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
4 years agoxen/displif: Protocol version 2
Oleksandr Andrushchenko [Wed, 1 Jul 2020 07:19:23 +0000 (10:19 +0300)]
xen/displif: Protocol version 2

1. Add protocol version as an integer

Version string, which is in fact an integer, is hard to handle in the
code that supports different protocol versions. To simplify that
also add the version as an integer.

2. Pass buffer offset with XENDISPL_OP_DBUF_CREATE

There are cases when display data buffer is created with non-zero
offset to the data start. Handle such cases and provide that offset
while creating a display buffer.

3. Add XENDISPL_OP_GET_EDID command

Add an optional request for reading Extended Display Identification
Data (EDID) structure which allows better configuration of the
display connectors over the configuration set in XenStore.
With this change connectors may have multiple resolutions defined
with respect to detailed timing definitions and additional properties
normally provided by displays.

If this request is not supported by the backend then visible area
is defined by the relevant XenStore's "resolution" property.

If backend provides extended display identification data (EDID) with
XENDISPL_OP_GET_EDID request then EDID values must take precedence
over the resolutions defined in XenStore.

4. Bump protocol version to 2.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
4 years agox86/pv: Make the PV default WRMSR path match the HVM default
Andrew Cooper [Thu, 23 Jul 2020 17:33:51 +0000 (18:33 +0100)]
x86/pv: Make the PV default WRMSR path match the HVM default

The current HVM default for writes to unknown MSRs is to inject #GP if the MSR
is unreadable, and discard writes otherwise. While this behaviour isn't great,
the PV default is even worse, because it swallows writes even to non-readable
MSRs.  i.e. A PV guest doesn't even get a #GP fault for a write to a totally
bogus index.

Update PV to make it consistent with HVM, which will simplify the task of
making other improvements to the default MSR behaviour.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agolockprof: don't pass name into registration function
Jan Beulich [Fri, 24 Jul 2020 08:19:25 +0000 (10:19 +0200)]
lockprof: don't pass name into registration function

The type uniquely identifies the associated name, hence the name fields
can be statically initialized.

Also constify not just the involved struct field, but also struct
lock_profile's. Rather than specifying lock_profile_ancs[]' dimension at
definition time, add a suitable build time check, such that at least
missing tail additions to the initializer can be spotted easily.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agolockprof: don't leave locks uninitialized upon allocation failure
Jan Beulich [Fri, 24 Jul 2020 08:18:30 +0000 (10:18 +0200)]
lockprof: don't leave locks uninitialized upon allocation failure

Even if a specific struct lock_profile instance can't be allocated, the
lock itself should still be functional. As this isn't a production use
feature, also log a message in the event that the profiling struct can't
be allocated.

Fixes: d98feda5c756 ("Make lock profiling usable again")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/S3: put data segment registers into known state upon resume
Jan Beulich [Fri, 24 Jul 2020 08:17:26 +0000 (10:17 +0200)]
x86/S3: put data segment registers into known state upon resume

wakeup_32 sets %ds and %es to BOOT_DS, while leaving %fs at what
wakeup_start did set it to, and %gs at whatever BIOS did load into it.
All of this may end up confusing the first load_segments() to run on
the BSP after resume, in particular allowing a non-nul selector value
to be left in %fs.

Alongside %ss, also put all other data segment registers into the same
state that the boot and CPU bringup paths put them in.

Reported-by: M. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/vmce: Dispatch vmce_{rd,wr}msr() from guest_{rd,wr}msr()
Andrew Cooper [Tue, 21 Jul 2020 17:25:15 +0000 (18:25 +0100)]
x86/vmce: Dispatch vmce_{rd,wr}msr() from guest_{rd,wr}msr()

... rather than from the default clauses of the PV and HVM MSR handlers.

This means that we no longer take the vmce lock for any unknown MSR, and
accesses to architectural MCE banks outside of the subset implemented for the
guest no longer fall further through the unknown MSR path.

The bank limit of 32 isn't stated anywhere I can locate, but is a consequence
of the MSR layout described in SDM Volume 4.

With the vmce calls removed, the hvm alternative_call()'s expression can be
simplified substantially.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/svm: Misc coding style corrections
Andrew Cooper [Fri, 7 Feb 2020 15:35:54 +0000 (15:35 +0000)]
x86/svm: Misc coding style corrections

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/svm: Fold nsvm_{wr,rd}msr() into svm_msr_{read,write}_intercept()
Andrew Cooper [Mon, 10 Dec 2018 11:58:03 +0000 (11:58 +0000)]
x86/svm: Fold nsvm_{wr,rd}msr() into svm_msr_{read,write}_intercept()

... to simplify the default cases.

There are multiple errors with the handling of these three MSRs, but they are
deliberately not addressed at this point.

This removes the dance converting -1/0/1 into X86EMUL_*, allowing for the
removal of the 'ret' variable.

While cleaning this up, drop the gdprintk()'s for #GP conditions, and the
'result' variable from svm_msr_write_intercept() as it is never modified.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agotools/ocaml: Default to useful build output
Elliott Mitchell [Sat, 18 Jul 2020 03:32:42 +0000 (20:32 -0700)]
tools/ocaml: Default to useful build output

While hiding details of build output looks pretty to some, defaulting to
doing so deviates from the rest of Xen.  Switch the OCAML tools to match
everything else.

Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
4 years agotools: Partially revert "Cross-compilation fixes."
Elliott Mitchell [Sat, 18 Jul 2020 03:31:21 +0000 (20:31 -0700)]
tools: Partially revert "Cross-compilation fixes."

This partially reverts commit 16504669c5cbb8b195d20412aadc838da5c428f7.

Doesn't look like much of 16504669c5cbb8b195d20412aadc838da5c428f7
actually remains due to passage of time.

Of the 3, both Python and pygrub appear to mostly be building just fine
cross-compiling.  The OCAML portion is being troublesome, this is going
to cause bug reports elsewhere soon.  The OCAML portion though can
already be disabled by setting OCAML_TOOLS=n and shouldn't have this
extra form of disabling.

Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agotools/xen-cpuid: use dashes consistently in feature names
Jan Beulich [Tue, 21 Jul 2020 12:04:59 +0000 (14:04 +0200)]
tools/xen-cpuid: use dashes consistently in feature names

We've grown to a mix of dashes and underscores - switch to consistent
naming in the hope that future additions will play by this.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Wei Liu <wl@xen.org>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agooxenstored: fix ABI breakage introduced in Xen 4.9.0
Edwin Török [Wed, 15 Jul 2020 15:10:56 +0000 (16:10 +0100)]
oxenstored: fix ABI breakage introduced in Xen 4.9.0

dbc84d2983969bb47d294131ed9e6bbbdc2aec49 (Xen >= 4.9.0) deleted XS_RESTRICT
from oxenstored, which caused all the following opcodes to be shifted by 1:
reset_watches became off-by-one compared to the C version of xenstored.

Looking at the C code the opcode for reset watches needs:
XS_RESET_WATCHES = XS_SET_TARGET + 2

So add the placeholder `Invalid` in the OCaml<->C mapping list.
(Note that the code here doesn't simply convert the OCaml constructor to
 an integer, so we don't need to introduce a dummy constructor).

Igor says that with a suitably patched xenopsd to enable watch reset,
we now see `reset watches` during kdump of a guest in xenstored-access.log.

Signed-off-by: Edwin Török <edvin.torok@citrix.com>
Tested-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
4 years agogolang/xenlight: fix code generation for python 2.6
Nick Rosbrook [Mon, 20 Jul 2020 23:54:40 +0000 (19:54 -0400)]
golang/xenlight: fix code generation for python 2.6

Before python 2.7, str.format() calls required that the format fields
were explicitly enumerated, e.g.:

  '{0} {1}'.format(foo, bar)

  vs.

  '{} {}'.format(foo, bar)

Currently, gengotypes.py uses the latter pattern everywhere, which means
the Go bindings do not build on python 2.6. Use the 2.6 syntax for
format() in order to support python 2.6 for now.

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoMAINTAINERS: add myself as a golang bindings maintainer
Nick Rosbrook [Thu, 16 Jul 2020 16:00:26 +0000 (12:00 -0400)]
MAINTAINERS: add myself as a golang bindings maintainer

Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
4 years agoSUPPORT.md: Spell Experimental correctly
Julien Grall [Mon, 20 Jul 2020 17:35:55 +0000 (18:35 +0100)]
SUPPORT.md: Spell Experimental correctly

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86emul: support AVX512_VP2INTERSECT insns
Jan Beulich [Tue, 21 Jul 2020 12:00:25 +0000 (14:00 +0200)]
x86emul: support AVX512_VP2INTERSECT insns

The standard memory access pattern once again should allow us to go
without a test harness addition beyond the EVEX Disp8-scaling one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agox86/shadow: l3table[] and gl3e[] are HVM only
Jan Beulich [Tue, 21 Jul 2020 11:59:28 +0000 (13:59 +0200)]
x86/shadow: l3table[] and gl3e[] are HVM only

... by the very fact that they're 3-level specific, while PV always gets
run in 4-level mode. This requires adding some seemingly redundant
#ifdef-s - some of them will be possible to drop again once 2- and
3-level guest code doesn't get built anymore in !HVM configs, but I'm
afraid there's still quite a bit of disentangling work to be done to
make this possible.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/shadow: have just a single instance of sh_set_toplevel_shadow()
Jan Beulich [Tue, 21 Jul 2020 11:58:56 +0000 (13:58 +0200)]
x86/shadow: have just a single instance of sh_set_toplevel_shadow()

The only guest/shadow level dependent piece here is the call to
sh_make_shadow(). Make a pointer to the respective function an
argument of sh_set_toplevel_shadow(), allowing it to be moved to
common.c.

This implies making get_shadow_status() available to common.c; its set
and delete counterparts are moved along with it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/shadow: shadow_table[] needs only one entry for PV-only configs
Jan Beulich [Tue, 21 Jul 2020 11:58:15 +0000 (13:58 +0200)]
x86/shadow: shadow_table[] needs only one entry for PV-only configs

Furthermore the field isn't needed at all with shadow support disabled -
move it into struct shadow_vcpu.

Introduce for_each_shadow_table(), shortening loops for the 4-level case
at the same time.

Adjust loop variables and a function parameter to be "unsigned int"
where applicable at the same time. Also move a comment that ended up
misplaced due to incremental additions.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agox86/shadow: dirty VRAM tracking is needed for HVM only
Jan Beulich [Tue, 21 Jul 2020 11:57:06 +0000 (13:57 +0200)]
x86/shadow: dirty VRAM tracking is needed for HVM only

Move shadow_track_dirty_vram() into hvm.c (requiring two static
functions to become non-static). More importantly though make sure we
don't de-reference d->arch.hvm.dirty_vram for a non-HVM guest. This was
a latent issue only just because the field lives far enough into struct
hvm_domain to be outside the part overlapping with struct pv_domain.

While moving shadow_track_dirty_vram() some purely typographic
adjustments are being made, like inserting missing blanks or putting
breaces on their own lines.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
4 years agodocs: Replace non-UTF-8 character in hypfs-paths.pandoc
Andrew Cooper [Mon, 20 Jul 2020 16:54:52 +0000 (17:54 +0100)]
docs: Replace non-UTF-8 character in hypfs-paths.pandoc

From the docs cronjob on xenbits:

  /usr/bin/pandoc --number-sections --toc --standalone misc/hypfs-paths.pandoc --output html/misc/hypfs-paths.html
  pandoc: Cannot decode byte '\x92': Data.Text.Internal.Encoding.decodeUtf8: Invalid UTF-8 stream
  make: *** [Makefile:236: html/misc/hypfs-paths.html] Error 1

Fixes: 5a4a411bde4 ("docs: specify stability of hypfs path documentation")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoArm: prune #include-s needed by domain.h
Jan Beulich [Wed, 15 Jul 2020 10:39:06 +0000 (12:39 +0200)]
Arm: prune #include-s needed by domain.h

asm/domain.h is a dependency of xen/sched.h, and hence should not itself
include xen/sched.h. Nor should any of the other #include-s used by it.
While at it, also drop two other #include-s that aren't needed by this
particular header.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agodocs: specify stability of hypfs path documentation
Juergen Gross [Mon, 20 Jul 2020 11:38:00 +0000 (13:38 +0200)]
docs: specify stability of hypfs path documentation

In docs/misc/hypfs-paths.pandoc the supported paths in the hypervisor
file system are specified. Make it more clear that path availability
might change, e.g. due to scope widening or narrowing (e.g. being
limited to a specific architecture).

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/mtrr: Drop workaround for old 32bit CPUs
Andrew Cooper [Wed, 8 Jul 2020 10:12:51 +0000 (11:12 +0100)]
x86/mtrr: Drop workaround for old 32bit CPUs

This logic is dead as Xen is 64bit-only these days.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/vmx: add Intel PT MSR definitions
Michal Leszczynski [Tue, 30 Jun 2020 12:33:44 +0000 (14:33 +0200)]
x86/vmx: add Intel PT MSR definitions

Define constants related to Intel Processor Trace features.

Signed-off-by: Michal Leszczynski <michal.leszczynski@cert.pl>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agocompat: add a little bit of description to xlat.lst
Jan Beulich [Fri, 17 Jul 2020 15:52:14 +0000 (17:52 +0200)]
compat: add a little bit of description to xlat.lst

Requested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/HVM: fold both instances of looking up a hvm_ioreq_vcpu with a request pending
Jan Beulich [Fri, 17 Jul 2020 15:51:07 +0000 (17:51 +0200)]
x86/HVM: fold both instances of looking up a hvm_ioreq_vcpu with a request pending

It seems pretty likely that the "break" in the loop getting replaced in
handle_hvm_io_completion() was meant to exit both nested loops at the
same time. Re-purpose what has been hvm_io_pending() to hand back the
struct hvm_ioreq_vcpu instance found, and use it to replace the two
nested loops.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
4 years agox86/HVM: re-work hvm_wait_for_io() a little
Jan Beulich [Fri, 17 Jul 2020 15:50:40 +0000 (17:50 +0200)]
x86/HVM: re-work hvm_wait_for_io() a little

Convert the function's main loop to a more ordinary one, without goto
and without initial steps not needing to be inside a loop at all.

Take the opportunity and add blank lines between case blocks.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
4 years agox86/HVM: fold hvm_io_assist() into its only caller
Jan Beulich [Fri, 17 Jul 2020 15:50:09 +0000 (17:50 +0200)]
x86/HVM: fold hvm_io_assist() into its only caller

While there are two call sites, the function they're in can be slightly
re-arranged such that the code sequence can be added at its bottom. Note
that the function's only caller has already checked sv->pending, and
that the prior while() loop was just a slightly more fancy if()
(allowing an early break out of the construct).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
4 years agoVT-d: use clear_page() in alloc_pgtable_maddr()
Jan Beulich [Fri, 17 Jul 2020 15:49:29 +0000 (17:49 +0200)]
VT-d: use clear_page() in alloc_pgtable_maddr()

For full pages this is (meant to be) more efficient. Also change the
type and reduce the scope of the involved local variable.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agoVT-d: install sync_cache hook on demand
Jan Beulich [Fri, 17 Jul 2020 15:48:42 +0000 (17:48 +0200)]
VT-d: install sync_cache hook on demand

Instead of checking inside the hook whether any non-coherent IOMMUs are
present, simply install the hook only when this is the case.

To prove that there are no other references to the now dynamically
updated ops structure (and hence that its updating happens early
enough), make it static and rename it at the same time.

Note that this change implies that sync_cache() shouldn't be called
directly unless there are unusual circumstances, like is the case in
alloc_pgtable_maddr(), which gets invoked too early for iommu_ops to
be set already (and therefore we also need to be careful there to
avoid accessing vtd_ops later on, as it lives in .init).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agox86: restore pv_rtc_handler() invocation
Jan Beulich [Wed, 15 Jul 2020 13:46:30 +0000 (15:46 +0200)]
x86: restore pv_rtc_handler() invocation

This was lost when making the logic accessible to PVH Dom0.

While doing so make the access to the global function pointer safe
against races (as noticed by Roger): The only current user wants to be
invoked just once (but can tolerate to be invoked multiple times),
zapping the pointer at that point.

Fixes: 835d8d69d96a ("x86/rtc: provide mediated access to RTC for PVH dom0")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoVT-x: simplify/clarify vmx_load_pdptrs()
Jan Beulich [Tue, 14 Jul 2020 08:00:45 +0000 (10:00 +0200)]
VT-x: simplify/clarify vmx_load_pdptrs()

* Guests outside of long mode can't have PCID enabled. Drop the
  respective check to make more obvious that there's no security issue
  (from potentially accessing past the mapped page's boundary).

* Only bits 5...31 of CR3 are relevant in 32-bit PAE mode; all others
  are ignored. The high 32 ones may in particular have remained
  unchanged after leaving long mode.

* Drop the unnecessary and badly typed local variable p.

* Don't open-code hvm_long_mode_active() (and extend this to the related
  nested VT-x code).

* Constify guest_pdptes to clarify that we're only reading from the
  page.

* Drop the "crash" label now that there's only a single path leading
  there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
4 years agox86/mwait: remove unneeded local variables
Roger Pau Monné [Tue, 14 Jul 2020 07:57:17 +0000 (09:57 +0200)]
x86/mwait: remove unneeded local variables

Remove the eax and cstate local variables, the same can be directly
fetched from acpi_processor_cx without any transformations.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agoConfig.mk: Unnail versions (for unstable branch)
Ian Jackson [Mon, 13 Jul 2020 13:50:33 +0000 (14:50 +0100)]
Config.mk: Unnail versions (for unstable branch)

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agoBranch Xen 4.15: Change version numbers
Ian Jackson [Mon, 13 Jul 2020 13:50:06 +0000 (14:50 +0100)]
Branch Xen 4.15: Change version numbers

And rerun autogen.sh.  No changes other than to versions.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
4 years agopvcalls: Document correctly and explicitely the padding for all arches
Julien Grall [Sat, 27 Jun 2020 09:55:33 +0000 (10:55 +0100)]
pvcalls: Document correctly and explicitely the padding for all arches

The specification of pvcalls suggests there is padding for 32-bit x86 at
the end of most the structure. However, they are not described in in the
public header.

Because of that all the structures would have a different size between
32-bit x86 and 64-bit x86.

For all the other architectures supported (Arm and 64-bit x86), the
structure have the sames sizes because they contain implicit padding
thanks to the 64-bit alignment of the field uint64_t field.

Given the specification is authoritative, the padding will now be the
same for all architectures. The potential breakage of compatibility is
ought to be fine as pvcalls is still a tech preview.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agopvcalls: Clearly spell out that the header is just a reference
Julien Grall [Sat, 27 Jun 2020 09:55:32 +0000 (10:55 +0100)]
pvcalls: Clearly spell out that the header is just a reference

A recent thread on xen-devel [1] pointed out that the header was
provided as a reference for the specification.

Unfortunately, this was never written down in xen.git so for an external
user (or a reviewer) it is not clear whether the spec or the header
should be followed when there is a conflict.

To avoid more confusion, a paragraph is added at the top of the header
to clearly spell out it is only provided for reference.

[1] https://lore.kernel.org/xen-devel/alpine.DEB.2.21.2006151343430.9074@sstabellini-ThinkPad-T480s/

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Paul Durrant <paul@xen.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
4 years agoxen: Check the alignment of the offset pased via VCPUOP_register_vcpu_info
Julien Grall [Tue, 26 May 2020 17:31:33 +0000 (18:31 +0100)]
xen: Check the alignment of the offset pased via VCPUOP_register_vcpu_info

Currently a guest is able to register any guest physical address to use
for the vcpu_info structure as long as the structure can fits in the
rest of the frame.

This means a guest can provide an address that is not aligned to the
natural alignment of the structure.

On Arm 32-bit, unaligned access are completely forbidden by the
hypervisor. This will result to a data abort which is fatal.

On Arm 64-bit, unaligned access are only forbidden when used for atomic
access. As the structure contains fields (such as evtchn_pending_self)
that are updated using atomic operations, any unaligned access will be
fatal as well.

While the misalignment is only fatal on Arm, a generic check is added
as an x86 guest shouldn't sensibly pass an unaligned address (this
would result to a split lock).

This is XSA-327.

Reported-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
4 years agox86/ept: flush cache when modifying PTEs and sharing page tables
Roger Pau Monné [Tue, 7 Jul 2020 12:40:11 +0000 (14:40 +0200)]
x86/ept: flush cache when modifying PTEs and sharing page tables

Modifications made to the page tables by EPT code need to be written
to memory when the page tables are shared with the IOMMU, as Intel
IOMMUs can be non-coherent and thus require changes to be written to
memory in order to be visible to the IOMMU.

In order to achieve this make sure data is written back to memory
after writing an EPT entry when the recalc bit is not set in
atomic_write_ept_entry. If such bit is set, the entry will be
adjusted and atomic_write_ept_entry will be called a second time
without the recalc bit set. Note that when splitting a super page the
new tables resulting of the split should also be written back.

Failure to do so can allow devices behind the IOMMU access to the
stale super page, or cause coherency issues as changes made by the
processor to the page tables are not visible to the IOMMU.

This allows to remove the VT-d specific iommu_pte_flush helper, since
the cache write back is now performed by atomic_write_ept_entry, and
hence iommu_iotlb_flush can be used to flush the IOMMU TLB. The newly
used method (iommu_iotlb_flush) can result in less flushes, since it
might sometimes be called rightly with 0 flags, in which case it
becomes a no-op.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agovtd: optimize CPU cache sync
Roger Pau Monné [Tue, 7 Jul 2020 12:39:54 +0000 (14:39 +0200)]
vtd: optimize CPU cache sync

Some VT-d IOMMUs are non-coherent, which requires a cache write back
in order for the changes made by the CPU to be visible to the IOMMU.
This cache write back was unconditionally done using clflush, but there are
other more efficient instructions to do so, hence implement support
for them using the alternative framework.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/alternative: introduce alternative_2
Roger Pau Monné [Tue, 7 Jul 2020 12:39:25 +0000 (14:39 +0200)]
x86/alternative: introduce alternative_2

It's based on alternative_io_2 without inputs or outputs but with an
added memory clobber.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
4 years agovtd: don't assume addresses are aligned in sync_cache
Roger Pau Monné [Tue, 7 Jul 2020 12:39:05 +0000 (14:39 +0200)]
vtd: don't assume addresses are aligned in sync_cache

Current code in sync_cache assume that the address passed in is
aligned to a cache line size. Fix the code to support passing in
arbitrary addresses not necessarily aligned to a cache line size.

This is part of XSA-321.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/iommu: introduce a cache sync hook
Roger Pau Monné [Tue, 7 Jul 2020 12:38:34 +0000 (14:38 +0200)]
x86/iommu: introduce a cache sync hook

The hook is only implemented for VT-d and it uses the already existing
iommu_sync_cache function present in VT-d code. The new hook is
added so that the cache can be flushed by code outside of VT-d when
using shared page tables.

Note that alloc_pgtable_maddr must use the now locally defined
sync_cache function, because IOMMU ops are not yet setup the first
time the function gets called during IOMMU initialization.

No functional change intended.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agovtd: prune (and rename) cache flush functions
Roger Pau Monné [Tue, 7 Jul 2020 12:38:13 +0000 (14:38 +0200)]
vtd: prune (and rename) cache flush functions

Rename __iommu_flush_cache to iommu_sync_cache and remove
iommu_flush_cache_page. Also remove the iommu_flush_cache_entry
wrapper and just use iommu_sync_cache instead. Note the _entry suffix
was meaningless as the wrapper was already taking a size parameter in
bytes. While there also constify the addr parameter.

No functional change intended.

This is part of XSA-321.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agovtd: improve IOMMU TLB flush
Jan Beulich [Tue, 7 Jul 2020 12:37:46 +0000 (14:37 +0200)]
vtd: improve IOMMU TLB flush

Do not limit PSI flushes to order 0 pages, in order to avoid doing a
full TLB flush if the passed in page has an order greater than 0 and
is aligned. Should increase the performance of IOMMU TLB flushes when
dealing with page orders greater than 0.

This is part of XSA-321.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/ept: atomically modify entries in ept_next_level
Roger Pau Monné [Tue, 7 Jul 2020 12:37:12 +0000 (14:37 +0200)]
x86/ept: atomically modify entries in ept_next_level

ept_next_level was passing a live PTE pointer to ept_set_middle_entry,
which was then modified without taking into account that the PTE could
be part of a live EPT table. This wasn't a security issue because the
pages returned by p2m_alloc_ptp are zeroed, so adding such an entry
before actually initializing it didn't allow a guest to access
physical memory addresses it wasn't supposed to access.

This is part of XSA-328.

Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86/EPT: ept_set_middle_entry() related adjustments
Jan Beulich [Tue, 7 Jul 2020 12:36:52 +0000 (14:36 +0200)]
x86/EPT: ept_set_middle_entry() related adjustments

ept_split_super_page() wants to further modify the newly allocated
table, so have ept_set_middle_entry() return the mapped pointer rather
than tearing it down and then getting re-established right again.

Similarly ept_next_level() wants to hand back a mapped pointer of
the next level page, so re-use the one established by
ept_set_middle_entry() in case that path was taken.

Pull the setting of suppress_ve ahead of insertion into the higher level
table, and don't have ept_split_super_page() set the field a 2nd time.

This is part of XSA-328.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
4 years agox86/shadow: correct an inverted conditional in dirty VRAM tracking
Jan Beulich [Tue, 7 Jul 2020 12:36:24 +0000 (14:36 +0200)]
x86/shadow: correct an inverted conditional in dirty VRAM tracking

This originally was "mfn_x(mfn) == INVALID_MFN". Make it like this
again, taking the opportunity to also drop the unnecessary nearby
braces.

This is XSA-319.

Fixes: 246a5a3377c2 ("xen: Use a typesafe to define INVALID_MFN")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
4 years agoxen/common: event_channel: Don't ignore error in get_free_port()
Julien Grall [Thu, 19 Mar 2020 13:17:31 +0000 (13:17 +0000)]
xen/common: event_channel: Don't ignore error in get_free_port()

Currently, get_free_port() is assuming that the port has been allocated
when evtchn_allocate_port() is not return -EBUSY.

However, the function may return an error when:
    - We exhausted all the event channels. This can happen if the limit
    configured by the administrator for the guest ('max_event_channels'
    in xl cfg) is higher than the ABI used by the guest. For instance,
    if the guest is using 2L, the limit should not be higher than 4095.
    - We cannot allocate memory (e.g Xen has not more memory).

Users of get_free_port() (such as EVTCHNOP_alloc_unbound) will validly
assuming the port was valid and will next call evtchn_from_port(). This
will result to a crash as the memory backing the event channel structure
is not present.

Fixes: 368ae9a05fe ("xen/pvshim: forward evtchn ops between L0 Xen and L2 DomU")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
4 years agox86emul: fix FXRSTOR test for most AMD CPUs
Jan Beulich [Mon, 6 Jul 2020 15:14:24 +0000 (17:14 +0200)]
x86emul: fix FXRSTOR test for most AMD CPUs

AMD CPUs that we classify as X86_BUG_FPU_PTRS don't touch the selector/
offset portion of the save image during FXSAVE unless an unmasked
exception is pending. Hence the selector zapping done between the
initial FXSAVE and the emulated FXRSTOR needs to be mirrored onto the
second FXSAVE, output of which gets fed into memcmp() to compare with
the input image.

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoConfig: Update QEMU
Anthony PERARD [Fri, 3 Jul 2020 13:55:33 +0000 (14:55 +0100)]
Config: Update QEMU

Backport 2 commits to fix building QEMU without PCI passthrough
support.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agokdd: fix build again
Wei Liu [Fri, 3 Jul 2020 20:10:01 +0000 (20:10 +0000)]
kdd: fix build again

Restore Tim's patch. The one that was committed was recreated by me
because git didn't accept my saved copy. I made some mistakes while
recreating that patch and here we are.

Fixes: 3471cafbdda3 ("kdd: stop using [0] arrays to access packet contents")
Reported-by: Michael Young <m.a.young@durham.ac.uk>
Signed-off-by: Wei Liu <wl@xen.org>
Reviewed-by: Tim Deegan <tim@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agobuild: tweak variable exporting for make 3.82
Jan Beulich [Thu, 2 Jul 2020 09:11:40 +0000 (11:11 +0200)]
build: tweak variable exporting for make 3.82

While I've been running into an issue here only because of an additional
local change I'm carrying, to be able to override just the compiler in
$(XEN_ROOT)/.config (rather than the whole tool chain), in
config/StdGNU.mk:

ifeq ($(filter-out default undefined,$(origin CC)),)

I'd like to propose to nevertheless correct the underlying issue:
Exporting an unset variable changes its origin from "undefined" to
"file". This comes into effect because of our adding of -rR to
MAKEFLAGS, which make 3.82 wrongly applies also upon re-invoking itself
after having updated auto.conf{,.cmd}.

Move the export statement past $(XEN_ROOT)/config/$(XEN_OS).mk inclusion
(which happens through $(XEN_ROOT)/Config.mk) such that the variables
already have their designated values at that point, while retaining
their initial origin up to the point they get defined.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/tlb: fix assisted flush usage
Roger Pau Monné [Thu, 2 Jul 2020 09:05:53 +0000 (11:05 +0200)]
x86/tlb: fix assisted flush usage

Commit e9aca9470ed86 introduced a regression when avoiding sending
IPIs for certain flush operations. Xen page fault handler
(spurious_page_fault) relies on blocking interrupts in order to
prevent handling TLB flush IPIs and thus preventing other CPUs from
removing page tables pages. Switching to assisted flushing avoided such
IPIs, and thus can result in pages belonging to the page tables being
removed (and possibly re-used) while __page_fault_type is being
executed.

Force some of the TLB flushes to use IPIs, thus avoiding the assisted
TLB flush. Those selected flushes are the page type change (when
switching from a page table type to a different one, ie: a page that
has been removed as a page table) and page allocation. This sadly has
a negative performance impact on the pvshim, as less assisted flushes
can be used. Note the flush in grant-table code is also switched to
use an IPI even when not strictly needed. This is done so that a
common arch_flush_tlb_mask can be introduced and always used in common
code.

Introduce a new flag (FLUSH_FORCE_IPI) and helper to force a TLB flush
using an IPI (x86 only). Note that the flag is only meaningfully defined
when the hypervisor supports PV or shadow paging mode, as otherwise
hardware assisted paging domains are in charge of their page tables and
won't share page tables with Xen, thus not influencing the result of
page walks performed by the spurious fault handler.

Just passing this new flag when calling flush_area_mask prevents the
usage of the assisted flush without any other side effects.

Note the flag is not defined on Arm.

Fixes: e9aca9470ed86 ('x86/tlb: use Xen L0 assisted TLB flush when available')
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agooptee: allow plain TMEM buffers with NULL address
Volodymyr Babchuk [Fri, 19 Jun 2020 22:34:01 +0000 (22:34 +0000)]
optee: allow plain TMEM buffers with NULL address

Trusted Applications use a popular approach to determine the required
size of a buffer: the client provides a memory reference with the NULL
pointer to a buffer. This is so called "Null memory reference". TA
updates the reference with the required size and returns it back to the
client. Then the client allocates a buffer of the needed size and
repeats the operation.

This behavior is described in TEE Client API Specification, paragraph
3.2.5. Memory References.

OP-TEE represents this null memory reference as a TMEM parameter with
buf_ptr = 0x0. This is the only case when we should allow a TMEM
buffer without the OPTEE_MSG_ATTR_NONCONTIG flag. This also the
special case for a buffer with OPTEE_MSG_ATTR_NONCONTIG flag.

This could lead to a potential issue, because IPA 0x0 is a valid
address, but OP-TEE will treat it as a special case. So, care should
be taken when construction OP-TEE enabled guest to make sure that such
guest have no memory at IPA 0x0 and none of its memory is mapped at PA
0x0.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agooptee: immediately free buffers that are released by OP-TEE
Volodymyr Babchuk [Fri, 19 Jun 2020 22:33:59 +0000 (22:33 +0000)]
optee: immediately free buffers that are released by OP-TEE

Normal World can share a buffer with OP-TEE for two reasons:
1. A client application wants to exchange data with TA
2. OP-TEE asks for shared buffer for internal needs

The second case was handled more strictly than necessary:

1. In RPC request OP-TEE asks for buffer
2. NW allocates buffer and provides it via RPC response
3. Xen pins pages and translates data
4. Xen provides buffer to OP-TEE
5. OP-TEE uses it
6. OP-TEE sends request to free the buffer
7. NW frees the buffer and sends the RPC response
8. Xen unpins pages and forgets about the buffer

The problem is that Xen should forget about buffer in between stages 6
and 7. I.e. the right flow should be like this:

6. OP-TEE sends request to free the buffer
7. Xen unpins pages and forgets about the buffer
8. NW frees the buffer and sends the RPC response

This is because OP-TEE internally frees the buffer before sending the
"free SHM buffer" request. So we have no reason to hold reference for
this buffer anymore. Moreover, in multiprocessor systems NW have time
to reuse the buffer cookie for another buffer. Xen complained about this
and denied the new buffer registration. I have seen this issue while
running tests on iMX SoC.

So, this patch basically corrects that behavior by freeing the buffer
earlier, when handling RPC return from OP-TEE.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/spec-ctrl: Protect against CALL/JMP straight-line speculation
Andrew Cooper [Wed, 1 Jul 2020 11:39:59 +0000 (12:39 +0100)]
x86/spec-ctrl: Protect against CALL/JMP straight-line speculation

Some x86 CPUs speculatively execute beyond indirect CALL/JMP instructions.

With CONFIG_INDIRECT_THUNK / Retpolines, indirect CALL/JMP instructions are
converted to direct CALL/JMP's to __x86_indirect_thunk_REG(), leaving just a
handful of indirect JMPs implementing those stubs.

There is no architectrual execution beyond an indirect JMP, so use INT3 as
recommended by vendors to halt speculative execution.  This is shorter than
LFENCE (which would also work fine), but also shows up in logs if we do
unexpected execute them.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agokconfig: fix typo in XEN_SHSTK description
Olaf Hering [Tue, 30 Jun 2020 10:21:19 +0000 (12:21 +0200)]
kconfig: fix typo in XEN_SHSTK description

Rename 'vai' to 'via'.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agomm: fix public declaration of struct xen_mem_acquire_resource
Roger Pau Monné [Mon, 29 Jun 2020 16:03:49 +0000 (18:03 +0200)]
mm: fix public declaration of struct xen_mem_acquire_resource

XENMEM_acquire_resource and it's related structure is currently inside
a __XEN__ or __XEN_TOOLS__ guarded section to limit it's scope to the
hypervisor or the toolstack only. This is wrong as the hypercall is
already being used by the Linux kernel at least, and as such needs to
be public.

Also switch the usage of uint64_aligned_t to plain uint64_t, as
uint64_aligned_t is only to be used by the toolstack. Doing such
change will reduce the size of the structure on 32bit x86 by 4bytes,
since there will be no padding added after the frame_list handle.

This is fine, as users of the previous layout will allocate 4bytes of
padding that won't be read by Xen, and users of the new layout won't
allocate those, which is also fine since Xen won't try to access them.

Note that the structure already has compat handling, and such handling
will take care of copying the right size (ie: minus the padding) when
called from a 32bit x86 context. This is true for the compat code both
before and after this patch, since the structures in the memory.h
compat header are subject to a pragma pack(4), which already removed
the trailing padding that would otherwise be introduced by the
alignment of the frame field to 8 bytes.

Fixes: 3f8f12281dd20 ('x86/mm: add HYPERVISOR_memory_op to acquire guest resources')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agoxsm: Drop trailing whitespace from build scripts
Andrew Cooper [Fri, 26 Jun 2020 16:48:49 +0000 (17:48 +0100)]
xsm: Drop trailing whitespace from build scripts

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/boot: Don't disable PV32 when XEN_SHSTK is compiled out
Andrew Cooper [Fri, 26 Jun 2020 10:30:55 +0000 (11:30 +0100)]
x86/boot: Don't disable PV32 when XEN_SHSTK is compiled out

There is no need to automatically disable PV32 support on SHSTK-capable
hardware if Xen isn't actually using the feature.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agochangelog: Add notes about CET and Migration changes
Andrew Cooper [Fri, 26 Jun 2020 14:35:27 +0000 (15:35 +0100)]
changelog: Add notes about CET and Migration changes

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/livepatch: Make livepatching compatible with CET Shadow Stacks
Andrew Cooper [Mon, 8 Jun 2020 17:47:58 +0000 (18:47 +0100)]
x86/livepatch: Make livepatching compatible with CET Shadow Stacks

Just like the alternatives infrastructure, the livepatch infrastructure
disables CR0.WP to perform patching, which is not permitted with CET active.

Modify arch_livepatch_{quiesce,revive}() to disable CET before disabling WP,
and reset the dirty bits on all virtual regions before re-enabling CET.

One complication is that arch_livepatch_revive() has to fix up the top of the
shadow stack.  This depends on the functions not being inlined, even under
LTO.  Another limitation is that reset_virtual_region_perms() may shatter the
final superpage of .text depending on alignment.

This logic, and its downsides, are temporary until the patching infrastructure
can be adjusted to not use CR0.WP.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agox86/msr: Disallow access to Processor Trace MSRs
Andrew Cooper [Fri, 19 Jun 2020 11:14:32 +0000 (12:14 +0100)]
x86/msr: Disallow access to Processor Trace MSRs

We do not expose the feature to guests, so should disallow access to the
respective MSRs.  For simplicity, drop the entire block of MSRs, not just the
subset which have been specified thus far.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Wei Liu <wl@xen.org>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agovchan-socket-proxy: Handle closing shared input/output_fd
Jason Andryuk [Thu, 11 Jun 2020 03:29:36 +0000 (23:29 -0400)]
vchan-socket-proxy: Handle closing shared input/output_fd

input_fd & output_fd may be the same FD.  In that case, mark both as -1
when closing one.  That avoids a dangling FD reference.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agovchan-socket-proxy: Cleanup resources on exit
Jason Andryuk [Thu, 11 Jun 2020 03:29:35 +0000 (23:29 -0400)]
vchan-socket-proxy: Cleanup resources on exit

Close open FDs and close th vchan connection when exiting the program.

This addresses some Coverity findings about leaking file descriptors.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agovchan-socket-proxy: Set closed FDs to -1
Jason Andryuk [Thu, 11 Jun 2020 03:29:34 +0000 (23:29 -0400)]
vchan-socket-proxy: Set closed FDs to -1

These FDs are closed, so set them to -1 so they are no longer valid.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agovchan-socket-proxy: Switch data_loop() to take state
Jason Andryuk [Thu, 11 Jun 2020 03:29:33 +0000 (23:29 -0400)]
vchan-socket-proxy: Switch data_loop() to take state

Switch data_loop to take a pointer to vchan_proxy_state.

No functional change.

This removes a dead store to input_fd identified by Coverity.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>
4 years agovchan-socket-proxy: Use a struct to store state
Jason Andryuk [Thu, 11 Jun 2020 03:29:32 +0000 (23:29 -0400)]
vchan-socket-proxy: Use a struct to store state

Use a struct to group the vchan ctrl and FDs.  This will facilite
tracking the state of open and closed FDs and ctrl in data_loop().

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Release-acked-by: Paul Durrant <paul@xen.org>