Enable CPU erratum of Speculative AT on the Neoverse N1 processor
versions r0p0 to r2p0.
Also Fix Cortex A76 Erratum string which had a wrong errata number.
Roger Pau Monne [Mon, 17 Aug 2020 15:57:52 +0000 (17:57 +0200)]
x86/pv: handle writes to the EFER MSR
Silently drop writes to the EFER MSR for PV guests if the value is not
changed from what it's being reported. Current PV Linux will attempt
to write to the MSR with the same value that's been read, and raising
a fault will result in a guest crash.
As part of this work introduce a helper to easily get the EFER value
reported to guests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Mon, 17 Aug 2020 18:45:47 +0000 (19:45 +0100)]
tools/ocaml/xenstored: drop select based socket watching
Poll has been the default since 2014, I think we can safely say by now
that poll() works and we don't need to fall back to select().
This will allow fixing up the way we call poll to be more efficient
(and pave the way for introducing epoll support):
currently poll wraps the select API, which is inefficient.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
xen/arm: cmpxchg: Add missing memory barriers in __cmpxchg_mb_timeout()
The function __cmpxchg_mb_timeout() was intended to have the same
semantics as __cmpxchg_mb(). Unfortunately, the memory barriers were
not added when first implemented.
There is no known issue with the existing callers, but the barriers are
added given this is the expected semantics in Xen.
The issue was introduced by XSA-295.
Backport: 4.8+ Fixes: 86b0bc958373 ("xen/arm: cmpxchg: Provide a new helper that can timeout") Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Roger Pau Monné [Wed, 12 Aug 2020 12:47:05 +0000 (14:47 +0200)]
x86/hvm: change EOI exit bitmap helper parameter
Change the last parameter of the update_eoi_exit_bitmap helper to be a
set/clear boolean instead of a triggering field. This is already
inline with how the function is implemented, and will allow deciding
whether an exit is required by the higher layers that call into
update_eoi_exit_bitmap. Note that the current behavior is not changed
by this patch.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Don Slutz [Sun, 9 Aug 2020 18:22:34 +0000 (14:22 -0400)]
rpmball: Adjust to new rpm, do not require --force
Also prevent warning: directory /boot: remove failed
Before:
[root@TestCloud1 xen]# rpm -hiv dist/xen*rpm
Preparing... ################################# [100%]
file /boot from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/bin from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/lib from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/lib64 from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
file /usr/sbin from install of xen-4.15-unstable.x86_64 conflicts with file from package filesystem-3.2-25.el7.x86_64
[root@TestCloud1 xen]# rpm -e xen
warning: directory /boot: remove failed: Device or resource busy
Paul Durrant [Tue, 4 Aug 2020 13:41:59 +0000 (14:41 +0100)]
x86/iommu: convert AMD IOMMU code to use new page table allocator
This patch converts the AMD IOMMU code to use the new page table allocator
function. This allows all the free-ing code to be removed (since it is now
handled by the general x86 code) which reduces TLB and cache thrashing as well
as shortening the code.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 4 Aug 2020 13:41:57 +0000 (14:41 +0100)]
x86/iommu: add common page-table allocator
Instead of having separate page table allocation functions in VT-d and AMD
IOMMU code, we could use a common allocation function in the general x86 code.
This patch adds a new allocation function, iommu_alloc_pgtable(), for this
purpose. The function adds the page table pages to a list. The pages in this
list are then freed by iommu_free_pgtables(), which is called by
domain_relinquish_resources() after PCI devices have been de-assigned.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 4 Aug 2020 13:41:56 +0000 (14:41 +0100)]
x86/iommu: re-arrange arch_iommu to separate common fields...
... from those specific to VT-d or AMD IOMMU, and put the latter in a union.
There is no functional change in this patch, although the initialization of
the 'mapped_rmrrs' list occurs slightly later in iommu_domain_init() since
it is now done (correctly) in VT-d specific code rather than in general x86
code.
NOTE: I have not combined the AMD IOMMU 'root_table' and VT-d 'pgd_maddr'
fields even though they perform essentially the same function. The
concept of 'root table' in the VT-d code is different from that in the
AMD code so attempting to use a common name will probably only serve
to confuse the reader.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
David Woodhouse [Thu, 19 Mar 2020 20:40:24 +0000 (20:40 +0000)]
tools/xenstore: Do not abort xenstore-ls if a node disappears while iterating
The do_ls() function has somewhat inconsistent handling of errors.
If reading the node's contents with xs_read() fails, then do_ls() will
just quietly not display the contents.
If reading the node's permissions with xs_get_permissions() fails, then
do_ls() will print a warning, continue, and ultimately won't exit with
an error code (unless another error happens).
If recursing into the node with xs_directory() fails, then do_ls() will
abort immediately, not printing any further nodes.
For persistent failure modes — such as ENOENT because a node has been
removed, or EACCES because it has had its permisions changed since the
xs_directory() on the parent directory returned its name — it's
obviously quite likely that if either of the first two errors occur for
a given node, then so will the third and thus xenstore-ls will abort.
The ENOENT one is actually a fairly common case, and has caused tools to
fail to clean up a network device because it *apparently* already
doesn't exist in xenstore.
There is a school of thought that says, "Well, xenstore-ls returned an
error. So the tools should not trust its output."
The natural corollary of this would surely be that the tools must re-run
xenstore-ls as many times as is necessary until its manages to exit
without hitting the race condition. I am not keen on that conclusion.
For the specific case of ENOENT it seems reasonable to declare that,
but for the timing, we might as well just not have seen that node at
all when calling xs_directory() for the parent. By ignoring the error,
we give acceptable output.
The issue can be reproduced as follows:
(dom0) # for a in `seq 1 1000` ; do
xenstore-write /local/domain/2/foo/$a $a ;
done
Now simultaneously:
(dom0) # for a in `seq 1 999` ; do
xenstore-rm /local/domain/2/foo/$a ;
done
(dom2) # while true ; do
./xenstore-ls -p /local/domain/2/foo | grep -c 1000 ;
done
We should expect to see node 1000 in the output, every time.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com>
Paul Durrant [Thu, 13 Aug 2020 10:35:53 +0000 (11:35 +0100)]
x86/viridian: remove the viridian_vcpu msg_pending bit mask
The mask does not actually serve a useful purpose as we only use the SynIC
for timer messages. Dropping the mask means that the EOM MSR handler
essentially becomes a no-op. This means we can avoid setting 'message_pending'
for timer messages and hence avoid a VMEXIT for the EOM.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Trammell Hudson [Wed, 12 Aug 2020 17:42:48 +0000 (17:42 +0000)]
x86/setup: Ignore early boot parameters like no-real-mode
There are parameters in xen/arch/x86/boot/cmdline.c that
are only used early in the boot process, so handlers are
necessary to avoid an "Unknown command line option" in
dmesg.
This also updates ignore_param() to generate a temporary
variable name so that the macro can be used more than once
per file.
Signed-off-by: Trammell hudson <hudson@trmm.net> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Leave note to stop TEMP_NAME() finding more general use] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:32:11 +0000 (13:32 +0200)]
x86/EFI: sanitize build logic
With changes done over time and as far as linking goes, the only special
thing about building with EFI support enabled is the need for the dummy
relocations object for xen.gz uniformly in all build stages. All other
efi/*.o can be consumed from the built_in*.o files.
In efi/Makefile, besides moving relocs-dummy.o to "extra", also properly
split between obj-y and obj-bin-y.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:14:02 +0000 (13:14 +0200)]
x86: slightly re-arrange 32-bit handling in dom0_construct_pv()
Add #ifdef-s (the 2nd one will be needed in particular, to guard the
uses of m2p_compat_vstart and HYPERVISOR_COMPAT_VIRT_START()) and fold
duplicate uses of elf_32bit().
Also adjust what gets logged: Avoid "compat32" when support isn't built
in, and don't assume ELF class <> ELFCLASS64 means ELFCLASS32.
While doing this, in code getting touched anyway:
- use ROUNDUP() instead of open-coding it,
- drop a stale (dead) BUG_ON(),
- replace panic() by printk() plus error return, for being consistent
with other code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
The originally used sed expression converted not just multiple leading
zeroes (as intended), but also trailing ones, rendering the error
message somewhat confusing. Collapse zeroes in just the one place where
we need them collapsed, and leave objdump's output as is for all other
purposes.
Fixes: 48115d14743e ("Move more kernel decompression bits to .init.* sections") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 7 Aug 2020 11:12:00 +0000 (13:12 +0200)]
build: work around bash issue
Older bash (observed with 3.2.57(2)) fails to honor "set -e" for certain
built-in commands ("while" here), despite the command's status correctly
being non-zero. The subsequent objcopy invocation now being separated by
a semicolon results in no failure. Insert an explicit "exit" (replacing
; by && ought to be another possible workaround).
Fixes: e321576f4047 ("xen/build: start using if_changed") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
during boot. The units on the first line are Hz, not MHz, so correct that and
add a space for clarity.
Also, for the min/max line, use three dots instead of two and add more spaces
so that the line can't be mistaken for being a double decimal point typo.
Andrew Cooper [Wed, 5 Aug 2020 11:05:27 +0000 (12:05 +0100)]
x86/ioapic: Fix fixmap error path logic in ioapic_init_mappings()
In the case that bad_ioapic_register() fails, the current position of idx++
means that clear_fixmap(idx) will be called with the wrong index, and not
clean up the mapping just created.
Increment idx as part of the loop, rather than midway through the loop body.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 5 Aug 2020 08:30:18 +0000 (10:30 +0200)]
x86emul: correct AVX512_BF16 insn names in EVEX Disp8 test
The leading 'v' ought to be omitted from the table entries.
Fixes: 7ff66809ccd5 ("x86emul: support AVX512_BF16 insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:29:18 +0000 (10:29 +0200)]
x86emul: AVX512PF insns aren't memory accesses
These are prefetches, so should be treated just like other prefetches.
Fixes: 467e91bde720 ("x86emul: support AVX512PF insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:28:40 +0000 (10:28 +0200)]
x86emul: AVX512F scatter insns are memory writes
While the custom handling renders the "to_mem" field generally unused,
x86_insn_is_mem_write() still (indirectly) consumes that information,
and hence the table entries want to be correct.
Fixes: 7d569b848036 ("x86emul: support AVX512F scatter insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:28:01 +0000 (10:28 +0200)]
x86emul: AVX512{F,BW} down conversion moves are memory writes
For this to be properly reported, the case labels need to move to a
different switch() block.
Fixes: 30e0bdf79828 ("x86emul: support AVX512{F,BW} down conversion moves") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:26:11 +0000 (10:26 +0200)]
x86emul: adjustments to mem access / write logic testing
The combination of specifying a ModR/M byte with the upper two bits set
and the modrm field set to T is pointless - the same test will be
executed twice, i.e. overall things will be slower for no extra gain. I
can only assume this was a copy-and-paste-without-enough-editing mistake
of mine.
Furthermore adjust the base type of a few bit fields to shrink table
size, as subsequently quite a few new entries will get added to the
tables using this type.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 5 Aug 2020 08:20:59 +0000 (10:20 +0200)]
x86emul: further FPU env testing relaxation for AMD-like CPUs
See the code comment that's being extended. Additionally a few more
zap_fpsel() invocations are needed - whenever we stored state after
there potentially having been a context switch behind our backs.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 3 Aug 2020 08:06:32 +0000 (10:06 +0200)]
libxl: avoid golang building without CONFIG_GOLANG=y
While this doesn't address the real problem I've run into (attempting to
update r/o source files), not recursing into tools/golang/xenlight/ is
enough to fix the build for me for the moment. I don't currently see why 60db5da62ac0 ("libxl: Generate golang bindings in libxl Makefile") found
it necessary to invoke this build step unconditionally.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Mon, 3 Aug 2020 14:27:22 +0000 (16:27 +0200)]
x86emul: avoid assembler warning about .type not taking effect in test harness
gcc re-orders top level blocks by default when optimizing. This
re-ordering results in all our .type directives to get emitted to the
assembly file first, followed by gcc's. The assembler warns about
attempts to change the type of a symbol when it was already set (and
when there's no intervening setting to "notype").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Fri, 31 Jul 2020 15:43:31 +0000 (17:43 +0200)]
x86/hvm: simplify 'mmio_direct' check in epte_get_entry_emt()
Re-factor the code to take advantage of the fact that the APIC access page is
a 'special' page. The VMX code is left alone and hence the APIC access page is
still inserted into the P2M with type p2m_mmio_direct. This is left alone as it
is not obvious there is another suitable type to use, and the necessary
re-ordering in epte_get_entry_emt() is straightforward.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Fri, 31 Jul 2020 15:42:47 +0000 (17:42 +0200)]
x86/hvm: set 'ipat' in EPT for special pages
All non-MMIO ranges (i.e those not mapping real device MMIO regions) that
map valid MFNs are normally marked MTRR_TYPE_WRBACK and 'ipat' is set. Hence
when PV drivers running in a guest populate the BAR space of the Xen Platform
PCI Device with pages such as the Shared Info page or Grant Table pages,
accesses to these pages will be cachable.
However, should IOMMU mappings be enabled be enabled for the guest then these
accesses become uncachable. This has a substantial negative effect on I/O
throughput of PV devices. Arguably PV drivers should bot be using BAR space to
host the Shared Info and Grant Table pages but it is currently commonplace for
them to do this and so this problem needs mitigation. Hence this patch makes
sure the 'ipat' bit is set for any special page regardless of where in GFN
space it is mapped.
NOTE: Clearly this mitigation only applies to Intel EPT. It is not obvious
that there is any similar mitigation possible for AMD NPT. Downstreams
such as Citrix XenServer have been carrying a patch similar to this for
several releases though.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 31 Jul 2020 15:41:58 +0000 (17:41 +0200)]
x86emul: replace UB shifts
Displacement values can be negative, hence we shouldn't left-shift them.
Or else we get
(XEN) UBSAN: Undefined behaviour in x86_emulate/x86_emulate.c:3482:55
(XEN) left shift of negative value -2
While auditing shifts, I noticed a pair of missing parentheses, which
also get added right here.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 31 Jul 2020 15:40:13 +0000 (17:40 +0200)]
x86/PV: drop a few misleading paging_mode_refcounts() checks
The filling and cleaning up of v->arch.guest_table in new_guest_cr3()
was apparently inconsistent so far: There was a type ref acquired
unconditionally for the new top level page table, but the dropping of
the old type ref was conditional upon !paging_mode_refcounts(). Mirror
this also to arch_set_info_guest().
Also move new_guest_cr3()'s #ifdef to around the function - both callers
now get built only when CONFIG_PV, i.e. no need to retain a stub.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 26 Jun 2020 16:46:38 +0000 (17:46 +0100)]
tools/configure: drop BASH configure variable
This is a weird variable to have in the first place. The only user of it is
XSM's CONFIG_SHELL, which opencodes a fallback to sh. The scripts are shebang
sh, which is already necessary to support non-Linux build environments.
Make the mkflask.sh and mkaccess_vector.sh scripts executable, drop the
CONFIG_SHELL, and drop the $BASH variable to prevent further use.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/spinlock: move debug helpers inside the locked regions
Debug helpers such as lock profiling or the invariant pCPU assertions
must strictly be performed inside the exclusive locked region, or else
races might happen.
Note the issue was not strictly introduced by the pointed commit in
the Fixes tag, since lock stats where already incremented before the
barrier, but that commit made it more apparent as manipulating the cpu
field could happen outside of the locked regions and thus trigger the
BUG_ON on rel_lock(). This is only enabled on debug builds, and thus
releases are not affected.
Fixes: 80cba391a35 ('spinlocks: in debug builds store cpu holding the lock') Reported-by: Igor Druzhinin <igor.druzhinin@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Andrew Cooper [Fri, 20 Jul 2018 17:22:25 +0000 (17:22 +0000)]
x86/hvm: Clean up track_dirty_vram() calltree
* Rename nr to nr_frames. A plain 'nr' is confusing to follow in the the
lower levels.
* Use DIV_ROUND_UP() rather than opencoding it in several different ways
* The hypercall input is capped at uint32_t, so there is no need for
nr_frames to be unsigned long in the lower levels.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
x86/vpt: only try to resume timers belonging to enabled devices
Check whether the emulated device is actually enabled before trying to
resume the associated timers.
Thankfully all those structures are zeroed at initialization, and
since the devices are not enabled they are never populated, which
triggers the pt->vcpu check at the beginning of pt_resume forcing an
exit from the function.
While there limit the scope of i and make it unsigned.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/hvm: fix ISA IRQ 0 handling when set as lowest priority mode in IO APIC
Lowest priority destination mode does allow the vIO APIC code to
select a vCPU to inject the interrupt to, but the selected vCPU must
be part of the possible destinations configured for such IO APIC pin.
Fix the code in order to only force vCPU 0 if it's part of the
listed destinations.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
x86/hvm: don't force vCPU 0 for IRQ 0 when using fixed destination mode
When the IO APIC pin mapped to the ISA IRQ 0 has been configured to
use fixed delivery mode, do not forcefully route interrupts to vCPU 0,
as the OS might have setup those interrupts to be injected to a
different vCPU, and injecting to vCPU 0 can cause the OS to miss such
interrupts or errors to happen due to unexpected vectors being
injected on vCPU 0.
In order to fix remove such handling altogether for fixed destination
mode pins and just inject them according to the data setup in the
IO-APIC entry.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86/hvm: fix vIO-APIC build without IRQ0_SPECIAL_ROUTING
pit_channel0_enabled needs to be guarded with IRQ0_SPECIAL_ROUTING
since it's only used when the special handling of ISA IRQ 0 is
enabled. However such helper being a single line it's better to just
inline it directly in vioapic_deliver where it's used.
No functional change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
print: introduce a format specifier for pci_sbdf_t
The new format specifier is '%pp', and prints a pci_sbdf_t using the
seg:bus:dev.func format. Replace all SBDFs printed using
'%04x:%02x:%02x.%u' to use the new format specifier.
No functional change intended.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Julien Grall <julien.grall@arm.com>
For just the pieces where Jan is the only maintainer: Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 27 Jul 2020 18:21:09 +0000 (19:21 +0100)]
public/domctl: Fix the struct xen_domctl ABI in 32bit builds
The Xen domctl ABI currently relies on the union containing a field with
alignment of 8.
32bit projects which only copy the used subset of functionality end up with an
ABI breakage if they don't have at least one uint64_aligned_t field copied.
Insert explicit padding, and some build assertions to ensure it never changes
moving forwards.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Version string, which is in fact an integer, is hard to handle in the
code that supports different protocol versions. To simplify that
also add the version as an integer.
2. Pass buffer offset with XENDISPL_OP_DBUF_CREATE
There are cases when display data buffer is created with non-zero
offset to the data start. Handle such cases and provide that offset
while creating a display buffer.
3. Add XENDISPL_OP_GET_EDID command
Add an optional request for reading Extended Display Identification
Data (EDID) structure which allows better configuration of the
display connectors over the configuration set in XenStore.
With this change connectors may have multiple resolutions defined
with respect to detailed timing definitions and additional properties
normally provided by displays.
If this request is not supported by the backend then visible area
is defined by the relevant XenStore's "resolution" property.
If backend provides extended display identification data (EDID) with
XENDISPL_OP_GET_EDID request then EDID values must take precedence
over the resolutions defined in XenStore.
Andrew Cooper [Thu, 23 Jul 2020 17:33:51 +0000 (18:33 +0100)]
x86/pv: Make the PV default WRMSR path match the HVM default
The current HVM default for writes to unknown MSRs is to inject #GP if the MSR
is unreadable, and discard writes otherwise. While this behaviour isn't great,
the PV default is even worse, because it swallows writes even to non-readable
MSRs. i.e. A PV guest doesn't even get a #GP fault for a write to a totally
bogus index.
Update PV to make it consistent with HVM, which will simplify the task of
making other improvements to the default MSR behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 24 Jul 2020 08:19:25 +0000 (10:19 +0200)]
lockprof: don't pass name into registration function
The type uniquely identifies the associated name, hence the name fields
can be statically initialized.
Also constify not just the involved struct field, but also struct
lock_profile's. Rather than specifying lock_profile_ancs[]' dimension at
definition time, add a suitable build time check, such that at least
missing tail additions to the initializer can be spotted easily.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 24 Jul 2020 08:18:30 +0000 (10:18 +0200)]
lockprof: don't leave locks uninitialized upon allocation failure
Even if a specific struct lock_profile instance can't be allocated, the
lock itself should still be functional. As this isn't a production use
feature, also log a message in the event that the profiling struct can't
be allocated.
Fixes: d98feda5c756 ("Make lock profiling usable again") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 24 Jul 2020 08:17:26 +0000 (10:17 +0200)]
x86/S3: put data segment registers into known state upon resume
wakeup_32 sets %ds and %es to BOOT_DS, while leaving %fs at what
wakeup_start did set it to, and %gs at whatever BIOS did load into it.
All of this may end up confusing the first load_segments() to run on
the BSP after resume, in particular allowing a non-nul selector value
to be left in %fs.
Alongside %ss, also put all other data segment registers into the same
state that the boot and CPU bringup paths put them in.
Reported-by: M. Vefa Bicakci <m.v.b@runbox.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 21 Jul 2020 17:25:15 +0000 (18:25 +0100)]
x86/vmce: Dispatch vmce_{rd,wr}msr() from guest_{rd,wr}msr()
... rather than from the default clauses of the PV and HVM MSR handlers.
This means that we no longer take the vmce lock for any unknown MSR, and
accesses to architectural MCE banks outside of the subset implemented for the
guest no longer fall further through the unknown MSR path.
The bank limit of 32 isn't stated anywhere I can locate, but is a consequence
of the MSR layout described in SDM Volume 4.
With the vmce calls removed, the hvm alternative_call()'s expression can be
simplified substantially.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Mon, 10 Dec 2018 11:58:03 +0000 (11:58 +0000)]
x86/svm: Fold nsvm_{wr,rd}msr() into svm_msr_{read,write}_intercept()
... to simplify the default cases.
There are multiple errors with the handling of these three MSRs, but they are
deliberately not addressed at this point.
This removes the dance converting -1/0/1 into X86EMUL_*, allowing for the
removal of the 'ret' variable.
While cleaning this up, drop the gdprintk()'s for #GP conditions, and the
'result' variable from svm_msr_write_intercept() as it is never modified.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
While hiding details of build output looks pretty to some, defaulting to
doing so deviates from the rest of Xen. Switch the OCAML tools to match
everything else.
Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Of the 3, both Python and pygrub appear to mostly be building just fine
cross-compiling. The OCAML portion is being troublesome, this is going
to cause bug reports elsewhere soon. The OCAML portion though can
already be disabled by setting OCAML_TOOLS=n and shouldn't have this
extra form of disabling.
Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Wei Liu <wl@xen.org>
Edwin Török [Wed, 15 Jul 2020 15:10:56 +0000 (16:10 +0100)]
oxenstored: fix ABI breakage introduced in Xen 4.9.0
dbc84d2983969bb47d294131ed9e6bbbdc2aec49 (Xen >= 4.9.0) deleted XS_RESTRICT
from oxenstored, which caused all the following opcodes to be shifted by 1:
reset_watches became off-by-one compared to the C version of xenstored.
Looking at the C code the opcode for reset watches needs:
XS_RESET_WATCHES = XS_SET_TARGET + 2
So add the placeholder `Invalid` in the OCaml<->C mapping list.
(Note that the code here doesn't simply convert the OCaml constructor to
an integer, so we don't need to introduce a dummy constructor).
Igor says that with a suitably patched xenopsd to enable watch reset,
we now see `reset watches` during kdump of a guest in xenstored-access.log.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Tested-by: Igor Druzhinin <igor.druzhinin@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Nick Rosbrook [Mon, 20 Jul 2020 23:54:40 +0000 (19:54 -0400)]
golang/xenlight: fix code generation for python 2.6
Before python 2.7, str.format() calls required that the format fields
were explicitly enumerated, e.g.:
'{0} {1}'.format(foo, bar)
vs.
'{} {}'.format(foo, bar)
Currently, gengotypes.py uses the latter pattern everywhere, which means
the Go bindings do not build on python 2.6. Use the 2.6 syntax for
format() in order to support python 2.6 for now.
Signed-off-by: Nick Rosbrook <rosbrookn@ainfosec.com> Acked-by: Wei Liu <wl@xen.org>
Jan Beulich [Tue, 21 Jul 2020 11:59:28 +0000 (13:59 +0200)]
x86/shadow: l3table[] and gl3e[] are HVM only
... by the very fact that they're 3-level specific, while PV always gets
run in 4-level mode. This requires adding some seemingly redundant
#ifdef-s - some of them will be possible to drop again once 2- and
3-level guest code doesn't get built anymore in !HVM configs, but I'm
afraid there's still quite a bit of disentangling work to be done to
make this possible.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 21 Jul 2020 11:58:56 +0000 (13:58 +0200)]
x86/shadow: have just a single instance of sh_set_toplevel_shadow()
The only guest/shadow level dependent piece here is the call to
sh_make_shadow(). Make a pointer to the respective function an
argument of sh_set_toplevel_shadow(), allowing it to be moved to
common.c.
This implies making get_shadow_status() available to common.c; its set
and delete counterparts are moved along with it.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 21 Jul 2020 11:58:15 +0000 (13:58 +0200)]
x86/shadow: shadow_table[] needs only one entry for PV-only configs
Furthermore the field isn't needed at all with shadow support disabled -
move it into struct shadow_vcpu.
Introduce for_each_shadow_table(), shortening loops for the 4-level case
at the same time.
Adjust loop variables and a function parameter to be "unsigned int"
where applicable at the same time. Also move a comment that ended up
misplaced due to incremental additions.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Tue, 21 Jul 2020 11:57:06 +0000 (13:57 +0200)]
x86/shadow: dirty VRAM tracking is needed for HVM only
Move shadow_track_dirty_vram() into hvm.c (requiring two static
functions to become non-static). More importantly though make sure we
don't de-reference d->arch.hvm.dirty_vram for a non-HVM guest. This was
a latent issue only just because the field lives far enough into struct
hvm_domain to be outside the part overlapping with struct pv_domain.
While moving shadow_track_dirty_vram() some purely typographic
adjustments are being made, like inserting missing blanks or putting
breaces on their own lines.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Fixes: 5a4a411bde4 ("docs: specify stability of hypfs path documentation") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Wed, 15 Jul 2020 10:39:06 +0000 (12:39 +0200)]
Arm: prune #include-s needed by domain.h
asm/domain.h is a dependency of xen/sched.h, and hence should not itself
include xen/sched.h. Nor should any of the other #include-s used by it.
While at it, also drop two other #include-s that aren't needed by this
particular header.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
docs: specify stability of hypfs path documentation
In docs/misc/hypfs-paths.pandoc the supported paths in the hypervisor
file system are specified. Make it more clear that path availability
might change, e.g. due to scope widening or narrowing (e.g. being
limited to a specific architecture).
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Paul Durrant <paul@xen.org>
Jan Beulich [Fri, 17 Jul 2020 15:52:14 +0000 (17:52 +0200)]
compat: add a little bit of description to xlat.lst
Requested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 17 Jul 2020 15:51:07 +0000 (17:51 +0200)]
x86/HVM: fold both instances of looking up a hvm_ioreq_vcpu with a request pending
It seems pretty likely that the "break" in the loop getting replaced in
handle_hvm_io_completion() was meant to exit both nested loops at the
same time. Re-purpose what has been hvm_io_pending() to hand back the
struct hvm_ioreq_vcpu instance found, and use it to replace the two
nested loops.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Fri, 17 Jul 2020 15:50:09 +0000 (17:50 +0200)]
x86/HVM: fold hvm_io_assist() into its only caller
While there are two call sites, the function they're in can be slightly
re-arranged such that the code sequence can be added at its bottom. Note
that the function's only caller has already checked sv->pending, and
that the prior while() loop was just a slightly more fancy if()
(allowing an early break out of the construct).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Fri, 17 Jul 2020 15:48:42 +0000 (17:48 +0200)]
VT-d: install sync_cache hook on demand
Instead of checking inside the hook whether any non-coherent IOMMUs are
present, simply install the hook only when this is the case.
To prove that there are no other references to the now dynamically
updated ops structure (and hence that its updating happens early
enough), make it static and rename it at the same time.
Note that this change implies that sync_cache() shouldn't be called
directly unless there are unusual circumstances, like is the case in
alloc_pgtable_maddr(), which gets invoked too early for iommu_ops to
be set already (and therefore we also need to be careful there to
avoid accessing vtd_ops later on, as it lives in .init).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>