Wei Liu [Tue, 29 Jan 2019 12:59:55 +0000 (12:59 +0000)]
x86/mm: change pl3e to l3t in virt_to_xen_l3e
We will need to have a variable named pl3e when we rewrite
virt_to_xen_l3e. Change pl3e to l3t to reflect better its purpose.
This will make reviewing later patch easier.
Wei Liu [Tue, 29 Jan 2019 12:57:35 +0000 (12:57 +0000)]
x86/mm: change pl1e to l1t in virt_to_xen_l1e
We will need to have a variable named pl1e when we rewrite
virt_to_xen_l1e. Change pl1e to l1t to reflect better its purpose.
This will make reviewing later patch easier.
Wei Liu [Tue, 29 Jan 2019 12:54:48 +0000 (12:54 +0000)]
x86/mm: change pl2e to l2t in virt_to_xen_l2e
We will need to have a variable named pl2e when we rewrite
virt_to_xen_l2e. Change pl2e to l2t to reflect better its purpose.
This will make reviewing later patch easier.
Wei Liu [Wed, 23 Jan 2019 15:33:07 +0000 (15:33 +0000)]
x86: introduce a new set of APIs to manage Xen page tables
We are going to switch to using domheap page for page tables.
A new set of APIs is introduced to allocate, map, unmap and free pages
for page tables.
The allocation and deallocation work on mfn_t but not page_info,
because they are required to work even before frame table is set up.
Implement the old functions with the new ones. We will rewrite, patch
by patch, other mm functions that manipulate page tables to use the
new APIs.
Note these new APIs still use xenheap page underneath and no actual
map and unmap is done so that we don't break xen half way. They will
be switched to use domheap and dynamic mappings.
xen-bugtool relies on code that has been removed in commit 9e8672f1c3
"tools: remove xend and associated python modules", more than 5 years
ago. Remove it, since it confuses users.
-$ /usr/sbin/xen-bugtool
Traceback (most recent call last):
File "/usr/sbin/xen-bugtool", line 9, in <module>
from xen.util import bugtool
ImportError: No module named xen.util
Anthony PERARD [Thu, 31 Jan 2019 10:57:48 +0000 (10:57 +0000)]
libxl: When restricted, start QEMU paused
libxl runs the command "cont" later during guest creation; i.e. it
is expecting that QEMU would not do any emulation. Use the "-S"
command option to achieve this.
Unfortunately, when QEMU is started with "-S", it won't write QEMU's
readiness into xenstore. So only activate this option when we have a
QEMU startup notification via QMP available, i.e. when dm_restrict
is activated.
The -S option has the side-effect of suppressing the startup
notification via xenstore: libxl will only get the notification via
QMP.
It is important to rely only on QMP for notification when we have
QMP available, as (due to a qemu bug) not waiting for that QMP
notification may result in the QMP socket becoming blocked, so that
QEMU stops responding to new connections even if no existing ones
are active.
When the QEMU bug happens, the actions taken by both libxl and QEMU
are roughly as follows:
- libxl connects and handshakes with QEMU, then sends the
cmd "query-status".
- QEMU prepares and maybe tries to send the response,
while also writing "running" into xenstore.
- libxl sees via xenstore that QEMU is running and disconnects from the
QMP socket before receiving the response from the cmd.
=> The QMP socket (monitor) is thereby blocked and will never reply
to commands on new connections.
This is due to QEMU only responding to one command at a time, and
suspending its monitor (QMP) until the command has been processed and
sent. Disconnecting from the socket doesn't unsuspend the monitor. The
race described here is very likely to happen with QEMU 3.1.50 (during
3.2 development), but can be reproduced with QEMU 3.1.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Drop the debug-only early exit if the sources of length disagree, because the
only effect it has it to avoid the more detailed analysis of what went wrong.
Reported-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Thu, 13 Dec 2018 17:01:24 +0000 (17:01 +0000)]
x86/svm: Drop enum instruction_index and simplify svm_get_insn_len()
Passing a 32-bit integer index into an array with entries containing less than
32 bits of data is wasteful, and creates an unnecessary error condition of
passing an out-of-range index.
The width of the X86EMUL_OPC() encoding is currently 20 bits for the
instructions used, which leaves room for a modrm byte. Drop opc_tab[]
entirely, and encode the expected opcode/modrm information directly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Brian Woods <brian.woods@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Thu, 13 Dec 2018 17:01:24 +0000 (09:01 -0800)]
x86/svm: Remove list functionality from __get_instruction_length_* infrastructure
The existing __get_instruction_length_from_list() has a single user
which uses the list functionality. That user however should be looking
specifically for INVD or WBINVD, as reported by the vmexit exit reason.
Modify svm_vmexit_do_invalidate_cache() to ask for the correct
instruction, and drop all list functionality from the helper.
Take the opportunity to rename it to svm_get_insn_len(), and drop the
IOIO length handling which has never been used.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Brian Woods <brian.woods@amd.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Thu, 31 Jan 2019 10:38:24 +0000 (11:38 +0100)]
x86emul: correct AVX512BW write masking checks
For VPSADBW this likely was a result of bad copy-and-paste.
For VPS{L,R}LDQ comment and code were not in line, but then again the
comment also wasn't fully updated from the AVX2 original it got cloned
from.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Thu, 31 Jan 2019 10:37:56 +0000 (11:37 +0100)]
tools: fix build dependency upon generated header(s)
Commit fd35f32b4b ("tools/x86emul: Use struct cpuid_policy in the
userspace test harnesses") didn't account for the dependencies of
cpuid-autogen.h to potentially change between incremental builds.
Putting the make invocation to produce the header together with the
directory tree creation therefore does not work. Introduce a separate
goal.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Wei Liu [Wed, 30 Jan 2019 13:55:55 +0000 (13:55 +0000)]
x86/pvh-boot: don't mandate validity of RSDP pointer
RSDP is not mandatory according to PVH spec. Remove the BUG_ON. The
guest (xen) will fall back to scanning if necessary.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooepr3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrii Anisov [Fri, 25 Jan 2019 17:06:02 +0000 (19:06 +0200)]
xen/arm: gic-vgic: Fix the assert condition in vgic_connect_hw_irq
Currently, the assert condition in vgic_connect_hw_irq does not
correspond to the comment above and result to hit the assertion
on HW IRQ disconnection.
Fix the condition so it corresponds to the comment and allows IRQ
disconnection on debug builds.
Fixes: ec2a2f1 ("ARM: VGIC: factor out vgic_connect_hw_irq()") Signed-off-by: Andrii Anisov <andrii_anisov@epam.com> Suggested-by: Stefan Nuernberger <snu@amazon.de> Reviewed-by: Andre Przywara <andre.przywara@arm.com>
[julieng: Reword the commit message] Acked-by: Julien Grall <julien.grall@arm.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Wei Liu [Tue, 29 Jan 2019 11:37:59 +0000 (11:37 +0000)]
libxl: correctly dispose of dominfo list in libxl_name_to_domid
Tamas reported ssid_label was leaked. Use the designated function to
free dominfo list to fix the leakage.
Reported-by: Tamas K Lengyel <tamas@tklengyel.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Tested-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Fri, 25 Jan 2019 16:23:46 +0000 (16:23 +0000)]
x86/hvm: Fix bit checking for CR4 and MSR_EFER
Before the cpuid_policy logic came along, %cr4/EFER auditing on migrate-in was
complicated, because at that point no CPUID information had been set for the
guest. Auditing against the host CPUID was better than nothing, but not
ideal.
Similarly at the time, PVHv1 lacked the "CPUID passed through from hardware"
behaviour with PV guests had, and PVH dom0 had to be special-cased to be able
to boot.
Order of information in the migration stream is still an issue (hence we still
need to keep the restore parameter to cope with a nested virt corner case for
%cr4), but since Xen 4.9, all domains start with a suitable CPUID policy,
which is a more appropriate upper bound than host_cpuid_policy.
Finally, reposition the UMIP logic as it is the only row out of order.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Tue, 22 Jan 2019 18:58:56 +0000 (18:58 +0000)]
x86/p2m: Drop erroneous #VE-enabled check in ept_set_entry()
Code clearing the "Suppress VE" bit in an EPT entry isn't nececsserily running
in current context. In ALTP2M_external mode, it definitely is not, and in PV
context, vcpu_altp2m(current) acts upon the HVM union.
Even if we could sensibly resolve the target vCPU, it may legitimately not be
fully set up at this point, so rejecting the EPT modification would be buggy.
There is a path in hvm_hap_nested_page_fault() which explicitly emulates #VE
in the cpu_has_vmx_virt_exceptions case, so the -EOPNOTSUPP part of this
condition is also wrong.
Drop the !sve check entirely.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Release-acked-by: Juergen Gross <jgross@suse.com>
In order to solve it move the vioapic_hwdom_map_gsi outside of the
locked region in vioapic_write_redirent. vioapic_hwdom_map_gsi will
not access any of the vioapic fields, so there's no need to call the
function holding the hvm.irq_lock.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Julien Grall [Mon, 28 Jan 2019 11:50:25 +0000 (11:50 +0000)]
xen/arm: Implement workaround for Cortex-A76 erratum 1165522
Early version of Cortex-A76 can end-up with corrupt TLBs if they
speculate an AT instruction while the S1/S2 system registers are in an
inconsistent state.
This can happen during guest context switch and when invalidating the
TLBs for other than the current VMID.
The workaround implemented in Xen will:
- Use an empty stage-2 with a reserved VMID while context switching
between 2 guests
- Use an empty stage-2 with the VMID where TLBs need to be flushed
Julien Grall [Mon, 28 Jan 2019 11:50:24 +0000 (11:50 +0000)]
xen/arm: p2m: Only use isb() when it is necessary
The EL1 translation regime is out-of-context when running at EL2. This
means the processor cannot speculate memory accesses using the registers
associated to that regime.
An isb() is only needed if Xen is going to use the translation regime
before returning to the guest (exception returns will synchronize the
context).
Remove unnecessary isb() and document the ones left.
Julien Grall [Mon, 28 Jan 2019 11:50:23 +0000 (11:50 +0000)]
xen/arm: domain_build: Don't switch to the guest P2M when copying data
Until recently, kernel/initrd/dtb were loaded using guest VA and
therefore requiring to restore temporarily the P2M. This was reworked
in a series of commits (up to 9292086 "xen/arm: domain_build: Use
copy_to_guest_phys_flush_dcache in dtb_load") to use a guest PA.
This will also help a follow-up patch which will require
p2m_{save,restore}_state to work in pair to workaround an erratum.
Jan Beulich [Mon, 28 Jan 2019 16:40:39 +0000 (17:40 +0100)]
x86/AMD: flush TLB after ucode update
The increased number of messages (spec_ctrl.c:print_details()) within a
certain time window made me notice some slowness of boot time screen
output. Experimentally I've narrowed the time window to be from
immediately after the early ucode update on the BSP to the PAT write in
cpu_init(), which upon further investigation has an effect because of
the full TLB flush that's implied by that write.
For that reason, as a workaround, flush the TLB of the mapping of the
page that holds the blob. Note that flushing just a single page is
sufficient: As per verify_patch_size() patch size can't exceed 4k, and
the way xmalloc() works the blob can't be crossing a page boundary.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Brian Woods <brian.woods@amd.com> Release-acked-by: Juergen Gross <jgross@suse.com>
During instruction emulation, the cpuid instruction is emulated with
data that is controlled by the guest. As speculation might pass bound
checks, we have to ensure that no out-of-bound loads are possible.
To not rely on the compiler to perform value propagation, instead of
using the array_index_nospec macro, we replace the variable with the
constant to be propagated instead.
This commit is part of the SpectreV1+L1TF mitigation patch series.
When interacting with hpet, read and write operations can be executed
during instruction emulation, where the guest controls the data that
is used. As it is hard to predict the number of instructions that are
executed speculatively, we prevent out-of-bound accesses by using the
array_index_nospec function for guest specified addresses that should
be used for hpet operations.
We introduce another macro that uses the ARRAY_SIZE macro to block
speculative accesses. For arrays that are statically accessed, this macro
can be used instead of the usual macro. Using this macro results in more
readable code, and allows to modify the way this case is handled in a
single place.
This commit is part of the SpectreV1+L1TF mitigation patch series.
George Dunlap [Thu, 24 Jan 2019 17:48:27 +0000 (17:48 +0000)]
docs: Fix dm_restrict documentation
Remove "chatty" and redundant information from the xl man page;
restrict it to functional descriptions only, and point instead to
qemu-depriv.pandoc and SUPPORT.md as locations for "canonical"
information.
Add a man page entry for device_model_user.
Update qemu-deprivilege.pandoc:
Changes in missing feature list:
- Migration is functional
- But qdisk backends are not
Add a missing restriction list.
The following statements from the man page are dropped:
- Mentioning PV; PV guests never have a device model.
- Drop the confusing statement about stdvga and cirrus vga options.
- Re-used domain IDs are now handled.
- Device models should no longer be able to create world-readable
files on dom0's filesystem.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Thu, 24 Jul 2014 10:06:39 +0000 (11:06 +0100)]
xen/sched: Introduce domain_vcpu() helper
The progression of multi-vcpu support in Xen (originally a single pointer,
then an embedded d->vcpu[] array, then a dynamically allocated array) has
resulted in a large quantity of ad-hoc code for looking a vcpu up by id, and a
large number of ways that the toolstack can cause Xen to trip over a NULL
pointer. Some of this has been addressed in Xen 4.12, and work is ongoing.
Another property of looking a vcpu up by id is that it is frequently done in
unprivileged hypercall context, making it an attractive target for speculative
sidechannel attacks.
Introduce a helper to do the lookup correctly, and without speculative
interference. For performance reasons, it is useful not to have an smp_rmb()
in this helper on ARM, and luckily this is safe to do, because of the
serialisation offered by the global domlist lock.
As a minor change noticed when checking the safety of this construct, sanity
check during boot that idle->max_vcpus is a suitable upper bound for
idle->vcpu[].
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 21 Dec 2018 17:23:32 +0000 (17:23 +0000)]
x86/pvh-dom0: Remove unnecessary function pointer call from modify_identity_mmio()
Function pointer calls are far more expensive in a post-Spectre world, and
this one doesn't need to be.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Andrew Cooper [Fri, 7 Dec 2018 13:43:27 +0000 (13:43 +0000)]
xen/dom0: Add a dom0-iommu=none option
For development purposes, it is very convenient to boot Xen as a PVH guest,
with an XTF PV or PVH "dom0". The edit-compile-go cycle is a matter of
seconds, and you can reasonably insert printk() debugging in places which
which would be completely infeasible when booting fully-fledged guests.
However, the PVH dom0 path insists on having a working IOMMU, which doesn't
exist when virtualised as a PVH guest, and isn't necessary for XTF anyway.
Introduce a developer mode to skip the IOMMU requirement.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>