UBSAN (which I happened to have active in my build at the time) identifies the
problem explicitly:
(XEN) Using APIC driver default
(XEN) ================================================================================
(XEN) UBSAN: Undefined behaviour in /local/xen.git/xen/include/xsm/xsm.h:309:19
(XEN) member access within null pointer of type 'struct xsm_operations'
(XEN) ----[ Xen-4.13-unstable x86_64 debug=y Not tainted ]----
"adjust system domain creation (and call it earlier on x86)" didn't account
for the fact that domain_create() depends on XSM already being set up.
Therefore, domain_create() follows xsm_ops->alloc_security_domain() which is
offset 0 from a NULL pointer, meaning that we execute the 16bit IVT until
happening to explode in __x86_indirect_thunk_rax().
There is nothing very interesting that xsm_multiboot_init() does more than
allocating memory, which means that it is safe to move earlier during setup.
xen/arm: mm: Check start is always before end in {destroy, modify}_xen_mappings
The two helpers {destroy, modify}_xen_mappings don't check that the
start is always before the end. This should never happen but if it
happens, it will result to unexpected behavior.
Catch such issues earlier on by adding an ASSERT in destroy_xen_mappings
and modify_xen_mappings.
Since commit f60658c6ae "xen/arm: Stop relocating Xen", the function
setup_page_tables() does not require any information from the FDT.
So the initialization of the page-tables can be done much earlier in the
boot process. The earliest setup_page_tables() can be called is after
traps have been initialized, so we can get backtrace if an error
occurred.
Moving the initialization of the page-tables also avoid the dance to map
the FDT again in the new set of page-tables.
xen/arm: mm: Introduce DEFINE_PAGE_TABLE{,S} and use it
We have multiple static page-tables defined in arch/arm/mm.c. The
current way to define them is difficult to read and does not help when
making modification.
Two new helpers DEFINE_PAGE_TABLES (to define multiple page-tables) and
DEFINE_PAGE_TABLE (alias of DEFINE_PAGE_TABLES(..., 1)) are introduced
and now used to define static page-tables.
Note that DEFINE_PAGE_TABLES() alignment differs from what is currently
used for allocating page-tables. This is fine because page-tables are
only required to be aligned to a page-size.
xen/arm32: mm: Avoid to zero and clean cache for CPU0 domheap
The page-table walker is configured to use the same shareability and
cacheability as the access performed when updating the page-tables. This
means cleaning the cache for CPU0 domheap is unnecessary.
Furthermore, CPU0 page-tables are part of Xen binary and will already be
zeroed before been used. So it is pointless to zero the domheap again.
xen/arm32: head: Always zero r3 before update a page-table entry
The boot code is using r2 and r3 to hold the page-table entry value.
While r2 is always updated before storing the value, this is not always
the case for r3.
Thankfully today, r3 will always be zero when we care. But this is
difficult to track and error-prone.
So always zero r3 within the few instructions before the write the
page-table entry.
There are no reason to consider the HW CPU ID will be 0 when the
processor is part of a uniprocessor system. At best, this will result to
conflicting output as the rest of Xen use the value directly read from
MPIDR.
So remove the zeroing and logic to check if the CPU is part of a
uniprocessor system.
xen/arm: p2m: configure stage-2 page table to support upto 42-bit PA systems
At the moment, on platform supporting 42-bit PA, Xen will only expose
40-bit worth of IPA to all domains.
The limitation was to prevent allocating too much memory for the root
page tables as those platforms only support 3-levels page-tables. At the
time, this was deemed acceptable because none of the platforms had
address wired above 40-bits.
However, newer platforms take advantage of the full address space. This
will result to break Dom0 boot as it can't access anything above 40-bit.
The only way to support 42-bit IPA is to allocate 8 pages for the root
page-tables. This is a bit a waste of memory as Xen does not offer
per-guest stage-2 configuration. But it is considered acceptable as
current platforms support 42-bit PA have a lot of memory.
In the future, we may want to consider per-guest stage-2 configuration
to reduce the waste.
Jan Beulich [Fri, 31 May 2019 09:53:39 +0000 (03:53 -0600)]
Arm64: further speed-up to hweight{32,64}()
According to Linux commit e75bef2a4f ("arm64: Select
ARCH_HAS_FAST_MULTIPLIER") this is a further improvement over the
variant using only bitwise operations on at least some hardware, and no
worse on other.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
xen: actually skip the first MAX_ORDER bits in pfn_pdx_hole_setup
pfn_pdx_hole_setup is meant to skip the first MAX_ORDER bits, but
actually it only skips the first MAX_ORDER-1 bits. The issue was
probably introduced by bdb5439c3f ("x86_64: Ensure frame-table
compression leaves MAX_ORDER aligned"), when changing to loop to start
from MAX_ORDER-1 an adjustment by 1 was needed in the call to
find_next_bit() but not done.
Fix the issue by passing j+1 and i+1 to find_next_zero_bit and
find_next_bit. Also add a check for i >= BITS_PER_LONG because
find_{,next_}zero_bit() are free to assume that their last argument is
less than their middle one.
pfn_to_pdx expects an address, not a size, as a parameter. Specifically,
it expects the end address, then the masks calculations compensate for
any holes between start and end. Thus, we should pass the end address to
pfn_to_pdx.
The initial pdx is stored in frametable_base_pdx, so we can subtract the
result of pfn_to_pdx(start_address) from nr_pdxs; we know that we don't
need to cover any memory in the range 0-start in the frametable.
Remove the variable `nr_pages' because it is unused.
Pu Wen [Thu, 4 Apr 2019 13:48:13 +0000 (21:48 +0800)]
tools/libxc: Add Hygon Dhyana support
Add Hygon Dhyana support to caculate the cpuid policies for creating PV
or HVM guest by using the code path of AMD.
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Wei Liu <wei.liu2@citrix.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Pu Wen [Thu, 4 Apr 2019 13:48:04 +0000 (21:48 +0800)]
x86/cpuid: Add Hygon Dhyana support
The Hygon Dhyana family 18h processor shares the same cpuid leaves as
the AMD family 17h one. So add Hygon Dhyana support to caculate the
cpuid policies as the AMD CPU does.
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Pu Wen [Thu, 4 Apr 2019 13:47:40 +0000 (21:47 +0800)]
x86/domctl: Add Hygon Dhyana support
Add Hygon Dhyana support to update cpuid info for creating PV guest.
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Pu Wen [Thu, 4 Apr 2019 13:47:29 +0000 (21:47 +0800)]
x86/domain: Add Hygon Dhyana support
Add Hygon Dhyana support to handle HyperTransport range.
Also loading a nul selector does not clear bases and limits on Hygon
CPUs, so add Hygon Dhyana support to the function preload_segment.
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Pu Wen [Thu, 4 Apr 2019 13:46:33 +0000 (21:46 +0800)]
x86/spec_ctrl: Add Hygon Dhyana to the respective mitigation machinery
The Hygon Dhyana CPU has the same speculative execution as AMD family
17h, so share AMD Retpoline and PTI mitigation code with Hygon Dhyana.
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Pu Wen [Thu, 4 Apr 2019 13:46:23 +0000 (21:46 +0800)]
x86/cpu/mce: Add Hygon Dhyana support to the MCA infrastructure
The machine check architecture for Hygon Dhyana CPU is similar to the
AMD family 17h one. Add vendor checking for Hygon Dhyana to share the
code path of AMD family 17h.
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Pu Wen [Thu, 4 Apr 2019 13:46:11 +0000 (21:46 +0800)]
x86/cpu/vpmu: Add Hygon Dhyana and AMD Zen support for vPMU
As Hygon Dhyana CPU share similar PMU architecture with AMD family
17h one, so add Hygon Dhyana support in vpmu_arch_initialise() and
vpmu_init() by sharing AMD code path.
Split the common part in amd_vpmu_init() to a static function
_vpmu_init(), making AMD and Hygon to call the shared function to
initialize vPMU.
As current vPMU still not support AMD Zen(family 17h), add 0x17 support
to amd_vpmu_init().
Also create a function hygon_vpmu_init() for Hygon vPMU initialization.
Both of AMD 17h and Hygon 18h have the same performance event select
and counter MSRs as AMD 15h has, so reuse the 15h definitions for them.
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Jan Beulich <jbeulich@suse.com>
Pu Wen [Thu, 4 Apr 2019 13:45:42 +0000 (21:45 +0800)]
x86/cpu: Fix common cpuid faulting probing for AMD and Hygon
There is no MSR_INTEL_PLATFORM_INFO for AMD and Hygon families. Read
this MSR will stop the Xen initialization process in some Hygon
systems or produce GPF(0). So directly return false in the function
probe_cpuid_faulting() if !cpu_has_hypervisor.
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Jan Beulich <jbeulich@suse.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Pu Wen [Thu, 4 Apr 2019 13:45:03 +0000 (21:45 +0800)]
x86/cpu: Create Hygon Dhyana architecture support file
Add x86 architecture support for a new processor: Hygon Dhyana Family
18h. To make Hygon initialization flow more clear, carve out code from
amd.c into a separate file hygon.c, and remove unnecessary code for
Hygon Dhyana.
To identify Hygon Dhyana CPU, add a new vendor type X86_VENDOR_HYGON
and vendor ID "HygonGenuine" for system recognition, and fit the new
x86 vendor lookup mechanism.
Hygon can fully use the function early_init_amd(), so make this common
function non-static and direct call it from Hygon code.
Add a separate hygon_get_topology(), which calculate phys_proc_id from
AcpiId[6](see reference [1]).
Signed-off-by: Pu Wen <puwen@hygon.cn> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Rebase over 0cd074144cb "x86/cpu: Renumber X86_VENDOR_* to form a bitmap" and 64933920c9b "x86/cpu: Drop cpu_devs[] and $VENDOR_init_cpu() hooks"] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 3 Jan 2019 18:03:25 +0000 (18:03 +0000)]
tools/fuzz: Add a cpu-policy fuzzing harness
There is now enough complexity that a fuzzing harness is a good idea, and
enough supporting logic to implement one which AFL seems happy with.
Take the existing recalculate_synth() helper and export it as
x86_cpuid_policy_recalc_synth(), as it is needed by the fuzzing harness.
While editing the MAINTAINERS file, insert a related entry which was
accidentally missed from c/s 919ddc3c0 "tools/cpu-policy: Add unit tests", and
sort the lines.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 21 May 2019 16:56:43 +0000 (17:56 +0100)]
libx86: Helper for clearing out-of-range CPUID leaves
When merging a levelled policy, stale out-of-range leaves may remain.
Introduce a helper to clear them, and test a number of the subtle corner
cases.
The logic based on cpuid_policy_xstates() is liable to need changing when XCR0
has bit 63 defined. Leave BUILD_BUG_ON()'s behind with comments in all all
impacted areas, which includes in x86_cpuid_policy_fill_native().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 6 Jun 2019 14:05:27 +0000 (16:05 +0200)]
x86/IRQ: ACKTYPE_NONE cannot make it into irq_guest_eoi_timer_fn()
action->ack_type is set once before the timer even gets initialized, and
is never changed later. The timer gets activated only for EOI and UNMASK
types. Hence there's no need to have a respective if() in there. Replace
it by an ASSERT().
Jan Beulich [Thu, 6 Jun 2019 14:04:53 +0000 (16:04 +0200)]
x86/IRQ: bail early from irq_guest_eoi_timer_fn() when nothing is in flight
There's no point entering the loop in the function in this case. Instead
there still being something in flight _after_ the loop would be an
actual problem: No timer would be running anymore for issuing the EOI
eventually, and hence this IRQ (and possibly lower priority ones) would
be blocked, perhaps indefinitely.
Issue a warning instead and prefer breaking some (presumably
misbehaving) guest over stalling perhaps the entire system.
Jan Beulich [Thu, 6 Jun 2019 14:04:09 +0000 (16:04 +0200)]
x86/IRQ: don't keep EOI timer running without need
The timer needs to remain active only until all pending IRQ instances
have seen EOIs from their respective domains. Stop it when the in-flight
count has reached zero in desc_guest_eoi(). Note that this is race free
(with __do_IRQ_guest()), as the IRQ descriptor lock is being held at
that point.
Also pull up stopping of the timer in __do_IRQ_guest() itself: Instead
of stopping it immediately before re-setting, stop it as soon as we've
made it past any early returns from the function (and hence we're sure
it'll get set again).
Finally bail from the actual timer handler in case we find the timer
already active again by the time we've managed to acquire the IRQ
descriptor lock. Without this we may forcibly EOI an IRQ immediately
after it got sent to a guest. For this, timer_is_active() gets split out
of active_timer(), deliberately moving just one of the two ASSERT()s (to
allow the function to be used also on a never initialized timer).
Jan Beulich [Thu, 6 Jun 2019 14:03:10 +0000 (16:03 +0200)]
memory: don't depend on guest_handle_subrange_okay() implementation details
guest_handle_subrange_okay() takes inclusive first and last parameters,
i.e. checks that [first, last] is valid. Many callers, however, actually
need to see whether [first, limit) is valid (i.e., limit is non-
inclusive), and to do this they subtract 1 from the size. This is
normally correct, except in cases where first == limit, in which case
guest_handle_subrange_okay() will be passed a second parameter less than
its first.
As it happens, due to the way the math is implemented in x86's
guest_handle_subrange_okay(), the return value turns out to be correct;
but we shouldn\92t rely on this behavior.
Make sure all callers handle first == limit explicitly before calling
guest_handle_subrange_okay().
Note that the other uses (increase-reservation, populate-physmap, and
decrease-reservation) are already fine due to a suitable check in
do_memory_op().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Thu, 6 Jun 2019 09:16:57 +0000 (11:16 +0200)]
adjust system domain creation (and call it earlier on x86)
Split out this mostly arch-independent code into a common-code helper
function. (This does away with Arm's arch_init_memory() altogether.)
On x86 this needs to happen before acpi_boot_init(): Commit 9fa94e1058
("x86/ACPI: also parse AMD IOMMU tables early") only appeared to work
fine - it's really broken, and doesn't crash (on non-EFI AMD systems)
only because of there being a mapping of linear address 0 during early
boot. On EFI there is:
Andrew Cooper [Fri, 31 May 2019 19:54:28 +0000 (12:54 -0700)]
xen/vm-event: Misc fixups
* Drop redundant brackes, and inline qualifiers.
* Insert newlines and spaces where appropriate.
* Drop redundant NDEBUG - gdprint() is already conditional. Fix the
logging level, as gdprintk() already prefixes the guest marker.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Andrew Cooper [Fri, 31 May 2019 19:29:27 +0000 (12:29 -0700)]
xen/vm-event: Fix interactions with the vcpu list
vm_event_resume() should use domain_vcpu(), rather than opencoding it
without its Spectre v1 safety.
vm_event_wake_blocked() can't ever be invoked in a case where d->vcpu is
NULL, so drop the outer if() and reindent, fixing up style issues.
The comment, which is left alone, is false. This algorithm still has
starvation issues when there is an asymetric rate of generated events.
However, the existing logic is sufficiently complicated and fragile that
I don't think I've followed it fully, and because we're trying to
obsolete this interface, the safest course of action is to leave it
alone, rather than to end up making things subtly different.
Therefore, no practical change that callers would notice.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
The use of (*ved)-> leads to poor code generation, as the compiler can't
assume the pointer hasn't changed, and results in hard-to-follow code.
For both vm_event_{en,dis}able(), rename the ved parameter to p_ved, and
work primarily with a local ved pointer.
This has a key advantage in vm_event_enable(), in that the partially
constructed vm_event_domain only becomes globally visible once it is
fully constructed. As a consequence, the spinlock doesn't need holding.
Furthermore, rearrange the order of operations to be more sensible.
Check for repeated enables and an bad HVM_PARAM before allocating
memory, and gather the trivial setup into one place, dropping the
redundant zeroing.
No practical change that callers will notice.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Andrew Cooper [Fri, 31 May 2019 20:57:03 +0000 (13:57 -0700)]
xen/vm-event: Expand vm_event_* spinlock macros and rename the lock
These serve no purpose, but to add to the congnitive load of following
the code. Remove the level of indirection.
Furthermore, the lock protects all data in vm_event_domain, making
ring_lock a poor choice of name.
For vm_event_get_response() and vm_event_grab_slot(), fold the exit
paths to have a single unlock, as the compiler can't make this
optimisation itself.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>
Andrew Cooper [Fri, 31 May 2019 19:35:55 +0000 (12:35 -0700)]
xen/vm-event: Drop unused u_domctl parameter from vm_event_domctl()
This parameter isn't used at all. Futhermore, elide the copyback in
failing cases, as it is only successful paths which generate data which
needs sending back to the caller.
Finally, drop a redundant d == NULL check, as that logic is all common
at the begining of do_domctl().
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Jan Beulich <jbeulich@suse.com>
x86: remove alternative_callN usage of ALTERNATIVE asm macro
There is a bug in llvm that needs to be fixed before switching to use
the alternative assembly macros in inline assembly call sites.
Therefore alternative_callN using inline assembly to generate the
alternative patch sites should be using the ALTERNATIVE C preprocessor
macro rather than the ALTERNATIVE assembly macro. Using the assembly
macro in an inline assembly instance triggers the following bug on
llvm based toolchains:
Jan Beulich [Mon, 3 Jun 2019 15:21:05 +0000 (17:21 +0200)]
x86: further speed-up to hweight{32,64}()
According to Linux commit 0136611c62 ("optimize hweight64 for x86_64")
this is a further improvement over the variant using only bitwise
operations. It's also a slight further code size reduction.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 3 Jun 2019 15:20:13 +0000 (17:20 +0200)]
bitops: speed up hweight<N>()
Algorithmically this gets us in line with current Linux, where the same
change did happen about 13 years ago. See in particular Linux commits f9b4192923 ("bitops: hweight() speedup") and 0136611c62 ("optimize
hweight64 for x86_64").
Kconfig changes for actually setting HAVE_FAST_MULTIPLY will follow.
Take the opportunity and change generic_hweight64()'s return type to
unsigned int.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 3 Jun 2019 15:15:06 +0000 (17:15 +0200)]
x86emul/fuzz: add a state sanity checking function
This is to accompany sanitize_input(). Just like for initial state we
want to have state between two emulated insns sane, at least as far as
assumptions in the main emulator go. Do minimal checking after segment
register, CR, and MSR writes, and roll back to the old value in case of
failure (raising #GP(0) at the same time).
In the particular case observed, a CR0 write clearing CR0.PE was
followed by a VEX-encoded insn, which the decoder accepts based on
guest address size, restricting things just outside of the 64-bit case
(real and virtual modes don't allow VEX-encoded insns). Subsequently
_get_fpu() would then assert that CR0.PE must be set (and EFLAGS.VM
clear) when trying to invoke YMM, ZMM, or OPMASK state.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Tue, 23 Oct 2018 10:18:07 +0000 (11:18 +0100)]
x86/hvm: Make the altp2m locking in hvm_hap_nested_page_fault() easier to follow
Drop the ap2m_active boolean, and consistently use the unlocking form:
if ( p2m != hostp2m )
__put_gfn(p2m, gfn);
__put_gfn(hostp2m, gfn);
which makes it clear that we always unlock the altp2m's gfn if it is in use,
and always unlock the hostp2m's gfn. This also drops the ternary expression
in the logdirty case.
Extend the logdirty comment to identify where the locking violation is liable
to occur.
No (intended) overall change in behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Petre Pircalabu [Thu, 30 May 2019 14:18:17 +0000 (17:18 +0300)]
vm_event: Make ‘local’ functions ‘static’
vm_event_get_response, vm_event_resume, and vm_event_mark_and_pause are
used only in xen/common/vm_event.c.
Signed-off-by: Petre Pircalabu <ppircalabu@bitdefender.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Andrew Cooper [Fri, 17 May 2019 18:35:08 +0000 (19:35 +0100)]
x86/mpparse: Don't print "limit reached" for every subsequent processor
When you boot Xen with the default 256 NR_CPUS, on a box with rather more
processors, the resulting spew is unnecesserily verbose. Instead, print the
message once, e.g:
Andrew Cooper [Fri, 17 May 2019 18:30:47 +0000 (19:30 +0100)]
xen/lib: Introduce printk_once() and replace some opencoded examples
Reflow the ZynqMP message for grepability, and fix the omission of a newline.
There is a race condition where multiple cpus could race to set once_ boolean.
However, the use of this construct is mainly useful for boot time code, and
the only consequence of the race is a repeated print message.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Fri, 17 May 2019 18:23:55 +0000 (19:23 +0100)]
x86/spec-ctrl: Knights Landing/Mill are retpoline-safe
They are both Airmont-based and should have been included in c/s 17f74242ccf
"x86/spec-ctrl: Extend repoline safey calcuations for eIBRS and Atom parts".
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Fri, 31 May 2019 09:40:52 +0000 (11:40 +0200)]
x86/vhpet: avoid 'small' time diff test on resume
It appears that even 64-bit versions of Windows 10, when not using syth-
etic timers, will use 32-bit HPET non-periodic timers. There is a test
in hpet_set_timer(), specific to 32-bit timers, that tries to disambiguate
between a comparator value that is in the past and one that is sufficiently
far in the future that it wraps. This is done by assuming that the delta
between the main counter and comparator will be 'small' [1], if the
comparator value is in the past. Unfortunately, more often than not, this
is not the case if the timer is being re-started after a migrate and so
the timer is set to fire far in the future (in excess of a minute in
several observed cases) rather then set to fire immediately. This has a
rather odd symptom where the guest console is alive enough to be able to
deal with mouse pointer re-rendering, but any keyboard activity or mouse
clicks yield no response.
This patch simply adds an extra check of 'creation_finished' into
hpet_set_timer() so that the 'small' time test is omitted when the function
is called to restart timers after migration, and thus any negative delta
causes a timer to fire immediately.
[1] The number of ticks that equate to 0.9765625 milliseconds
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 31 May 2019 09:39:49 +0000 (11:39 +0200)]
VT-d: change bogus return value of intel_iommu_lookup_page()
The function passes 0 as "alloc" argument to addr_to_dma_page_maddr(),
so -ENOMEM simply makes no sense (and its use was probably simply a
copy-and-paste effect originating at intel_iommu_map_page()).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
There are no reason to consider the HW CPU ID will be 0 when the
processor is part of a uniprocessor system. At best, this will result to
conflicting output as the rest of Xen use the value directly read from
MPIDR_EL1.
So remove the zeroing and logic to check if the CPU is part of a
uniprocessor system.
Andrew Cooper [Thu, 28 Mar 2019 14:23:13 +0000 (14:23 +0000)]
x86: init_hypercall_page() cleanup
The various pieces of the hypercall page infrastructure have grown
organically over time and ended up in a bit of a mess.
* Rename all functions to be of the form *_init_hypercall_page(). This
makes them somewhat shorter, and means they can actually be grepped
for in one go.
* Move init_hypercall_page() to domain.c. The 64-bit traps.c isn't a
terribly appropriate place for it to live.
* Drop an obsolete comment from hvm_init_hypercall_page() and drop the
domain parameter from hvm_funcs.init_hypercall_page() as it isn't
necessary.
* Rearrange the logic in the each function to avoid needing extra local
variables, and to write the page in one single pass.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Wed, 29 May 2019 04:19:11 +0000 (05:19 +0100)]
x86/altp2m: Fix style errors introduced with c/s 9abcac7ff
Drop introduced trailing whitespace, excessively long lines, mal-indention,
superfluous use of PRI macros for int-or-smaller types, and incorrect PRI
macros for gfns and mfns.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Tamas K Lengyel [Tue, 28 May 2019 13:10:36 +0000 (14:10 +0100)]
x86/altp2m: cleanup p2m_altp2m_lazy_copy
The p2m_altp2m_lazy_copy is responsible for lazily populating an
altp2m view when the guest traps out due to no EPT entry being present
in the active view. Currently, in addition to taking a number of
unused argements, the whole calling convention has a number of
redundant p2m lookups: the function reads the hostp2m, even though the
caller has just read the same hostp2m entry; and then the caller
re-reads the altp2m entry that the function has just read (and possibly set).
Rework this function to make it a bit more rational. Specifically:
- Pass the current hostp2m entry values we have just read for it to
use to populate the altp2m entry if it finds the entry empty.
- If the altp2m entry is not empty, pass out the values we've read so
the caller doesn't need to re-walk the tables
- Either way, return with the gfn 'locked', to make clean-up handling
more consistent.
Rename the function to better reflect this functionality.
While we're here, change bool_t to bool, and return true/false rather
than 1/0.
It's a bit grating to do both the p2m_lock() and the get_gfn(),
knowing that they boil down to the same thing at the moment; but we
have to maintain the fiction until such time as we decide to get rid
of it entirely.
Signed-off-by: Tamas K Lengyel <tamas@tklengyel.com> Signed-off-by: George Dunlap <george.dunlap@citrix.com> Tested-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Mon, 27 May 2019 10:26:20 +0000 (12:26 +0200)]
vm_event: fix rc check for uninitialized ring
vm_event_claim_slot() returns -EOPNOTSUPP for an uninitialized ring
since commit 15e4dd5e866b43bbc ("common/vm_event: Initialize vm_event
lists on domain creation"), but the callers test for -ENOSYS.
Jan Beulich [Mon, 27 May 2019 10:25:44 +0000 (12:25 +0200)]
vsprintf: constify "end" parameters
Except in the top level function we don't mean to ever write through
"end". The variable is used solely for pointer comparison purposes
there. Add const everywhere.
Also make function heading wrapping style uniform again for all of the
involved functions.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 27 May 2019 10:24:37 +0000 (12:24 +0200)]
x86/CPUID: adjust SSEn dependencies
Along the lines of b9f6395590 ("x86/cpuid: adjust dependencies of
post-SSE ISA extensions") further convert SSEn dependencies to be more
chain like, with each successor addition depending on its immediate
predecessor. This is more in line with how hardware has involved, and
how other projects like gcc and binutils connect things together.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 24 May 2019 13:14:17 +0000 (14:14 +0100)]
tests/cpu-policy: Skip building on older versions of GCC
GCC 4.4 (as included in CentOS 6) is too old to handle designated initialisers
in anonymous unions. As this is just a developer tool, skip the test in this
case, rather than sacraficing the legibility/expresibility of the test cases.
This fixes the Gitlab CI tests.
While adding this logic to cpu-polcy, adjust the equivelent logic from
x86_emulator on which this was based. Printing:
Test harness not built, use newer compiler than "gcc"
isn't helpful for anyone unexpectedly encountering the error.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Alistair Francis [Fri, 24 May 2019 08:30:39 +0000 (10:30 +0200)]
gitignore: ignore xen.lds and asm-offsets.s for all archs
Instead of ignoring xen.lds and asm-offsets.s for every specific arch,
let's instead just use gitignore's wildcard feature to ignore them for
all archs.
Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Igor Druzhinin [Fri, 24 May 2019 08:30:21 +0000 (10:30 +0200)]
libacpi: report PCI slots as enabled only for hotpluggable devices
DSDT for qemu-xen lacks _STA method of PCI slot object. If _STA method
doesn't exist then the slot is assumed to be always present and active
which in conjunction with _EJ0 method makes every device ejectable for
an OS even if it's not the case.
qemu-kvm is able to dynamically add _EJ0 method only to those slots
that either have hotpluggable devices or free for PCI passthrough.
As Xen lacks this capability we cannot use their way.
qemu-xen-traditional DSDT has _STA method which only reports that
the slot is present if there is a PCI devices hotplugged there.
This is done through querying of its PCI hotplug controller.
qemu-xen has similar capability that reports if device is "hotpluggable
or absent" which we can use to achieve the same result.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Norbert Manthey [Fri, 24 May 2019 08:28:26 +0000 (10:28 +0200)]
common/grant_table: harden helpers
Guests can issue grant table operations and provide guest controlled
data to them. This data is used for memory loads in helper functions
and macros. To avoid speculative out-of-bound accesses, we use the
array_index_nospec macro where applicable, or the block_speculation
macro.
This is part of the speculative hardening effort.
Signed-off-by: Norbert Manthey <nmanthey@amazon.de> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 24 May 2019 08:27:24 +0000 (10:27 +0200)]
x86emul: support AVX512{F,ER} reciprocal insns
Also include the only other AVX512ER insn pair, VEXP2P{D,S}.
Note that despite the replacement of the SHA insns' table slots there's
no need to special case their decoding: Their insn-specific code already
sets op_bytes (as was required due to simd_other), and TwoOp is of no
relevance for legacy encoded SIMD insns.
The raising of #UD when EVEX.L'L is 3 for AVX512ER scalar insns is done
to be on the safe side. The SDM does not clarify behavior there, and
it's even more ambiguous here (without AVX512VL in the picture).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 24 May 2019 08:22:55 +0000 (10:22 +0200)]
x86emul: support AVX512F legacy-equivalent scalar int/FP conversion insns
VCVT{,T}S{S,D}2SI use EVEX.W for their destination (register) rather
than their (possibly memory) source operand size and hence need a
"manual" override of disp8scale.
While the SDM claims that EVEX.L'L needs to be zero for the 32-bit forms
of VCVT{,U}SI2SD (exception type E10NF), observations on my test system
do not confirm this (and I've got informal confirmation that this is a
doc mistake). Nevertheless, to be on the safe side, force evex.lr to be
zero in this case though when constructing the stub.
Slightly adjust the scalar to_int() in the test harness, to increase the
chances of the operand ending up in memory.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 24 May 2019 08:19:59 +0000 (10:19 +0200)]
x86/IO-APIC: fix build with gcc9
There are a number of pointless __packed attributes which cause gcc 9 to
legitimately warn:
utils.c: In function 'vtd_dump_iommu_info':
utils.c:287:33: error: converting a packed 'struct IO_APIC_route_entry' pointer (alignment 1) to a 'struct IO_APIC_route_remap_entry' pointer (alignment 8) may result in an unaligned pointer value [-Werror=address-of-packed-member]
287 | remap = (struct IO_APIC_route_remap_entry *) &rte;
| ^~~~~~~~~~~~~~~~~~~~~~~~~
intremap.c: In function 'ioapic_rte_to_remap_entry':
intremap.c:343:25: error: converting a packed 'struct IO_APIC_route_entry' pointer (alignment 1) to a 'struct IO_APIC_route_remap_entry' pointer (alignment 8) may result in an unaligned pointer value [-Werror=address-of-packed-member]
343 | remap_rte = (struct IO_APIC_route_remap_entry *) old_rte;
| ^~~~~~~~~~~~~~~~~~~~~~~~~
Simply drop these attributes. Take the liberty and also re-format the
structure definitions at the same time.
Reported-by: Charles Arnold <carnold@suse.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 20 May 2019 10:14:05 +0000 (10:14 +0000)]
x86/boot: Link opt_dom0_verbose to CONFIG_VERBOSE_DEBUG
We currently have an asymmetric setup where CONFIG_VERBOSE_DEBUG controls
extra diagnostics for a PV dom0, and opt_dom0_verbose controls extra
diagnostics for a PVH dom0.
Default opt_dom0_verbose to CONFIG_VERBOSE_DEBUG and use opt_dom0_verbose
consistently.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 14 Sep 2018 17:50:01 +0000 (18:50 +0100)]
x86/boot: Wire up dom0=shadow for PV dom0
This would have been very handy when debugging some pv-l1tf issues. As there
is no cost to supporting it, wire it up.
Due to the way dom0 is constructed, switching into shadow mode must be done
after the pagetables are written, and because of partially being in dom0
context, shadow_enable() doesn't like the state it finds.
Reuse the pv_l1tf tasklet for convenience, which will switch dom0 into shadow
mode just before it starts executing.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 20 May 2019 10:14:01 +0000 (10:14 +0000)]
x86/pv: Fix error handling in dom0_construct_pv()
One path in dom0_construct_pv() returns -1 unlike all other error paths.
Switch it to returning -EINVAL.
This was last modified by c/s c84481fb XSA-55, but the bug predates that
series. However, this patch did (for no obvious reason) introduce a
bifurcated tail to the function with two subtly different elf_check_broken()
clauses.
As the elf_check_broken() is just a warning and doesn't influence the further
boot, fold the exit paths together and use a single clause.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 21 May 2019 17:19:33 +0000 (18:19 +0100)]
libx86: Elide more empty CPUID leaves when serialising a policy
x86_cpuid_copy_to_buffer() currently serialises the full content of the
various subleaf unions. While leaves 4, 0xb and 0xd don't have a concrete
max_subleaf field, they do have well defined upper bounds.
Diffing the results of `xen-cpuid -p` shows the resulting saving:
Commit 03957f58db "xen/const: Extend the existing macro BIT to take a
suffix in parameter" didn't convert all the callers of the macro BIT.
This will result to a build breakage when enabling Livepatch on arm64.
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Julien Grall <julien.grall@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>