tools/arm: optee: create optee firmware node in DT if tee=optee
If TEE support is enabled with "tee=optee" option in xl.cfg,
then we need to inform guest about available TEE, by creating
corresponding node in the guest's device tree.
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com> Reviewed-by: Julien Grall <julien.grall@arm.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 18 Sep 2019 13:20:00 +0000 (15:20 +0200)]
x86/CPUID: drop INVPCID dependency on PCID
PCID validly depends on LM, as it can be enabled in Long Mode only.
INVPCID, otoh, can be used not only without PCID enabled, but also
outside of Long Mode altogether. In both cases its functionality is
simply restricted to PCID 0, which is sort of expected as no other PCID
can be activated there.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 18 Sep 2019 13:14:49 +0000 (15:14 +0200)]
x86: limit the amount of TLB flushing in switch_cr3_cr4()
We really need to flush the TLB just once, if we do so with or after the
CR3 write. The only case where two flushes are unavoidable is when we
mean to turn off CR4.PGE (perhaps just temporarily; see the code
comment).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 18 Sep 2019 13:13:21 +0000 (15:13 +0200)]
x86emul: treat Hygon guests like AMD ones
For some reason the Hygon enabling series left out the insn emulator.
Make appropriate adjustments wherever we've been special casing AMD.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Wei Liu <wl@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Release-acked-by: Juergen Gross <jgross@suse.com>
Jan Beulich [Wed, 18 Sep 2019 13:12:33 +0000 (15:12 +0200)]
core-parking: interact with runtime SMT-disabling
When disabling SMT at runtime, secondary threads should no longer be
candidates for bringing back up in response to _PUR ACPI events. Purge
them from the tracking array.
Doing so involves adding locking to guard accounting data in the core
parking code. While adding the declaration for the lock, take the
liberty to drop two unnecessary forward function declarations.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
make subdirs-install
make[2]: Entering directory `/home/travis/build/andyhhp/xen/tools'
make[3]: Entering directory `/home/travis/build/andyhhp/xen/tools'
make -C libs install
make[4]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs'
make[5]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs'
make -C toolcore install
make[6]: Entering directory `/home/travis/build/andyhhp/xen/tools/libs/toolcore'
make libs
make[7]: Entering directory`/home/travis/build/andyhhp/xen/tools/libs/toolcore'
for i in include/xentoolcore.h include/xentoolcore_internal.h; do \
gcc -x c -ansi -Wall -Werror -I<snip>/xen/tools/libs/toolcore/../../../tools/include \
-S -o /dev/null $i || exit 1; \
echo $i; \
done >headers.chk.new
include/xentoolcore_internal.h:30:31: fatal error: _xentoolcore_list.h: No such file or directory
#include "_xentoolcore_list.h"
^
compilation terminated.
make[7]: *** [headers.chk] Error 1
The problem is that xentoolcore_internal.h includes _xentoolcore_list.h which
hasn't been generated yet.
The toolcore headers.chk rule (unlike the other libraries) had an additional
dependency against $(AUTOINCS), which forced the headers to be generated
first. Replicate this in the common libs.mk
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen/arm: Zero BSS after the MMU and D-cache is turned on
At the moment BSS is zeroed before the MMU and D-Cache is turned on.
In other words, the cache will be bypassed when zeroing the BSS section.
On Arm64, per the Image protocol [1], the state of the cache for BSS region
is not known because it is not part of the "loaded kernel image".
On Arm32, the boot protocol [2] does not mention anything about the
state of the cache. Therefore, it should be assumed that it is not known
for BSS region.
This means that the cache will need to be invalidated twice for the BSS
region:
1) Before zeroing to remove any dirty cache line. Otherwise they may
get evicted while zeroing and therefore overriding the value.
2) After zeroing to remove any cache line that may have been
speculated. Otherwise when turning on MMU and D-Cache, the CPU may
see old values.
At the moment, the only reason to have BSS zeroed early is because the
boot page tables are part of it. To avoid the two cache invalidations,
it would be better if the boot page tables are part of the "loaded
kernel image" and therefore be zeroed when loading the image into
memory. A good candidate is the section .data.page_aligned.
A new macro DEFINE_BOOT_PAGE_TABLE is introduced to create and mark
page-tables used before BSS is zeroed. This includes all boot_* but also
xen_fixmap as zero_bss() will print a message when earlyprintk is
enabled.
Boot CPU and secondary CPUs will use different entry point to C code. At
the moment, the decision on which entry to use is taken within launch().
In order to avoid using conditional instruction and make the call
clearer, launch() is reworked to take in parameters the entry point and its
arguments.
Lastly, document the behavior and the main registers usage within the
function.
Andrew Cooper [Fri, 13 Sep 2019 16:17:21 +0000 (17:17 +0100)]
drivers/acpi: Drop "ERST table was not found" message
ERST isn't a mandatory table, and also isn't very common to find. The message
is unnecessary noise during boot. Furthermore, it is redundant with the list
of found ACPI tables printed just ahead.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
This patch defines a new bit reported in the hw_cap field of struct
xen_sysctl_physinfo to indicate whether the platform supports sharing of
HAP page tables (i.e. the P2M) with the IOMMU. This informs the toolstack
whether the domain needs extra memory to store discrete IOMMU page tables
or not.
NOTE: This patch makes sure iommu_hap_pt_shared is clear if HAP is not
supported or the IOMMU is disabled, and defines it to false if
!CONFIG_HVM.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Wei Liu <wl@xen.org> Acked-by: Julien Grall <julien.grall@arm.com>
Paul Durrant [Tue, 17 Sep 2019 14:11:48 +0000 (16:11 +0200)]
use is_iommu_enabled() where appropriate...
...rather than testing the global iommu_enabled flag and ops pointer.
Now that there is a per-domain flag indicating whether the domain is
permitted to use the IOMMU (which determines whether the ops pointer will
be set), many tests of the global iommu_enabled flag and ops pointer can
be translated into tests of the per-domain flag. Some of the other tests of
purely the global iommu_enabled flag can also be translated into tests of
the per-domain flag.
NOTE: The comment in iommu_share_p2m_table() is also fixed; need_iommu()
disappeared some time ago. Also, whilst the style of the 'if' in
flask_iommu_resource_use_perm() is fixed, I have not translated any
instances of u32 into uint32_t to keep consistency. IMO such a
translation would be better done globally for the source module in
a separate patch.
The change to the definition of iommu_call() is to keep the PV shim
build happy. Without this change it will fail to compile with errors
of the form:
Paul Durrant [Tue, 17 Sep 2019 14:10:38 +0000 (16:10 +0200)]
domain: introduce XEN_DOMCTL_CDF_iommu flag
This patch introduces a common domain creation flag to determine whether
the domain is permitted to make use of the IOMMU. Currently the flag is
always set for both dom0 and any domU created by libxl if the IOMMU is
globally enabled (i.e. iommu_enabled == 1). sanitise_domain_config() is
modified to reject the flag if !iommu_enabled.
A new helper function, is_iommu_enabled(), is added to test the flag and
iommu_domain_init() will return immediately if !is_iommu_enabled(). This is
slightly different to the previous behaviour based on !iommu_enabled where
the call to arch_iommu_domain_init() was made regardless, however it appears
that this call was only necessary to initialize the dt_devices list for ARM
such that iommu_release_dt_devices() can be called unconditionally by
domain_relinquish_resources(). Adding a simple check of is_iommu_enabled()
into iommu_release_dt_devices() keeps this unconditional call working.
No functional change should be observed with this patch applied.
Subsequent patches will allow the toolstack to control whether use of the
IOMMU is enabled for a domain.
NOTE: The introduction of the is_iommu_enabled() helper function might
seem excessive but its use is expected to increase with subsequent
patches. Also, having iommu_domain_init() bail before calling
arch_iommu_domain_init() is not strictly necessary, but I think the
consequent addition of the call to is_iommu_enabled() in
iommu_release_dt_devices() makes the code clearer.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: "Roger Pau Monné" <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com>
sched: populate cpupool0 only after all cpus are up
Simplify cpupool initialization by populating cpupool0 with cpus only
after all cpus are up. This avoids having to call the cpu notifier
directly for cpu 0.
With that in place there is no need to create cpupool0 earlier, so
do that just before assigning the cpus. Initialize free cpus with all
online cpus at that time in order to be able to add the cpu notifier
late, too.
Print the lock profile data when the system crashes and add some more
information for each lock data (lock address, cpu holding the lock).
While at it use the PRI_stime format specifier for printing time data.
This is especially beneficial for watchdog triggered crashes in case
of deadlocks.
In order to have the cpu holding the lock available let the
lock profile config option select DEBUG_LOCKS.
As printing the lock profile data will make use of locking, too, we
need to disable spinlock debugging before calling
spinlock_profile_printall() from panic().
While at it remove a superfluous #ifdef CONFIG_LOCK_PROFILE and rename
CONFIG_LOCK_PROFILE to CONFIG_DEBUG_LOCK_PROFILE.
Also move the .lockprofile.data section to init area in linker scripts
as the data is no longer needed after boot.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
spinlocks: in debug builds store cpu holding the lock
Add the cpu currently holding the lock to struct lock_debug. This makes
analysis of locking errors easier and it can be tested whether the
correct cpu is releasing a lock again.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 17 Sep 2019 14:06:15 +0000 (16:06 +0200)]
x86/PCI: read MSI-X table entry count early
Rather than doing this every time we set up interrupts for a device
anew (and then in two distinct places) fill this invariant field
right after allocating struct arch_msix.
While at it also obtain the MSI-X capability structure position just
once, in msix_capability_init(), rather than in each caller.
Furthermore take the opportunity and eliminate the multi_msix_capable()
alias of msix_table_size().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 17 Sep 2019 14:05:01 +0000 (16:05 +0200)]
AMD/IOMMU: introduce a "valid" flag for IVRS mappings
For us to no longer blindly allocate interrupt remapping tables for
everything the ACPI tables name, we can't use struct ivrs_mappings'
intremap_table field anymore to also have the meaning of "this entry
is valid". Add a separate boolean field instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 17 Sep 2019 14:03:44 +0000 (16:03 +0200)]
AMD/IOMMU: don't free shared IRT multiple times
Calling amd_iommu_free_intremap_table() for every IVRS entry is correct
only in per-device-IRT mode. Use a NULL 2nd argument to indicate that
the shared table should be freed, and call the function exactly once in
shared mode.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
microcode: pass a patch pointer to apply_microcode()
apply_microcode()'s always loading the cached ucode patch forces
a patch to be stored before being loaded. Make apply_microcode()
accept a patch pointer to remove the limitation so that a patch
can be stored after a successful loading.
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
microcode/amd: call svm_host_osvw_init() in common code
Introduce a vendor hook, .end_update_percpu, for svm_host_osvw_init().
The hook function is called on each cpu after loading an update.
It is a preparation for spliting out apply_microcode() from
cpu_request_microcode().
Note that svm_host_osvm_init() should be called regardless of the
result of loading an update.
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Some callbacks in microcode_ops or related functions take a cpu
id parameter. But at current call sites, the cpu id parameter is
always equal to current cpu id. Some of them even use an assertion
to guarantee this. Remove this redundent 'cpu' parameter.
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Remove the per-cpu cache field in struct ucode_cpu_info since it has
been replaced by a global cache. It would leads to only one field
remaining in ucode_cpu_info. Then, this struct is removed and the
remaining field (cpu signature) is stored in per-cpu area.
The cpu status notifier is also removed. It was used to free the "mc"
field to avoid memory leak.
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Previously, a per-cpu ucode cache is maintained. Then each CPU had one
per-cpu update cache and there might be multiple versions of microcode.
Thus microcode_resume_cpu tried best to update microcode by loading
every update cache until a successful load.
But now the cache struct is simplified a lot and only a single ucode is
cached. a single invocation of ->apply_microcode() would load the cache
and make microcode updated.
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
microcode: introduce a global cache of ucode patch
to replace the current per-cpu cache 'uci->mc'.
With the assumption that all CPUs in the system have the same signature
(family, model, stepping and 'pf'), one microcode update matches with
one cpu should match with others. Having differing microcode revisions
on cpus would cause system unstable and should be avoided. Hence, caching
one microcode update is good enough for all cases.
Introduce a global variable, microcode_cache, to store the newest
matching microcode update. Whenever we get a new valid microcode update,
its revision id is compared against that of the microcode update to
determine whether the "microcode_cache" needs to be replaced. And
this global cache is loaded to cpu in apply_microcode().
All operations on the cache is protected by 'microcode_mutex'.
Note that I deliberately avoid touching the old per-cpu cache ('uci->mc')
as I am going to remove it completely in the following patches. We copy
everything to create the new cache blob to avoid reusing some buffers
previously allocated for the old per-cpu cache. It is not so efficient,
but it is already corrected by a patch later in this series.
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
microcode/amd: distinguish old and mismatched ucode in microcode_fits()
Sometimes, an ucode with a level lower than or equal to current CPU's
patch level is useful. For example, to work around a broken bios which
only loads ucode for BSP, when BSP parses an ucode blob during bootup,
it is better to save an ucode with lower or equal level for APs
No functional change is made in this patch. But following patch would
handle "old ucode" and "mismatched ucode" separately.
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
to a more generic function. So that it can be used alone to check
an update against the CPU signature and current update revision.
Note that enum microcode_match_result will be used in common code
(aka microcode.c), it has been placed in the common header. And
constifying the parameter of microcode_sanity_check() such that it
can be called by microcode_update_match().
Signed-off-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Fri, 13 Sep 2019 10:21:47 +0000 (12:21 +0200)]
public/xen.h: update the comment explaining 'Wallclock time'
Since commit 0629adfd80e "Actually set a HVM domain's time offset when it
sets the RTC", the comment in the public header has been misleading, since
it claims that wallclock time is only updated by control software.
Moreover, the comments stating that wc_sec and wc_nsec are seconds and
nanoseconds (respectively) in UTC since the Unix epoch are bogus. Their
values are adjusted by the domain's time_offset_seconds value, which is
updated by a guest write to the emulated RTC and hence the wallclock
timezone is under guest control.
This patch attempts to bring the comment in line with reality whilst
keeping it reasonably short.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
xen/arm: setup: Relocate the Device-Tree later on in the boot
At the moment, the Device-Tree is relocated into xenheap while setting
up the memory subsystem. This is actually not necessary because the
early mapping is still present and we don't require the virtual address
to be stable until unflatting the Device-Tree.
So the relocation can safely be moved after the memory subsystem is
fully setup. This has the nice advantage to make the relocation common
and let the xenheap allocator decides where to put it.
Lastly, the device-tree is not going to be used for ACPI system. So
there are no need to relocate it and can just be discarded.
Lars Kurth [Fri, 30 Aug 2019 19:35:13 +0000 (20:35 +0100)]
scripts/add_maintainers.pl: Add logic to use V entry
Add logic to use V section entry in THE REST for identifying xen trees
Specifically:
* Move check until after the MAINTAINERS file has been read
* Add get_xen_maintainers_file_version() for check
* Remove top_of_tree as not needed any more
* Fail with extended error message when used out of tree
Signed-off-by: Lars Kurth <lars.kurth@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Lars Kurth [Fri, 30 Aug 2019 17:42:56 +0000 (18:42 +0100)]
MAINTAINERS: Add V section entry to allow identification of Xen file
This change provides sufficient information to allow get_maintainer.pl /
add_maintainers.pl scripts to be run on xen sister repositories such as
mini-os.git, osstest.git, etc
A suggested template for sister repositories of Xen is
========================================================
This file follows the same conventions as outlined in
xen.git:MAINTAINERS. Please refer to the file in xen.git
for more information.
debugtrace: add entry when entry count is wrapping
The debugtrace entry count is a 32 bit variable, so it can wrap when
lots of trace entries are being produced. Making it wider would result
in a waste of buffer space as the printed count value would consume
more bytes when not wrapping.
So instead of letting the count value grow to huge values let it wrap
and add a wrap counter printed in this situation. This will keep the
needed buffer space at today's value while avoiding to loose a way to
sort all entries in case multiple trace buffers are involved.
Note that the wrap message will be printed before the first trace
entry in case output is switched to console early. This is on purpose
in order to enable a future support of debugtrace to console without
any allocated buffer.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
debugtrace is normally writing trace entries into a single trace
buffer. There are cases where this is not optimal, e.g. when hunting
a bug which requires writing lots of trace entries and one cpu is
stuck. This will result in other cpus filling the trace buffer and
finally overwriting the interesting trace entries of the hanging cpu.
In order to be able to debug such situations add the capability to use
per-cpu trace buffers. This can be selected by specifying the
debugtrace boot parameter with the modifier "cpu:", like:
debugtrace=cpu:16
At the same time switch the parsing function to accept size modifiers
(e.g. 4M or 1G).
Printing out the trace entries is done for each buffer in order to
minimize the effort needed during printing. As each entry is prefixed
with its sequence number sorting the entries can easily be done when
analyzing them.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Report whether shadow paging is supported by the hypervisor, since it
can be disabled at build time.
Reuse and tweak LIBXL_HAVE_PHYSINFO_CAP_HAP as it hasn't appeared in a
released version of Xen yet.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Current libxl code will always enable Hardware Assisted Paging (HAP),
expecting that the hypervisor will fallback to shadow if HAP is not
available. With the changes to DOMCTL_createdomain that's not the case
any longer, and the hypervisor will raise an error if HAP is not
available instead of silently falling back to shadow.
In order to keep the previous functionality report whether HAP is
available or not in XEN_SYSCTL_physinfo, so that the toolstack can
select a sane default if there's no explicit user selection of whether
HAP should be used.
Note that on ARM hardware HAP capability is always reported since it's
a required feature in order to run Xen.
Fixes: d0c0ba7d3de ('x86/hvm/domain: remove the 'hap_enabled' flag') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Jan Beulich [Wed, 11 Sep 2019 12:54:34 +0000 (14:54 +0200)]
x86/shadow: fold p2m page accounting into sh_min_allocation()
This is to make the function live up to the promise its name makes. And
it simplifies all callers.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Ian Jackson [Tue, 10 Sep 2019 15:16:51 +0000 (16:16 +0100)]
tools/ocaml: abi check: #include on x86 only. Spotted by Gitlab CI
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 10 Sep 2019 14:35:09 +0000 (16:35 +0200)]
x86emul: fix test harness and fuzzer build dependencies
Commit fd35f32b4b ("tools/x86emul: Use struct cpuid_policy in the
userspace test harnesses") didn't account for the dependencies of
cpuid-autogen.h to potentially change between incremental builds. In
particular the harness has a "run" goal which is supposed to be usable
independently of the rest of the tools sub-tree building, and both the
harness and the fuzzer code are also supposed to be buildable
independently. Therefore a re-build of the generated header needs to be
triggered first, which is achieved by introducing a new top-level target
pattern (for just the "run" part for now).
Further cpuid.o did not have any dependencies added for it.
Finally, while at it, add a "run" target to the cpu-policy test harness.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 10 Sep 2019 14:34:21 +0000 (16:34 +0200)]
x86/IRQ: make 'i' debug output more tabular again
Since the affinity values are no longer of uniform width, move them
further to the right such that as much of the output as possible comes
out aligned with one another.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
The loop in FOR_EACH_IOREQ_SERVER is backwards hence the cleanup on
failure needs to be done forwards.
Fixes: 97a5a3e30161 ('x86/hvm/ioreq: maintain an array of ioreq servers rather than a list') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Tue, 10 Sep 2019 14:04:55 +0000 (15:04 +0100)]
tools/ocaml: Fix build error with CentOS 7
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28) complains:
xenctrl_stubs.c: In function 'stub_xc_domain_create':
xenctrl_stubs.c:216:28: error: 'val' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
cfg.arch.emulation_flags = ocaml_list_to_c_bitmap
^
xenctrl_stubs.c:198:12: error: 'val' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
cfg.flags = ocaml_list_to_c_bitmap
^
cc1: all warnings being treated as errors
GCC doesn't point at the correct piece of code, but the diagnostic text is
correct, and can occur when the list is empty. Initialise val to 0.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Andrew Cooper [Tue, 10 Sep 2019 11:17:30 +0000 (12:17 +0100)]
tools/ocaml: abi: Use formal conversion and check in more places
Now we have a caller for ocaml_list_to_c_bitmap.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Tue, 10 Sep 2019 11:27:45 +0000 (12:27 +0100)]
tools/ocaml: abi-check: Check properly.
Fix a broken regexp which would mention `$/' when it ought to have
mentioned `$'. The result would be that it would match lines like
type some_ocaml_type = Thing | Other_Thing
but ignore everything but the type name, giving wrong answers.
Check that we check mentioned types. Otherwise if we fail to spot
some suitable thing in the ocaml, we would just omit checking this
type !
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Andrew Cooper [Tue, 10 Sep 2019 11:14:51 +0000 (12:14 +0100)]
tools/ocaml: Reformat domain_create_flag
This will allow us to apply the abi checker soon.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Tue, 10 Sep 2019 11:25:26 +0000 (12:25 +0100)]
tools/ocaml: abi-check: Cope with multiple conversions of same type
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Tue, 10 Sep 2019 11:34:38 +0000 (12:34 +0100)]
tools/ocaml: abi-check: Improve output and error messages
In the generated C, add some comments saying where we found the ocaml
type. This helps with debugging. (I considered emitting #line
directives but decided this would be more confusing than helpful.)
Improve two dies.
Use better-named filehandles (perl prints thier names when it dies).
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Andrew Cooper [Tue, 10 Sep 2019 11:18:45 +0000 (12:18 +0100)]
tools/ocaml: abi handling: Provide ocaml->C conversion/check
No users of this yet so no overall change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Tue, 10 Sep 2019 11:12:44 +0000 (12:12 +0100)]
tools/ocaml: abi-check: Add comments
Provide interface documentation for this script.
Explain why we check .ml not .mli.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Mon, 9 Sep 2019 17:12:06 +0000 (18:12 +0100)]
tools/ocaml: Introduce xenctrl ABI build-time checks
c/s f089fddd941 broke the Ocaml ABI by renumering
XEN_SYSCTL_PHYSCAP_directio without adjusting the Ocaml
physinfo_cap_flag enumeration.
Add build machinery which will check the ABI correspondence.
This will result in a compile time failure whenever constants get
renumbered/added without a compatible adjustment to the Ocaml ABI.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Andrew Cooper [Mon, 9 Sep 2019 17:12:05 +0000 (18:12 +0100)]
tools/ocaml: Add missing CAP_PV
c/s f089fddd941 broke the Ocaml ABI by renumering XEN_SYSCTL_PHYSCAP_directio
without adjusting the Ocaml physinfo_cap_flag enumeration. Fix this by
inserting CAP_PV between CAP_HVM and CAP_DirectIO.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Mon, 9 Sep 2019 10:35:03 +0000 (11:35 +0100)]
x86/boot: Improve code generation from bootsym()
The code generation for bootsym() is atrocious, and unnecessarily complicated.
Given the appropriate physical address, all we need is to construct a virtual
address of the appropriate type.
Andrew Cooper [Fri, 6 Sep 2019 15:59:02 +0000 (16:59 +0100)]
x86/cpuid: Fix handling of the CPUID.7[0].eax levelling MSR
7a0 is an integer field, not a mask - taking the logical and of the hardware
and policy values results in nonsense. Instead, take the policy value
directly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@cirtrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
As a preparation for per-cpu buffers do a little refactoring of the
debugtrace data: put the needed buffer admin data into the buffer as
it will be needed for each buffer. In order not to limit buffer size
switch the related fields from unsigned int to unsigned long, as on
huge machines with RAM in the TB range it might be interesting to
support buffers >4GB.
While at it switch debugtrace_send_to_console and debugtrace_used to
bool and delete an empty line.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
After dumping the debugtrace buffer it is cleared. This results in some
entries not being printed in case the buffer is dumped again before
having wrapped.
While at it remove the trailing zero byte in the buffer as it is no
longer needed. Commit b5e6e1ee8da59f introduced passing the number of
chars to be printed in the related interfaces, so the trailing 0 byte
is no longer required.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Current physcaps in XEN_SYSCTL_physinfo are only used by x86, albeit
the capabilities themselves are not x86 specific.
This patch adds support for also reporting the current capabilities on
Arm hardware. Note that on Arm PHYSCAP_hvm is always reported, and
setting PHYSCAP_directio has been moved to common code since the same
logic to set it is used by x86 and Arm.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
xen/arm32: head: Don't setup the fixmap on secondary CPUs
setup_fixmap() will setup the fixmap in the boot page tables in order to
use earlyprintk and also update the register r11 holding the address to
the UART.
However, secondary CPUs are not using earlyprintk between turning the
MMU on and switching to the runtime page table. So setting up the
fixmap in the boot pages table is pointless.
This means most of setup_fixmap() is not necessary for the secondary
CPUs. The update of UART address is now moved out of setup_fixmap() and
duplicated in the CPU boot and secondary CPUs boot. Additionally, the
call to setup_fixmap() is removed from secondary CPUs boot.
Lastly, take the opportunity to replace load from literal pool with the
new macro mov_w.
xen/arm32: head: Move assembly switch to the runtime PT in secondary CPUs path
The assembly switch to the runtime PT is only necessary for the
secondary CPUs. So move the code in the secondary CPUs path.
While this is definitely not compliant with the Arm Arm as we are
switching between two differents set of page-tables without turning off
the MMU. Turning off the MMU is impossible here as the ID map may clash
with other mappings in the runtime page-tables. This will require more
rework to avoid the problem. So for now add a TODO in the code.
Finally, the code is currently assume that r5 will be properly set to 0
before hand. This is done by create_page_tables() which is called quite
early in the boot process. There are a risk this may be oversight in the
future and therefore breaking secondary CPUs boot. Instead, set r5 to 0
just before using it.
Document the behavior and the main registers usage within the function.
Note that r6 is now only used within the function, so it does not need
to be part of the common register.
xen/arm32: head: Rework and document check_cpu_mode()
A branch in the success case can be avoided by inverting the branch
condition. At the same time, remove a pointless comment as Xen can only
run at Hypervisor Mode.
Lastly, document the behavior and the main registers usage within the
function.
Julien Grall [Wed, 26 Jun 2019 12:46:56 +0000 (13:46 +0100)]
xen/arm32: head: Introduce distinct paths for the boot CPU and secondary CPUs
The boot code is currently quite difficult to go through because of the
lack of documentation and a number of indirection to avoid executing
some path in either the boot CPU or secondary CPUs.
In an attempt to make the boot code easier to follow, each parts of the
boot are now in separate functions. Furthermore, the paths for the boot
CPU and secondary CPUs are now distinct and for now will call each
functions.
Follow-ups will remove unnecessary calls and do further improvement
(such as adding documentation and reshuffling).
Note that the switch from using the ID mapping to the runtime mapping
is duplicated for each path. This is because in the future we will need
to stay longer in the ID mapping for the boot CPU.
Lastly, it is now required to save lr in cpu_init() becauswe the
function will call other functions and therefore clobber lr.
xen/arm32: head: Rework UART initialization on boot CPU
Anything executed after the label common_start can be executed on all
CPUs. However most of the instructions executed between the label
common_start and init_uart are not executed on the boot CPU.
The only instructions executed are to lookup the CPUID so it can be
printed on the console (if earlyprintk is enabled). Printing the CPUID
is not entirely useful to have for the boot CPU and requires a
conditional branch to bypass unused instructions.
Furthermore, the function init_uart is only called for boot CPU
requiring another conditional branch. This makes the code a bit tricky
to follow.
The UART initialization is now moved before the label common_start. This
now requires to have a slightly altered print for the boot CPU and set
the early UART base address in each the two path (boot CPU and
secondary CPUs).
This has the nice effect to remove a couple of conditional branch in
the code.
After this rework, the CPUID is only used at the very beginning of the
secondary CPUs boot path. So there is no need to "reserve" x24 for the
CPUID.
Lastly, take the opportunity to replace load from literal pool with the
new macro mov_w.
xen/arm32: head: Don't clobber r14/lr in the macro PRINT
The current implementation of the macro PRINT will clobber r14/lr. This
means the user should save r14 if it cares about it.
Follow-up patches will introduce more use of PRINT in places where lr
should be preserved. Rather than requiring all the user to preserve lr,
the macro PRINT is modified to save and restore it.
While the comment state r3 will be clobbered, this is not the case. So
PRINT will use r3 to preserve lr.
Lastly, take the opportunity to move the comment on top of PRINT and use
PRINT in init_uart. Both changes will be helpful in a follow-up patch.
Julien Grall [Mon, 17 Jun 2019 13:51:21 +0000 (14:51 +0100)]
xen/arm64: head: Introduce a macro to get a PC-relative address of a symbol
Arm64 provides instructions to load a PC-relative address, but with some
limitations:
- adr is enable to cope with +/-1MB
- adrp is enale to cope with +/-4GB but relative to a 4KB page
address
Because of that, the code requires to use 2 instructions to load any Xen
symbol. To make the code more obvious, introducing a new macro adr_l is
introduced.
The new macro is used to replace a couple of open-coded use in
efi_xen_start.
Julien Grall [Tue, 6 Aug 2019 17:14:08 +0000 (18:14 +0100)]
xen/arm: lpae: Allow more LPAE helpers to be used in assembly
A follow-up patch will require to use *_table_offset() and *_MASK helpers
from assembly. This can be achieved by using _AT() macro to remove the type
when called from assembly.
Andrew Cooper [Mon, 26 Nov 2018 17:06:23 +0000 (17:06 +0000)]
x86/cpuid: Extend the cpuid= option to support all named features
For gen-cpuid.py, fix a comment describing self.names, and generate the
reverse mapping in self.values. Write out INIT_FEATURE_NAMES which maps a
string name to a bit position.
For parse_cpuid(), use cmdline_strcmp() and perform a binary search over
INIT_FEATURE_NAMES. A tweak to cmdline_strcmp() is needed to break at equals
signs as well.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Bandan Das [Fri, 6 Sep 2019 15:07:55 +0000 (17:07 +0200)]
x86/apic: do not initialize LDR and DFR for bigsmp
Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
bigsmp APIC implementation uses physical destination mode, but it
nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
multiple bit being set.
This does not cause a functional problem because LDR and DFR are ignored
when physical destination mode is active, but it triggered a problem on a
32-bit KVM guest which jumps into a kdump kernel.
The multiple bits set unearthed a bug in the KVM APIC implementation. The
code which creates the logical destination map for VCPUs ignores the
disabled state of the APIC and ends up overwriting an existing valid entry
and as a result, APIC calibration hangs in the guest during kdump
initialization.
Remove the bogus LDR/DFR initialization.
This is not intended to work around the KVM APIC bug. The LDR/DFR
ininitalization is wrong on its own.
Bandan Das [Fri, 6 Sep 2019 15:07:14 +0000 (17:07 +0200)]
x86/apic: include the LDR when clearing out APIC registers
Although APIC initialization will typically clear out the LDR before
setting it, the APIC cleanup code should reset the LDR.
This was discovered with a 32-bit KVM guest jumping into a kdump
kernel. The stale bits in the LDR triggered a bug in the KVM APIC
implementation which caused the destination mapping for VCPUs to be
corrupted.
Note that this isn't intended to paper over the KVM APIC bug. The kernel
has to clear the LDR when resetting the APIC registers except when X2APIC
is enabled.
Signed-off-by: Bandan Das <bsd@redhat.com>
[Linux commit 558682b5291937a70748d36fd9ba757fb25b99ae] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[Linux commit 04b1d5d098491244f506c4265cc95b87210eef2f] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
LLVM code generation can attempt to load from a variable in the next
condition of an expression under certain circumstances, thus
attempting to load use_xsave regardless of the value of the bsp
variable, which leads to a page fault when the init section has
already been unmapped.
Fix this by making use_xsave non-init, thus preventing the page fault;
use __read_mostly instead. The LLVM bug with the discussion about this
issue can be found at:
https://bugs.llvm.org/show_bug.cgi?id=39707
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>