Report whether shadow paging is supported by the hypervisor, since it
can be disabled at build time.
Reuse and tweak LIBXL_HAVE_PHYSINFO_CAP_HAP as it hasn't appeared in a
released version of Xen yet.
Requested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Current libxl code will always enable Hardware Assisted Paging (HAP),
expecting that the hypervisor will fallback to shadow if HAP is not
available. With the changes to DOMCTL_createdomain that's not the case
any longer, and the hypervisor will raise an error if HAP is not
available instead of silently falling back to shadow.
In order to keep the previous functionality report whether HAP is
available or not in XEN_SYSCTL_physinfo, so that the toolstack can
select a sane default if there's no explicit user selection of whether
HAP should be used.
Note that on ARM hardware HAP capability is always reported since it's
a required feature in order to run Xen.
Fixes: d0c0ba7d3de ('x86/hvm/domain: remove the 'hap_enabled' flag') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Jan Beulich [Wed, 11 Sep 2019 12:54:34 +0000 (14:54 +0200)]
x86/shadow: fold p2m page accounting into sh_min_allocation()
This is to make the function live up to the promise its name makes. And
it simplifies all callers.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Ian Jackson [Tue, 10 Sep 2019 15:16:51 +0000 (16:16 +0100)]
tools/ocaml: abi check: #include on x86 only. Spotted by Gitlab CI
Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 10 Sep 2019 14:35:09 +0000 (16:35 +0200)]
x86emul: fix test harness and fuzzer build dependencies
Commit fd35f32b4b ("tools/x86emul: Use struct cpuid_policy in the
userspace test harnesses") didn't account for the dependencies of
cpuid-autogen.h to potentially change between incremental builds. In
particular the harness has a "run" goal which is supposed to be usable
independently of the rest of the tools sub-tree building, and both the
harness and the fuzzer code are also supposed to be buildable
independently. Therefore a re-build of the generated header needs to be
triggered first, which is achieved by introducing a new top-level target
pattern (for just the "run" part for now).
Further cpuid.o did not have any dependencies added for it.
Finally, while at it, add a "run" target to the cpu-policy test harness.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 10 Sep 2019 14:34:21 +0000 (16:34 +0200)]
x86/IRQ: make 'i' debug output more tabular again
Since the affinity values are no longer of uniform width, move them
further to the right such that as much of the output as possible comes
out aligned with one another.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
The loop in FOR_EACH_IOREQ_SERVER is backwards hence the cleanup on
failure needs to be done forwards.
Fixes: 97a5a3e30161 ('x86/hvm/ioreq: maintain an array of ioreq servers rather than a list') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Andrew Cooper [Tue, 10 Sep 2019 14:04:55 +0000 (15:04 +0100)]
tools/ocaml: Fix build error with CentOS 7
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-28) complains:
xenctrl_stubs.c: In function 'stub_xc_domain_create':
xenctrl_stubs.c:216:28: error: 'val' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
cfg.arch.emulation_flags = ocaml_list_to_c_bitmap
^
xenctrl_stubs.c:198:12: error: 'val' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
cfg.flags = ocaml_list_to_c_bitmap
^
cc1: all warnings being treated as errors
GCC doesn't point at the correct piece of code, but the diagnostic text is
correct, and can occur when the list is empty. Initialise val to 0.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Andrew Cooper [Tue, 10 Sep 2019 11:17:30 +0000 (12:17 +0100)]
tools/ocaml: abi: Use formal conversion and check in more places
Now we have a caller for ocaml_list_to_c_bitmap.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Tue, 10 Sep 2019 11:27:45 +0000 (12:27 +0100)]
tools/ocaml: abi-check: Check properly.
Fix a broken regexp which would mention `$/' when it ought to have
mentioned `$'. The result would be that it would match lines like
type some_ocaml_type = Thing | Other_Thing
but ignore everything but the type name, giving wrong answers.
Check that we check mentioned types. Otherwise if we fail to spot
some suitable thing in the ocaml, we would just omit checking this
type !
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Andrew Cooper [Tue, 10 Sep 2019 11:14:51 +0000 (12:14 +0100)]
tools/ocaml: Reformat domain_create_flag
This will allow us to apply the abi checker soon.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Tue, 10 Sep 2019 11:25:26 +0000 (12:25 +0100)]
tools/ocaml: abi-check: Cope with multiple conversions of same type
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Tue, 10 Sep 2019 11:34:38 +0000 (12:34 +0100)]
tools/ocaml: abi-check: Improve output and error messages
In the generated C, add some comments saying where we found the ocaml
type. This helps with debugging. (I considered emitting #line
directives but decided this would be more confusing than helpful.)
Improve two dies.
Use better-named filehandles (perl prints thier names when it dies).
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Andrew Cooper [Tue, 10 Sep 2019 11:18:45 +0000 (12:18 +0100)]
tools/ocaml: abi handling: Provide ocaml->C conversion/check
No users of this yet so no overall change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Tue, 10 Sep 2019 11:12:44 +0000 (12:12 +0100)]
tools/ocaml: abi-check: Add comments
Provide interface documentation for this script.
Explain why we check .ml not .mli.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Ian Jackson [Mon, 9 Sep 2019 17:12:06 +0000 (18:12 +0100)]
tools/ocaml: Introduce xenctrl ABI build-time checks
c/s f089fddd941 broke the Ocaml ABI by renumering
XEN_SYSCTL_PHYSCAP_directio without adjusting the Ocaml
physinfo_cap_flag enumeration.
Add build machinery which will check the ABI correspondence.
This will result in a compile time failure whenever constants get
renumbered/added without a compatible adjustment to the Ocaml ABI.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <Andrew.Cooper3@citrix.com>
Andrew Cooper [Mon, 9 Sep 2019 17:12:05 +0000 (18:12 +0100)]
tools/ocaml: Add missing CAP_PV
c/s f089fddd941 broke the Ocaml ABI by renumering XEN_SYSCTL_PHYSCAP_directio
without adjusting the Ocaml physinfo_cap_flag enumeration. Fix this by
inserting CAP_PV between CAP_HVM and CAP_DirectIO.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Ian Jackson <ian.jackson@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Andrew Cooper [Mon, 9 Sep 2019 10:35:03 +0000 (11:35 +0100)]
x86/boot: Improve code generation from bootsym()
The code generation for bootsym() is atrocious, and unnecessarily complicated.
Given the appropriate physical address, all we need is to construct a virtual
address of the appropriate type.
Andrew Cooper [Fri, 6 Sep 2019 15:59:02 +0000 (16:59 +0100)]
x86/cpuid: Fix handling of the CPUID.7[0].eax levelling MSR
7a0 is an integer field, not a mask - taking the logical and of the hardware
and policy values results in nonsense. Instead, take the policy value
directly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@cirtrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
As a preparation for per-cpu buffers do a little refactoring of the
debugtrace data: put the needed buffer admin data into the buffer as
it will be needed for each buffer. In order not to limit buffer size
switch the related fields from unsigned int to unsigned long, as on
huge machines with RAM in the TB range it might be interesting to
support buffers >4GB.
While at it switch debugtrace_send_to_console and debugtrace_used to
bool and delete an empty line.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
After dumping the debugtrace buffer it is cleared. This results in some
entries not being printed in case the buffer is dumped again before
having wrapped.
While at it remove the trailing zero byte in the buffer as it is no
longer needed. Commit b5e6e1ee8da59f introduced passing the number of
chars to be printed in the related interfaces, so the trailing 0 byte
is no longer required.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Current physcaps in XEN_SYSCTL_physinfo are only used by x86, albeit
the capabilities themselves are not x86 specific.
This patch adds support for also reporting the current capabilities on
Arm hardware. Note that on Arm PHYSCAP_hvm is always reported, and
setting PHYSCAP_directio has been moved to common code since the same
logic to set it is used by x86 and Arm.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
xen/arm32: head: Don't setup the fixmap on secondary CPUs
setup_fixmap() will setup the fixmap in the boot page tables in order to
use earlyprintk and also update the register r11 holding the address to
the UART.
However, secondary CPUs are not using earlyprintk between turning the
MMU on and switching to the runtime page table. So setting up the
fixmap in the boot pages table is pointless.
This means most of setup_fixmap() is not necessary for the secondary
CPUs. The update of UART address is now moved out of setup_fixmap() and
duplicated in the CPU boot and secondary CPUs boot. Additionally, the
call to setup_fixmap() is removed from secondary CPUs boot.
Lastly, take the opportunity to replace load from literal pool with the
new macro mov_w.
xen/arm32: head: Move assembly switch to the runtime PT in secondary CPUs path
The assembly switch to the runtime PT is only necessary for the
secondary CPUs. So move the code in the secondary CPUs path.
While this is definitely not compliant with the Arm Arm as we are
switching between two differents set of page-tables without turning off
the MMU. Turning off the MMU is impossible here as the ID map may clash
with other mappings in the runtime page-tables. This will require more
rework to avoid the problem. So for now add a TODO in the code.
Finally, the code is currently assume that r5 will be properly set to 0
before hand. This is done by create_page_tables() which is called quite
early in the boot process. There are a risk this may be oversight in the
future and therefore breaking secondary CPUs boot. Instead, set r5 to 0
just before using it.
Document the behavior and the main registers usage within the function.
Note that r6 is now only used within the function, so it does not need
to be part of the common register.
xen/arm32: head: Rework and document check_cpu_mode()
A branch in the success case can be avoided by inverting the branch
condition. At the same time, remove a pointless comment as Xen can only
run at Hypervisor Mode.
Lastly, document the behavior and the main registers usage within the
function.
Julien Grall [Wed, 26 Jun 2019 12:46:56 +0000 (13:46 +0100)]
xen/arm32: head: Introduce distinct paths for the boot CPU and secondary CPUs
The boot code is currently quite difficult to go through because of the
lack of documentation and a number of indirection to avoid executing
some path in either the boot CPU or secondary CPUs.
In an attempt to make the boot code easier to follow, each parts of the
boot are now in separate functions. Furthermore, the paths for the boot
CPU and secondary CPUs are now distinct and for now will call each
functions.
Follow-ups will remove unnecessary calls and do further improvement
(such as adding documentation and reshuffling).
Note that the switch from using the ID mapping to the runtime mapping
is duplicated for each path. This is because in the future we will need
to stay longer in the ID mapping for the boot CPU.
Lastly, it is now required to save lr in cpu_init() becauswe the
function will call other functions and therefore clobber lr.
xen/arm32: head: Rework UART initialization on boot CPU
Anything executed after the label common_start can be executed on all
CPUs. However most of the instructions executed between the label
common_start and init_uart are not executed on the boot CPU.
The only instructions executed are to lookup the CPUID so it can be
printed on the console (if earlyprintk is enabled). Printing the CPUID
is not entirely useful to have for the boot CPU and requires a
conditional branch to bypass unused instructions.
Furthermore, the function init_uart is only called for boot CPU
requiring another conditional branch. This makes the code a bit tricky
to follow.
The UART initialization is now moved before the label common_start. This
now requires to have a slightly altered print for the boot CPU and set
the early UART base address in each the two path (boot CPU and
secondary CPUs).
This has the nice effect to remove a couple of conditional branch in
the code.
After this rework, the CPUID is only used at the very beginning of the
secondary CPUs boot path. So there is no need to "reserve" x24 for the
CPUID.
Lastly, take the opportunity to replace load from literal pool with the
new macro mov_w.
xen/arm32: head: Don't clobber r14/lr in the macro PRINT
The current implementation of the macro PRINT will clobber r14/lr. This
means the user should save r14 if it cares about it.
Follow-up patches will introduce more use of PRINT in places where lr
should be preserved. Rather than requiring all the user to preserve lr,
the macro PRINT is modified to save and restore it.
While the comment state r3 will be clobbered, this is not the case. So
PRINT will use r3 to preserve lr.
Lastly, take the opportunity to move the comment on top of PRINT and use
PRINT in init_uart. Both changes will be helpful in a follow-up patch.
Julien Grall [Mon, 17 Jun 2019 13:51:21 +0000 (14:51 +0100)]
xen/arm64: head: Introduce a macro to get a PC-relative address of a symbol
Arm64 provides instructions to load a PC-relative address, but with some
limitations:
- adr is enable to cope with +/-1MB
- adrp is enale to cope with +/-4GB but relative to a 4KB page
address
Because of that, the code requires to use 2 instructions to load any Xen
symbol. To make the code more obvious, introducing a new macro adr_l is
introduced.
The new macro is used to replace a couple of open-coded use in
efi_xen_start.
Julien Grall [Tue, 6 Aug 2019 17:14:08 +0000 (18:14 +0100)]
xen/arm: lpae: Allow more LPAE helpers to be used in assembly
A follow-up patch will require to use *_table_offset() and *_MASK helpers
from assembly. This can be achieved by using _AT() macro to remove the type
when called from assembly.
Andrew Cooper [Mon, 26 Nov 2018 17:06:23 +0000 (17:06 +0000)]
x86/cpuid: Extend the cpuid= option to support all named features
For gen-cpuid.py, fix a comment describing self.names, and generate the
reverse mapping in self.values. Write out INIT_FEATURE_NAMES which maps a
string name to a bit position.
For parse_cpuid(), use cmdline_strcmp() and perform a binary search over
INIT_FEATURE_NAMES. A tweak to cmdline_strcmp() is needed to break at equals
signs as well.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Bandan Das [Fri, 6 Sep 2019 15:07:55 +0000 (17:07 +0200)]
x86/apic: do not initialize LDR and DFR for bigsmp
Legacy apic init uses bigsmp for smp systems with 8 and more CPUs. The
bigsmp APIC implementation uses physical destination mode, but it
nevertheless initializes LDR and DFR. The LDR even ends up incorrectly with
multiple bit being set.
This does not cause a functional problem because LDR and DFR are ignored
when physical destination mode is active, but it triggered a problem on a
32-bit KVM guest which jumps into a kdump kernel.
The multiple bits set unearthed a bug in the KVM APIC implementation. The
code which creates the logical destination map for VCPUs ignores the
disabled state of the APIC and ends up overwriting an existing valid entry
and as a result, APIC calibration hangs in the guest during kdump
initialization.
Remove the bogus LDR/DFR initialization.
This is not intended to work around the KVM APIC bug. The LDR/DFR
ininitalization is wrong on its own.
Bandan Das [Fri, 6 Sep 2019 15:07:14 +0000 (17:07 +0200)]
x86/apic: include the LDR when clearing out APIC registers
Although APIC initialization will typically clear out the LDR before
setting it, the APIC cleanup code should reset the LDR.
This was discovered with a 32-bit KVM guest jumping into a kdump
kernel. The stale bits in the LDR triggered a bug in the KVM APIC
implementation which caused the destination mapping for VCPUs to be
corrupted.
Note that this isn't intended to paper over the KVM APIC bug. The kernel
has to clear the LDR when resetting the APIC registers except when X2APIC
is enabled.
Signed-off-by: Bandan Das <bsd@redhat.com>
[Linux commit 558682b5291937a70748d36fd9ba757fb25b99ae] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[Linux commit 04b1d5d098491244f506c4265cc95b87210eef2f] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
LLVM code generation can attempt to load from a variable in the next
condition of an expression under certain circumstances, thus
attempting to load use_xsave regardless of the value of the bsp
variable, which leads to a page fault when the init section has
already been unmapped.
Fix this by making use_xsave non-init, thus preventing the page fault;
use __read_mostly instead. The LLVM bug with the discussion about this
issue can be found at:
https://bugs.llvm.org/show_bug.cgi?id=39707
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 27 Dec 2018 15:14:01 +0000 (15:14 +0000)]
x86/AMD: Fix handling of x87 exception pointers on Fam17h hardware
AMD Pre-Fam17h CPUs "optimise" {F,}X{SAVE,RSTOR} by not saving/restoring
FOP/FIP/FDP if an x87 exception isn't pending. This causes an information
leak, CVE-2006-1056, and worked around by several OSes, including Xen. AMD
Fam17h CPUs no longer have this leak, and advertise so in a CPUID bit.
Introduce the RSTR_FP_ERR_PTRS feature, as specified by AMD, and expose to all
guests by default. While adjusting libxl's cpuid table, add CLZERO which
looks to have been omitted previously.
Also introduce an X86_BUG bit to trigger the (F)XRSTOR workaround, and set it
on AMD hardware where RSTR_FP_ERR_PTRS is not advertised. Optimise the
conditions for the workaround paths.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 27 Nov 2018 15:06:15 +0000 (15:06 +0000)]
x86/vtd: Drop struct iommu_flush
It is unclear why this abstraction exists, but iommu_get_flush() returns
possibly NULL and every user unconditionally dereferences the result. In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.
Move the two function pointers into struct vtd_iommu (using a flush prefix),
and delete iommu_get_flush(). Furthermore, there is no need to pass the IOMMU
pointer to the callbacks via a void pointer, so change the parameter to be
correctly typed as struct vtd_iommu. Clean up bool_t to bool in surrounding
context.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 27 Nov 2018 15:02:18 +0000 (15:02 +0000)]
x86/vtd: Drop struct ir_ctrl
It is unclear why this abstraction exists, but iommu_ir_ctrl() returns
possibly NULL and every user unconditionally dereferences the result. In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.
Move the fields into struct vtd_iommu, and delete iommu_ir_ctrl().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 27 Nov 2018 14:57:14 +0000 (14:57 +0000)]
x86/vtd: Drop struct qi_ctrl
It is unclear why this abstraction exists, but iommu_qi_ctrl() returns
possibly NULL and every user unconditionally dereferences the result. In
practice, I can't spot a path where iommu is NULL, so I think it is mostly
dead.
Move the sole member into struct vtd_iommu, and delete iommu_qi_ctrl().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 5 Sep 2019 08:02:11 +0000 (10:02 +0200)]
VT-d/ATS: tidy device_in_domain()
Use appropriate types. Drop unnecessary casts. Check for failures which
can (at least in theory because of non-obvious breakage elsewhere)
occur, instead of ones which really can't (map_domain_page() won't
return NULL).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 5 Sep 2019 08:00:07 +0000 (10:00 +0200)]
x86/cpu-policy: work around bogus warning in test harness
Despite %.12s properly limiting the number of characters read from
ident[], gcc 9 (at least up to 9.2.0) warns about the strings not
being nul-terminated:
test-cpu-policy.c:64:18: error: '%.12s' directive argument is not a nul-terminated string [-Werror=format-overflow=]
64 | fail(" Test '%.12s', expected vendor %u, got %u\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test-cpu-policy.c:20:12: note: in definition of macro 'fail'
20 | printf(fmt, ##__VA_ARGS__); \
| ^~~
test-cpu-policy.c:64:27: note: format string is defined here
64 | fail(" Test '%.12s', expected vendor %u, got %u\n",
| ^~~~~
test-cpu-policy.c:44:7: note: referenced argument declared here
44 | } tests[] = {
| ^~~~~
The issue was reported against gcc in their bugzilla (bug 91667).
Re-order array entries, oddly enough suppressing the warning.
Reported-by: Christopher Clark <christopher.w.clark@gmail.com> Reported-by: Dario Faggioli <dfaggioli@suse.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
p2m/ept: add _subtree suffix to ept_invalidate_emt
So that the name implies the function is used to walk the page table
pointer passed as parameter. Drop the parent_ prefix from the level
parameter, since the level passed is the one matching the EPT entry
passed in the mfn parameter.
While there also change bool_t to bool and add an assert to make sure
no level 0 entries (ie: 4K EPT leaf entries) are passed as parameters.
No functional change intended.
Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 5 Sep 2019 07:58:17 +0000 (09:58 +0200)]
VT-d: avoid PCI device lookup
The two uses of pci_get_pdev_by_domain() lack proper locking, but are
also only used to get hold of a NUMA node ID. Calculate and store the
node ID earlier on and remove the lookups (in lieu of fixing the
locking).
While doing this it became apparent that iommu_alloc()'s use of
alloc_pgtable_maddr() would occur before RHSAs would have been parsed:
iommu_alloc() gets called from the DRHD parsing routine, which - on
spec conforming platforms - happens strictly before RHSA parsing. Defer
the allocation until after all ACPI table parsing has finished,
established the node ID there first.
Suggested-by: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 5 Sep 2019 07:56:42 +0000 (09:56 +0200)]
x86/shadow: don't enable shadow mode with too small a shadow allocation (part 2)
Commit 2634b997af ("x86/shadow: don't enable shadow mode with too small
a shadow allocation") was incomplete: The adjustment done there to
shadow_enable() is also needed in shadow_one_bit_enable(). The (new)
problem report was (apparently) a failed PV guest migration followed by
another migration attempt for that same guest. Disabling log-dirty mode
after the first one had left a couple of shadow pages allocated (perhaps
something that also wants fixing), and hence the second enabling of
log-dirty mode wouldn't have allocated anything further.
Reported-by: James Wang <jnwang@suse.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
x86/altp2m: Add a new hypercall to get the active altp2m index
The patch adds a new lib xc function (xc_altp2m_get_vcpu_p2m_idx) that
uses a new hvmop (HVMOP_altp2m_get_p2m_idx) to get the active altp2m
index from a given vcpu.
Signed-off-by: Alexandru Isaila <aisaila@bitdefender.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Mon, 2 Sep 2019 16:16:53 +0000 (17:16 +0100)]
tools/shim: Apply more duct tape to the linkfarm logic
Sander reported a build failure which manifests as `make; make install`
failing with:
<snip>/cross-install -m0644 -p xen-dir/xen-shim //usr/local/lib/xen/boot/xen-shim
install: cannot stat 'xen-dir/xen-shim': No such file or directory
make[4]: *** [Makefile:52: install] Error 1
make[4]: Leaving directory '/usr/src/new/xen-unstable/tools/firmware'
It has subsequently been seen intermittently by OSSTest. This was caused by
c/s 32b1d628 triggering a preexisting linkfarm bug for partial rebuilds.
Between the first `make` and the subsequent `make install`, the linkfarm logic
observes new final build products and regenerates the linkfarm. This includes
a distclean, which throws away everything from the first `make`.
As the xen-shim rule use a symlink, the link itself remains still up-to-date
but is broken due to the distclean, which causes install to fail.
Update the linkfarm logic to not regenerate itself when build artefacts
appear. This isn't a comprehensive fix but is the best which can be done
easily. Any further effort would be better spent making out-of-tree builds
work for Xen.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Today dumping the debugtrace buffers is done via sercon_puts(), while
direct printing of trace entries (after toggling output to the console)
is using serial_puts().
Use sercon_puts() in both cases, as the difference between both is not
really making sense.
In order to prepare moving debugtrace functionality to an own source
file rename sercon_puts() to console_serial_puts() and make it globally
visible.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 3 Sep 2019 12:50:33 +0000 (14:50 +0200)]
x86emul: support INVPCID
Just like for INVLPGA the HVM hook only supports PCID 0 for the time
being for individual address invalidation. It also translates the other
types to a full flush, which is architecturally permitted and
performance-wise presumably not much worse because emulation is slow
anyway.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 3 Sep 2019 12:49:52 +0000 (14:49 +0200)]
x86emul: generalize invlpg() hook
The hook is already in use for INVLPGA as well. Rename the hook and add
parameters. For the moment INVLPGA with a non-zero ASID remains
unsupported, but the TODO item gets pushed into the actual hook handler.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 3 Sep 2019 12:49:20 +0000 (14:49 +0200)]
x86/HVM: ignore guest INVD uses
The only place we'd expect the insn to be sensibly used is in
(virtualization unaware) firmware.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 3 Sep 2019 12:48:19 +0000 (14:48 +0200)]
x86emul: support WBNOINVD
Rev 037 of Intel's ISA extensions document does not state intercept
behavior for the insn (I've been unofficially told that the distinction
is going to be by exit qualification, as I would have assumed
considering that this way it's sufficiently transparent to unaware
software, as using WBINVD in place of WBNOINVD is always correct, just
less efficient). Similarly AMD's PM volume 2 version 3.31 only states
that both use the same VMEXIT, but not how to distinugish them (other
than by decoding the insn). Therefore in the HVM case for now it'll be
backed by the same ->wbinvd_intercept() handlers.
Use this occasion and also add the two missing table entries for
CLDEMOTE, which doesn't require any further changes to make work.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
public: add macro for defining variable length array in public headers
Several public headers of the hypervisor contain structures with
variable length arrays. In order to be usable with different compilers
those definitions are depending on the compiler type and the standard
supported by the compiler.
In order to avoid open coding the different variants in each header
add a common macro for that purpose in xen.h.
This at once corrects most of the definitions which miss one case
leading to not defining the array at all.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 29 Aug 2019 12:35:33 +0000 (13:35 +0100)]
x86/apci: Adjust command line parsing for "acpi_sleep"
Perform parsing in a custom_param, rather than stashing the content in a
string and parsing in an initcall. Adjust the parsing to conform to current
standards.
No practical change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 2 Sep 2019 12:45:57 +0000 (14:45 +0200)]
x86: shrink video_{flags,mode} to {8,16} bits
We really don't need them to be any wider.
Also remove the C level declaration (and hence also the GLOBAL) of
video_mode altogether; it's used in assembly code only.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 2 Sep 2019 12:45:18 +0000 (14:45 +0200)]
x86: a little bit of 16-bit video mode setting code cleanup
To "compensate" for the code size growth by an earlier change:
- drop "trampoline" labels (in almost all cases the target label is
reachable with an 8-bit-displacement branch anyway, and a single 16-
bit-displacement branch is still better than a pair of two branches)
- drop an entirely dead insn from wakeup.S:mode_setw
- reduce code size in a few other (obvious I hope) cases, by more
suitable insn/operands selection
Also drop redundant #define-s (move suitable #include a little earlier
instead) and add two alignment directives.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 2 Sep 2019 12:41:19 +0000 (14:41 +0200)]
x86/ACPI: restore VESA mode upon resume from S3
In order for "acpi_sleep=s3_mode" to have any effect, we should record
the video mode we switched to during boot. Since right now there's mode
setting code for VESA modes only in the resume case, record the mode
just in that one case.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 2 Sep 2019 12:40:15 +0000 (14:40 +0200)]
x86emul: generalize wbinvd() hook
The hook is already in use for other purposes, and emulating e.g.
CLFLUSH by issuing WBINVD is, well, not very nice. Rename the hook and
add parameters. Use lighter weight flushing insns when possible in
hvmemul_cache_op().
hvmemul_cache_op() treating x86emul_invd the same as x86emul_wbinvd is
to retain original behavior, but I'm not sure this is what we want in
the long run.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Paul Durrant <paul.durrant@citrix.com>
Jan Beulich [Mon, 2 Sep 2019 12:38:37 +0000 (14:38 +0200)]
timers: limit heap size
First and foremost make timer_softirq_action() avoid growing the heap
if its new size can't be stored without truncation. 64k entries is a
lot, and I don't think we're at risk of actually running into the issue,
but I also think it's better not to allow for hard to debug problems to
occur in the first place.
Furthermore also adjust the code such the size/limit fields becoming
unsigned int would at least work from a mere sizing point of view. For
this also switch various uses of plain int to unsigned int.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Igor Druzhinin [Fri, 30 Aug 2019 13:23:01 +0000 (15:23 +0200)]
x86/domain: don't destroy IOREQ servers on soft reset
Performing soft reset should not opportunistically kill IOREQ servers
for device emulators that might be currently running for a domain.
Every emulator is supposed to clean up IOREQ servers for itself on exit.
This allows a toolstack to elect whether or not a particular device
model should be restarted.
The original code was introduced in 3235cbfe ("arch-specific hooks for
domain_soft_reset()") likely due to the fact 'default' IOREQ server
existed in Xen at the time and used by QEMU didn't have an API call to
destroy. Since the removal of 'default' IOREQ server from Xen this
reason has gone away.
Since commit ba7fdd64b ("xen: cleanup IOREQ server on exit") QEMU now
destroys IOREQ server for itself as every other device emulator
is supposed to do. It's now safe to remove this code from soft reset
path - existing systems with old QEMU should be able to work as
even if there are IOREQ servers left behind, a new QEMU instance will
override its ranges anyway.
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 30 Aug 2019 13:21:54 +0000 (15:21 +0200)]
x86: move INVPCID_TYPE_* to x86-defns.h
This way the insn emulator can then too use the #define-s. In place of
the TYPE infix add an X86 prefix.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 30 Aug 2019 08:24:13 +0000 (10:24 +0200)]
x86/ACPI: re-park previously parked CPUs upon resume from S3
Aiui when resuming from S3, CPUs come back out of RESET/INIT. Therefore
they need to undergo the same procedure as was added elsewhere by
commits d8f974f1a6 ("x86: command line option to avoid use of secondary
hyper-threads") and 8797d20a6e ("x86: possibly bring up all CPUs even
if not all are supposed to be used").
Just like done at boot time, avoid (at least pointlessly) using
stop-machine logic.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 29 Aug 2019 13:10:07 +0000 (15:10 +0200)]
x86: properly gate clearing of PKU feature
setup_clear_cpu_cap() is __init and hence may not be called post-boot.
Note that opt_pku nevertheless is not getting __initdata added - see
e.g. commit 43fa95ae6a ("mm: make opt_bootscrub non-init").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monné [Thu, 29 Aug 2019 13:08:46 +0000 (15:08 +0200)]
partially revert "x86/mm: Clean IOMMU flags from p2m-pt code"
This partially reverts commit 854a49a7486a02edae5b3e53617bace526e9c1b1 by re-adding the logic that
propagates changes to the domain physmap done by p2m_pt_set_entry into
the iommu page tables. Without this logic changes to the guest physmap
are not propagated to the iommu, leaving stale iommu entries that can
leak data, or failing to add new entries.
Note that this commit doesn't re-introduce iommu flags to the cpu page
table entries, since the logic to add/remove entries to the iommu page
tables is based on the p2m type and the mfn.
Fixes: 854a49a7486a02 ('x86/mm: Clean IOMMU flags from p2m-pt code') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Alexandru Isaila <aisaila@bitdefender.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Wed, 1 May 2019 17:14:03 +0000 (18:14 +0100)]
x86/boot: Drop all use of lmsw
lmsw is an obsolete relic of the 286 processor - so much so that it even lacks
intercept assistance on AMD processors.
Use a plain mov to %cr0 which is easier to follow, certainly faster to
virtualise on AMD hardware, and almost certainly a faster microcode path in
real hardware.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 12 Aug 2019 17:40:04 +0000 (18:40 +0100)]
x86/suspend: Simplify system table handling on resume
load_TR() is used exclusively in the resume path, but jumps through a lot of
unnecessary hoops. As suspend/resume is strictly on CPU0 in idle context, the
correct GDT to use is boot_gdt, which means it doesn't need saving on suspend.
Although doing more than strictly necessary, reuse load_system_tables(), which
is already used by APs on the S3 resume path.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 7 Aug 2019 11:53:51 +0000 (12:53 +0100)]
xen: Drop XEN_DOMCTL_{get,set}_machine_address_size
This functionality is obsolete. It was introduced by c/s 41296317a31 into
Xend, but was never exposed in libxl.
Nothing limits this to PV guests, but it makes no sense for HVM guests.
Looking through the XenServer templates, this was used to work around bugs in
the 32bit RHEL/CentOS 4.7 and 4.8 kernels (fixed in 4.9) and RHEL/CentOS/OEL
5.2 and 5.3 kernels (fixed in 5.4). RHEL 4 as a major version went out of
support in 2017, whereas the 5.2/5.3 kernels went out of support when 5.4 was
released in 2009.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Christian Lindig <christian.lindig@citrix.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 7 Aug 2019 11:49:37 +0000 (12:49 +0100)]
xen: Drop XEN_DOMCTL_suppress_spurious_page_faults
This functionality is obsolete. It was introduced by c/s 39407bed9c0 into
Xend, but never exposed in libxl.
While not explicitly limited to PV guests, this is PV-only by virtue of its
position in the pagefault handler.
Looking though the XenServer templates, this was used to work around bugs in
the 32bit RHEL/CentOS 4.{5..7} kernels (fixed in 4.8). RHEL 4 as a major
version when out if support in 2017.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wl@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Wed, 28 Aug 2019 14:58:45 +0000 (16:58 +0200)]
x86/hvm/domain: remove the 'hap_enabled' flag
The hap_enabled() macro can determine whether the feature is available
using the domain 'options'; there is no need for a separate flag.
NOTE: Furthermore, by extending sanitizing of the domain 'options', the
macro can be transformed into an inline function and re-located to
xen/sched.h. This also makes hap_enabled() common, thus allowing
removal of an ugly ifdef CONFIG_X86 from the common iommu code.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Wed, 28 Aug 2019 14:57:36 +0000 (16:57 +0200)]
p2m/ept: pass correct level to atomic_write_ept_entry in ept_invalidate_emt
The level passed to ept_invalidate_emt corresponds to the EPT entry
passed as the mfn parameter, which is a pointer to an EPT page table,
hence the entries in that page table will have one level less than the
parent.
Fix the call to atomic_write_ept_entry to pass the correct level, ie:
one level less than the parent.
Fixes: 50fe6e73705 ('pvh dom0: add and remove foreign pages') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>.
Julien Grall [Wed, 14 Aug 2019 09:36:07 +0000 (10:36 +0100)]
xen/arm: traps: Remove all zero padding before PRIregister format
Commit af156ff085 "xen/arm: types: Specify the zero padding in the
definition of PRIregister" moved the zero padding within the definition
of PRIregister.
However, some of the users still had zero padding before which result
to print tens of zero when dumping the CPU state.
To prevent this, remove the last users of zero padding before
PRIregister.