Xenia Ragiadakou [Thu, 11 Aug 2022 09:47:34 +0000 (11:47 +0200)]
arm/vgic: fix coding style in macro REG_RANK_INDEX()
Add parentheses around the macro parameter 's' to prevent against unintended
expansions. This, also, resolves a MISRA C 2012 Rule 20.7 violation warning.
Anthony PERARD [Thu, 11 Aug 2022 09:47:11 +0000 (11:47 +0200)]
tools/libxl: Replace deprecated -sdl option on QEMU command line
"-sdl" is deprecated upstream since 6695e4c0fd9e ("softmmu/vl:
Deprecate the -sdl and -curses option"), QEMU v6.2, and the option is
removed by 707d93d4abc6 ("ui: Remove deprecated options "-sdl" and
"-curses""), in upcoming QEMU v7.1.
Instead, use "-display sdl", available since 1472a95bab1e ("Introduce
-display argument"), before QEMU v1.0.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
Dario Faggioli [Thu, 11 Aug 2022 09:46:22 +0000 (11:46 +0200)]
xen/sched: setup dom0 vCPUs affinity only once
Right now, affinity for dom0 vCPUs is setup in two steps. This is a
problem as, at least in Credit2, unit_insert() sees and uses the
"intermediate" affinity, and place the vCPUs on CPUs where they cannot
be run. And this in turn results in boot hangs, if the "dom0_nodes"
parameter is used.
Fix this by setting up the affinity properly once and for all, in
sched_init_vcpu() called by create_vcpu().
Note that, unless a soft-affinity is explicitly specified for dom0 (by
using the relaxed mode of "dom0_nodes") we set it to the default, which
is all CPUs, instead of computing it basing on hard affinity (if any).
This is because hard and soft affinity should be considered as
independent user controlled properties. In fact, if we dor derive dom0's
soft-affinity from its boot-time hard-affinity, such computed value will
continue to be used even if later the user changes the hard-affinity.
And this could result in the vCPUs behaving differently than what the
user wanted and expects.
Fixes: dafd936dddbd ("Make credit2 the default scheduler") Reported-by: Olaf Hering <ohering@suse.de> Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/arm: vreg: Fix MISRA C 2012 Rule 20.7 violation
In VREG_REG_HELPERS(), the macro parameter 'offmask' is used as expression and
therefore it is good to be enclosed in parentheses to prevent against
unintended expansions.
xen/arm: regs: Fix MISRA C 2012 Rule 20.7 violation
In macro psr_mode(), the macro parameter 'm' is used as expression and
therefore it is good to be enclosed in parentheses to prevent against
unintended expansions.
Jason Andryuk [Tue, 19 Jul 2022 20:08:15 +0000 (16:08 -0400)]
x86: Expose more MSR_ARCH_CAPS to hwdom
commit e46474278a0e ("x86/intel: Expose MSR_ARCH_CAPS to dom0") started
exposing MSR_ARCH_CAPS to dom0. More bits in MSR_ARCH_CAPS have since
been defined, but they haven't been exposed. Update the list to allow
them through.
As one example, this allows a Linux Dom0 to know that it has the
appropriate microcode via FB_CLEAR. Notably, and with the updated
microcode, this changes dom0's
/sys/devices/system/cpu/vulnerabilities/mmio_stale_data changes from:
"Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown"
to:
"Mitigation: Clear CPU buffers; SMT Host state unknown"
This exposes the MMIO Stale Data and Intel Branch History Injection
(BHI) controls as well as the page size change MCE issue bit.
Fixes: commit 2ebe8fe9b7e0 ("x86/spec-ctrl: Enumeration for MMIO Stale Data controls") Fixes: commit cea9ae062295 ("x86/spec-ctrl: Enumeration for new Intel BHI controls") Fixes: commit 59e89cdabc71 ("x86/vtx: Disable executable EPT superpages to work around CVE-2018-12207") Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
In MASK_DECLARE_ macros, the macro parameter 'x' is used as expression and
therefore it is good to be enclosed in parentheses to prevent against
unintended expansions.
Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com>
While there add the blanks missing around the + operators involved.
Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Jane Malalane [Tue, 9 Aug 2022 09:49:43 +0000 (11:49 +0200)]
x86/kexec: Add the '.L_' prefix to is_* and call_* labels
These are local symbols and shouldn't be externally visible.
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jane Malalane <jane.malalane@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
automation: qemu-smoke-arm64: Run ping test over a pv network interface
This patch modified the test in the following way
- Dom0 is booted with an alpine linux rootfs with the xen tools.
- Once Dom0 is booted, it starts xenstored, calls init-dom0less to setup
the xenstore interface for the dom0less Dom1, setups the bridged network
and attaches a pv network interface to Dom1.
- In the meantime, Dom1 in its init script tries to assign an ip to eth0
and ping Dom0,
- If Dom1 manages to ping Dom0, it prints 'passed'.
Use kernel 5.19 to unblock testing dom0less enhanced.
This kernel version has the necessary patches for deferring xenbus probe
until xenstore is fully initialized.
Also, build kernel with bridging and xen netback support enabled because
it will be used for testing network connectivity between Dom0 and Dom1
over a pv network interface.
automation: disable xen,enhanced in qemu-smoke-arm64
Disable xen,enhanced because we don't use PV drivers in this test and
also because the kernel used for testing is old and unpatched and would
break if xen,enhanced is passed.
Edwin Török [Fri, 29 Jul 2022 17:53:25 +0000 (18:53 +0100)]
tools/ocaml/*/Makefile: generate paths.ml from configure
paths.ml contains various paths known to configure, and currently is generated
via a Makefile rule. Simplify this and generate it through configure, similar
to how oxenstored.conf is generated from oxenstored.conf.in.
This will allow to reuse the generated file more easily with Dune.
No functional change.
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Acked-by: Christian Lindig <christian.lindig@citrix.com>
Andrew Cooper [Tue, 2 Aug 2022 13:30:30 +0000 (14:30 +0100)]
x86/spec-ctrl: Use IST RSB protection for !SVM systems
There is a corner case where a VT-x guest which manages to reliably trigger
non-fatal #MC's could evade the rogue RSB speculation protections that were
supposed to be in place.
This is a lack of defence in depth; Xen does not architecturally execute more
RET than CALL instructions, so an attacker would have to locate a different
gadget (e.g. SpectreRSB) first to execute a transient path of excess RET
instructions.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/hypfs: check the return value of snprintf to avoid leaking stack accidently
The function snprintf() returns the number of characters that would have been
written in the buffer if the buffer size had been sufficiently large,
not counting the terminating null character.
Hence, the value returned is not guaranteed to be smaller than the buffer size.
Check the return value of snprintf() to prevent leaking stack contents to the
guest by accident.
Also, for debug builds, add an assertion to ensure that the assumption made on
the size of the destination buffer still holds.
xen/compiler: fix MISRA C 2012 Rule 20.7 violation
In __must_be_array(), the macro parameter 'a' is used as expression and
therefore it is good to be enclosed in parentheses to prevent against
unintended expansions.
Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Fri, 5 Aug 2022 06:36:54 +0000 (08:36 +0200)]
tools/xenstore: add documentation for new set/get-feature commands
Add documentation for two new Xenstore wire commands SET_FEATURE and
GET_FEATURE used to set or query the Xenstore features visible in the
ring page of a given domain.
When calling python tools to convert misra documentation or merge
cppcheck xml files, use $(PYTHON).
While there fix misra document conversion script to be executable.
Fixes: 57caa5375321 ("xen: Add MISRA support to cppcheck make rule") Fixes: 43aa3f6e72d3 ("xen/build: Add cppcheck and cppcheck-html make rules") Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Add git commands examples that can be used to generate fixes and how to
use the pretty configuration for git.
This should make it easier for contributors to have the right format.
Jan Beulich [Wed, 3 Aug 2022 10:10:26 +0000 (12:10 +0200)]
evtchn: convert domain event lock to an r/w one
Especially for the use in evtchn_move_pirqs() (called when moving a vCPU
across pCPU-s) and the ones in EOI handling in PCI pass-through code,
serializing perhaps an entire domain isn't helpful when no state (which
isn't e.g. further protected by the per-channel lock) changes.
Unfortunately this implies dropping of lock profiling for this lock,
until r/w locks may get enabled for such functionality.
While ->notify_vcpu_id is now meant to be consistently updated with the
per-channel lock held, an extension applies to ECS_PIRQ: The field is
also guaranteed to not change with the per-domain event lock held for
writing. Therefore the link_pirq_port() call from evtchn_bind_pirq()
could in principle be moved out of the per-channel locked regions, but
this further code churn didn't seem worth it.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Hongda Deng [Fri, 29 Jul 2022 08:36:02 +0000 (16:36 +0800)]
arm/vgic-v3: fix virq offset in the rank when storing irouter
When vGIC performs irouter registers emulation, to get the target vCPU
via virq conveniently, Xen doesn't store the irouter value directly,
instead it will use the value (affinities) in irouter to calculate the
target vCPU, and then save the target vCPU in irq rank->vcpu[offset].
When vGIC tries to get the target vCPU, it first calculates the target
vCPU index via
int target = read_atomic(&rank->vcpu[virq & INTERRUPT_RANK_MASK]);
and then it gets the target vCPU via
v->domain->vcpu[target];
When vGIC tries to store irouter for one virq, the target vCPU index
in the rank is computed as
offset &= virq & INTERRUPT_RANK_MASK;
finally it gets the target vCPU via
d->vcpu[read_atomic(&rank->vcpu[offset])];
There is a difference between them while getting the target vCPU index
in the rank. Actually (virq & INTERRUPT_RANK_MASK) would already get
the target vCPU index in the rank, it's wrong to add '&' before '=' when
calculate the offset.
For example, the target vCPU index in the rank should be 6 for virq 38,
but vGIC will get offset=0 when vGIC stores the irouter for this virq,
and finally vGIC will access the wrong target vCPU index in the rank
when updating the irouter.
Fixes: 5d495f4349b5 ("xen/arm: vgic: Optimize the way to store the target vCPU in the rank") Signed-off-by: Hongda Deng <Hongda.Deng@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
xen/efi: efibind: fix MISRA C 2012 Directive 4.10 violation
Prevent header file from being included more than once by adding ifndef guard.
In order to be close to gnu-efi code
- for x86_64, use the same guard
- for arm64, that there is no guard in gnu-efi, for consistency,
use a similar format and position to the x86_64 guard
automation: qemu-smoke-arm64.sh: Rename the device tree to avoid confusion
Rename the device tree from virt-gicv3 to virt-gicv2 to avoid confusion
since the version of the generic interrupt controller used for this test
is the v2 and not the v3.
xen/arm: domain: Fix MISRA C 2012 Rule 8.7 violation
The function idle_loop() is referenced only in domain.c.
Change its linkage from external to internal by adding the storage-class
specifier static to its definitions.
Add the function as a 'fake' input operand to the inline assembly statement,
to make the compiler aware that the function is used.
Fake means that the function is not actually used as an operand by the asm code.
That is because there is not a suitable gcc arm32 asm constraint for labels.
Declare return_to_new_vcpu32() and return_to_new_vcpu64() that are also
referenced by this inline asm statement.
Also, this patch resolves indirectly a MISRA C 2012 Rule 8.4 violation warning.
xen/arm: mm: Reduce the area that xen_second covers
At the moment, xen_second is used to cover the first 2GB of the
virtual address space. With the recent rework of the page-tables,
only the first 1GB region (where Xen resides) is effectively used.
In addition to that, I would like to reshuffle the memory layout.
So Xen mappings may not be anymore in the first 2GB of the virtual
address space.
Therefore, rework xen_second so it only covers the 1GB region where
Xen will reside.
With this change, xen_second doesn't cover anymore the xenheap area
on arm32. So, we first need to add memory to the boot allocator before
setting up the xenheap mappings.
Take the opportunity to update the comments on top of xen_fixmap and
xen_xenmap.
xen/arm: mm: Move domain_{,un}map_* helpers in a separate file
The file xen/arch/mm.c has been growing quite a lot. It now contains
various independent part of the MM subsytem.
One of them is the helpers to map/unmap a page which is only used
by arm32 and protected by CONFIG_ARCH_MAP_DOMAIN_PAGE. Move them in a
new file xen/arch/arm/domain_page.c.
xen: Rename CONFIG_DOMAIN_PAGE to CONFIG_ARCH_MAP_DOMAIN_PAGE and...
move it to Kconfig.
The define CONFIG_DOMAIN_PAGE indicates whether the architecture provide
helpers to map/unmap a domain page. Rename it to CONFIG_ARCH_MAP_DOMAIN_PAGE
so it is clearer that support for domain page is not something that
can be disabled in Xen.
Take the opportunity to move CONFIG_MAP_DOMAIN_PAGE to Kconfig as this
will soon be necessary to use it in the Makefile.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> #arm part
xen/arm32: mm: Consolidate the domheap mappings initialization
At the moment, the domheap mappings initialization is done separately for
the boot CPU and secondary CPUs. The main difference is for the former
the pages are part of Xen binary whilst for the latter they are
dynamically allocated.
It would be good to have a single helper so it is easier to rework
on the domheap is initialized.
For CPU0, we still need to use pre-allocated pages because the
allocators may use domain_map_page(), so we need to have the domheap
area ready first. But we can still delay the initialization to setup_mm().
Introduce a new helper init_domheap_mappings() that will be called
from setup_mm() for the boot CPU and from init_secondary_pagetables()
for secondary CPUs.
At the moment, *_VIRT_END may either point to the address after the end
or the last address of the region.
The lack of consistency make quite difficult to reason with them.
Furthermore, there is a risk of overflow in the case where the address
points past to the end. I am not aware of any cases, so this is only a
latent bug.
Start to solve the problem by removing all the *_VIRT_END exclusively used
by the Arm code and add *_VIRT_SIZE when it is not present.
Take the opportunity to rename BOOT_FDT_SLOT_SIZE to BOOT_FDT_VIRT_SIZE
for better consistency and use _AT(vaddr_t, ).
Also take the opportunity to fix the coding style of the comment touched
in mm.c.
xsm/dummy: fix MISRA C 2012 Directive 4.10 violation
Protect header file from being included more than once by adding ifndef guard.
Signed-off-by: Xenia Ragiadakou <burzalodowa@gmail.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Jan Beulich [Fri, 29 Jul 2022 06:50:25 +0000 (08:50 +0200)]
x86/shadow: drop CONFIG_HVM conditionals from sh_update_cr3()
Now that we're not building multi.c anymore for 2 and 3 guest levels
when !HVM, there's no point in having these conditionals anymore. (As
somewhat a special case, the last of the removed conditionals really
builds on shadow_mode_external() always returning false when !HVM.) This
way the code becomes a tiny bit more readable.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 29 Jul 2022 06:48:26 +0000 (08:48 +0200)]
x86/shadow: properly handle get_page() failing
We should not blindly (in a release build) insert the new entry in the
hash if a reference to the guest page cannot be obtained, or else an
excess reference would be put when removing the hash entry again. Crash
the domain in that case instead. The sole caller doesn't further care
about the state of the guest page: All it does is return the
corresponding shadow page (which was obtained successfully before) to
its caller.
To compensate we further need to adjust hash removal: Since the shadow
page already has had its backlink set, domain cleanup code would try to
destroy the shadow, and hence still cause a put_page() without
corresponding get_page(). Leverage that the failed get_page() leads to
no hash insertion, making shadow_hash_delete() no longer assume it will
find the requested entry. Instead return back whether the entry was
found. This way delete_shadow_status() can avoid calling put_page() in
the problem scenario.
For the other caller of shadow_hash_delete() simply reinstate the
otherwise dropped assertion at the call site.
While touching the conditionals in {set,delete}_shadow_status() anyway,
also switch around their two pre-existing parts, to have the cheap one
first (frequently allowing to avoid evaluation of the expensive - due to
evaluate_nospec() - one altogether).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
automation: arm64: Create a test job for testing static allocation on qemu
Enable CONFIG_STATIC_MEMORY in the existing arm64 build.
Create a new test job, called qemu-smoke-arm64-gcc-staticmem.
Adjust qemu-smoke-arm64.sh script to accomodate the static memory test as a
new test variant. The test variant is determined based on the first argument
passed to the script. For testing static memory, the argument is 'static-mem'.
The test configures DOM1 with a static memory region and adds a check in the
init script.
The check consists in comparing the contents of the /proc/device-tree
memory entry with the static memory range with which DOM1 was configured.
If the memory layout is correct, a message gets printed by DOM1.
At the end of the qemu run, the script searches for the specific message
in the logs and fails if not found.
The EXPERT config option cannot anymore be selected via the environmental
variable XEN_CONFIG_EXPERT. Remove stale references to XEN_CONFIG_EXPERT
from the automation code.
libxl/arm: Create specific IOMMU node to be referred by virtio-mmio device
Reuse generic IOMMU device tree bindings to communicate Xen specific
information for the virtio devices for which the restricted memory
access using Xen grant mappings need to be enabled.
Insert "iommus" property pointed to the IOMMU node with "xen,grant-dma"
compatible to all virtio devices which backends are going to run in
non-hardware domains (which are non-trusted by default).
Based on device-tree binding from Linux:
Documentation/devicetree/bindings/iommu/xen,grant-dma.yaml
This patch introduces helpers to allocate Virtio MMIO params
(IRQ and memory region) and create specific device node in
the Guest device-tree with allocated params. In order to deal
with multiple Virtio devices, reserve corresponding ranges.
For now, we reserve 1MB for memory regions and 10 SPIs.
As these helpers should be used for every Virtio device attached
to the Guest, call them for Virtio disk(s).
Please note, with statically allocated Virtio IRQs there is
a risk of a clash with a physical IRQs of passthrough devices.
For the first version, it's fine, but we should consider allocating
the Virtio IRQs automatically. Thankfully, we know in advance which
IRQs will be used for passthrough to be able to choose non-clashed
ones.
This patch adds basic support for configuring and assisting virtio-mmio
based virtio-disk backend (emulator) which is intended to run out of
Qemu and could be run in any domain.
Although the Virtio block device is quite different from traditional
Xen PV block device (vbd) from the toolstack's point of view:
- as the frontend is virtio-blk which is not a Xenbus driver, nothing
written to Xenstore are fetched by the frontend currently ("vdev"
is not passed to the frontend). But this might need to be revised
in future, so frontend data might be written to Xenstore in order to
support hotplugging virtio devices or passing the backend domain id
on arch where the device-tree is not available.
- the ring-ref/event-channel are not used for the backend<->frontend
communication, the proposed IPC for Virtio is IOREQ/DM
it is still a "block device" and ought to be integrated in existing
"disk" handling. So, re-use (and adapt) "disk" parsing/configuration
logic to deal with Virtio devices as well.
For the immediate purpose and an ability to extend that support for
other use-cases in future (Qemu, virtio-pci, etc) perform the following
actions:
- Add new disk backend type (LIBXL_DISK_BACKEND_STANDALONE) and reflect
that in the configuration
- Introduce new disk "specification" and "transport" fields to struct
libxl_device_disk. Both are written to the Xenstore. The transport
field is only used for the specification "virtio" and it assumes
only "mmio" value for now.
- Introduce new "specification" option with "xen" communication
protocol being default value.
- Add new device kind (LIBXL__DEVICE_KIND_VIRTIO_DISK) as current
one (LIBXL__DEVICE_KIND_VBD) doesn't fit into Virtio disk model
An example of domain configuration for Virtio disk:
disk = [ 'phy:/dev/mmcblk0p3, xvda1, backendtype=standalone, specification=virtio']
Nothing has changed for default Xen disk configuration.
Please note, this patch is not enough for virtio-disk to work
on Xen (Arm), as for every Virtio device (including disk) we need
to allocate Virtio MMIO params (IRQ and memory region) and pass
them to the backend, also update Guest device-tree. The subsequent
patch will add these missing bits. For the current patch,
the default "irq" and "base" are just written to the Xenstore.
This is not an ideal splitting, but this way we avoid breaking
the bisectability.
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Jan Beulich [Wed, 27 Jul 2022 11:00:08 +0000 (13:00 +0200)]
x86/PV: correct post-preemption progress recording in iommu_memory_setup()
Coverity validly points out that the mfn_add() as used was dead code.
Coverity ID: 1507475 Fixes: c1e1564c8995 ("IOMMU/x86: perform PV Dom0 mappings in batches") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 27 Jul 2022 10:58:50 +0000 (12:58 +0200)]
mm: enforce return value checking on get_page()
It's hard to imagine a case where an error may legitimately be ignored
here. It's bad enough that in at least one case (set_shadow_status())
the return value was checked only by way of ASSERT()ing.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Wed, 27 Jul 2022 10:58:16 +0000 (12:58 +0200)]
x86/shadow: drop shadow_prepare_page_type_change()'s 3rd parameter
As of 8cc5036bc385 ("x86/pv: Fix ABAC cmpxchg() race in
_get_page_type()") this no longer needs passing separately - the type
can now be read from struct page_info, as the call now happens after its
writing.
While there also constify the 2nd parameter.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Edwin Török [Wed, 27 Jul 2022 10:57:10 +0000 (12:57 +0200)]
x86/msr: fix X2APIC_LAST
The latest Intel manual now says the X2APIC reserved range is only
0x800 to 0x8ff (NOT 0xbff).
This changed between SDM 68 (Nov 2018) and SDM 69 (Jan 2019).
The AMD manual documents 0x800-0x8ff too.
There are non-X2APIC MSRs in the 0x900-0xbff range now:
e.g. 0x981 is IA32_TME_CAPABILITY, an architectural MSR.
The new MSR in this range appears to have been introduced in Icelake,
so this commit should be backported to Xen versions supporting Icelake.
Backport: 4.13+
Signed-off-by: Edwin Török <edvin.torok@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 26 Jul 2022 13:11:33 +0000 (14:11 +0100)]
x86/vpmu: Fix build following vmfork addition
GCC with IBT extensions complains:
arch/x86/cpu/vpmu.c:351:15: error: conflicting types for 'vpmu_save_force'; have 'void(void *)' with implied 'nocf_check' attribute
351 | void cf_check vpmu_save_force(void *arg)
| ^~~~~~~~~~~~~~~
In file included from ./arch/x86/include/asm/domain.h:10,
from ./include/xen/domain.h:8,
from ./include/xen/sched.h:11,
from ./include/xen/event.h:12,
from arch/x86/cpu/vpmu.c:23:
./arch/x86/include/asm/vpmu.h:117:6: note: previous declaration of 'vpmu_save_force' with type 'void(void *)'
117 | void vpmu_save_force(void *arg);
| ^~~~~~~~~~~~~~~
Adjust the declaraion.
Fixes: 755087eb9b10 ("xen/mem_sharing: support forks with active vPMU state") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 19 Jul 2022 20:37:43 +0000 (21:37 +0100)]
x86/pv: Inject #GP for implicit grant unmaps
This is a debug behaviour to identify buggy kernels. Crashing the domain is
the most unhelpful thing to do, because it discards the relevant context.
Instead, inject #GP[0] like other permission errors in x86. In particular,
this lets the kernel provide a backtrace which is more likely to be helpful to
a developer.
As a bugfix, this always injects #GP[0] to current, not l1e_owner. It is not
l1e_owner's fault if dom0 using superpowers triggers an implicit unmap.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 26 Jul 2022 12:54:34 +0000 (14:54 +0200)]
x86/mm: correct TLB flush condition in _get_page_type()
When this logic was moved, it was moved across the point where nx is
updated to hold the new type for the page. IOW originally it was
equivalent to using x (and perhaps x would better have been used), but
now it isn't anymore. Switch to using x, which then brings things in
line again with the slightly earlier comment there (now) talking about
transitions _from_ writable.
I have to confess though that I cannot make a direct connection between
the reported observed behavior of guests leaving several pages around
with pending general references and the change here. Repeated testing,
nevertheless, confirms the reported issue is no longer there.
This is CVE-2022-33745 / XSA-408.
Reported-by: Charles Arnold <carnold@suse.com> Fixes: 8cc5036bc385 ("x86/pv: Fix ABAC cmpxchg() race in _get_page_type()") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
In common/memory.c the ifdef code surrounding ptdom_max_order is
using HAS_PASSTHROUGH instead of CONFIG_HAS_PASSTHROUGH, fix the
problem using the correct macro.
Fixes: e0d44c1f9461 ("build: convert HAS_PASSTHROUGH use to Kconfig") Signed-off-by: Luca Fancellu <luca.fancellu@arm.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 26 Jul 2022 06:33:10 +0000 (08:33 +0200)]
page-alloc: fix initialization of cross-node regions
Quite obviously to determine the split condition successive pages'
attributes need to be evaluated, not always those of the initial page.
Fixes: 72b02bc75b47 ("xen/heap: pass order to free_heap_pages() in heap init") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Jan Beulich [Mon, 25 Jul 2022 13:46:21 +0000 (15:46 +0200)]
include: correct re-building conditions around hypercall-defs.h
For a .cmd file to be picked up, the respective target needs to be
listed in $(targets). This wasn't the case for hypercall-defs.i, leading
to permanent re-building even on an entirely unchanged tree (because of
the command apparently having changed).
In exchange the target doesn't need naming in $(clean-files) anymore.
Fixes: eca1f00d0227 ("xen: generate hypercall interface related code") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Tamas K Lengyel [Mon, 25 Jul 2022 13:44:33 +0000 (15:44 +0200)]
xen/mem_sharing: support forks with active vPMU state
Currently the vPMU state from a parent isn't copied to VM forks. To enable the
vPMU state to be copied to a fork VM we export certain vPMU functions. First,
the vPMU context needs to be allocated for the fork if the parent has one. For
this we introduce vpmu->allocate_context, which has previously only been called
when the guest enables the PMU on itself. Furthermore, we export
vpmu_save_force so that the PMU context can be saved on-demand even if no
context switch took place on the parent's CPU yet. Additionally, we make sure
all relevant configuration MSRs are saved in the vPMU context so the copy is
complete and the fork starts with the same PMU config as the parent.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Re-generate goland bindings to reflect changes to libxl_types.idl
from the following commit: 54d8f27d0477 tools/libxl: report trusted backend status to frontends
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Mon, 25 Jul 2022 13:43:35 +0000 (15:43 +0200)]
VT-d: fold dma_pte_clear_one() into its only caller
This way intel_iommu_unmap_page() ends up quite a bit more similar to
intel_iommu_map_page().
No functional change intended.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:42:33 +0000 (15:42 +0200)]
IOMMU/x86: add perf counters for page table splitting / coalescing
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin tian <kevin.tian@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:41:48 +0000 (15:41 +0200)]
VT-d: replace all-contiguous page tables by superpage mappings
When a page table ends up with all contiguous entries (including all
identical attributes), it can be replaced by a superpage entry at the
next higher level. The page table itself can then be scheduled for
freeing.
The adjustment to LEVEL_MASK is merely to avoid leaving a latent trap
for whenever we (and obviously hardware) start supporting 512G mappings.
Note that cache sync-ing is likely more strict than necessary. This is
both to be on the safe side as well as to maintain the pattern of all
updates of (potentially) live tables being accompanied by a flush (if so
needed).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:41:12 +0000 (15:41 +0200)]
AMD/IOMMU: replace all-contiguous page tables by superpage mappings
When a page table ends up with all contiguous entries (including all
identical attributes), it can be replaced by a superpage entry at the
next higher level. The page table itself can then be scheduled for
freeing.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:40:41 +0000 (15:40 +0200)]
VT-d: free all-empty page tables
When a page table ends up with no present entries left, it can be
replaced by a non-present entry at the next higher level. The page table
itself can then be scheduled for freeing.
Note that while its output isn't used there yet,
pt_update_contig_markers() right away needs to be called in all places
where entries get updated, not just the one where entries get cleared.
Note further that while pt_update_contig_markers() updates perhaps
several PTEs within the table, since these are changes to "avail" bits
only I do not think that cache flushing would be needed afterwards. Such
cache flushing (of entire pages, unless adding yet more logic to me more
selective) would be quite noticable performance-wise (very prominent
during Dom0 boot).
Also note that cache sync-ing is likely more strict than necessary. This
is both to be on the safe side as well as to maintain the pattern of all
updates of (potentially) live tables being accompanied by a flush (if so
needed).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:40:00 +0000 (15:40 +0200)]
AMD/IOMMU: free all-empty page tables
When a page table ends up with no present entries left, it can be
replaced by a non-present entry at the next higher level. The page table
itself can then be scheduled for freeing.
Note that while its output isn't used there yet,
pt_update_contig_markers() right away needs to be called in all places
where entries get updated, not just the one where entries get cleared.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:38:22 +0000 (15:38 +0200)]
IOMMU/x86: prefill newly allocate page tables
Page tables are used for two purposes after allocation: They either
start out all empty, or they are filled to replace a superpage.
Subsequently, to replace all empty or fully contiguous page tables,
contiguous sub-regions will be recorded within individual page tables.
Install the initial set of markers immediately after allocation. Make
sure to retain these markers when further populating a page table in
preparation for it to replace a superpage.
The markers are simply 4-bit fields holding the order value of
contiguous entries. To demonstrate this, if a page table had just 16
entries, this would be the initial (fully contiguous) set of markers:
index 0 1 2 3 4 5 6 7 8 9 A B C D E F
marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
"Contiguous" here means not only present entries with successively
increasing MFNs, each one suitably aligned for its slot, and identical
attributes, but also a respective number of all non-present (zero except
for the markers) entries.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:37:34 +0000 (15:37 +0200)]
x86: introduce helper for recording degree of contiguity in page tables
This is a re-usable helper (kind of a template) which gets introduced
without users so that the individual subsequent patches introducing such
users can get committed independently of one another.
See the comment at the top of the new file. To demonstrate the effect,
if a page table had just 16 entries, this would be the set of markers
for a page table with fully contiguous mappings:
index 0 1 2 3 4 5 6 7 8 9 A B C D E F
marker 4 0 1 0 2 0 1 0 3 0 1 0 2 0 1 0
"Contiguous" here means not only present entries with successively
increasing MFNs, each one suitably aligned for its slot, but also a
respective number of all non-present entries.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:36:33 +0000 (15:36 +0200)]
VT-d: allow use of superpage mappings
... depending on feature availability (and absence of quirks).
Also make the page table dumping function aware of superpages.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:35:40 +0000 (15:35 +0200)]
AMD/IOMMU: allow use of superpage mappings
No separate feature flags exist which would control availability of
these; the only restriction is HATS (establishing the maximum number of
page table levels in general), and even that has a lower bound of 4.
Thus we can unconditionally announce 2M and 1G mappings. (Via non-
default page sizes the implementation in principle permits arbitrary
size mappings, but these require multiple identical leaf PTEs to be
written, which isn't all that different from having to write multiple
consecutive PTEs with increasing frame numbers. IMO that's therefore
beneficial only on hardware where suitable TLBs exist; I'm unaware of
such hardware.)
Note that in principle 512G and 256T mappings could also be supported
right away, but the freeing of page tables (to be introduced in
subsequent patches) when replacing a sufficiently populated tree with a
single huge page would need suitable preemption, which will require
extra work.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:34:55 +0000 (15:34 +0200)]
IOMMU/x86: new command line option to suppress use of superpage mappings
Before actually enabling their use, provide a means to suppress it in
case of problems. Note that using the option can also affect the sharing
of page tables in the VT-d / EPT combination: If EPT would use large
page mappings but the option is in effect, page table sharing would be
suppressed (to properly fulfill the admin request).
Requested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:33:34 +0000 (15:33 +0200)]
IOMMU/x86: support freeing of pagetables
For vendor specific code to support superpages we need to be able to
deal with a superpage mapping replacing an intermediate page table (or
hierarchy thereof). Consequently an iommu_alloc_pgtable() counterpart is
needed to free individual page tables while a domain is still alive.
Since the freeing needs to be deferred until after a suitable IOTLB
flush was performed, released page tables get queued for processing by a
tasklet.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org>
Jan Beulich [Mon, 25 Jul 2022 13:32:59 +0000 (15:32 +0200)]
IOMMU/x86: perform PV Dom0 mappings in batches
For large page mappings to be easily usable (i.e. in particular without
un-shattering of smaller page mappings) and for mapping operations to
then also be more efficient, pass batches of Dom0 memory to iommu_map().
In dom0_construct_pv() and its helpers (covering strict mode) this
additionally requires establishing the type of those pages (albeit with
zero type references).
The earlier establishing of PGT_writable_page | PGT_validated requires
the existing places where this gets done (through get_page_and_type())
to be updated: For pages which actually have a mapping, the type
refcount needs to be 1.
There is actually a related bug that gets fixed here as a side effect:
Typically the last L1 table would get marked as such only after
get_page_and_type(..., PGT_writable_page). While this is fine as far as
refcounting goes, the page did remain mapped in the IOMMU in this case
(when "iommu=dom0-strict").
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
The loop in iommu_{,un}map() can be arbitrary large, and as such it
needs to handle preemption. Introduce a new flag that signals whether
the function should do preemption checks, returning the number of pages
that have been processed in case a need for preemption was actually
found.
Note that the cleanup done in iommu_map() can now be incomplete if
preemption has happened, and hence callers would need to take care of
unmapping the whole range (ie: ranges already mapped by previously
preempted calls). So far none of the callers care about having those
ranges unmapped, so error handling in arch_iommu_hwdom_init() can be
kept as-is.
Note that iommu_legacy_{un,}map() are left without preemption handling:
callers of those interfaces aren't going to modified to pass bigger
chunks, and hence the functions won't be modified as they are legacy and
uses should be replaced with iommu_{un,}map() instead if preemption is
required.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul@xen.org>
Anthony PERARD [Thu, 21 Jul 2022 12:46:02 +0000 (13:46 +0100)]
automation: use "needs" instead of "dependencies" for test jobs
Like with "dependencies", the jobs will get artifacts from the jobs
listed in "needs". But the test jobs can run as soon as the build jobs
listed have finished.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Hongyan Xia [Wed, 24 Feb 2021 18:43:13 +0000 (18:43 +0000)]
xen/heap: pass order to free_heap_pages() in heap init
The idea is to split the range into multiple aligned power-of-2 regions
which only needs to call free_heap_pages() once each. We check the least
significant set bit of the start address and use its bit index as the
order of this increment. This makes sure that each increment is both
power-of-2 and properly aligned, which can be safely passed to
free_heap_pages(). Of course, the order also needs to be sanity checked
against the upper bound and MAX_ORDER.
Tested on a nested environment on c5.metal with various amount
of RAM and CONFIG_DEBUG=n. Time for end_boot_allocator() to complete:
Before After
- 90GB: 1445 ms 96 ms
- 8GB: 126 ms 8 ms
- 4GB: 62 ms 4 ms
At the moment, init_heap_pages() will call free_heap_pages() page
by page. To reduce the time to initialize the heap, we will want
to provide multiple pages at the same time.
init_heap_pages() is now split in two parts:
- init_heap_pages(): will break down the range in multiple set
of contiguous pages. For now, the criteria is the pages should
belong to the same NUMA node.
- _init_heap_pages(): will initialize a set of pages belonging to
the same NUMA node. In a follow-up patch, new requirements will
be added (e.g. pages should belong to the same zone). For now the
pages are still passed one by one to free_heap_pages().
Note that the comment before init_heap_pages() is heavily outdated and
does not reflect the current code. So update it.
This patch is a merge/rework of patches from David Woodhouse and
Hongyan Xia.
Signed-off-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/gnttab: Store frame GFN in struct page_info on Arm
Rework Arm implementation to store grant table frame GFN
in struct page_info directly instead of keeping it in
standalone status/shared arrays. This patch is based on
the assumption that a grant table page is a xenheap page.
To cover 64-bit/40-bit IPA on Arm64/Arm32 we need the space
to hold 52-bit/28-bit + extra bit value respectively. In order
to not grow the size of struct page_info borrow the required
amount of bits from type_info's count portion which current
context won't suffer (currently only 1 bit is used on Arm).
Please note, to minimize code changes and avoid introducing
an extra #ifdef-s to the header, we keep the same amount of
bits on both subarches, although the count portion on Arm64
could be wider, so we waste some bits here.
Introduce corresponding PGT_* constructs and access macro
page_get(set)_xenheap_gfn. Please note, all accesses to
the GFN portion of type_info field should always be protected
by the P2M lock. In case when it is not feasible to satisfy
that requirement (risk of deadlock, lock inversion, etc)
it is important to make sure that all non-protected updates
to this field are atomic.
As several non-protected read accesses still exist within
current code (most calls to page_get_xenheap_gfn() are not
protected by the P2M lock) the subsequent patch will introduce
hardening code for p2m_remove_mapping() to be called with P2M
lock held in order to check any difference between what is
already mapped and what is requested to be ummapped.
Update existing gnttab macros to deal with GFN value according
to new location. Also update the use of count portion of type_info
field on Arm in share_xen_page_with_guest().
While at it, extend this simplified M2P-like approach for any
xenheap pages which are proccessed in xenmem_add_to_physmap_one()
except foreign ones. Update the code to set GFN portion after
establishing new mapping for the xenheap page in said function
and to clean GFN portion when putting a reference on that page
in p2m_put_l3_page().
And for everything to work correctly introduce arch-specific
initialization pattern PGT_TYPE_INFO_INITIALIZER to be applied
to type_info field during initialization at alloc_heap_pages()
and acquire_staticmem_pages(). The pattern's purpose on Arm
is to clear the GFN portion before use, on x86 it is just
a stub.
This patch is intended to fix the potential issue on Arm
which might happen when remapping grant-table frame.
A guest (or the toolstack) will unmap the grant-table frame
using XENMEM_remove_physmap. This is a generic hypercall,
so on x86, we are relying on the fact the M2P entry will
be cleared on removal. For architecture without the M2P,
the GFN would still be present in the grant frame/status
array. So on the next call to map the page, we will end up to
request the P2M to remove whatever mapping was the given GFN.
This could well be another mapping.
Please note, this patch also changes the behavior how the shared_info
page (which is xenheap RAM page) is mapped in xenmem_add_to_physmap_one().
Now, we only allow to map the shared_info at once. The subsequent
attempts to map it will result in -EBUSY. Doing that we mandate
the caller to first unmap the page before mapping it again. This is
to prevent Xen creating an unwanted hole in the P2M. For instance,
this could happen if the firmware stole a RAM address for mapping
the shared_info page into but forgot to unmap it afterwards.
Besides that, this patch simplifies arch code on Arm by
removing arrays and corresponding management code and
as the result gnttab_init_arch/gnttab_destroy_arch helpers
and struct grant_table_arch become useless and can be
dropped globally.
xen/arm: Harden the P2M code in p2m_remove_mapping()
Borrow the x86's check from p2m_remove_page() which was added
by the following commit: c65ea16dbcafbe4fe21693b18f8c2a3c5d14600e
"x86/p2m: don't assert that the passed in MFN matches for a remove"
and adjust it to the Arm code base.
Basically, this check will be strictly needed for the xenheap pages
after applying a subsequent commit which will introduce xenheap based
M2P approach on Arm. But, it will be a good opportunity to harden
the P2M code for *every* RAM pages since it is possible to remove
any GFN - MFN mapping currently on Arm (even with the wrong helpers).
Jan Beulich [Wed, 20 Jul 2022 13:48:49 +0000 (15:48 +0200)]
x86: also suppress use of MMX insns
Passing -mno-sse alone is not enough: The compiler may still find
(questionable) reasons to use MMX insns. In particular with gcc12 use
of MOVD+PUNPCKLDQ+MOVQ was observed in an apparent attempt to auto-
vectorize the storing of two adjacent zeroes, 32 bits each.
Reported-by: ChrisD <chris@dalessio.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 20 Jul 2022 13:46:48 +0000 (15:46 +0200)]
x86emul: add memory operand low bits checks for ENQCMD{,S}
Already ISE rev 044 added text to this effect; rev 045 further dropped
leftover earlier text indicating the contrary:
- ENQCMD requires the low 32 bits of the memory operand to be clear,
- ENDCMDS requires bits 20...30 of the memory operand to be clear.
Fixes: d27385968741 ("x86emul: support ENQCMD insns") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Mon, 18 Jul 2022 13:15:08 +0000 (14:15 +0100)]
x86/spec-ctrl: Make svm_vmexit_spec_ctrl conditional
The logic was written this way out of an abundance of caution, but the reality
is that AMD parts don't currently have the RAS-flushing side effect, nor do
they intend to gain it.
This removes one WRMSR from the VMExit path by default on Zen2 systems.
Fixes: 614cec7d79d7 ("x86/svm: VMEntry/Exit logic for MSR_SPEC_CTRL") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 30 Jun 2022 21:15:25 +0000 (22:15 +0100)]
x86/spec-ctrl: Consistently halt speculation using int3
The RSB stuffing loop and retpoline thunks date from the very beginning, when
halting speculation was a brand new field.
These days, we've largely settled on int3 for halting speculation in
non-architectural paths. It's a single byte, and is fully serialising - a
requirement for delivering #BP if it were to execute.
Update the thunks. Mostly for consistency across the codebase, but it does
shrink every entrypath in Xen by 6 bytes which is a marginal win.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
tools/xl: use sparse init for dom_info, remove duplicate vars
Rather than having shadow variables for every element of dom_info, it is
better to properly initialize dom_info at the start. This also removes
the misleading memset() in the middle of main_create().
Remove the dryrun element of domain_create as that has been displaced
by the global "dryrun_only" variable.
Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Tue, 19 Jul 2022 06:37:29 +0000 (08:37 +0200)]
x86: deal with gcc12 release build issues
While a number of issues we previously had with pre-release gcc12 were
fixed in the final release, we continue to have one issue (with multiple
instances) when doing release builds (i.e. at higher optimization
levels): The compiler takes issue with subtracting (always 1 in our
case) from artifical labels (expressed as array) marking the end of
certain regions. This isn't an unreasonable position to take. Simply
hide the "array-ness" by casting to an integer type. To keep things
looking consistently, apply the same cast also on the respective
expressions dealing with the starting addresses. (Note how
efi_arch_memory_setup()'s l2_table_offset() invocations avoid a similar
issue by already having the necessary casts.) In is_xen_fixed_mfn()
further switch from __pa() to virt_to_maddr() to better match the left
sides of the <= operators.
Reported-by: Charles Arnold <carnold@suse.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>