Jan Beulich [Tue, 5 Apr 2022 12:17:21 +0000 (14:17 +0200)]
VT-d: drop ownership checking from domain_context_mapping_one()
Despite putting in quite a bit of effort it was not possible to
establish why exactly this code exists (beyond possibly sanity
checking). Instead of a subsequent change further complicating this
logic, simply get rid of it.
Take the opportunity and move the respective unmap_vtd_domain_page() out
of the locked region.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
This is to make more obvious that nothing outside of domain_iommu(d)
actually changes or is otherwise needed by the function.
No functional change intended.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 5 Apr 2022 12:16:10 +0000 (14:16 +0200)]
VT-d: fix add/remove ordering when RMRRs are in use
In the event that the RMRR mappings are essential for device operation,
they should be established before updating the device's context entry,
while they should be torn down only after the device's context entry was
successfully cleared.
Also switch to %pd in related log messages.
Fixes: fa88cfadf918 ("vt-d: Map RMRR in intel_iommu_add_device() if the device has RMRR") Fixes: 8b99f4400b69 ("VT-d: fix RMRR related error handling") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 5 Apr 2022 12:15:33 +0000 (14:15 +0200)]
VT-d: fix (de)assign ordering when RMRRs are in use
In the event that the RMRR mappings are essential for device operation,
they should be established before updating the device's context entry,
while they should be torn down only after the device's context entry was
successfully updated.
Also adjust a related log message.
This is CVE-2022-26358 / part of XSA-400.
Fixes: 8b99f4400b69 ("VT-d: fix RMRR related error handling") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul@xen.org> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Tue, 5 Apr 2022 12:12:27 +0000 (14:12 +0200)]
VT-d: correct ordering of operations in cleanup_domid_map()
The function may be called without any locks held (leaving aside the
domctl one, which we surely don't want to depend on here), so needs to
play safe wrt other accesses to domid_map[] and domid_bitmap[]. This is
to avoid context_set_domain_id()'s writing of domid_map[] to be reset to
zero right away in the case of it racing the freeing of a DID.
For the interaction with context_set_domain_id() and did_to_domain_id()
see the code comment.
{check_,}cleanup_domid_map() are called with pcidevs_lock held or during
domain cleanup only (and pcidevs_lock is also held around
context_set_domain_id()), i.e. racing calls with the same (dom, iommu)
tuple cannot occur.
domain_iommu_domid(), besides its use by cleanup_domid_map(), has its
result used only to control flushing, and hence a stale result would
only lead to a stray extra flush.
This is CVE-2022-26357 / XSA-399.
Fixes: b9c20c78789f ("VT-d: per-iommu domain-id") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Wed, 23 Feb 2022 08:40:40 +0000 (09:40 +0100)]
x86/hap: do not switch on log dirty for VRAM tracking
XEN_DMOP_track_dirty_vram possibly calls into paging_log_dirty_enable
when using HAP mode, and it can interact badly with other ongoing
paging domctls, as XEN_DMOP_track_dirty_vram is not holding the domctl
lock.
This was detected as a result of the following assert triggering when
doing repeated migrations of a HAP HVM domain with a stubdom:
Assertion 'd->arch.paging.log_dirty.allocs == 0' failed at paging.c:198
----[ Xen-4.17-unstable x86_64 debug=y Not tainted ]----
CPU: 34
RIP: e008:[<ffff82d040314b3b>] arch/x86/mm/paging.c#paging_free_log_dirty_bitmap+0x606/0x6
RFLAGS: 0000000000010206 CONTEXT: hypervisor (d0v23)
[...]
Xen call trace:
[<ffff82d040314b3b>] R arch/x86/mm/paging.c#paging_free_log_dirty_bitmap+0x606/0x63a
[<ffff82d040279f96>] S xsm/flask/hooks.c#domain_has_perm+0x5a/0x67
[<ffff82d04031577f>] F paging_domctl+0x251/0xd41
[<ffff82d04031640c>] F paging_domctl_continuation+0x19d/0x202
[<ffff82d0403202fa>] F pv_hypercall+0x150/0x2a7
[<ffff82d0403a729d>] F lstar_enter+0x12d/0x140
Such assert triggered because the stubdom used
XEN_DMOP_track_dirty_vram while dom0 was in the middle of executing
XEN_DOMCTL_SHADOW_OP_OFF, and so log dirty become enabled while
retiring the old structures, thus leading to new entries being
populated in already clear slots.
Fix this by not enabling log dirty for VRAM tracking, similar to what
is done when using shadow instead of HAP. Call
p2m_enable_hardware_log_dirty when enabling VRAM tracking in order to
get some hardware assistance if available. As a side effect the memory
pressure on the p2m pool should go down if only VRAM tracking is
enabled, as the dirty bitmap is no longer allocated.
Note that paging_log_dirty_range (used to get the dirty bitmap for
VRAM tracking) doesn't use the log dirty bitmap, and instead relies on
checking whether each gfn on the range has been switched from
p2m_ram_logdirty to p2m_ram_rw in order to account for dirty pages.
This is CVE-2022-26356 / XSA-397.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 5 Apr 2022 09:40:58 +0000 (11:40 +0200)]
x86/time: use fake read_tsc()
Go a step further than bed9ae54df44 ("x86/time: switch platform timer
hooks to altcall") did and eliminate the "real" read_tsc() altogether:
It's not used except in pointer comparisons, and hence it looks overall
more safe to simply poison plt_tsc's read_counter hook.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 5 Apr 2022 09:38:04 +0000 (11:38 +0200)]
x86/APIC: make connections between seemingly arbitrary numbers
Making adjustments to arbitrarily chosen values shouldn't require
auditing the code for possible derived numbers - such a change should
be doable in a single place, having an effect on all code depending on
that choice.
For one make the TDCR write actually use APIC_DIVISOR. With the
necessary mask constant introduced, also use that in vLAPIC code. While
introducing the constant, drop APIC_TDR_DIV_TMBASE: The bit has been
undefined in halfway recent SDM and PM versions.
And then introduce a constant tying together the scale used when
converting nanoseconds to bus clocks.
No functional change intended.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 5 Apr 2022 09:36:32 +0000 (11:36 +0200)]
x86/APIC: calibrate against platform timer when possible
Use the original calibration against PIT only when the platform timer
is PIT. This implicitly excludes the "xen_guest" case from using the PIT
logic (init_pit() fails there, and as of 5e73b2594c54 ["x86/time: minor
adjustments to init_pit()"] using_pit also isn't being set too early
anymore), so the respective hack there can be dropped at the same time.
This also reduces calibration time from 100ms to 50ms, albeit this step
is being skipped as of 0731a56c7c72 ("x86/APIC: no need for timer
calibration when using TDT") anyway.
While re-indenting the PIT logic in calibrate_APIC_clock(), besides
adjusting style also switch around the 2nd TSC/TMCCT read pair, to match
the order of the 1st one, yielding more consistent deltas.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
tools/firmware: do not add a .note.gnu.property section
Prevent the assembler from creating a .note.gnu.property section on
the output objects, as it's not useful for firmware related binaries,
and breaks the resulting rombios image.
This requires modifying the cc-option Makefile macro so it can test
assembler options (by replacing the usage of the -S flag with -c) and
also stripping the -Wa, prefix if present when checking for the test
output.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
tools/firmware: fix setting of fcf-protection=none
Setting the fcf-protection=none option in EMBEDDED_EXTRA_CFLAGS in the
Makefile doesn't get it propagated to the subdirectories, so instead
set the flag in firmware/Rules.mk, like it's done for other compiler
flags.
Fixes: 3667f7f8f7 ('x86: Introduce support for CET-IBT') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jason Andryuk [Fri, 1 Apr 2022 14:33:10 +0000 (10:33 -0400)]
libxl: Re-scope qmp_proxy_spawn.ao usage
I've observed this failed assertion:
libxl_event.c:2057: libxl__ao_inprogress_gc: Assertion `ao' failed.
AFAICT, this is happening in qmp_proxy_spawn_outcome where
sdss->qmp_proxy_spawn.ao is NULL.
The out label of spawn_stub_launch_dm() calls qmp_proxy_spawn_outcome(),
but it is only in the success path that sdss->qmp_proxy_spawn.ao gets
set to the current ao.
qmp_proxy_spawn_outcome() should instead use sdss->dm.spawn.ao, which is
the already in-use ao when spawn_stub_launch_dm() is called. The same
is true for spawn_qmp_proxy().
With this, move sdss->qmp_proxy_spawn.ao initialization to
spawn_qmp_proxy() since its use is for libxl__spawn_spawn() and it can
be initialized along with the rest of sdss->qmp_proxy_spawn.
Fixes: 83c845033dc8 ("libxl: use vchan for QMP access with Linux stubdomain") Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Move dcs->console_xswait initialization into the callers of
initiate_domain_create, do_domain_create() and do_domain_soft_reset(),
so it is initialized along with the other dcs state.
Fixes: c57e6ebd8c3e ("(lib)xl: soft reset support") Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Jason Andryuk [Wed, 30 Mar 2022 18:17:41 +0000 (14:17 -0400)]
xl: Fix global pci options
commit babde47a3fed "introduce a 'passthrough' configuration option to
xl.cfg..." moved the pci list parsing ahead of the global pci option
parsing. This broke the global pci configuration options since they
need to be set first so that looping over the pci devices assigns their
values.
Move the global pci options ahead of the pci list to restore their
function.
Fixes: babde47a3fed ("introduce a 'passthrough' configuration option to xl.cfg...") Signed-off-by: Jason Andryuk <jandryuk@gmail.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Jan Beulich [Thu, 31 Mar 2022 08:45:46 +0000 (10:45 +0200)]
livepatch: account for patch offset when applying NOP patch
While not triggered by the trivial xen_nop in-tree patch on
staging/master, that patch exposes a problem on the stable trees, where
all functions have ENDBR inserted. When NOP-ing out a range, we need to
account for this. Handle this right in livepatch_insn_len().
This requires livepatch_insn_len() to be called _after_ ->patch_offset
was set.
Fixes: 6974c75180f1 ("xen/x86: Livepatch: support patching CET-enhanced functions") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Tue, 29 Mar 2022 13:48:15 +0000 (15:48 +0200)]
build: generic top-level rule to build individual files
In particular when cross-compiling or having in place other tool chain
overrides, invoking make to build individual files (e.g. object,
preprocessed, or assembly ones) so far involves putting the various
overrides on the command line instead of simply getting them from
./.config.
Furthermore this helps working around a yet unaddressed make quirk [1]:
Variables put on the command line are invisible to $(shell ...), unless
invoked from a recursive make: During the recursive invocation such
variables are put in the recursive make's environment and hence become
"visible".
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
[1] https://savannah.gnu.org/bugs/?10593
xen/arm: set CPSR Z bit when creating aarch32 guests
The first 32 bytes of zImage are NOPs. When CONFIG_EFI is enabled in the
kernel, certain versions of Linux will use an UNPREDICTABLE NOP
encoding, sometimes resulting in an unbootable kernel. Whether the
resulting kernel is bootable or not depends on the processor. See commit a92882a4d270 in the Linux kernel for all the details.
All kernel releases starting from Linux 4.9 without commit a92882a4d270
are affected.
Fortunately there is a simple workaround: setting the "Z" bit in CPSR
make it so those invalid NOP instructions are never executed. That is
because the instruction is conditional (not equal). So, on QEMU at
least, the instruction will end up to be ignored and not generate an
exception. Setting the "Z" bit makes those kernel versions bootable
again and it is harmless in the other cases.
Note that both U-Boot and QEMU -kernel set the "Z" bit in CPSR when
booting a zImage kernel on aarch32.
Jan Beulich [Tue, 22 Mar 2022 12:10:59 +0000 (13:10 +0100)]
x86/build: work around older GNU ld not leaving .got.plt empty
The initial three entries in .got.plt are "static", i.e. present
independent of actual entries allocation of which is triggered by
respective relocations. When no real entries are needed, halfway recent
ld discards the "static" portion of the table as well, but older GNU ld
fails to do so.
Fixes: dedb0aa42c6d ("x86/build: use --orphan-handling linker option if available") Reported-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Julien Grall <jgrall@amazon.com>
Andrew Cooper [Tue, 22 Mar 2022 12:07:24 +0000 (13:07 +0100)]
x86/hvm: Annotate hvm_physdev_op() with cf_check
This was missed previously, and would yield a fatal #CP for any HVM domain
which issues a physdevop hypercall.
Fixes: cdbe2b0a1aec ("x86: Enable CET Indirect Branch Tracking") Reported-by: Juergen Gross <jgross@suse.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Raphael Ning [Wed, 16 Mar 2022 18:38:41 +0000 (18:38 +0000)]
evtchn/fifo: Don't set PENDING bit if guest misbehaves
Currently, evtchn_fifo_set_pending() will mark the event as PENDING even
if it fails to lock the FIFO event queue(s), or if the guest has not
initialized the FIFO control block for the target vCPU. A well-behaved
guest should never trigger either of these cases.
There is no good reason to set the PENDING bit (the guest should not
depend on this behaviour anyway) or check for pollers in such corner
cases, so skip that. In fact, both the comment above the for loop and
the commit message for
suggest that the bit should be set after the FIFO queue(s) are locked.
Take the opportunity to rename the was_pending variable (flipping its
sense) and switch to the standard bool type.
Suggested-by: David Vrabel <dvrabel@amazon.co.uk> Signed-off-by: Raphael Ning <raphning@amazon.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: David Vrabel <dvrabel@amazon.co.uk> Tested-by: Luca Fancellu <luca.fancellu@arm.com>
xen/arm64: io: Handle the abort due to access to stage1 translation table
If the abort was caused due to access to stage1 translation table, Xen
will try to set the p2m entry (assuming that the Stage 1 translation
table is in a non MMIO region).
If there is no such entry found, then Xen will try to map the address as
a MMIO region (assuming that the Stage 1 translation table is in a
direct MMIO region).
If that fails as well, then there are the two following scenarios:-
1. Stage 1 translation table being in an emulated MMIO region - Xen
can read the region, but it has no way to return the value read to the
CPU page table walker (which tries to go through the stage1 tables to
resolve the translation fault).
2. Stage 1 translation table address is invalid.
In both the above scenarios, Xen will forward the abort to the guest.
xen/arm64: io: Emulate instructions (with invalid ISS) on MMIO region
When an instruction is trapped in Xen due to translation fault, Xen
checks if the ISS is invalid (for data abort) or it is an instruction
abort. If so, Xen tries to resolve the translation fault using p2m page
tables. In case of data abort, Xen will try to map the mmio region to
the guest (ie tries to emulate the mmio region).
If the ISS is not valid and it is a data abort, then Xen tries to
decode the instruction. In case of ioreq, Xen saves the decoding state,
rn and imm9 to vcpu_io. Whenever the vcpu handles the ioreq successfully,
it will read the decoding state to determine if the instruction decoded
was a ldr/str post indexing (ie INSTR_LDR_STR_POSTINDEXING). If so, it
uses these details to post increment rn.
In case of mmio handler, if the mmio operation was successful, then Xen
retrives the decoding state, rn and imm9. For state ==
INSTR_LDR_STR_POSTINDEXING, Xen will update rn.
If there is an error encountered while decoding/executing the instruction,
Xen will forward the abort to the guest.
Also, the logic to infer the type of instruction has been moved from
try_handle_mmio() to try_decode_instruction() which is called before.
try_handle_mmio() is solely responsible for handling the mmio operation.
Bjoern Doebel [Thu, 10 Mar 2022 07:35:36 +0000 (07:35 +0000)]
xen/x86: Livepatch: support patching CET-enhanced functions
Xen enabled CET for supporting architectures. The control flow aspect of
CET require functions that can be called indirectly (i.e., via function
pointers) to start with an ENDBR64 instruction. Otherwise a control flow
exception is raised.
This expectation breaks livepatching flows because we patch functions by
overwriting their first 5 bytes with a JMP + <offset>, thus breaking the
ENDBR64. We fix this by checking the start of a patched function for
being ENDBR64. In the positive case we move the livepatch JMP to start
behind the ENDBR64 instruction.
To avoid having to guess the ENDBR64 offset again on patch reversal
(which might race with other mechanisms adding/removing ENDBR
dynamically), use the livepatch metadata to store the computed offset
along with the saved bytes of the overwritten function.
Signed-off-by: Bjoern Doebel <doebel@amazon.de> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com> Tested-by: Jiamei Xie <jiamei.xie@arm.com>
Andrew Cooper [Tue, 15 Mar 2022 12:07:18 +0000 (12:07 +0000)]
x86/cet: Remove writeable mapping of the BSPs shadow stack
An unintended consequence of the BSP using cpu0_stack[] is that writeable
mappings to the BSPs shadow stacks are retained in the bss. This renders
CET-SS almost useless, as an attacker can update both return addresses and the
ret will not fault.
We specifically don't want to shatter the superpage mapping .data and .bss, so
the only way to fix this is to not have the BSP stack in the main Xen image.
Break cpu_alloc_stack() out of cpu_smpboot_alloc(), and dynamically allocate
the BSP stack as early as reasonable in __start_xen(). As a consequence,
there is no need to delay the BSP's memguard_guard_stack() call.
Copy the top of cpu info block just before switching to use the new stack.
Fix a latent bug by setting %rsp to info->guest_cpu_user_regs rather than
->es; this would be buggy if reinit_bsp_stack() called schedule() (which
rewrites the GPR block) directly, but luckily it doesn't.
Finally, move cpu0_stack[] into .init, so it can be reclaimed after boot.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 8 Mar 2022 13:47:25 +0000 (13:47 +0000)]
x86/cet: Use dedicated NOP4 for cf_clobber
For livepatching, we need to look at a potentially clobbered function and
determine whether it used to have an ENDBR64 instruction.
Use a non-default 4-byte P6 long nop, not emitted by toolchains, and extend
check-endbr.sh to look for it. The same logic can check for the absence of
any endbr32 instructions, so include a check for those too.
The choice of nop has some complicated consequences. nopw (%rax) has a ModRM
byte of 0, which the Bourne compatible shells unconditionally strip from
parameters, meaning that we can't pass it to `grep -aob`.
Therefore, use nopw (%rcx) so the ModRM byte becomes 1.
This then demonstrates another bug. Under perl regexes, \1 thru \9 are
subpattern matches, and not octal escapes, while the behaviour of \10 and
higher depend on the number of capture groups. Switch the `grep -P` runes to
use hex escapes instead, which are unambiguous.
The build time check then requires that the endbr64 poison have the same
treatment as endbr64 to avoid placing the byte pattern in immediate operands.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 14 Mar 2022 10:30:46 +0000 (10:30 +0000)]
x86/cet: Clear IST supervisor token busy bits on S3 resume
Stacks are not freed across S3. Execution just stops, leaving supervisor
token busy bits active. Fixing this for the primary shadow stack was done
previously, but there is a (rare) risk that an IST token is left busy too, if
the platform power-off happens to intersect with an NMI/#MC arriving. This
will manifest as #DF next time the IST vector gets used.
Introduce rdssp() and wrss() helpers in a new shstk.h, cleaning up
fixup_exception_return() and explaining the trick with the literal 1.
Then this infrastructure to rewrite the IST tokens in load_system_tables()
when all the other IST details are being set up. In the case that an IST
token were left busy across S3, this will clear the busy bit before the stack
gets used.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 17 Mar 2022 16:42:13 +0000 (17:42 +0100)]
x86emul/test: correct VSCALEF{P,S}{S,D} entries in predicates test
I can't see why these would want / need to suppress testing of the
register forms of the insns. Quite likely a copy-and-paste oversight
when originally creating the table.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
George Dunlap [Thu, 10 Mar 2022 21:37:05 +0000 (21:37 +0000)]
MAINTAINERS: Propose Henry Wang as the new release manager
ARM has proposed Henry Wang as a release manager for 4.17. Signify
this by giving him maintainership over CHANGELOG.md.
Below is an introduction given by Bertrand Marquis:
Henry Wang is an open-source software engineer at Arm focusing on the
hypervisor and virtualization technology. Before joining the
AIS-Hypervisor team, he was one of the leading Arm contributors of the
Rust-VMM and the Cloud Hypervisor community. He is the Arm reviewer
of the Cloud Hypervisor project. His work includes basic project
enabling on Arm platform, Arm device emulation, advanced features
support on Arm and bug fixes.
After joining the AIS-Hypervisor team at Arm, he has been involved in Xen feature
development on Arm in various areas, including:
1. Xen Arm MPAM extension research and PoC: Ongoing, the design will
share in xen-devel soon.
2. Port of Xen to Arm MPU systems: Working together with Penny Zheng
on coding and testing, will be soon sent to xen-devel.
3. Static Xen heap on Arm: Work done but depend on the direct mapping
series from Penny Zheng, will be upstreamed in the next weeks.
4. Virtio PoC for Xen on Arm using kvmtool as the Xen virtio backend:
Work done, including the enabling of the virtio and the virtio
performance tuning.
5. Participated in code reviews and discussions in xen-devel,
including the foreign memory mapping series from EPAM, etc.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Henry Wang <Henry.Wang@arm.com> Acked-by: Julien Grall <jgrall@amazon.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
This adds support for serial console as found in a laptop with TGL-LP
(StarBook MkV). Since the device is on the bus 0, it needs to be enabled
via "com1=...,amt", not just "...,pci".
Device specification is in Intel docs 631119-007 and 631120-001.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
xen/arm64: io: Handle the abort due to access to stage1 translation table
If the abort was caused due to access to stage1 translation table, Xen
will try to set the p2m entry (assuming that the Stage 1 translation
table is in the non MMIO region).
If there is no such entry found, then Xen will try to map the address as
a MMIO region (assuming that the Stage 1 translation table is in the
direct MMIO region).
If that fails as well, then there are the two following scenarios:-
1. Stage 1 translation table being in an emulated MMIO region - Xen
can read the region, but it has no way to return the value read to the
CPU page table walker (which tries to go through the stage1 tables to
resolve the translation fault).
2. Stage 1 translation table address is invalid.
In both the above scenarios, Xen will forward the abort to the guest.
xen/arm64: io: Support instructions (for which ISS is not valid) on emulated MMIO region using MMIO/ioreq handler
When an instruction is trapped in Xen due to translation fault, Xen
checks if the ISS is invalid (for data abort) or it is an instruction
abort. If so, Xen tries to resolve the translation fault using p2m page
tables. In case of data abort, Xen will try to map the mmio region to
the guest (ie tries to emulate the mmio region).
If the ISS is not valid and it is a data abort, then Xen tries to
decode the instruction. In case of ioreq, Xen saves the decoding state,
rn and imm9 to vcpu_io. Whenever the vcpu handles the ioreq successfully,
it will read the decoding state to determine if the instruction decoded
was a ldr/str post indexing (ie INSTR_LDR_STR_POSTINDEXING). If so, it
uses these details to post increment rn.
In case of mmio handler, if the mmio operation was successful, then Xen
retrives the decoding state, rn and imm9. For state ==
INSTR_LDR_STR_POSTINDEXING, Xen will update rn.
If there is an error encountered while decoding/executing the instruction,
Xen will forward the abort to the guest.
Also, the logic to infer the type of instruction has been moved from
try_handle_mmio() to try_decode_instruction() which is called before.
try_handle_mmio() is solely responsible for handling the mmio operation.
xen/arm64: Decode ldr/str post increment operations
At the moment, Xen does not decode any of the arm64 instructions. This
means that when hsr_dabt.isv == 0, Xen cannot handle those instructions.
This will lead to Xen to abort the guests (from which those instructions
originate).
With this patch, Xen is able to decode ldr/str post indexing instructions.
These are a subset of instructions for which hsr_dabt.isv == 0.
Jan Beulich [Mon, 14 Mar 2022 09:33:35 +0000 (10:33 +0100)]
x86/build: use --orphan-handling linker option if available
As was e.g. making necessary 4b7fd8153ddf ("x86: fold sections in final
binaries"), arbitrary sections appearing without our linker script
placing them explicitly can be a problem. Have the linker make us aware
of such sections, so we would know that the script needs adjusting.
To deal with the resulting warnings:
- Retain .note.* explicitly for ELF, and discard all of them (except the
earlier consumed .note.gnu.build-id) for PE/COFF.
- Have explicit statements for .got, .plt, and alike and add assertions
that they're empty. No output sections will be created for these as
long as they remain empty (or else the assertions would cause early
failure anyway).
- Collect all .rela.* into a single section, with again an assertion
added for the resulting section to be empty.
- Extend the enumerating of .debug_* to ELF. Note that for Clang adding
of .debug_macinfo is necessary. Amend this by its Dwarf5 counterpart,
.debug_macro, then as well (albeit more may need adding for full
coverage).
- For LLVM ld also enumerate .symtab, .strtab, and .shstrtab.
Suggested-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Mon, 14 Mar 2022 09:32:40 +0000 (10:32 +0100)]
IOMMU/x86: tidy adjust_irq_affinities hook
As of 3e56754b0887 ("xen/cet: Fix __initconst_cf_clobber") there's no
need for a non-void return value anymore, as the hook functions are no
longer themselves passed to __initcall(). For the same reason the
iommu_enabled checks can now move from the individual functions to the
wrapper.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monné [Mon, 14 Mar 2022 09:30:02 +0000 (10:30 +0100)]
pci/ats: do not allow broken devices to be assigned to guests
Introduce a new field to mark devices as broken: having it set prevents
the device from being assigned to guests. Use the field in order to mark
ATS devices that have failed a flush when using VT-d as broken, thus
preventing them to be assigned to any guest.
This allows the device IOMMU context entry to be cleaned up properly, as
calling _pci_hide_device will just change the ownership of the device,
but the IOMMU context entry of the device would be left as-is. It would
also leak a VT-d Domain ID if using one, as removing the device from
its previous owner will allow releasing the IOMMU DID used by the device
without having cleaned up the context entry.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Roger Pau Monné [Mon, 14 Mar 2022 09:29:24 +0000 (10:29 +0100)]
x86/vmx: remove dead code to create domains without a vLAPIC
After the removal of PVHv1 it's no longer supported to create a domain
using hardware virtualization extensions and without a local APIC:
PVHv2 mandates domains to always have a LAPIC. Remove some stale code
in VMCS construction and related helpers that catered for that
use-case.
No functional change.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Mon, 14 Mar 2022 09:27:57 +0000 (10:27 +0100)]
x86/time: further improve TSC / CPU freq calibration accuracy
Calibration logic assumes that the platform timer (HPET or ACPI PM
timer) and the TSC are read at about the same time. This assumption may
not hold when a long latency event (e.g. SMI or NMI) occurs between the
two reads. Reduce the risk of reading uncorrelated values by doing at
least four pairs of reads, using the tuple where the delta between the
enclosing TSC reads was smallest. From the fourth iteration onwards bail
if the new TSC delta isn't better (smaller) than the best earlier one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Roger Pau Monne [Wed, 9 Mar 2022 12:28:46 +0000 (13:28 +0100)]
livepatch: set -f{function,data}-sections compiler option
If livepatching support is enabled build the hypervisor with
-f{function,data}-sections compiler options, which is required by the
livepatching tools to detect changes and create livepatches.
This shouldn't result in any functional change on the hypervisor
binary image, but does however require some changes in the linker
script in order to handle that each function and data item will now be
placed into its own section in object files. As a result add catch-all
for .text, .data and .bss in order to merge each individual item
section into the final image.
The main difference will be that .text.startup will end up being part
of .text rather than .init, and thus won't be freed. .text.exit will
also be part of .text rather than dropped. Overall this could make the
image bigger, and package some .text code in a sub-optimal way.
On Arm the .data.read_mostly needs to be moved ahead of the .data
section like it's already done on x86, so the .data.* catch-all
doesn't also include .data.read_mostly. The alignment of
.data.read_mostly also needs to be set to PAGE_SIZE so it doesn't end
up being placed at the tail of a read-only page from the previous
section. While there move the alignment of the .data section ahead of
the section declaration, like it's done for other sections.
The benefit of having CONFIG_LIVEPATCH enable those compiler option
is that the livepatch build tools no longer need to fiddle with the
build system in order to enable them. Note the current livepatch tools
are broken after the recent build changes due to the way they
attempt to set -f{function,data}-sections.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Wed, 9 Mar 2022 12:28:45 +0000 (13:28 +0100)]
xen/build: put image header into a separate section
So it can be explicitly placed ahead of the rest of the .text content
in the linker script (and thus the resulting image). This is a
prerequisite for further work that will add a catch-all to the text
section (.text.*).
Note that placement of the sections inside of .text is also slightly
adjusted to be more similar to the position found in the default GNU
ld linker script.
The special handling of the object file containing the header data as
the first object file passed to the linker command line can also be
removed.
While there also remove the special handling of efi/ on x86. There's
no need for the resulting object file to be passed in any special
order to the linker.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Andrew Cooper [Mon, 7 Mar 2022 20:19:18 +0000 (20:19 +0000)]
x86/kexec: Fix kexec-reboot with CET active
The kexec_reloc() asm has an indirect jump to relocate onto the identity
trampoline. While we clear CET in machine_crash_shutdown(), we fail to clear
CET for the non-crash path. This in turn highlights that the same is true of
resetting the CPUID masking/faulting.
Move both pieces of logic from machine_crash_shutdown() to machine_kexec(),
the latter being common for all kexec transitions. Adjust the condition for
CET being considered active to check in CR4, which is simpler and more robust.
Fixes: 311434bfc9d1 ("x86/setup: Rework MSR_S_CET handling for CET-IBT") Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") Fixes: 5ab9564c6fa1 ("x86/cpu: Context switch cpuid masks and faulting state in context_switch()") Reported-by: David Vrabel <dvrabel@amazon.co.uk> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: David Vrabel <dvrabel@amazon.co.uk>
Bjoern Doebel [Wed, 9 Mar 2022 15:22:03 +0000 (16:22 +0100)]
livepatch: resolve old address before function verification
When verifying that a livepatch can be applied, we may as well want to
inspect the target function to be patched. To do so, we need to resolve
this function's address before running the arch-specific
livepatch_verify hook.
Signed-off-by: Bjoern Doebel <doebel@amazon.de> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Roger Pau Monné [Wed, 9 Mar 2022 15:21:01 +0000 (16:21 +0100)]
vpci/msix: fix PBA accesses
Map the PBA in order to access it from the MSI-X read and write
handlers. Note that previously the handlers would pass the physical
host address into the {read,write}{l,q} handlers, which is wrong as
those expect a linear address.
Map the PBA using ioremap when the first access is performed. Note
that 32bit arches might want to abstract the call to ioremap into a
vPCI arch handler, so they can use a fixmap range to map the PBA.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Alex Olson <Alex.Olson@starlab.io>
Andrew Cooper [Mon, 7 Mar 2022 16:35:52 +0000 (16:35 +0000)]
x86/spec-ctrl: Cease using thunk=lfence on AMD
AMD have updated their Spectre v2 guidance, and lfence/jmp is no longer
considered safe. AMD are recommending using retpoline everywhere.
Retpoline is incompatible with CET. All CET-capable hardware has efficient
IBRS (specifically, not something retrofitted in microcode), so use IBRS (and
STIBP for consistency sake).
This is a logical change on AMD, but not on Intel as the default calculations
would end up with these settings anyway. Leave behind a message if IBRS is
found to be missing.
Also update the default heuristics to never select THUNK_LFENCE. This causes
AMD CPUs to change their default to retpoline.
Also update the printed message to include the AMD MSR_SPEC_CTRL settings, and
STIBP now that we set it for consistency sake.
This is part of XSA-398 / CVE-2021-26401.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Bertrand Marquis [Thu, 17 Feb 2022 14:52:54 +0000 (14:52 +0000)]
xen/arm: Allow to discover and use SMCCC_ARCH_WORKAROUND_3
Allow guest to discover whether or not SMCCC_ARCH_WORKAROUND_3 is
supported and create a fastpath in the code to handle guests request to
do the workaround.
The function SMCCC_ARCH_WORKAROUND_3 will be called by the guest for
flushing the branch history. So we want the handling to be as fast as
possible.
As the mitigation is applied on every guest exit, we can check for the
call before saving all context and return very early.
Rahul Singh [Mon, 14 Feb 2022 18:47:32 +0000 (18:47 +0000)]
xen/arm: Add Spectre BHB handling
This commit is adding Spectre BHB handling to Xen on Arm.
The commit is introducing new alternative code to be executed during
exception entry:
- SMCC workaround 3 call
- loop workaround (with 8, 24 or 32 iterations)
- use of new clearbhb instruction
Cpuerrata is modified by this patch to apply the required workaround for
CPU affected by Spectre BHB when CONFIG_ARM64_HARDEN_BRANCH_PREDICTOR is
enabled.
To do this the system previously used to apply smcc workaround 1 is
reused and new alternative code to be copied in the exception handler is
introduced.
To define the type of workaround required by a processor, 4 new cpu
capabilities are introduced (for each number of loop and for smcc
workaround 3).
When a processor is affected, enable_spectre_bhb_workaround is called
and if the processor does not have CSV2 set to 3 or ECBHB feature (which
would mean that the processor is doing what is required in hardware),
the proper code is enabled at exception entry.
In the case where workaround 3 is not supported by the firmware, we
enable workaround 1 when possible as it will also mitigate Spectre BHB
on systems without CSV2.
Bertrand Marquis [Wed, 23 Feb 2022 09:42:18 +0000 (09:42 +0000)]
xen/arm: Add ECBHB and CLEARBHB ID fields
Introduce ID coprocessor register ID_AA64ISAR2_EL1.
Add definitions in cpufeature and sysregs of ECBHB field in mmfr1 and
CLEARBHB in isar2 ID coprocessor registers.
Bertrand Marquis [Tue, 15 Feb 2022 10:39:47 +0000 (10:39 +0000)]
xen/arm: move errata CSV2 check earlier
CSV2 availability check is done after printing to the user that
workaround 1 will be used. Move the check before to prevent saying to the
user that workaround 1 is used when it is not because it is not needed.
This will also allow to reuse install_bp_hardening_vec function for
other use cases.
Code previously returning "true", now returns "0" to conform to
enable_smccc_arch_workaround_1 returning an int and surrounding code
doing a "return 0" if workaround is not needed.
Andrew Cooper [Mon, 7 Mar 2022 12:34:48 +0000 (12:34 +0000)]
x86/cet: Force -fno-jump-tables for CET-IBT
Both GCC and Clang have a (mis)feature where, even with
-fcf-protection=branch, jump tables are created using a notrack jump rather
than using endbr's in each case statement.
This is incompatible with the safety properties we want in Xen, and enforced
by not setting MSR_S_CET.NOTRACK_EN. The consequence is a fatal #CP[endbr].
-fno-jump-tables is generally active as a side effect of
CONFIG_INDIRECT_THUNK (retpoline), but as of c/s 95d9ab461436 ("x86/Kconfig:
introduce option to select retpoline usage"), we explicitly support turning
retpoline off.
Fixes: 3667f7f8f7c4 ("x86: Introduce support for CET-IBT") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Lasse Collin [Mon, 7 Mar 2022 08:09:26 +0000 (09:09 +0100)]
xz: move s->lzma.len = 0 initialization to lzma_reset()
It's a more logical place even if the resetting needs to be done
only once per LZMA2 stream (if lzma_reset() called in the middle
of an LZMA2 stream, .len will already be 0).
Link: https://lore.kernel.org/r/20211010213145.17462-4-xiang@kernel.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git a98a25408b0e Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Lasse Collin [Mon, 7 Mar 2022 08:08:54 +0000 (09:08 +0100)]
xz: validate the value before assigning it to an enum variable
This might matter, for example, if the underlying type of enum xz_check
was a signed char. In such a case the validation wouldn't have caught an
unsupported header. I don't know if this problem can occur in the kernel
on any arch but it's still good to fix it because some people might copy
the XZ code to their own projects from Linux instead of the upstream
XZ Embedded repository.
This change may increase the code size by a few bytes. An alternative
would have been to use an unsigned int instead of enum xz_check but
using an enumeration looks cleaner.
Link: https://lore.kernel.org/r/20211010213145.17462-3-xiang@kernel.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4f8d7abaa413 Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Lasse Collin [Mon, 7 Mar 2022 08:08:08 +0000 (09:08 +0100)]
xz: avoid overlapping memcpy() with invalid input with in-place decompression
From: Lasse Collin <lasse.collin@tukaani.org>
With valid files, the safety margin described in lib/decompress_unxz.c
ensures that these buffers cannot overlap. But if the uncompressed size
of the input is larger than the caller thought, which is possible when
the input file is invalid/corrupt, the buffers can overlap. Obviously
the result will then be garbage (and usually the decoder will return
an error too) but no other harm will happen when such an over-run occurs.
This change only affects uncompressed LZMA2 chunks and so this
should have no effect on performance.
Link: https://lore.kernel.org/r/20211010213145.17462-2-xiang@kernel.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 83d3c4f22a36 Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Lasse Collin [Mon, 7 Mar 2022 08:06:31 +0000 (09:06 +0100)]
xz: fix XZ_DYNALLOC to avoid useless memory reallocations
s->dict.allocated was initialized to 0 but never set after a successful
allocation, thus the code always thought that the dictionary buffer has
to be reallocated.
Link: http://lkml.kernel.org/r/20191104185107.3b6330df@tukaani.org Reported-by: Yu Sun <yusun2@cisco.com> Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Acked-by: Daniel Walker <danielwa@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Origin: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 8e20ba2e53fc Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Luca Fancellu <luca.fancellu@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 7 Mar 2022 07:59:46 +0000 (08:59 +0100)]
x86/tboot: adjust Kconfig default
We shouldn't include unsupported code by default, with not even a means
for its building to be disabled. Convert the dependency from merely
affecting the prompt's visibility to a real one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Jan Beulich [Fri, 4 Mar 2022 09:49:22 +0000 (10:49 +0100)]
x86: also discard .fini_array in linker script
This simply parallels .dtors. Both section types can reference
.text.exit, which requires them to be discarded together with that one.
Compilers, depending on their findings during the configure phase, may
elect to use either model. While .{init,fini}_array look to be
preferred, cross compilers apparently have this guessed, likely
resulting in a fallback to .{c,d}tors. Hence we need to support both
sets.
Fixes: 4b7fd8153ddf ("x86: fold sections in final binaries") Reported-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Fri, 4 Mar 2022 08:29:42 +0000 (09:29 +0100)]
x86emul/test: correct VRNDSCALES{S,D} entries in predicates test
While benign (because only the decoder is exercised here, whereas a
wrong EVEX.W would cause an exception only during actual emulation),
let's still have correct information in the table entries.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 4 Mar 2022 08:28:09 +0000 (09:28 +0100)]
x86/time: add CF-clobber annotations
With bed9ae54df44 ("x86/time: switch platform timer hooks to altcall")
in place we can further arrange for ENDBR removal from the functions no
longer subject to indirect calls. Note that plt_tsc is adjusted as well,
despite presently not holding any pointer eligible for ENDBR removal.
This is just to be on the safe side going forward.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
When overriding the tool chain via CROSS_COMPILE, the resulting
components need to be made available to, in particular (but not limited
to) the check-endbr.sh script. Note that we don't allow overriding
ADDR2LINE yet; this would first require additions to some config/*.mk
before it would make sense to export the resulting variable as well.
The lack of NM exporting was apparently not a problem so far, but add it
at this occasion as well - we're using the tool, after all.
Fixes: 4d037425dccf ("x86: Build check for embedded endbr64 instructions") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Fri, 4 Mar 2022 08:25:39 +0000 (09:25 +0100)]
build,x86: remove the need for build32.mk
Rework "arch/x86/boot/Makefile" to allow it to build both file
"cmdline.S" and "reloc.S" without "build32.mk".
These will now use the main rules for "%.o: %.c", and thus generate a
dependency file. (We will not need to track the dependency manually
anymore.)
But for that, we need to override the main CFLAGS to do a 32bit build.
We introduce XEN_TREEWIDE_CFLAGS which can be reused in boot/Makefile,
and avoid the need to reparse Config.mk with a different value for
XEN_TARGET_ARCH. From this new $(XEN_TREEWIDE_CFLAGS), we only need to
change -m64 to have the 32bit flags. Then those are applied only to
"cmdline.o" and "reloc.o".
Specifically apply the rule "%.S: %.bin" to both cmdline.S and reloc.S
to avoid make trying to regenerate other %.S files with it.
There is no change expected to the resulting "cmdline.S" and
"reloc.S", only the *.o file changes as their symbol for FILE goes
from "cmdline.c" to "arch/x86//cmdline.c". (No idea why "boot" is
missing from the string.) (I've only check with GCC, not clang.)
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Michal Orzel [Wed, 2 Mar 2022 09:59:11 +0000 (10:59 +0100)]
xen/arm: gic: Introduce GIC_PRI_{IRQ/IPI}_ALL
Introduce macros GIC_PRI_IRQ_ALL and GIC_PRI_IPI_ALL to be used in all
the places where we want to set default priority for all the offsets
in interrupt priority register. This will improve readability and
allow to get rid of introducing variables just to store this value.
Take the opportunity to mark GIC_PRI_{IRQ/IPI} as unsigned values
to suppress static analyzer warnings as they are used in expressions
exceeding integer range (shifting into signed bit). Modify also other
priority related macros to be coherent.
Signed-off-by: Michal Orzel <michal.orzel@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
Andrew Cooper [Wed, 2 Mar 2022 20:27:46 +0000 (20:27 +0000)]
xen/cet: Fix __initconst_cf_clobber
The linker script collecting .init.rodata.* ahead of .init.rodata.cf_clobber
accidentally causes __initconst_cf_clobber to be a no-op.
Rearrange the linker script to unbreak this.
The IOMMU adjust_irq_affinities() hooks currently violate the safety
requirement for being cf_clobber, by also being plain __initcall()'s.
Consolidate to a single initcall using the iommu_adjust_irq_affinities()
wrapper (satisfying the cf_clobber safety requirement by using iommu_call()
under the hood), and also removes the dubious property that we'd call into
both vendors IOMMU drivers on boot, relying on the for_each_*() loops to be
empty for safety.
With this fixed, an all-enabled build of Xen has 1681 endbr64's in .text with
382 (23%) being clobbered during boot.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 2 Mar 2022 08:29:55 +0000 (09:29 +0100)]
x86: fold sections in final binaries
Especially when linking a PE binary (xen.efi), standalone output
sections are expensive: Often the linker will align the subsequent one
on the section alignment boundary (2Mb) when the linker script doesn't
otherwise place it. (I haven't been able to derive from observed
behavior under what conditions it would not do so.)
With gcov enabled (and with gcc11) I'm observing enough sections that,
as of quite recently, the resulting image doesn't fit in 16Mb anymore,
failing the final ASSERT() in the linker script. (That assertion is
slated to go away, but that's a separate change.)
Any destructor related sections can be discarded, as we never "exit"
the hypervisor. This includes .text.exit, which is referenced from
.dtors.*. Constructor related sections need to all be taken care of, not
just those with historically used names: .ctors.* and .text.startup is
what gcc11 populates. While there re-arrange ordering / sorting to match
that used by the linker provided scripts.
Finally, for xen.efi only, also discard .note.gnu.*. These are
meaningless in a PE binary. Quite likely, while not meaningless there,
the section is also of no use in ELF, but keep it there for now.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 2 Mar 2022 08:28:51 +0000 (09:28 +0100)]
x86/altcall: silence undue warning
Suitable compiler options are passed only when the actual feature
(XEN_IBT) is enabled, not when merely the compiler capability was found
to be available.
Fixes: 12e3410e071e ("x86/altcall: Check and optimise altcall targets") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 2 Mar 2022 08:28:06 +0000 (09:28 +0100)]
docs: correct "gnttab=" documented default
Defaults differ for Arm and x86, not the least because of v2 not even
being security supported on Arm.
Also drop a bogus sentence from gnttab_max_maptrack_frames, which was
presumably mistakenly cloned from gnttab_max_frames (albeit even there
what is being said is neither very precise nor very useful imo).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Julien Grall [Tue, 1 Mar 2022 19:56:15 +0000 (19:56 +0000)]
xen/arm32: head: Mark the end of subroutines with ENDPROC (take two)
Commit 2ac705a59ef5 ("xen/arm32: head: Mark the end of subroutines
with ENDPROC") intended to mark all the subroutines with ENDPROC.
Unfortunately, I missed fail(), switch_ttbr(), init_uart() and
__lookup_processor_type(). Add ENDPROC for the benefits of
static analysis tools and the reader.
Julien Grall [Sun, 27 Feb 2022 19:21:59 +0000 (19:21 +0000)]
xen/arm: lpae: Use the generic helpers to defined the Xen PT helpers
Currently, Xen PT helpers are only working with 4KB page granularity
and open-code the generic helpers. To allow more flexibility, we can
re-use the generic helpers and pass Xen's page granularity
(PAGE_SHIFT).
As Xen PT helpers are used in both C and assembly, we need to move
the generic helpers definition outside of the !__ASSEMBLY__ section.
Take the opportunity to prefix LPAE_SHIFT, LPAE_ENTRIES and
LPAE_ENTRY_MASK with XEN_PT_.
Note the aliases for each level are still kept for the time being so we
can avoid a massive patch to change all the callers.
Julien Grall [Sun, 27 Feb 2022 19:21:58 +0000 (19:21 +0000)]
xen/arm: lpae: Rename LPAE_ENTRIES_MASK_GS to LPAE_ENTRY_MASK_GS
Commit 05031fa87357 "xen/arm: guest_walk: Only generate necessary
offsets/masks" introduced LPAE_ENTRIES_MASK_GS. In a follow-up patch,
we will use it to define LPAE_ENTRY_MASK.
This will lead to inconsistent naming. As LPAE_ENTRY_MASK is used in
many places, it is better to rename LPAE_ENTRIES_MASK_GS and avoid
some churn.
So rename LPAE_ENTRIES_MASK_GS to LPAE_ENTRY_MASK_GS.
Anthony PERARD [Fri, 25 Feb 2022 14:54:08 +0000 (14:54 +0000)]
build: fix auto defconfig rule
We should only run "defconfig" if ".config" is missing. Commit 317c98cb91 have added a dependency on "tools/fixdep", so make would
start runnning "defconfig" also when "tools/fixdep" is newer than
".config" and thus overwrite any changes made by a developer.
Reintroduce intended behavior of the rule to only generate a default
Kconfig when ".config" is missing.
Fixes: 317c98cb91 ("build: hook kconfig into xen build system") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
and then later (on at least two Intel TigerLake platforms), the next HVM vCPU
to be scheduled on the BSP dies with:
(XEN) d1v0 Unexpected vmexit: reason 3
(XEN) domain_crash called from vmx.c:4304
(XEN) Domain 1 (vcpu#0) crashed on cpu#0:
The VMExit reason is EXIT_REASON_INIT, which has nothing to do with the
scheduled vCPU, and will be addressed in a subsequent patch. It is a
consequence of the APs triple faulting.
The reason the APs triple fault is because we don't tear down the stacks on
suspend. The idle/play_dead loop is killed in the middle of running, meaning
that the supervisor token is left busy.
On resume, SETSSBSY finds busy bit set, suffers #CP and triple faults because
the IDT isn't configured this early.
Rework the AP bring-up path to (re)create the supervisor token. This ensures
the primary stack is non-busy before use.
Note: There are potential issues with the IST shadow stacks too, but fixing
those is more involved.
Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") Link: https://github.com/QubesOS/qubes-issues/issues/7283 Reported-by: Thiner Logoer <logoerthiner1@163.com> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Thiner Logoer <logoerthiner1@163.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Fri, 25 Feb 2022 10:10:19 +0000 (11:10 +0100)]
xen/public: add comment to struct xen_mem_acquire_resource
Commit 7c7f7e8fba01 changed xen/include/public/memory.h in an incompatible
way. Unfortunately the changed parts were already in use in the Linux
kernel, so an update of the header in the kernel would result in a build
breakage.
As the change of above commit was in a section originally meant to be not
stable, it was the usage in the kernel which was wrong.
Add a comment to the modified struct for not reusing the now removed bit,
in order to avoid kernels using it stumbling over a possible new meaning
of the bit.
In case the kernel is updating to a new version of the header, the wrong
use case must be removed first.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>