Roger Pau Monne [Tue, 1 Mar 2022 11:34:54 +0000 (12:34 +0100)]
livepatch: set -f{function,data}-sections compiler option
If livepatching support is enabled build the hypervisor with
-f{function,data}-sections compiler options, which is required by the
livepatching tools to detect changes and create livepatches.
This shouldn't result in any functional change on the hypervisor
binary image, but does however require some changes in the linker
script in order to handle that each function and data item will now be
placed into its own section in object files. As a result add catch-all
for .text, .data and .bss in order to merge each individual item
section into the final image.
The main difference will be that .text.startup will end up being part
of .text rather than .init, and thus won't be freed. .text.exit will
also be part of .text rather than dropped. Overall this could make the
image bigger, and package some .text code in a sub-optimal way.
Note that placement of the sections inside of .text is also slightly
adjusted to be more similar to the position found in the default GNU
ld linker script.
The benefit of having CONFIG_LIVEPATCH enable those compiler options
is that the livepatch build tools no longer need to fiddle with the
build system in order to enable them. Note the current livepatch tools
are broken after the recent build changes due to the way they
attempt to set -f{function,data}-sections.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
Changes since v1:
- Introduce CC_SPLIT_SECTIONS for selecting the compiler options.
- Drop check for compiler options, all supported versions have them.
- Re-arrange section placement in .text, to match the default linker
script.
- Introduce .text.header to contain the headers bits that must appear
first in the final binary.
Jan Beulich [Wed, 2 Mar 2022 08:29:55 +0000 (09:29 +0100)]
x86: fold sections in final binaries
Especially when linking a PE binary (xen.efi), standalone output
sections are expensive: Often the linker will align the subsequent one
on the section alignment boundary (2Mb) when the linker script doesn't
otherwise place it. (I haven't been able to derive from observed
behavior under what conditions it would not do so.)
With gcov enabled (and with gcc11) I'm observing enough sections that,
as of quite recently, the resulting image doesn't fit in 16Mb anymore,
failing the final ASSERT() in the linker script. (That assertion is
slated to go away, but that's a separate change.)
Any destructor related sections can be discarded, as we never "exit"
the hypervisor. This includes .text.exit, which is referenced from
.dtors.*. Constructor related sections need to all be taken care of, not
just those with historically used names: .ctors.* and .text.startup is
what gcc11 populates. While there re-arrange ordering / sorting to match
that used by the linker provided scripts.
Finally, for xen.efi only, also discard .note.gnu.*. These are
meaningless in a PE binary. Quite likely, while not meaningless there,
the section is also of no use in ELF, but keep it there for now.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Wed, 2 Mar 2022 08:28:51 +0000 (09:28 +0100)]
x86/altcall: silence undue warning
Suitable compiler options are passed only when the actual feature
(XEN_IBT) is enabled, not when merely the compiler capability was found
to be available.
Fixes: 12e3410e071e ("x86/altcall: Check and optimise altcall targets") Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 2 Mar 2022 08:28:06 +0000 (09:28 +0100)]
docs: correct "gnttab=" documented default
Defaults differ for Arm and x86, not the least because of v2 not even
being security supported on Arm.
Also drop a bogus sentence from gnttab_max_maptrack_frames, which was
presumably mistakenly cloned from gnttab_max_frames (albeit even there
what is being said is neither very precise nor very useful imo).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Julien Grall [Tue, 1 Mar 2022 19:56:15 +0000 (19:56 +0000)]
xen/arm32: head: Mark the end of subroutines with ENDPROC (take two)
Commit 2ac705a59ef5 ("xen/arm32: head: Mark the end of subroutines
with ENDPROC") intended to mark all the subroutines with ENDPROC.
Unfortunately, I missed fail(), switch_ttbr(), init_uart() and
__lookup_processor_type(). Add ENDPROC for the benefits of
static analysis tools and the reader.
Julien Grall [Sun, 27 Feb 2022 19:21:59 +0000 (19:21 +0000)]
xen/arm: lpae: Use the generic helpers to defined the Xen PT helpers
Currently, Xen PT helpers are only working with 4KB page granularity
and open-code the generic helpers. To allow more flexibility, we can
re-use the generic helpers and pass Xen's page granularity
(PAGE_SHIFT).
As Xen PT helpers are used in both C and assembly, we need to move
the generic helpers definition outside of the !__ASSEMBLY__ section.
Take the opportunity to prefix LPAE_SHIFT, LPAE_ENTRIES and
LPAE_ENTRY_MASK with XEN_PT_.
Note the aliases for each level are still kept for the time being so we
can avoid a massive patch to change all the callers.
Julien Grall [Sun, 27 Feb 2022 19:21:58 +0000 (19:21 +0000)]
xen/arm: lpae: Rename LPAE_ENTRIES_MASK_GS to LPAE_ENTRY_MASK_GS
Commit 05031fa87357 "xen/arm: guest_walk: Only generate necessary
offsets/masks" introduced LPAE_ENTRIES_MASK_GS. In a follow-up patch,
we will use it to define LPAE_ENTRY_MASK.
This will lead to inconsistent naming. As LPAE_ENTRY_MASK is used in
many places, it is better to rename LPAE_ENTRIES_MASK_GS and avoid
some churn.
So rename LPAE_ENTRIES_MASK_GS to LPAE_ENTRY_MASK_GS.
Anthony PERARD [Fri, 25 Feb 2022 14:54:08 +0000 (14:54 +0000)]
build: fix auto defconfig rule
We should only run "defconfig" if ".config" is missing. Commit 317c98cb91 have added a dependency on "tools/fixdep", so make would
start runnning "defconfig" also when "tools/fixdep" is newer than
".config" and thus overwrite any changes made by a developer.
Reintroduce intended behavior of the rule to only generate a default
Kconfig when ".config" is missing.
Fixes: 317c98cb91 ("build: hook kconfig into xen build system") Reported-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
and then later (on at least two Intel TigerLake platforms), the next HVM vCPU
to be scheduled on the BSP dies with:
(XEN) d1v0 Unexpected vmexit: reason 3
(XEN) domain_crash called from vmx.c:4304
(XEN) Domain 1 (vcpu#0) crashed on cpu#0:
The VMExit reason is EXIT_REASON_INIT, which has nothing to do with the
scheduled vCPU, and will be addressed in a subsequent patch. It is a
consequence of the APs triple faulting.
The reason the APs triple fault is because we don't tear down the stacks on
suspend. The idle/play_dead loop is killed in the middle of running, meaning
that the supervisor token is left busy.
On resume, SETSSBSY finds busy bit set, suffers #CP and triple faults because
the IDT isn't configured this early.
Rework the AP bring-up path to (re)create the supervisor token. This ensures
the primary stack is non-busy before use.
Note: There are potential issues with the IST shadow stacks too, but fixing
those is more involved.
Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks") Link: https://github.com/QubesOS/qubes-issues/issues/7283 Reported-by: Thiner Logoer <logoerthiner1@163.com> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Thiner Logoer <logoerthiner1@163.com> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Juergen Gross [Fri, 25 Feb 2022 10:10:19 +0000 (11:10 +0100)]
xen/public: add comment to struct xen_mem_acquire_resource
Commit 7c7f7e8fba01 changed xen/include/public/memory.h in an incompatible
way. Unfortunately the changed parts were already in use in the Linux
kernel, so an update of the header in the kernel would result in a build
breakage.
As the change of above commit was in a section originally meant to be not
stable, it was the usage in the kernel which was wrong.
Add a comment to the modified struct for not reusing the now removed bit,
in order to avoid kernels using it stumbling over a possible new meaning
of the bit.
In case the kernel is updating to a new version of the header, the wrong
use case must be removed first.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 25 Feb 2022 10:09:21 +0000 (11:09 +0100)]
x86/altp2m: p2m_altp2m_propagate_change() should honor present page order
For higher order mappings the comparison against p2m->min_remapped_gfn
needs to take the upper bound of the covered GFN range into account, not
just the base GFN. Otherwise, i.e. when dropping a mapping overlapping
the remapped range but the base GFN outside of that range, an altp2m may
wrongly not get reset.
Note that there's no need to call get_gfn_type_access() ahead of the
check against the remapped range boundaries: None of its outputs are
needed earlier, and p2m_reset_altp2m() doesn't require the lock to be
held. In fact this avoids a latent lock order violation: With per-GFN
locking p2m_reset_altp2m() not only doesn't require the GFN lock to be
held, but holding such a lock would actually not be allowed, as the
function acquires a P2M lock.
Local variables are moved into the more narrow scope (one is deleted
altogether) to help see their actual life ranges.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tamas K Lengyel <tamas@tklengyel.com>
Anthony PERARD [Fri, 25 Feb 2022 10:07:52 +0000 (11:07 +0100)]
build: rework coverage and ubsan CFLAGS handling
When assigning a value a target-specific variable, that also affect
prerequisite of the target. This is mostly fine, but there is one case
where we will not want the COV_FLAGS added to the CFLAGS.
In arch/x86/boot, we have "head.o" with "cmdline.S" as prerequisite
and ultimately "cmdline.o", we don't want COV_FLAGS to that last one.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Fri, 25 Feb 2022 10:07:21 +0000 (11:07 +0100)]
build: rework "clean" to clean from the root dir
This will allow "clean" to work from an out-of-tree build when
it will be available.
Some of the file been removed in current "clean" target aren't added
to $(clean-files) because they are already listed in $(extra-) or
$(extra-y).
Also start to clean files listed in $(targets). This allows to clean
"common/config_data.S" and "xsm/flask/flask-policy.S" without
having to list them a second time.
Also clean files in "arch/x86/boot" from that directory by allowing
"clean" to descend into the subdir by adding "boot" into $(subdir-).
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com> # XSM Acked-by: Julien Grall <jgrall@amazon.com>
Anthony PERARD [Fri, 25 Feb 2022 10:07:13 +0000 (11:07 +0100)]
build: clean-up "clean" rules of duplication
All those files to be removed are already done in the main Makefile,
either by the "find" command or directly (for $(TARGET).efi).
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com> # XSM
Anthony PERARD [Fri, 25 Feb 2022 10:07:04 +0000 (11:07 +0100)]
build: generate x86's asm-macros.h with filechk
When we will build out-of-tree, make is going to try to generate
"asm-macros.h" before the directories "arch/x86/include/asm" exist,
thus we would need to call `mkdir` explicitly. We will use "filechk"
for that as it does everything that the current recipe does and does
call `mkdir`.
Also, they are no more "*.new" files generated in this directory.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Fri, 25 Feb 2022 10:03:35 +0000 (11:03 +0100)]
build: hook kconfig into xen build system
Now that xen's build system is very close to Linux's ones, we can hook
"Makefile.host" into Xen's build system, and we can build Kconfig with
that.
"tools/kconfig/Makefile" now needs a workaround to not rebuild
"$(XEN_ROOT)/.config", as `make` tries the rules "%.config" which
fails with:
tools/kconfig/Makefile:95: *** No configuration exists for this target on this architecture. Stop.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Fri, 25 Feb 2022 10:03:17 +0000 (11:03 +0100)]
build: rename __LINKER__ to LINKER_SCRIPT
For two reasons: this macro is used to generate a "linker script" and
is not by the linker, and name starting with an underscore '_' are
supposed to be reserved, so better avoid them when not needed.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <jgrall@amazon.com>
Anthony PERARD [Fri, 25 Feb 2022 10:01:51 +0000 (11:01 +0100)]
build: introduce if_changed_deps
This macro does compare command line like if_changed, but it also
rewrite the dependencies generated by $(CC) in order to depend on a
CONFIG_* as generated by kconfig instead of depending on autoconf.h.
This allow to make a change in kconfig options and only rebuild the
object that uses that CONFIG_* option.
cmd_and_record isn't needed anymore as it is replace by
cmd_and_fixdep.
There's only one .*.d dependency file left which is explicitly
included as a workound, all the other are been absorb into the .*.cmd
dependency files via `fixdep`. So including .*.d can be removed from
the makefile.
Also adjust "cloc" recipe due to .*.d been replace by .*.cmd files.
This imports fixdep.c and if_changed_deps macro from Linux v5.12.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Fri, 25 Feb 2022 10:01:15 +0000 (11:01 +0100)]
build: build everything from the root dir, use obj=$subdir
A subdirectory is now built by setting "$(obj)" instead of changing
directory. "$(obj)" should always be set when using "Rules.mk" and
thus a shortcut "$(build)" is introduced and should be used.
A new variable "$(need-builtin)" is introduce. It is to be used
whenever a "built_in.o" is wanted from a subdirectory. "built_in.o"
isn't the main target anymore, and thus only needs to depends on the
objects that should be part of "built_in.o".
Introduce $(srctree) and $(objtree) to replace $(BASEDIR) in cases a
relative path is better, and $(abs_srctree) and $(abs_objtree) which
have an absolute path.
DEPS is updated as the existing macro to deal with it doesn't know
about $(obj).
There's some changes in "Rules.mk" which in addition to deal with
"$(obj)" also make it's looks more like "Makefile.build" from Linux
v5.12.
test/Makefile doesn't need special handling in order to build
everything under test/, Rules.mk will visit test/livepatch via
$(subdir-y), thus "tests" "all" and "build" target are removed.
"subtree-force-update" target isn't useful so it is removed as well.
test/livepatch/Makefile doesn't need default target anymore, Rules.mk
will build everything in $(extra-y) and thus all *.livepatch.
Adjust cloc recipe: dependency files generated by CC will now have the
full path to the source file, so we don't need to prepend the
subdirectory. This fix some issue with source not been parsed by cloc
before. Also source from tools/kconfig would be listed with changes in
this patch so adjust the find command to stop listing the "tools"
directory and thus kconfig. With a default build of Xen on X86, they
are a few new files parsed by cloc:
arch/x86/x86_64/compat/mm.c
arch/x86/x86_64/mm.c
common/compat/domain.c
common/compat/memory.c
common/compat/xlat.c
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Bob Eshleman <bobbyeshleman@gmail.com> Acked-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com> # XSM
Anthony PERARD [Fri, 25 Feb 2022 09:55:30 +0000 (10:55 +0100)]
build: rework test/livepatch/Makefile
This rework the livepatch/Makefile to make it less repetitive and make
use of the facilities. All the targets to be built are now listed in
$(extra-y) which will allow Rules.mk to build them without the need of
a local target in a future patch.
There are some changes/fixes in this patch:
- when "xen-syms" is used for a target, it is added to the dependency
list of the target, which allow to rebuild the target when xen-syms
changes. But if "xen-syms" is missing, make simply fails.
- modinfo.o wasn't removing it's $@.bin file like the other targets,
this is now done.
- The command to build *.livepatch targets as been fixed to use
$(XEN_LDFLAGS) rather than just $(LDFLAGS) which is a fallout from 2740d96efdd3 ("xen/build: have the root Makefile generates the
CFLAGS")
make will findout the dependencies of the *.livepatch files and thus
what to built by "looking" at the objects listed in the *-objs
variables. The actual dependencies is generated by the new
"multi-depend" macro.
"$(targets)" needs to be updated with the objects listed in the
different *-objs variables to allow make to load the .*.cmd dependency
files.
This patch copies the macro "multi_depend" from Linux 5.12, and rename
it to "multi-depend".
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 25 Feb 2022 09:49:17 +0000 (10:49 +0100)]
x86: make embedded endbr64 check compatible with older GNU grep
With version 2.7 I'm observing support for binary searches, but
unreliable results: Only a subset of the supposed matches is actually
reported; for our pattern I've never observed any match. This same
version works fine when handing it a Perl regexp using hex or octal
escapes. Probe for support of -P and prefer that over the original
approach.
Fixes: 4d037425dccf ("x86: Build check for embedded endbr64 instructions") Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 25 Feb 2022 09:48:20 +0000 (10:48 +0100)]
x86/time: switch platform timer hooks to altcall
Except in the "clocksource=tsc" case we can replace the indirect calls
involved in accessing the platform timers by direct ones, as they get
established once and never changed. To also cover the "tsc" case, invoke
what read_tsc() resolves to directly. In turn read_tsc() then becomes
unreachable and hence can move to .init.*.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Julien Grall [Wed, 23 Feb 2022 19:08:33 +0000 (19:08 +0000)]
xen/mm: pg_offlined can be defined as bool in free_heap_pages()
The local variable pg_offlined in free_heap_pages() can only take two
values. So switch it to a bool.
Fixes: 289610483fc43 ("mm: fix broken tainted value in mark_page_free") Signed-off-by: Julien Grall <jgrall@amazon.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Michal Orzel [Tue, 22 Feb 2022 10:56:12 +0000 (11:56 +0100)]
xen/arm: Rename psr_mode_is_32bit to regs_mode_is_32bit
Commit aa2f5aefa8de ("xen/arm: Rework psr_mode_is_32bit()") modified
the function to take a struct cpu_user_regs instead of psr.
Perform renaming of psr_mode_is_32bit to regs_mode_is_32bit to reflect
that change.
Signed-off-by: Michal Orzel <michal.orzel@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
Juergen Gross [Thu, 17 Feb 2022 11:47:26 +0000 (12:47 +0100)]
docs: add some clarification to xenstore-migration.md
The Xenstore migration document is missing the specification that a
node record must be preceded by the record of its parent node in case
of live update.
Julien Grall [Wed, 23 Feb 2022 18:38:31 +0000 (18:38 +0000)]
xen/mm: Remove always true ASSERT() in free_heap_pages()
free_heap_pages() has an ASSERT() checking that node is >= 0. However
node is defined as an unsigned int. So it cannot be negative.
Therefore remove the check as it will always be true.
Coverity-ID: 1055631 Signed-off-by: Julien Grall <jgrall@amazon.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com><mailto:andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 24 Feb 2022 10:22:08 +0000 (11:22 +0100)]
x86/cpuid: replace more cpufeat_word() uses
Complete what e3662437eb43 ("x86/cpuid: Disentangle logic for new
feature leaves") has begun:
"Switch to using FEATURESET_* just like the policy/featureset helpers. This
breaks the cognitive complexity of needing to know which leaf a specifically
named feature should reside in, and is shorter to write. It is also far
easier to identify as correct at a glance, given the correlation with the
CPUID leaf being read."
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 24 Feb 2022 10:21:08 +0000 (11:21 +0100)]
x86: drop NOP_DS_PREFIX
This wasn't really necessary to introduce: The binutils change
permitting use of standalone "ds" (and "cs") in 64-bit code predates
the minimum binutils version we support.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 24 Feb 2022 10:20:34 +0000 (11:20 +0100)]
x86/mwait-idle: re-order state entry/exit code a little
The initial observation is that unlike the original ACPI idle driver we
have a 2nd cpu_is_haltable() in here. By making the actual state entry
conditional, the emitted trace records as well as the subsequent stats
update are at least misleading in case the state wasn't actually entered.
Hence they would want moving inside the conditional. At which point the
cpuidle_get_tick() invocations could (and hence should) move as well.
cstate_restore_tsc() also isn't needed if we didn't actually enter the
state.
This leaves only the errata_c6_workaround() and lapic_timer_off()
invocations outside the conditional. As a result it looks easier to
drop the conditional (and come back in sync with the other driver again)
than to move almost everything into the conditional.
While there also move the TRACE_6D() out of the IRQ-disabled region.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 24 Feb 2022 10:19:06 +0000 (11:19 +0100)]
x86/perfc: fold HVM's VM-exit counter arrays
Only one of them can be in use at a time, so make the whole set union-
like. While doing the rename in SVM code, combine the two perf_incra(),
generalizing the range upwards of VMEXIT_NPF.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Jan Beulich [Thu, 24 Feb 2022 10:17:26 +0000 (11:17 +0100)]
SVM: sync VM-exit perf counters with known VM-exit reasons
This has gone out of sync over time, resulting in NPF and XSETBV exits
incrementing the same counter. Introduce a simplistic mechanism to
hopefully keep things in better sync going forward.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Fri, 21 Jan 2022 03:47:05 +0000 (03:47 +0000)]
x86/hvm: Drop get_shadow_gs_base() hook and use hvm_get_reg()
This is a trivial accessor for an MSR, so use hvm_get_reg() rather than a
dedicated hook. In arch_get_info_guest(), rework the logic to read GS_SHADOW
only once.
get_hvm_registers() is called on current, meaning that diagnostics print a
stale GS_SHADOW from the previous vcpu context switch. Adjust both
implementations to obtain the correct value.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Mon, 21 Feb 2022 17:09:15 +0000 (17:09 +0000)]
x86/IOMMU: Use altcall, and __initconst_cf_clobber
Most IOMMU hooks are already altcall for performance reasons. Convert the
rest of them so we can harden all the hooks in Control Flow Integrity
configurations. This necessitates the use of iommu_{v,}call() in debug builds
too. Switch to using an ASSERT() as all forms should resolve to &iommu_ops.
Move the root iommu_ops from __read_mostly to __ro_after_init now that the
latter exists.
Since c/s 3330013e6739 ("VT-d / x86: re-arrange cache syncing"), vtd_ops is
not modified and doesn't need a forward declaration, so we can use
__initconst_cf_clobber for both VT-d and AMD.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 30 Nov 2021 21:31:55 +0000 (21:31 +0000)]
x86/vpmu: Harden indirect branches
As all function pointer calls are resolved to direct calls on boot, clobber
the endbr64 instructions too to make life harder for an attacker which has
managed to hijack a function pointer.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Sun, 7 Nov 2021 11:35:50 +0000 (11:35 +0000)]
x86/ucode: Use altcall, and __initconst_cf_clobber
Microcode loading is not a fastpath, but there are control flow integrity
hardening benefits from using altcall, because it allows us to clobber the
endbr64 instructions on all function pointer targets.
Convert the existing microcode_ops pointer into an __ro_after_init structure,
and move {amd,intel}_ucode_ops into __initconst_cf_clobber.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 14 Feb 2022 12:12:13 +0000 (12:12 +0000)]
x86/hvm: Use __initdata_cf_clobber for hvm_funcs
Now that all calls through hvm_funcs are fully altcall'd, harden all the svm
and vmx function pointer targets. This drops 106 endbr64 instructions.
Clobbering does come with a theoretical risk. The non-pointer fields of
{svm,vmx}_function_table can in theory happen to form a bit pattern matching a
pointer into .text at a legal endbr64 instruction, but this is expected to be
implausible for anything liable to pass code review.
While at it, move hvm_funcs into __ro_after_init now that this exists.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
of 1655 on an everything-enabled build of Xen, which is ~12%.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Andrew Cooper [Thu, 4 Nov 2021 19:36:23 +0000 (19:36 +0000)]
x86/altcall: Optimise away endbr64 instruction where possible
With altcall, we convert indirect branches into direct ones. With that
complete, none of the potential targets need an endbr64 instruction.
Furthermore, removing the endbr64 instructions is a security defence-in-depth
improvement, because it limits the options available to an attacker who has
managed to hijack a function pointer.
Introduce new .init.{ro,}data.cf_clobber sections. Have _apply_alternatives()
walk over this, looking for any pointers into .text, and clobber an endbr64
instruction if found. This is some minor structure (ab)use but it works
alarmingly well.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 26 Nov 2021 15:42:48 +0000 (15:42 +0000)]
x86/altcall: Check and optimise altcall targets
When converting indirect to direct calls, there is no need to execute endbr64
instructions. Detect and optimise this case, leaving a warning in the case
that no endbr64 was found, as it likely indicates a build error.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 1 Nov 2021 15:17:20 +0000 (15:17 +0000)]
x86: Enable CET Indirect Branch Tracking
With all the pieces now in place, turn CET-IBT on when available.
MSR_S_CET, like SMEP/SMAP, controls Ring1 meaning that ENDBR_EN can't be
enabled for Xen independently of PV32 kernels. As we already disable PV32 for
CET-SS, extend this to all CET, adjusting the documentation/comments as
appropriate.
Introduce a cet=no-ibt command line option to allow the admin to disable IBT
even when everything else is configured correctly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 1 Nov 2021 21:54:26 +0000 (21:54 +0000)]
x86/EFI: Disable CET-IBT around Runtime Services calls
UEFI Runtime services, at the time of writing, aren't CET-IBT compatible.
Work is ongoing to address this. In the meantime, unconditionally disable IBT.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 1 Nov 2021 16:13:29 +0000 (16:13 +0000)]
x86/setup: Rework MSR_S_CET handling for CET-IBT
CET-SS and CET-IBT can be independently controlled, so the configuration of
MSR_S_CET can't be constant any more.
Introduce xen_msr_s_cet_value(), mostly because I don't fancy
writing/maintaining that logic in assembly. Use this in the 3 paths which
alter MSR_S_CET when both features are potentially active.
To active CET-IBT, we only need CR4.CET and MSR_S_CET.ENDBR_EN. This is
common with the CET-SS setup, so reorder the operations to set up CR4 and
MSR_S_CET for any nonzero result from xen_msr_s_cet_value(), and set up
MSR_PL0_SSP and SSP if SHSTK_EN was also set.
Adjust the crash path to disable CET-IBT too.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 1 Nov 2021 17:08:24 +0000 (17:08 +0000)]
x86/entry: Make IDT entrypoints CET-IBT compatible
Each IDT vector needs to land on an endbr64 instruction. This is especially
important for the #CP handler, which will recurse indefinitely if the endbr64
is missing, eventually escalating to #DF if guard pages are active.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 1 Nov 2021 09:51:16 +0000 (09:51 +0000)]
x86/entry: Make syscall/sysenter entrypoints CET-IBT compatible
Each of MSR_{L,C}STAR and MSR_SYSENTER_EIP need to land on an endbr64
instruction. For sysenter, this is easy.
Unfortunately for syscall, the stubs are already 29 byte long with a limit of
32. endbr64 is 4 bytes. Luckily, there is a 1 byte instruction which can
move from the stubs into the main handlers.
Move the push %rax out of the stub and into {l,c}star_entry(), allowing room
for the endbr64 instruction when appropriate. Update the comment describing
the entry state.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Mon, 1 Nov 2021 12:36:33 +0000 (12:36 +0000)]
x86/traps: Rework write_stub_trampoline() to not hardcode the jmp
For CET-IBT, we will need to optionally insert an endbr64 instruction at the
start of the stub. Don't hardcode the jmp displacement assuming that it
starts at byte 24 of the stub.
Also add extra comments describing what is going on. The mix of %rax and %rsp
is far from trivial to follow.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
x86: Build check for embedded endbr64 instructions
An interesting corner case occurs when the byte sequence making up endbr64
ends up on a non-instruction boundary. Such embedded instructions mark legal
indirect branch targets as far as the CPU is concerned, which aren't legal as
far as the logic is concerned.
When CET-IBT is active, check for embedded byte sequences. Example failures
look like:
check-endbr.sh xen-syms Fail: Found 2 embedded endbr64 instructions
0xffff82d040325677: test_endbr64 at /local/xen.git/xen/arch/x86/x86_64/entry.S:28
0xffff82d040352da6: init_done at /local/xen.git/xen/arch/x86/setup.c:675
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 11 Nov 2021 13:09:19 +0000 (13:09 +0000)]
x86: Use control flow typechecking where possible
Now all indirect branch targets have been annotated, turn on typechecking to
catch issues in the future.
This extension isn't in a released version of GCC yet, so provide a container
to use with the extension included, and add it to CI. RANDCONFIG is necessary
because some stubs for compiled-out subsystems are used as function pointer
targets.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 2 Nov 2021 20:58:59 +0000 (20:58 +0000)]
x86/bugframe: CFI hardening
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.
Use cf_check to annotate function pointer targets for the toolchain.
run_in_exception_handler() managed to escape typechecking, as the compiler
can't see where function pointer gets called. After adding some ad-hoc
typechecking, it turns out that dump_execution_state() alone differs in
const-ness from the other users of run_in_exception_handler().
Introduce a new show_execution_state_nonconst() to make the typechecking
happy.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 29 Oct 2021 17:04:02 +0000 (18:04 +0100)]
x86/stack: CFI hardening
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.
Use cf_check to annotate function pointer targets for the toolchain.
The function typecheck in switch_stack_and_jump() is incompatible with control
flow typechecking. It's ok for reset_stack_and_jump_ind(), but for
reset_stack_and_jump(), it would force us to endbr64 the targets which are
branched to directly.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 29 Oct 2021 16:28:04 +0000 (17:28 +0100)]
x86/emul: CFI hardening
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.
Use cf_check to annotate function pointer targets for the toolchain.
pv_emul_is_mem_write() is only used in a single file. Move it out of its
header file, so it doesn't risk being duplicated in multiple translation
units.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 29 Oct 2021 19:15:24 +0000 (20:15 +0100)]
x86/hvm: CFI hardening for hvm_funcs
Control Flow Integrity schemes use toolchain and optionally hardware support
to help protect against call/jump/return oriented programming attacks.
Use cf_check to annotate function pointer targets for the toolchain.
In svm.c, make a few rearrangements. svm_update_guest_cr() has no external
callers so can become static, but needs moving along with svm_fpu_enter() to
avoid a forward declaration. Move svm_fpu_leave() too, to match. Also move
svm_update_guest_efer() to drop its forward declaration.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>