xenbits.xensource.com Git

MAINTAINERS: adjust x86/mm/shadow maintainers

Better reflect reality: Andrew and Jan are active maintainers
and I review patches. Keep myself as a reviewer so I can help
with historical context &c.

Signed-off-by: Tim Deegan <tim@xen.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

AMD/IOMMU: drop command completion timeout

First and foremost - such timeouts were not signaled to callers, making
them believe they're fine to e.g. free previously unmapped pages.

Mirror VT-d's behavior: A fixed number of loop iterations is not a
suitable way to detect timeouts in an environment (CPU and bus speeds)
independent manner anyway. Furthermore, leaving an in-progress operation
pending when it appears to take too long is problematic: If a command
completed later, the signaling of its completion may instead be
understood to signal a subsequently started command's completion.

Log excessively long processing times (with a progressive threshold) to
have some indication of problems in this area. Allow callers to specify
a non-default timeout bias for this logging, using the same values as
VT-d does, which in particular means a (by default) much larger value
for device IO TLB invalidation.

This is part of XSA-373 / CVE-2021-28692.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

AMD/IOMMU: wait for command slot to be available

No caller cared about send_iommu_command() indicating unavailability of
a slot. Hence if a sufficient number prior commands timed out, we did
blindly assume that the requested command was submitted to the IOMMU
when really it wasn't. This could mean both a hanging system (waiting
for a command to complete that was never seen by the IOMMU) or blindly
propagating success back to callers, making them believe they're fine
to e.g. free previously unmapped pages.

Fold the three involved functions into one, add spin waiting for an
available slot along the lines of VT-d's qinval_next_index(), and as a
consequence drop all error indicator return types/values.

This is part of XSA-373 / CVE-2021-28692.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

x86/spec-ctrl: Mitigate TAA after S3 resume

The user chosen setting for MSR_TSX_CTRL needs restoring after S3.

All APs get the correct setting via start_secondary(), but the BSP was missed
out.

This is XSA-377 / CVE-2021-28690.

Fixes: 8c4330818f6 ("x86/spec-ctrl: Mitigate the TSX Asynchronous Abort sidechannel")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/spec-ctrl: Protect against Speculative Code Store Bypass

Modern x86 processors have far-better-than-architecturally-guaranteed self
modifying code detection.  Typically, when a write hits an instruction in
flight, a Machine Clear occurs to flush stale content in the frontend and
backend.

For self modifying code, before a write which hits an instruction in flight
retires, the frontend can speculatively decode and execute the old instruction
stream.  Speculation of this form can suffer from type confusion in registers,
and potentially leak data.

Furthermore, updates are typically byte-wise, rather than atomic.  Depending
on timing, speculation can race ahead multiple times between individual
writes, and execute the transiently-malformed instruction stream.

Xen has stubs which are used in certain cases for emulation purposes.  Inhibit
speculation between updating the stub and executing it.

This is XSA-375 / CVE-2021-0089.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

VT-d: eliminate flush related timeouts

Leaving an in-progress operation pending when it appears to take too
long is problematic: If e.g. a QI command completed later, the write to
the "poll slot" may instead be understood to signal a subsequently
started command's completion. Also our accounting of the timeout period
was actually wrong: We included the time it took for the command to
actually make it to the front of the queue, which could be heavily
affected by guests other than the one for which the flush is being
performed.

Do away with all timeout detection on all flush related code paths.
Log excessively long processing times (with a progressive threshold) to
have some indication of problems in this area.

Additionally log (once) if qinval_next_index() didn't immediately find
an available slot. Together with the earlier change sizing the queue(s)
dynamically, we should now have a guarantee that with our fully
synchronous model any demand for slots can actually be satisfied.

This is part of XSA-373 / CVE-2021-28692.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

AMD/IOMMU: size command buffer dynamically

With the present synchronous model, we need two slots for every
operation (the operation itself and a wait command). There can be one
such pair of commands pending per CPU. To ensure that under all normal
circumstances a slot is always available when one is requested, size the
command ring according to the number of present CPUs.

This is part of XSA-373 / CVE-2021-28692.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

VT-d: size qinval queue dynamically

With the present synchronous model, we need two slots for every
operation (the operation itself and a wait descriptor). There can be
one such pair of requests pending per CPU. To ensure that under all
normal circumstances a slot is always available when one is requested,
size the queue ring according to the number of present CPUs.

This is part of XSA-373 / CVE-2021-28692.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>

xen/arm: Boot modules should always be scrubbed if bootscrub={on, idle}

The function to initialize the pages (see init_heap_pages()) will request
scrub when the admin request idle bootscrub (default) and state ==
SYS_STATE_active. When bootscrub=on, Xen will scrub any free pages in
heap_init_late().

Currently, the boot modules (e.g. kernels, initramfs) will be discarded/
freed after heap_init_late() is called and system_state switched to
SYS_STATE_active. This means the pages associated with the boot modules
will not get scrubbed before getting re-purposed.

If the memory is assigned to an untrusted domU, it may be able to
retrieve secrets from the modules.

This is part of XSA-372 / CVE-2021-28693.

Fixes: 1774e9b1df27 ("xen/arm: introduce create_domUs")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>

xen/arm: Create dom0less domUs earlier

In a follow-up patch we will need to unallocate the boot modules
before heap_init_late() is called.

The modules will contain the domUs kernel and initramfs. Therefore Xen
will need to create extra domUs (used by dom0less) before heap_init_late().

This has two consequences on dom0less:
    1) Domains will not be unpaused as soon as they are created but
    once all have been created. However, Xen doesn't guarantee an order
    to unpause, so this is not something one could rely on.

    2) The memory allocated for a domU will not be scrubbed anymore when an
    admin select bootscrub=on. This is not something we advertised, but if
    this is a concern we can introduce either force scrub for all domUs or
    a per-domain flag in the DT. The behavior for bootscrub=off and
    bootscrub=idle (default) has not changed.

This is part of XSA-372 / CVE-2021-28693.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Tested-by: Stefano Stabellini <sstabellini@kernel.org>

x86/cpuid: Half revert "x86/cpuid: Drop special_features[]"

xen-cpuid does print out the list of special features, and this is helpful to
keep.

Fixes: ba6950fb070 ("x86/cpuid: Drop special_features[]")
Reported-by: Jan Beulich <JBeulich@suse.com>
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

evtchn: type adjustments

First of all avoid "long" when "int" suffices, i.e. in particular when
merely conveying error codes. 32-bit values are slightly cheaper to
deal with on x86, and their processing is at least no more expensive on
Arm. Where possible use evtchn_port_t for port numbers and unsigned int
for other unsigned quantities in adjacent code. In evtchn_set_priority()
eliminate a local variable altogether instead of changing its type.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

evtchn: add helper for port_is_valid() + evtchn_from_port()

The combination is pretty common, so adding a simple local helper seems
worthwhile. Make it const- and type-correct, in turn requiring the
two called function to also be const-correct (and at this occasion also
make them type-correct).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

evtchn: slightly defer lock acquire where possible

port_is_valid() and evtchn_from_port() are fine to use without holding
any locks. Accordingly acquire the per-domain lock slightly later in
evtchn_close() and evtchn_bind_vcpu(). Especially for the use by the
former (but there are pre-existing uses) add a comment about
port_is_valid()'s guarantees.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Julien Grall <julien@xen.org>

tools/firmware/ovmf: Use OvmfXen platform file is exist

A platform introduced in EDK II named OvmfXen is now the one to use for
Xen instead of OvmfX64. It comes with PVH support.

Also, the Xen support in OvmfX64 is deprecated,
"deprecation notice: *dynamic* multi-VMM (QEMU vs. Xen) support in OvmfPkg"
https://edk2.groups.io/g/devel/message/75498

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>

tools/libs/guest: fix save and restore of pv domains after 32-bit de-support

After 32-bit PV-guests have been security de-supported when not running
under PV-shim, the hypervisor will no longer be configured to support
those domains per default when not being built as PV-shim.

Unfortunately libxenguest will fail saving or restoring a PV domain
due to this restriction, as it is trying to get the compat MFN list
even for 64 bit guests.

Fix that by obtaining the compat MFN list only for 32-bit PV guests.

Fixes: 1a0f2fe2297d122a08fe ("SUPPORT.md: Un-shimmed 32-bit PV guests are no longer supported")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/cpuid: Drop special_features[]

While the ! annotation is useful to indicate that something special is
happening, an array of bits is not. Drop it, to prevent mistakes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/cpuid: Fix HLE and RTM handling (again)

For reasons which are my fault, but I don't recall why, the
FDP_EXCP_ONLY/NO_FPU_SEL adjustment uses the whole special_features[] array
element, not the two relevant bits.

HLE and RTM were recently added to the list of special features, causing them
to be always set in guest view, irrespective of the toolstacks choice on the
matter.

Rewrite the logic to refer to the features specifically, rather than relying
on the contents of the special_features[] array.

Fixes: 8fe24090d9 ("x86/cpuid: Rework HLE and RTM handling")
Reported-by: Edwin Török <edvin.torok@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

docs: release-technician-checklist: update to leaf tree version pinning

Our releases look to flip-flop between keeping or discarding the date
and title of the referenced qemu-trad commit. I think with the hash
replaced by a tag, the commit's date and title would better also be
purged.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>

xen: credit2: fix per-entity load tracking when continuing running

If we schedule, and the current vCPU continues to run, its statistical
load is not properly updated, resulting in something like this, even if
all the 8 vCPUs are 100% busy:

(XEN) Runqueue 0:
(XEN) [...]
(XEN)   aveload            = 2097152 (~800%)
(XEN) [...]
(XEN)   Domain: 0 w 256 c 0 v 8
(XEN)     1: [0.0] flags=2 cpu=4 credit=9996885 [w=256] load=35 (~0%)
(XEN)     2: [0.1] flags=2 cpu=2 credit=9993725 [w=256] load=796 (~0%)
(XEN)     3: [0.2] flags=2 cpu=1 credit=9995885 [w=256] load=883 (~0%)
(XEN)     4: [0.3] flags=2 cpu=5 credit=9998833 [w=256] load=487 (~0%)
(XEN)     5: [0.4] flags=2 cpu=6 credit=9998942 [w=256] load=1595 (~0%)
(XEN)     6: [0.5] flags=2 cpu=0 credit=9994669 [w=256] load=22 (~0%)
(XEN)     7: [0.6] flags=2 cpu=7 credit=9997706 [w=256] load=0 (~0%)
(XEN)     8: [0.7] flags=2 cpu=3 credit=9992440 [w=256] load=0 (~0%)

As we can see, the average load of the runqueue as a whole is, instead,
computed properly.

This issue would, in theory, potentially affect Credit2 load balancing
logic. In practice, however, the problem only manifests (at least with
these characteristics) when there is only 1 runqueue active in the
cpupool, which also means there is no need to do any load-balancing.

Hence its real impact is pretty much limited to wrong per-vCPU load
percentages, when looking at the output of the 'r' debug-key.

With this patch, the load is updated and displayed correctly:

(XEN) Runqueue 0:
(XEN) [...]
(XEN)   aveload            = 2097152 (~800%)
(XEN) [...]
(XEN) Domain info:
(XEN)   Domain: 0 w 256 c 0 v 8
(XEN)     1: [0.0] flags=2 cpu=4 credit=9995584 [w=256] load=262144 (~100%)
(XEN)     2: [0.1] flags=2 cpu=6 credit=9992992 [w=256] load=262144 (~100%)
(XEN)     3: [0.2] flags=2 cpu=3 credit=9998918 [w=256] load=262118 (~99%)
(XEN)     4: [0.3] flags=2 cpu=5 credit=9996867 [w=256] load=262144 (~100%)
(XEN)     5: [0.4] flags=2 cpu=1 credit=9998912 [w=256] load=262144 (~100%)
(XEN)     6: [0.5] flags=2 cpu=2 credit=9997842 [w=256] load=262144 (~100%)
(XEN)     7: [0.6] flags=2 cpu=7 credit=9994623 [w=256] load=262144 (~100%)
(XEN)     8: [0.7] flags=2 cpu=0 credit=9991815 [w=256] load=262144 (~100%)

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>

credit2: make sure we pick a runnable unit from the runq if there is one

A !runnable unit (temporarily) present in the runq may cause us to
stop scanning the runq itself too early. Of course, we don't run any
non-runnable vCPUs, but we end the scan and we fallback to picking
the idle unit. In other word, this prevent us to find there and pick
the actual unit that we're meant to start running (which might be
further ahead in the runq).

Depending on the vCPU pinning configuration, this may lead to such
unit to be stuck in the runq for long time, causing malfunctioning
inside the guest.

Fix this by checking runnable/non-runnable status up-front, in the runq
scanning function.

Reported-by: Michał Leszczyński <michal.leszczynski@cert.pl>
Reported-by: Dion Kant <g.w.kant@hunenet.nl>
Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@citrix.com>

tools/libs/guest: make some definitions private to libxenguest

There are some definitions which are used in libxenguest only now.
Move them from libxenctrl over to libxenguest.

Remove an unused macro.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>

tools/libs: move xc_core* from libxenctrl to libxenguest

The functionality in xc_core* should be part of libxenguest instead
of libxenctrl. Users are already either in libxenguest, or in xl.
There is one single exception: xc_core_arch_auto_translated_physmap()
is being used by xc_domain_memory_mapping(), which is used by qemu.
So leave the xc_core_arch_auto_translated_physmap() functionality in
libxenctrl.

This will make it easier to merge common functionality of xc_core*
and xg_sr_save*.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>

tools/libs: move xc_resume.c to libxenguest

The guest suspend functionality is already part of libxenguest. Move
the resume functionality from libxenctrl to libxenguest, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>

tools/libs/ctrl: use common p2m mapping code in xc_domain_resume_any()

Instead of open coding the mapping of the p2m list use the already
existing xc_core_arch_map_p2m() call, especially as the current code
does not support guests with the linear p2m map. It should be noted
that this code is needed for colo/remus only.

Switching to xc_core_arch_map_p2m() drops the need to bail out for
bitness of tool stack and guest differing.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Wei Liu <wl@xen.org>

tools/libs/ctrl: fix xc_core_arch_map_p2m() to support linear p2m table

The core of a pv linux guest produced via "xl dump-core" is nor usable
as since kernel 4.14 only the linear p2m table is kept if Xen indicates
it is supporting that. Unfortunately xc_core_arch_map_p2m() is still
supporting the 3-level p2m tree only.

Fix that by copying the functionality of map_p2m() from libxenguest to
libxenctrl.

Additionally the mapped p2m isn't of a fixed length now, so the
interface to the mapping functions needs to be adapted. In order not to
add even more parameters, expand struct domain_info_context and use a
pointer to that as a parameter.

Fixes: dc6d60937121 ("libxc: set flag for support of linear p2m list in domain builder")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>

tools/libs/guest: fix max_pfn setting in map_p2m()

When setting the highest pfn used in the guest, don't subtract 1 from
the value read from the shared_info data. The value read already is
the correct pfn.

Fixes: 91e204d37f449 ("libxc: try to find last used pfn when migrating")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Wei Liu <wl@xen.org>

SUPPORT.md: Un-shimmed 32-bit PV guests are no longer supported

The support status of 32-bit guests doesn't seem particularly useful.

With it changed to fully unsupported outside of PV-shim, adjust the PV32
Kconfig default accordingly.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>

xen/page_alloc: Remove dead code in alloc_domheap_pages()

Since commit 1aac966e24e9 "xen: support RAM at addresses 0 and 4096",
bits_to_zone() will never return 0 and it is expected that we have
minimum 2 zones.

Therefore the check in alloc_domheap_pages() is unnecessary and can
be removed. However, for sanity, it is replaced with an ASSERT().

Also take the opportunity to switch from min_t() to min() as
bits_to_zone() cannot return a negative value. The macro is tweaked
to make it clearer.

This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86/mtrr: remove stale function prototype

Fixes: 1c84d04673 ('VMX: remove the problematic set_uc_mode logic')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

x86/tboot: adjust UUID check

Replace a bogus cast, move the static variable into the only function
using it, and add __initconst. While there, also remove a pointless NULL
check.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>

x86/tboot: include all valid frame table entries in S3 integrity check

The difference of two pdx_to_page() return values is a number of pages,
not the number of bytes covered by the corresponding frame table entries.

Fixes: 3cb68d2b59ab ("tboot: fix S3 issue for Intel Trusted Execution Technology.")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Lukasz Hawrylko <lukasz.hawrylko@linux.intel.com>

common: guard iommu symbols with CONFIG_HAS_PASSTHROUGH

The variables iommu_enabled and iommu_dont_flush_iotlb are defined in
drivers/passthrough/iommu.c and are referenced in common code, which
causes the link to fail when !CONFIG_HAS_PASSTHROUGH.

Guard references to these variables in common code so that xen
builds when !CONFIG_HAS_PASSTHROUGH.

Signed-off-by: Connor Davis <connojdavis@gmail.com>
[jb: further massage xen/iommu.h adjustment]
Acked-by: Jan Beulich <jbeulich@suse.com>

libelf: improve PVH elfnote parsing

Pass an hvm boolean parameter to the elf note checking routines, so that
better checking can be done in case libelf is dealing with an hvm
container.

elf_xen_note_check shouldn't return early unless PHYS32_ENTRY is set
and the container is of type HVM, or else the loader and version
checks would be avoided for kernels intended to be booted as PV but
that also have PHYS32_ENTRY set.

Adjust elf_xen_addr_calc_check so that the virtual addresses are
actually physical ones (by setting virt_base and elf_paddr_offset to
zero) when the container is of type HVM, as that container is always
started with paging disabled.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

libelf: don't attempt to parse __xen_guest for PVH

The legacy __xen_guest section doesn't support the PHYS32_ENTRY
elfnote, so it's pointless to attempt to parse the elfnotes from that
section when called from an hvm container.

Pass an hvm boolean parameter to the elf note parsing routine, so that
the respective parsing can be suppressed in case libelf is dealing with
an hvm container.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>

x86: fix build race when generating temporary object files (take 2)

The original commit wasn't quite sufficient: Emptying DEPS is helpful
only when nothing will get added to it subsequently. xen/Rules.mk will,
after including the local Makefile, amend DEPS by dependencies for
objects living in sub-directories though. For the purpose of suppressing
dependencies of the makefiles on the .*.d2 files (and thus to avoid
their re-generation) it is, however, not necessary at all to play with
DEPS. Instead we can override DEPS_INCLUDE (which generally is a late-
expansion variable).

Fixes: 761bb575ce97 ("x86: fix build race when generating temporary object files")
Signed-off-by: Jan Beulich <jbeulich@suse.com>

x86/tsx: Deprecate vpmu=rtm-abort and use tsx=<bool> instead

This reuses the rtm_disable infrastructure, so CPUID derivation works properly
when TSX is disabled in favour of working PCR3.

vpmu= is not a supported feature, and having this functionality under tsx=
centralises all TSX handling.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

x86/tsx: Minor cleanup and improvements

* Introduce cpu_has_arch_caps and replace boot_cpu_has(X86_FEATURE_ARCH_CAPS)
* Read CPUID data into the appropriate boot_cpu_data.x86_capability[]
element, as subsequent changes are going to need more cpu_has_* logic.
* Use the hi/lo MSR helpers, which substantially improves code generation.

No practical change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

x86/cpuid: Rework HLE and RTM handling

The TAA mitigation offered the option to hide the HLE and RTM CPUID bits,
which has caused some migration compatibility problems.

These two bits are special.  Annotate them with ! to emphasise this point.

Hardware Lock Elision (HLE) may or may not be visible in CPUID, but is
disabled in microcode on all CPUs, and has been removed from the architecture.
Do not advertise it to VMs by default.

Restricted Transactional Memory (RTM) may or may not be visible in CPUID, and
may or may not be configured in force-abort mode.  Have tsx_init() note
whether RTM has been configured into force-abort mode, so
guest_common_feature_adjustments() can conditionally hide it from VMs by
default.

The host policy values for HLE/RTM may or may not be set, depending on any
previous running kernel's choice of visibility, and Xen's choice.  TSX is
available on any CPU which enumerates a TSX-hiding mechanism, so instead of
doing a two-step to clobber any hiding, scan CPUID, then set the visibility,
just force visibility of the bits in the first place.

With the HLE/RTM bits now unilaterally visible in the host policy,
xc_cpuid_apply_policy() can construct a more appropriate policy out of thin
air for pre-4.13 VMs with no CPUID data in their migration stream, and
specifically one where HLE/RTM doesn't potentially disappear behind the back
of a running VM.

Fixes: 8c4330818f6 ("x86/spec-ctrl: Mitigate the TSX Asynchronous Abort sidechannel")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

x86: make hypervisor build with gcc11

Gcc 11 looks to make incorrect assumptions about valid ranges that
pointers may be used for addressing when they are derived from e.g. a
plain constant. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100680.

Utilize RELOC_HIDE() to work around the issue, which for x86 manifests
in at least
- mpparse.c:efi_check_config(),
- tboot.c:tboot_probe(),
- tboot.c:tboot_gen_frametable_integrity(),
- x86_emulate.c:x86_emulate() (at -O2 only).
The last case is particularly odd not just because it only triggers at
higher optimization levels, but also because it only affects one of at
least three similar constructs. Various "note" diagnostics claim the
valid index range to be [0, 2⁶³-1].

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

firmware/shim: UNSUPPORTED=n

We shouldn't default to include any unsupported code in the shim. Mark
the setting as off, replacing the ARGO specification. This points out
anomalies with the scheduler configuration: Unsupported schedulers
better don't default to Y in release builds (like is already the case
for ARINC653). Without at least the SCHED_NULL adjustments, the shim
would suddenly build with RTDS as its default scheduler.

As a result, the SCHED_NULL setting can also be dropped from defconfig.

Clearly with the shim defaulting to it, SCHED_NULL must be supported at
least there.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

tools/xenstored: Remove unused parameter in check_domains()

The parameter of check_domains() is not used within the function. In fact,
this was a left over of the original implementation as the version merged
doesn't need to know whether we are restoring.

So remove it.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

tools/console: Use const whenever we point to literal strings

Literal strings are not meant to be modified. So we should use const
char * rather than char * when we want to store a pointer to them.

Take the opportunity to remove the cast (char *) in console_init(). It
is unnecessary and will remove the const.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Wei Liu <wl@xen.org>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>

xen/char: console: Use const whenever we point to literal strings

Literal strings are not meant to be modified. So we should use const
char * rather than char * when we want to store a pointer to them.

The array should also not be modified at all and is only used by
xenlog_update_val(). So take the opportunity to add an extra const and
move the definition in the function.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

firmware/shim: drop XEN_CONFIG_EXPERT uses

As of commit d155e4aef35c ("xen: Allow EXPERT mode to be selected from
the menuconfig directly") EXPERT is a regular config option (which the
shim default config also enables).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <rogerpau@citrix.com>

firmware/shim: update linkfarm exclusions

Some intermediate files weren't considered at all at the time. Also
after its introduction, various changes to the build environment have
rendered the exclusion sets stale. For example, we now have some .*.cmd
files in the build tree. Combine all respective patterns into a single
.* one, seeing that we don't have any actual source files matching this
pattern in the tree. Add other patterns as well as individual files.
Also introduce LINK_EXCLUDE_PATHS to deal with entire directories full
of generated headers as well as a few specific files the names of which
are too generic to list under LINK_EXCLUDES.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

x86/guest: fix build when HVM and !PV32

The commit referenced below still wasn't careful enough - with COMPAT we
will have a compat_handle_okay() visible already, which we first need to
get rid of.

Fixes: bd1e7b47bac0 ("x86/shim: fix build when !PV32")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

automation: Add container for riscv64 builds

Add a container for cross-compiling xen to riscv64.
This just includes the cross-compiler and necessary packages for
building xen itself (packages for tools, stubdoms, etc., can be
added later).

Signed-off-by: Connor Davis <connojdavis@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

x86/shadow: fix DO_UNSHADOW()

When adding the HASH_CALLBACKS_CHECK() I failed to properly recognize
the (somewhat unusually formatted) if() around the call to
hash_domain_foreach()). Gcc 11 is absolutely right in pointing out the
apparently misleading indentation. Besides adding the missing braces,
also adjust the two oddly formatted if()-s in the macro.

Fixes: 90629587e16e ("x86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Tim Deegan <tim@xen.org>

automation: fix dependencies on openSUSE Tumbleweed containers

Fix the build inside our openSUSE Tumbleweed container by using
adding libzstd headers. While there, remove the explicit dependency
for python and python3 as the respective -devel packages will pull
them in anyway.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

automation: use DOCKER_CMD for building containers too

Use DOCKER_CMD from the environment (if defined) in the containers'
makefile too, so that, e.g., when doing `export DOCKED_CMD=podman`
podman is used for building the containers too.

Signed-off-by: Dario Faggioli <dfaggioli@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

tools/libs: guest: Fix Arm build after 8fc4916daf2a

Gitlab CI spotted an issue when building the tools Arm:

xg_dom_arm.c: In function 'meminit':
xg_dom_arm.c:401:50: error: passing argument 3 of 'set_mode' discards 'const' qualifier from pointer target type [-Werror=discarded-qualifiers]
401 | rc = set_mode(dom->xch, dom->guest_domid, dom->guest_type);
| ~~~^~~~~~~~~~~~

This is because the const was not propagated in the Arm code. Fix it
by constifying the 3rd parameter of set_mode().

Fixes: 8fc4916daf2a ("tools/libs: guest: Use const whenever we point to literal strings")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>

tools/xenmon: xenbaked: Mark const the field text in stat_map_t

The field text in stat_map_t will point to string literals. So mark it
as const to allow the compiler to catch any modified of the string.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>

tools/top: The string parameter in set_prompt() and set_delay() should be const

Neither string parameter in set_prompt() and set_delay() are meant to
be modified. In particular, new_prompt can point to a literal string.

So mark the two parameters as const and propagate it.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>

tools/misc: Use const whenever we point to literal strings

literal strings are not meant to be modified. So we should use const
char * rather than char * when we we to store a pointer to them.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>

tools/libs: stat: Use const whenever we point to literal strings

literal strings are not meant to be modified. So we should use const
char * rather than char * when we want to store a pointer to them.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>

tools/libs: guest: Use const whenever we point to literal strings

literal strings are not meant to be modified. So we should use const
*char rather than char * when we want to store a pointer to them.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Wei Liu <wl@xen.org>

tools/xenstore: simplify xenstored main loop

The main loop of xenstored is rather complicated due to different
handling of socket and ring-page interfaces. Unify that handling by
introducing interface type specific functions can_read() and
can_write().

Take the opportunity to remove the empty list check before calling
write_messages() because the function is already able to cope with an
empty list.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

tools/xenstore: move per connection read and write func hooks into a struct

Put the interface type specific functions into an own structure and let
struct connection contain only a pointer to that new function vector.

Don't even define the socket based functions in case of NO_SOCKETS
(Mini-OS).

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

tools/xenstore: cleanup Makefile and gitignore

The Makefile of xenstore and related to that the global .gitignore
file contain some leftovers from ancient times. Remove those.

While at it sort the tools/xenstore/* entries in .gitignore.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>

xen/arm: kernel: Propagate the error if we fail to decompress the kernel

Currently, we are ignoring any error from perform_gunzip() and replacing
the compressed kernel with the "uncompressed" kernel.

If there is a gzip failure, then it means that the output buffer may
contain garbagge. So it can result to various sort of behavior that may
be difficult to root cause.

In case of failure, free the output buffer and propagate the error.
We also need to adjust the return check for kernel_compress() as
perform_gunzip() may return a positive value.

Take the opportunity to adjust the code style for the check.

Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>

xen: fix build when !CONFIG_GRANT_TABLE

Move struct grant_table; in grant_table.h above
ifdef CONFIG_GRANT_TABLE. This fixes the following:

/build/xen/include/xen/grant_table.h:84:50: error: 'struct grant_table'
declared inside parameter list will not be visible outside of this
definition or declaration [-Werror]
84 | static inline int mem_sharing_gref_to_gfn(struct grant_table *gt,
|

Signed-off-by: Connor Davis <connojdavis@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

include/public: add RING_RESPONSE_PROD_OVERFLOW macro

Add a new RING_RESPONSE_PROD_OVERFLOW() macro for being able to
detect an ill-behaved backend tampering with the response producer
index.

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Jan Beulich <jbeulich@suse.com>

Argo/XSM: add SILO hooks

In SILO mode restrictions for inter-domain communication should apply
here along the lines of those for evtchn and gnttab.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

x86/shim: fix build when !PV32

In this case compat headers don't get generated (and aren't needed).
The changes made by 527922008bce ("x86: slim down hypercall handling
when !PV32") also weren't quite sufficient for this case.

Try to limit #ifdef-ary by introducing two "fallback" #define-s.

Fixes: d23d792478db ("x86: avoid building COMPAT code when !HVM && !PV32")
Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

x86emul: fix test harness build for gas 2.36

All of the sudden, besides .text and .rodata and alike, an always
present .note.gnu.property section has appeared. This section, when
converting to binary format output, gets placed according to its
linked address, causing the resulting blobs to be about 128Mb in size.
The resulting headers with a C representation of the binary blobs then
are, of course all a multiple of that size (and take accordingly long
to create). I didn't bother waiting to see what size the final
test_x86_emulator binary then would have had.

See also https://sourceware.org/bugzilla/show_bug.cgi?id=27753.

Rather than figuring out whether gas supports -mx86-used-note=, simply
remove the section while creating *.bin.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

x86/AMD: also determine L3 cache size

For Intel CPUs we record L3 cache size, hence we should also do so for
AMD and alike.

While making these additions, also make sure (throughout the function)
that we don't needlessly overwrite prior values when the new value to be
stored is zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

build: centralize / unify asm-offsets generation

Except for an additional prereq Arm and x86 have the same needs here,
and Arm can also benefit from the recent x86 side improvement. Recurse
into arch/*/ only for a phony include target (doing nothing on Arm),
and handle asm-offsets itself entirely locally to xen/Makefile.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>

Revert "x86/PV32: avoid TLB flushing after mod_l3_entry()" and "x86/PV: restrict TLB flushing after mod_l[234]_entry()"

These reintroduce XSA-286 / CVE-2018-15469, as confirmed by the xsa-286 XTF
test run by OSSTest.

The TLB flushing is for Xen's correctness, not the guest's.

The text in c/s bed7e6cad30 is technically correct, from the guests point of
view, but clearly false as far as XSA-286 is concerned. That said, it is
edcfce55917 which introduced the regression, which demonstrates that the
reasoning is flawed.

This reverts commit bed7e6cad30ec8db0c9ce9a1676856e9dc4c39da.
This reverts commit edcfce55917bb412f986d7b28358f6ef155b3664.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

xen/arm: gic-v3: Add missing breaks gicv3_read_apr()

Commit 78e67c99eb3f "arm/gic: Get rid of READ/WRITE_SYSREG32"
mistakenly converted all the cases in gicv3_read_apr() to fall-through.

Rather than re-instating a return per case, add the missing break and
keep a single return at the end of the fucntion.

Fixes: 78e67c99eb3f ("arm/gic: Get rid of READ/WRITE_SYSREG32")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>

tools: remove unused sysconfig variable XENSTORED_ROOTDIR

The sysconfig variable XENSTORED_ROOTDIR is not used anymore.
It used to point to a directory with tdb files, which is now a tmpfs.

In case the database is not in tmpfs, like on sysv and BSD systems,
xenstored will truncate existing database files during start.

Fixes: 2ef6ace428 ("tools: don't remove tdb data base file before starting xenstored")
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Wei Liu <wl@xen.org>

optee: enable OPTEE_SMC_SEC_CAP_MEMREF_NULL capability

OP-TEE mediator already have support for NULL memory references. It
was added in patch 0dbed3ad336 ("optee: allow plain TMEM buffers with
NULL address"). But it does not propagate
OPTEE_SMC_SEC_CAP_MEMREF_NULL capability flag to a guest, so well
behaving guest can't use this feature.

Note: linux optee driver honors this capability flag when handling
buffers from userspace clients, but ignores it when working with
internal calls. For instance, __optee_enumerate_devices() function
uses NULL argument to get buffer size hint from OP-TEE. This was the
reason, why "optee: allow plain TMEM buffers with NULL address" was
introduced in the first place.

This patch adds the mentioned capability to list of known
capabilities. From Linux point of view it means that userspace clients
can use this feature, which is confirmed by OP-TEE test suite:

* regression_1025 Test memref NULL and/or 0 bytes size
o regression_1025.1 Invalid NULL buffer memref registration
  regression_1025.1 OK
o regression_1025.2 Input/Output MEMREF Buffer NULL - Size 0 bytes
  regression_1025.2 OK
o regression_1025.3 Input MEMREF Buffer NULL - Size non 0 bytes
  regression_1025.3 OK
o regression_1025.4 Input MEMREF Buffer NULL over PTA invocation
  regression_1025.4 OK
  regression_1025 OK

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Julien Grall <jgrall@amazon.com>

tools/xenstore: Fix indentation in the header of xenstored_control.c

Commit e867af081d94 "tools/xenstore: save new binary for live update"
seemed to have spuriously changed the indentation of the first line of
the copyright header.

The previous indentation is re-instated so all the lines are indented
the same.

Reported-by: Bjoern Doebel <doebel@amazon.com>
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>

tools/xenstored: Prevent a buffer overflow in dump_state_node_perms()

ASAN reported one issue when Live Updating Xenstored:

=================================================================
==873==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffc194f53e0 at pc 0x555c6b323292 bp 0x7ffc194f5340 sp 0x7ffc194f5338
WRITE of size 1 at 0x7ffc194f53e0 thread T0
    #0 0x555c6b323291 in dump_state_node_perms xen/tools/xenstore/xenstored_core.c:2468
    #1 0x555c6b32746e in dump_state_special_node xen/tools/xenstore/xenstored_domain.c:1257
    #2 0x555c6b32a702 in dump_state_special_nodes xen/tools/xenstore/xenstored_domain.c:1273
    #3 0x555c6b32ddb3 in lu_dump_state xen/tools/xenstore/xenstored_control.c:521
    #4 0x555c6b32e380 in do_lu_start xen/tools/xenstore/xenstored_control.c:660
    #5 0x555c6b31b461 in call_delayed xen/tools/xenstore/xenstored_core.c:278
    #6 0x555c6b32275e in main xen/tools/xenstore/xenstored_core.c:2357
    #7 0x7f95eecf3d09 in __libc_start_main ../csu/libc-start.c:308
    #8 0x555c6b3197e9 in _start (/usr/local/sbin/xenstored+0xc7e9)

Address 0x7ffc194f53e0 is located in stack of thread T0 at offset 80 in frame
    #0 0x555c6b32713e in dump_state_special_node xen/tools/xenstore/xenstored_domain.c:1232

  This frame has 2 object(s):
    [32, 40) 'head' (line 1233)
    [64, 80) 'sn' (line 1234) <== Memory access at offset 80 overflows this variable

This is happening because the callers are passing a pointer to a variable
allocated on the stack. However, the field perms is a dynamic array, so
Xenstored will end up to read outside of the variable.

Rework the code so the permissions are written one by one in the fd.

Fixes: ed6eebf17d2c ("tools/xenstore: dump the xenstore state for live update")
Signed-off-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>

arm/time,vtimer: Get rid of READ/WRITE_SYSREG32

AArch64 registers are 64bit whereas AArch32 registers
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 registers have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of vtimer structure's member: ctl to register_t.

Add macro CNTFRQ_MASK containing mask for timer clock frequency
field of CNTFRQ_EL0 register.

Modify CNTx_CTL_* macros to return unsigned long instead of
unsigned int as ctl is now of type register_t.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>

arm/page: Get rid of READ/WRITE_SYSREG32

AArch64 registers are 64bit whereas AArch32 registers
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 registers have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify accesses to CTR_EL0 to use READ/WRITE_SYSREG.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

xen/arm: Always access SCTLR_EL2 using READ/WRITE_SYSREG()

The Armv8 specification describes the system register as a 64-bit value
on AArch64 and 32-bit value on AArch32 (same as ARMv7).

Unfortunately, Xen is accessing the system registers using
READ/WRITE_SYSREG32() which means the top 32-bit are clobbered.

This is only a latent bug so far because Xen will not yet use the top
32-bit.

There is also no change in behavior because arch/arm/arm64/head.S will
initialize SCTLR_EL2 to a sane value with the top 32-bit zeroed.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>

arm/p2m: Get rid of READ/WRITE_SYSREG32

AArch64 registers are 64bit whereas AArch32 registers
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 registers have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of vtcr to register_t.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

arm/gic: Get rid of READ/WRITE_SYSREG32

AArch64 registers are 64bit whereas AArch32 registers
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 registers have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify types of following members of struct gic_v3 to register_t:
-vmcr
-sre_el1
-apr0
-apr1

Add new macro GICC_IAR_INTID_MASK containing the mask
for INTID field of ICC_IAR0/1_EL1 register as only the first 23-bits
of IAR contains the interrupt number. The rest are RES0.
Therefore, take the opportunity to mask the bits [23:31] as
they should be used for an IRQ number (we don't know how the top bits
will be used).

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>

arm/gic: Remove member hcr of structure gic_v3

... as it is never used even in the patch introducing it.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>

arm: Modify type of actlr to register_t

AArch64 registers are 64bit whereas AArch32 registers
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 registers have upper 32bit reserved
it does not mean that they can't be widen in the future.

ACTLR_EL1 system register bits are implementation defined
which means it is possibly a latent bug on current HW as the CPU
implementer may already have decided to use the top 32bit.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

arm/domain: Get rid of READ/WRITE_SYSREG32

AArch64 registers are 64bit whereas AArch32 registers
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 registers have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of register cntkctl to register_t.

Thumbee registers are only usable by a 32-bit domain and therefore
we can just store the bottom 32-bit (IOW there is no type change).
In fact, this could technically be restricted to Armv7 HW (the
support was dropped retrospectively in Armv8) but leave it as-is
for now.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

arm64/vfp: Get rid of READ/WRITE_SYSREG32

AArch64 registers are 64bit whereas AArch32 registers
are 32bit or 64bit. MSR/MRS are expecting 64bit values thus
we should get rid of helpers READ/WRITE_SYSREG32
in favour of using READ/WRITE_SYSREG.
We should also use register_t type when reading sysregs
which can correspond to uint64_t or uint32_t.
Even though many AArch64 registers have upper 32bit reserved
it does not mean that they can't be widen in the future.

Modify type of FPCR, FPSR, FPEXC32_EL2 to register_t.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>

tools: add newlines to xenstored WRL_LOG

According to syslog(3) the fmt string does not need a newline.
The mini-os implementation of syslog requires the trailing newline.
Other calls to syslog do include the newline already, add it also to WRL_LOG.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Reviewed-by: Juergen Gross <jgross@suse.com>

vtpm: Correct timeout units and command duration

Add two patches:
vtpm-microsecond-duration.patch fixes the units for timeouts and command
durations.
vtpm-command-duration.patch increases the timeout linux uses to allow
commands to succeed.

Linux works around low timeouts, but not low durations. The second
patch allows commands to complete that often timeout with the lower
command durations.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Check req_len before unpacking command

vtpm_handle_cmd doesn't ensure there is enough space before unpacking
the req buffer. Add a minimum size check. Called functions will have
to do their own checking if they need more data from the request.

The error case is tricky since abort_egress wants to rely with a
corresponding tag. Just hardcode TPM_TAG_RQU_COMMAND since the vtpm is
sending in malformed commands in the first place.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Fix owner_auth & srk_auth parsing

Argument parsing only matches to before ':' and then the string with
leading ':' is passed to parse_auth_string which fails to parse.  Extend
the length to include the seperator in the match.

While here, switch the seperator to "=".  The man page documented "="
and the other tpm.* arguments already use "=".  Since it didn't work
before, we don't need to worry about backwards compatibility.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Remove bogus cast from TPM2_GetRandom

The UINT32 <-> UINT16 casting in TPM2_GetRandom is incorrect. Use a
local UINT16 as needed for the TPM hardware command and assign the
result.

Suggested-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Support GetRandom passthrough on TPM 2.0

GetRandom passthrough currently fails when using vtpmmgr with a hardware
TPM 2.0.
vtpmmgr (8): INFO[VTPM]: Passthrough: TPM_GetRandom
vtpm (12): vtpm_cmd.c:120: Error: TPM_GetRandom() failed with error code (30)

When running on TPM 2.0 hardware, vtpmmgr needs to convert the TPM 1.2
TPM_ORD_GetRandom into a TPM2 TPM_CC_GetRandom command. Besides the
differing ordinal, the TPM 1.2 uses 32bit sizes for the request and
response (vs. 16bit for TPM2).

Place the random output directly into the tpmcmd->resp and build the
packet around it. This avoids bouncing through an extra buffer, but the
header has to be written after grabbing the random bytes so we have the
number of bytes to include in the size.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Shutdown more gracefully

vtpmmgr uses the default, weak app_shutdown, which immediately calls the
shutdown hypercall. This short circuits the vtpmmgr clean up logic. We
need to perform the clean up to actually Flush our key out of the tpm.

Setting do_shutdown is one step in that direction, but vtpmmgr will most
likely be waiting in tpmback_req_any. We need to call shutdown_tpmback
to cancel the wait inside tpmback and perform the shutdown.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibaut@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Flush all transient keys

We're only flushing 2 transients, but there are 3 handles.  Use <= to also
flush the third handle since TRANSIENT_LAST is inclusive

The number of transient handles/keys is hardware dependent, so this
should query for the limit.  And assignment of handles is assumed to be
sequential from the minimum.  That may not be guaranteed, but seems okay
with my tpm2.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Flush transient keys on shutdown

Remove our key so it isn't left in the TPM for someone to come along
after vtpmmgr shutsdown.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Allow specifying srk_handle for TPM2

Bypass taking ownership of the TPM2 if an srk_handle is specified.

This srk_handle must be usable with Null auth for the time being.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Move vtpmmgr_shutdown

Reposition vtpmmgr_shutdown so it can call flush_tpm2 without a forward
declaration.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

stubom: newlib: Enable C99 formats for %z

vtpmmgr was changed to print size_t with the %z modifier, but newlib
isn't compiled with %z support. So you get output like:

root seal: zu; sector of 13: zu
root: zu v=zu
itree: 36; sector of 112: zu
group: zu v=zu id=zu md=zu
group seal: zu; 5 in parent: zu; sector of 13: zu
vtpm: zu+zu; sector of 48: zu

Enable the C99 formats in newlib so vtpmmgr prints the numeric values.

Fixes: 9379af08ccc0 ("stubdom: vtpmmgr: Correctly format size_t with %z when printing.")
Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

vtpmmgr: Print error code to aid debugging

tpm_get_error_name returns "Unknown Error Code" when an error string
is not defined. In that case, we should print the Error Code so it can
be looked up offline. tpm_get_error_name returns a const string, so
just have the two callers always print the error code so it is always
available.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

docs: Warn about incomplete vtpmmgr TPM 2.0 support

The vtpmmgr TPM 2.0 support is incomplete. Add a warning about that to
the documentation so others don't have to work through discovering it is
broken.

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>

tools: fix incorrect suggestions for XENCONSOLED_TRACE on BSD

--log does not take a file, it specifies what is supposed to be logged.

Also separate the XENSTORED and XENCONSOLED variables by a newline.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>

Arm/optee: don't open-code xzalloc_flex_struct()

The current use of xzalloc_bytes() in optee is nearly an open-coded
version of xzalloc_flex_struct(), which was introduced after the driver
was merged.

The main difference is xzalloc_bytes() will also force the allocation to
be SMP_CACHE_BYTES aligned and therefore avoid sharing the cache line.
While sharing the cache line can have an impact on the performance, this
is also true for most of the other users of x*alloc(), x*alloc_array(),
and x*alloc_flex_struct(). So if we want to prevent sharing cache lines,
arranging for this should be done in the allocator itself.

In this case, we don't need stricter alignment than what the allocator
provides. Hence replace the call to xzalloc_bytes() with one of
xzalloc_flex_struct().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Julien Grall <julien@xen.org>
Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>

x86/vhpet: fix RTC special casing

Restore setting the virtual timer callback private data to NULL if the
timer is not level triggered. This fixes the special casing done in
pt_update_irq so that the RTC interrupt when originating from the HPET
is suspended if the interrupt source is masked.

Note the RTC special casing done in pt_update_irq should only apply to
the RTC interrupt originating from the emulated RTC device (which does
set the callback private data), as in that case the callback itself
will destroy the virtual timer if the interrupt is ignored.

While there also use RTC_IRQ instead of 8 when the HPET is configured
in LegacyReplacement Mode.

Fixes: be07023be115 ("x86/vhpet: add support for level triggered interrupts")
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>