xen/pdx: Standardize region validation wrt pdx compression
Regions must be occasionally validated for pdx compression validity. That
is, whether any of the machine addresses spanning the region have a bit set
in the pdx "hole" (which is expected to always contain zeroes). There are
a few such tests through the code, and they all check for different things.
This patch replaces all such occurrences with a call to a centralized
function that checks a region for validity.
Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Tue, 12 Sep 2023 21:31:43 +0000 (22:31 +0100)]
x86/pv: Fix the determiniation of whether to inject #DB
We long ago fixed the emulator to not inject exceptions behind our back.
Therefore, assert that that a PV event (including interrupts, because that
would be buggy too) isn't pending, rather than skipping the #DB injection if
one is.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Nicola reports that the XSA-438 fix introduced new MISRA violations because of
some incidental tidying it tried to do. The parameter is useless, so resolve
the MISRA regression by removing it.
hap_update_cr3() discards the parameter entirely, while sh_update_cr3() uses
it to distinguish internal and external callers and therefore whether the
paging lock should be taken.
However, we have paging_lock_recursive() for this purpose, which also avoids
the ability for the shadow internal callers to accidentally not hold the lock.
Fixes: fb0ff49fe9f7 ("x86/shadow: defer releasing of PV's top-level shadow reference") Reported-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
George Dunlap [Fri, 30 Jun 2023 11:06:32 +0000 (12:06 +0100)]
credit: Don't steal vcpus which have yielded
On large systems with many vcpus yielding due to spinlock priority
inversion, it's not uncommon for a vcpu to yield its timeslice, only
to be immediately stolen by another pcpu looking for higher-priority
work.
To prevent this:
* Keep the YIELD flag until a vcpu is removed from a runqueue
* When looking for work to steal, skip vcpus which have yielded
NB that this does mean that sometimes a VM is inserted into an empty
runqueue; handle that case.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Reviewed-by: Juergen Gross <jgross@suse.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
George Dunlap [Mon, 18 Sep 2023 15:46:47 +0000 (16:46 +0100)]
credit: Limit load balancing to once per millisecond
The credit scheduler tries as hard as it can to ensure that it always
runs scheduling units with positive credit (PRI_TS_UNDER) before
running those with negative credit (PRI_TS_OVER). If the next
runnable scheduling unit is of priority OVER, it will always run the
load balancer, which will scour the system looking for another
scheduling unit of the UNDER priority.
Unfortunately, as the number of cores on a system has grown, the cost
of the work-stealing algorithm has dramatically increased; a recent
trace on a system with 128 cores showed this taking over 50
microseconds.
Add a parameter, load_balance_ratelimit, to limit the frequency of
load balance operations on a given pcpu. Default this to 1
millisecond.
Invert the load balancing conditional to make it more clear, and line
up more closely with the comment above it.
Overall it might be cleaner to have the last_load_balance checking
happen inside csched_load_balance(), but that would require either
passing both now and spc into the function, or looking them up again;
both of which seemed to be worse than simply checking and setting the
values before calling it.
On a system with a vcpu:pcpu ratio of 2:1, running Windows guests
(which will end up calling YIELD during spinlock contention), this
patch increased performance significantly.
Signed-off-by: George Dunlap <george.dunlap@cloud.com> Reviewed-by: Juergen Gross <jgross@suse.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com>
Testing on a Kaby Lake box with 8 CPUs leads to the serial buffer
being filled halfway during dom0 boot, and thus a non-trivial chunk of
Linux boot messages are dropped.
Increasing the buffer to 32K does fix the issue and Linux boot
messages are no longer dropped. There's no justification either on
why 16K was chosen, and hence bumping to 32K in order to cope with
current systems generating output faster does seem appropriate to have
a better user experience with the provided defaults.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Sat, 16 Sep 2023 04:06:49 +0000 (12:06 +0800)]
xen/arm64: head.S: Fix wrong enable_boot_cpu_mm() code movement
Some addressed comments on enable_boot_cpu_mm() were reintroduced
back during the code movement from arm64/head.S to arm64/mmu/head.S.
We should drop the unreachable code, move the 'mov lr, x5' closer to
'b remove_identity_mapping' so it is clearer that it will return,
and update the in-code comment accordingly.
Fixes: 6734327d76be ("xen/arm64: Split and move MMU-specific head.S to mmu/head.S") Reported-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com> Reviewed-by: Ayan Kumar Halder <ayan.kumar.halder@amd.com>
tools/light: Revoke permissions when a PCI detach for HVM domain
Currently, libxl will grant IOMEM, I/O port and IRQ permissions when
a PCI is attached (see pci_add_dm_done()) for all domain types. However,
the permissions are only revoked for non-HVM domain (see do_pci_remove()).
This means that HVM domains will be left with extra permissions. While
this look bad on the paper, the IRQ permissions should be revoked
when the Device Model call xc_physdev_unmap_pirq() and such domain
cannot directly mapped I/O port and IOMEM regions. Instead, this has to
be done by a Device Model.
The Device Model can only run in dom0 or PV stubdomain (upstream libxl
doesn't have support for HVM/PVH stubdomain).
For PV/PVH stubdomain, the permission are properly revoked, so there is
no security concern.
This leaves dom0. There are two cases:
1) Privileged: Anyone gaining access to the Device Model would already
have large control on the host.
2) Deprivileged: PCI passthrough require PHYSDEV operations which
are not accessible when the Device Model is restricted.
So overall, it is believed that the extra permissions cannot be exploited.
Rework the code so the permissions are all removed for HVM domains.
This needs to happen after the QEMU has detached the device. So
the revocation is now moved to pci_remove_detached().
Also add a comment on top of the error message when the PIRQ cannot
be unbind to explain this could be a spurious error as QEMU may have
already done it.
Signed-off-by: Julien Grall <jgrall@amazon.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
README: Remove old note about the build system's python expectation
Changesets 5852ca485263 (build: fix tools/configure in case only python3
exists) and c8a8645f1efe ("xen/build: Automatically locate a suitable python
interpreter") cause the build to look for python3, python and python2 in that
order.
Remove the outdated note from the README.
Signed-off-by: Javi Merino <javi.merino@cloud.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
tools: Don't use distutils in configure or Makefile
Python distutils is deprecated and is going to be removed in Python
3.12. distutils.sysconfig is available as the sysconfig module in
stdlib since Python 2.7 and Python 3.2, so use that directly.
Update the README to reflect that we now depend on Python 2.7.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Javi Merino <javi.merino@cloud.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
[Regen ./configure] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
tools/python: convert setup.py to use setuptools if available
Python distutils is deprecated and is going to be removed in Python
3.12. Add support for setuptools.
Setuptools in Python 3.11 complains:
SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
Keep using setup.py anyway to build the C extension.
Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Signed-off-by: Javi Merino <javi.merino@cloud.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
automation: Add python3's setuptools to some containers
In preparation of supporting both distutils and setuptools, add the
python3 setuptools module to the containers that have recent python3
installations.
Debian Stretch, Ubuntu trusty (14.04), Ubuntu xenial (16.04) and
Ubuntu bionic (18.04) are kept without setuptools on purpose, to test
installations that don't have it.
Centos 7 in particular is kept with python2 only.
Signed-off-by: Javi Merino <javi.merino@cloud.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Wed, 20 Sep 2023 09:31:42 +0000 (10:31 +0100)]
x86/shadow: defer releasing of PV's top-level shadow reference
sh_set_toplevel_shadow() re-pinning the top-level shadow we may be
running on is not enough (and at the same time unnecessary when the
shadow isn't what we're running on): That shadow becomes eligible for
blowing away (from e.g. shadow_prealloc()) immediately after the
paging lock was dropped. Yet it needs to remain valid until the actual
page table switch occurred.
Propagate up the call chain the shadow entry that needs releasing
eventually, and carry out the release immediately after switching page
tables. Handle update_cr3() failures by switching to idle pagetables.
Note that various further uses of update_cr3() are HVM-only or only act
on paused vCPU-s, in which case sh_set_toplevel_shadow() will not defer
releasing of the reference.
While changing the update_cr3() hook, also convert the "do_locking"
parameter to boolean.
This is CVE-2023-34322 / XSA-438.
Reported-by: Tim Deegan <tim@xen.org> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@cloud.com>
Andrew Cooper [Tue, 19 Sep 2023 10:23:34 +0000 (11:23 +0100)]
x86/entry: Partially revert IST-exit checks
The patch adding check_ist_exit() didn't account for the fact that
reset_stack_and_jump() is not an ABI-preserving boundary. The IST-ness in
%r12 doesn't survive into the next context, and is a stale value C.
There's no straightforward way to reconstruct the IST-exit-ness on the
exit-to-guest path after a context switch. For now, we only need IST-exit on
the return-to-Xen path.
Fixes: 21bdc25b05a0 ("x86/entry: Track the IST-ness of an entry for the exit paths") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Add stub function and symbol definitions required by common code. If the
file that the definition is supposed to be located in doesn't already
exist yet, temporarily place its definition in the new stubs.c
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Simon Gaiser [Tue, 19 Sep 2023 09:02:13 +0000 (11:02 +0200)]
x86/ACPI: Fix logging of MADT entries
The recent change to ignore MADT entries with invalid APIC IDs also
affected logging of MADT entries. That's not desired [1] [2], so restore
the old behavior.
Andrew Cooper [Wed, 30 Aug 2023 19:24:25 +0000 (20:24 +0100)]
x86/spec-ctrl: Mitigate the Zen1 DIV leakage
In the Zen1 microarchitecure, there is one divider in the pipeline which
services uops from both threads. In the case of #DE, the latched result from
the previous DIV to execute will be forwarded speculatively.
This is an interesting covert channel that allows two threads to communicate
without any system calls. In also allows userspace to obtain the result of
the most recent DIV instruction executed (even speculatively) in the core,
which can be from a higher privilege context.
Scrub the result from the divider by executing a non-faulting divide. This
needs performing on the exit-to-guest paths, and ist_exit-to-Xen.
Alternatives in IST context is believed safe now that it's done in NMI
context.
This is XSA-439 / CVE-2023-20588.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Fri, 15 Sep 2023 11:13:51 +0000 (12:13 +0100)]
x86/amd: Introduce is_zen{1,2}_uarch() predicates
We already have 3 cases using STIBP as a Zen1/2 heuristic, and are about to
introduce a 4th. Wrap the heuristic into a pair of predicates rather than
opencoding it, and the explanation of the heuristic, at each usage site.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 13 Sep 2023 12:53:33 +0000 (13:53 +0100)]
x86/spec-ctrl: Issue VERW during IST exit to Xen
There is a corner case where e.g. an NMI hitting an exit-to-guest path after
SPEC_CTRL_EXIT_TO_* would have run the entire NMI handler *after* the VERW
flush to scrub potentially sensitive data from uarch buffers.
In order to compensate, issue VERW when exiting to Xen from an IST entry.
SPEC_CTRL_EXIT_TO_XEN already has two reads of spec_ctrl_flags off the stack,
and we're about to add a third. Load the field into %ebx, and list the
register as clobbered.
%r12 has been arranged to be the ist_exit signal, so add this as an input
dependency and use it to identify when to issue a VERW.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 13 Sep 2023 11:20:12 +0000 (12:20 +0100)]
x86/entry: Track the IST-ness of an entry for the exit paths
Use %r12 to hold an ist_exit boolean. This register is zero elsewhere in the
entry/exit asm, so it only needs setting in the IST path.
As this is subtle and fragile, add check_ist_exit() to be used in debugging
builds to cross-check that the ist_exit boolean matches the entry vector.
Write check_ist_exit() it in C, because it's debug only and the logic more
complicated than I care to maintain in asm.
For now, we only need to use this signal in the exit-to-Xen path, but some
exit-to-guest paths happen in IST context too. Check the correctness in all
exit paths to avoid the logic bit-rotting.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 12 Sep 2023 16:03:16 +0000 (17:03 +0100)]
x86/spec-ctrl: Fold DO_SPEC_CTRL_EXIT_TO_XEN into it's single user
With the SPEC_CTRL_EXIT_TO_XEN{,_IST} confusion fixed, it's now obvious that
there's only a single EXIT_TO_XEN path. Fold DO_SPEC_CTRL_EXIT_TO_XEN into
SPEC_CTRL_EXIT_TO_XEN to simplify further fixes.
When merging labels, switch the name to .L\@_skip_sc_msr as "skip" on its own
is going to be too generic shortly.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 12 Sep 2023 14:06:49 +0000 (15:06 +0100)]
x86/spec-ctrl: Fix confusion between SPEC_CTRL_EXIT_TO_XEN{,_IST}
c/s 3fffaf9c13e9 ("x86/entry: Avoid using alternatives in NMI/#MC paths")
dropped the only user, leaving behind the (incorrect) implication that Xen had
split exit paths.
Delete the unused SPEC_CTRL_EXIT_TO_XEN and rename SPEC_CTRL_EXIT_TO_XEN_IST
to SPEC_CTRL_EXIT_TO_XEN for consistency.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Implement bitops.h, based on Linux's implementation as of commit 5321d1b1afb9a17302c6cec79f0cbf823eb0d3fc. Though it is based off of
Linux's implementation, this code diverges significantly in a number of
ways:
- Bitmap entries changed to 32-bit words to match X86 and Arm on Xen
- PPC32-specific code paths dropped
- Formatting completely re-done to more closely line up with Xen.
Including 4 space indentation.
- Use GCC's __builtin_popcount* for hweight* implementation
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Acked-by: Jan Beulich <jbeulich@suse.com>
OpenBSD 7.3 will unconditionally access HWCR if the TSC is reported as
Invariant, and it will then attempt to also unconditionally access PSTATE0 if
HWCR.TscFreqSel is set (currently the case on Xen).
The motivation for exposing HWCR.TscFreqSel was to avoid warning messages from
Linux. It has been agreed that Linux should be changed instead to not
complaint about missing HWCR.TscFreqSel when running virtualized.
The relation between HWCR.TscFreqSel and PSTATE0 is not clearly written down in
the PPR, but it's natural for OSes to attempt to fetch the P0 frequency if the
TSC is stated to increment at the P0 frequency.
Exposing PSTATEn (PSTATE0 at least) with all zeroes is not a suitable solution
because the PstateEn bit is read-write, and OSes could legitimately attempt to
set PstateEn=1 which Xen couldn't handle.
Furthermore, the TscFreqSel bit is model specific and was never safe to expose
like this in the first place. At a minimum it should have had a toolstack
adjustment to know not to migrate such a VM.
Therefore, simply remove the bit. Note the HWCR itself is an architectural
register, and does need to be accessible by the guest. Since HWCR contains
both architectural and non-architectural bits, going forward care must be taken
to assert the exposed value is correct on newer CPU families.
Reported-by: Solène Rapenne <solene@openbsd.org> Link: https://github.com/QubesOS/qubes-issues/issues/8502 Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 18 Sep 2023 13:06:59 +0000 (15:06 +0200)]
timer: fix NR_CPUS=1 build with gcc13
Gcc13 apparently infers from "if ( old_cpu < new_cpu )" that "new_cpu"
is >= 1, and then (on x86) complains about "per_cpu(timers, new_cpu)"
exceeding __per_cpu_offset[]'s bounds (being an array of 1 in such a
configuration). Make the code conditional upon there being at least 2
CPUs configured (otherwise there simply is nothing to migrate [to]).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@cloud.com>
Michal Orzel [Tue, 12 Sep 2023 10:53:41 +0000 (12:53 +0200)]
xen/arm: Skip Xen specific nodes/properties from hwdom /chosen node
Skip the following Xen specific host device tree nodes/properties
from being included into hardware domain /chosen node:
- xen,static-heap: this property informs Xen about memory regions
reserved exclusively as static heap,
- xen,domain-shared-memory-v1: node with this compatible informs Xen
about static shared memory region for a domain. Xen exposes a different
node (under /reserved-memory with compatible "xen,shared-memory-v1") to
let domain know about the shared region,
- xen,evtchn-v1: node with this compatible informs Xen about static
event channel configuration for a domain. Xen does not expose
information about static event channels to domUs and dom0 case was
overlooked (by default nodes from host dt are copied to dom0 fdt unless
explicitly marked to be skipped), since the author's idea was not to
expose it (refer docs/misc/arm/device-tree/booting.txt, "Static Event
Channel"). Even if we wanted to expose the static event channel
information, the current node is in the wrong format (i.e. contains
phandle to domU node not visible by dom0). Lastly, this feature is
marked as tech-preview and there is no Linux dt binding in place.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Release-acked-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Julien Grall <jgrall@amazon.com>
Implement atomic.h for PPC, based off of the original Xen 3.2
implementation. This implementation depends on some functions that are
not yet defined (notably __cmpxchg), so it can't yet be used.
Signed-off-by: Shawn Anastasio <sanastasio@raptorengineering.com> Acked-by: Jan Beulich <jbeulich@suse.com>
x86/efi: address violations of MISRA C:2012 Rule 7.2
The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".
Addi the 'U' suffix to integers literals with unsigned type.
x86/mcheck: address violations of MISRA C:2012 Rule 7.2
The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".
Add the 'U' suffix to integers literals with unsigned type.
For the sake of uniformity, the following change is made:
- add the 'U' suffix to all first macro's arguments
xen/lib: address violations of MISRA C:2012 Rule 7.2
The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".
Add the 'U' suffix to integers literals with unsigned type.
For the sake of uniformity, the following change is made:
- add the 'U' suffix to switch cases in 'cpuid.c'
Use pdev->sbdf instead of the PCI_SBDF macro in calls to pci_* functions
where appropriate. Move NULL check earlier.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Convert pci_find_*cap* functions and call sites to pci_sbdf_t, and remove some
now unused local variables. Also change to more appropriate types on lines that
are already being modified as a result of the pci_sbdf_t conversion.
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
x86/hvm: address violations of MISRA C:2012 Rule 7.3
The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".
Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.
xen/ioreq: address violations of MISRA C:2012 Rule 7.3
The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".
Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.
Michal Orzel [Thu, 24 Aug 2023 09:06:40 +0000 (11:06 +0200)]
xen/arm: Handle empty grant table region in find_unallocated_memory()
When creating dom0 with grant table support disabled in Xen and no IOMMU,
the following assert is triggered (debug build):
"Assertion 's <= e' failed at common/rangeset.c:189"
This is because find_unallocated_memory() (used to find memory holes
for extended regions) calls rangeset_remove_range() for an empty
grant table region. Fix it by checking if the size of region is not 0.
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
x86/viridian: address violations of MISRA C:2012 Rule 7.2
The xen sources contains violations of MISRA C:2012 Rule 7.2 whose
headline states:
"A 'u' or 'U' suffix shall be applied to all integer constants
that are represented in an unsigned type".
Add the 'U' suffix to integers literals with unsigned type and also to other
literals used in the same contexts or near violations, when their positive
nature is immediately clear. The latter changes are done for the sake of
uniformity.
x86/viridian: address violations of MISRA C:2012 Rule 7.3
The xen sources contain violations of MISRA C:2012 Rule 7.3 whose headline
states:
"The lowercase character 'l' shall not be used in a literal suffix".
Use the "L" suffix instead of the "l" suffix, to avoid potential ambiguity.
If the "u" suffix is used near "L", use the "U" suffix instead, for consistency.
Andrew Cooper [Thu, 31 May 2018 15:16:37 +0000 (16:16 +0100)]
x86: Fix calculation of %dr6/dr7 reserved bits
RTM debugging and BusLock Detect have both introduced conditional behaviour
into the %dr6/7 calculations which Xen's existing logic doesn't account for.
Introduce the CPUID bit for BusLock Detect, so we can get the %dr6 behaviour
correct from the outset.
Implement x86_adj_dr{6,7}_rsvd() fully, and use them in place of the plain
bitmasks.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jinoh Kang <jinoh.kang.kr@gmail.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 29 Aug 2023 11:01:49 +0000 (12:01 +0100)]
x86: Introduce new debug.c for debug register infrastructure
Broken out of the subsequent patch for clarity.
Add stub x86_adj_dr{6,7}_rsvd() functions which will be extended in the
following patch to fix bugs, and adjust debugreg.h to compile with a more
minimal set of includes.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Tue, 29 Aug 2023 10:16:11 +0000 (11:16 +0100)]
x86: Reject bad %dr6/%dr7 values when loading guest state
Right now, bad PV state is silently dropped and zeroed, while bad HVM state is
passed directly to hardware and can trigger VMEntry/VMRUN failures. e.g.
Furthermore, prior to c/s 30f43f4aa81e ("x86: Reorganise and rename debug
register fields in struct vcpu") in Xen 4.11 where v->arch.dr6 was reduced in
width, the toolstack can cause a host crash by loading a bad %dr6 value on
VT-x hardware.
Reject any %dr6/7 values with upper bits set. For PV guests, also audit
%dr0..3 using the same logic as in set_debugreg() so they aren't silently
zeroed later in the function. Leave a comment behind explaing how %dr4/5
handling changed, and why they're ignored now.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 11 Sep 2023 15:30:34 +0000 (17:30 +0200)]
include: make domain_page.h's stubs properly use type-unsafe MFN <-> virt helpers
The first of the commits referenced below didn't go far enough, and the
2nd of them, while trying to close (some of) the gap, wrongly kept using
the potentially type-safe variant. This is getting in the way of new
ports preferably not having any type-unsafe private code (and in
particular not having a need for any overrides in newly introduced
files).
Fixes: 41c48004d1d8 ("xen/mm: Use __virt_to_mfn in map_domain_page instead of virt_to_mfn") Fixes: f46b6197344f ("xen: Convert page_to_mfn and mfn_to_page to use typesafe MFN") Reported-by: Shawn Anastasio <sanastasio@raptorengineering.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
coverage: simplify the logic of choosing the number of gcov counters depending on the gcc version
The current structure of choosing the correct file based on the
compiler version makes us make 33 line files just to define a
constant. The changes after gcc 4.7 are minimal, just the number of
counters.
Fold the changes in gcc_4_9.c, gcc_5.c and gcc_7.c into gcc_4_7.c to
remove a lot of the boilerplate and keep the logic of choosing the
GCOV_COUNTER in gcc_4_7.c.
Signed-off-by: Javi Merino <javi.merino@cloud.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Javi Merino <javi.merino@cloud.com> Acked-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
xen: move arm/include/asm/vm_event.h to asm-generic
asm/vm_event.h is common for ARM and RISC-V so it will be moved to
asm-generic dir.
Original asm/vm_event.h from ARM was updated:
* use SPDX-License-Identifier.
* update comment messages of stubs.
* update #ifdef
* instead of "include <public/domctl.h>" -> "public/vm_event.h"
As vm_event.h was moved to asm-generic then it is needed to create
Makefile in arm/include/asm/ and add generated-y += vm_event.h to
it.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Jan Beulich [Mon, 11 Sep 2023 07:34:12 +0000 (09:34 +0200)]
MAINTAINERS: generalize vm-event/monitor entry
Replace Arm- and x86-specific lines with wildcard ones, thus covering
all architectures. Uniformly permit an extra sub-directory level to be
used, as is already the case for xen/include/.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Henry Wang [Mon, 28 Aug 2023 01:32:16 +0000 (09:32 +0800)]
xen/arm64: Fold setup_fixmap() to create_page_tables()
The original assembly setup_fixmap() is actually doing two seperate
tasks, one is enabling the early UART when earlyprintk on, and the
other is to set up the fixmap (even when earlyprintk is off).
Per discussion in [1], since commit 9d267c049d92 ("xen/arm64: Rework the memory layout"), there is no
chance that the fixmap and the mapping of early UART will clash with
the 1:1 mapping. Therefore the mapping of both the fixmap and the
early UART can be moved to the end of create_pagetables().
Wei Chen [Mon, 28 Aug 2023 01:32:15 +0000 (09:32 +0800)]
xen/arm: Move MMU related definitions from config.h to mmu/layout.h
Xen defines some global configuration macros for Arm in config.h.
However there are some address layout related definitions that are
defined for MMU systems only, and these definitions could not be
used by MPU systems. Adding ifdefs to differentiate the MPU from MMU
layout will result in a messy and hard-to-read/maintain code.
So move all memory layout definitions to a new file, i.e. mmu/layout.h
to avoid spreading "#ifdef" everywhere.
Signed-off-by: Wei Chen <wei.chen@arm.com> Signed-off-by: Penny Zheng <penny.zheng@arm.com> Signed-off-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Mon, 28 Aug 2023 01:32:14 +0000 (09:32 +0800)]
xen/arm64: Split and move MMU-specific head.S to mmu/head.S
The MMU specific code in head.S will not be used on MPU systems.
Instead of introducing more #ifdefs which will bring complexity
to the code, move MMU related code to mmu/head.S and keep common
code in head.S. Two notes while moving:
- As "fail" in original head.S is very simple and this name is too
easy to be conflicted, duplicate it in mmu/head.S instead of
exporting it.
- Use ENTRY() for enable_secondary_cpu_mm, enable_boot_cpu_mm and
setup_fixmap as they will be used externally.
Also move the assembly macros shared by head.S and mmu/head.S to
macros.h.
Note that, only the first 4KB of Xen image will be mapped as
identity (PA == VA). At the moment, Xen guarantees this by having
everything that needs to be used in the identity mapping in
.text.header section of head.S, and the size will be checked by
_idmap_start and _idmap_end at link time if this fits in 4KB.
Since we are introducing a new head.S in this patch, although
we can add .text.header to the new file to guarantee all identity
map code still in the first 4KB. However, the order of these two
files on this 4KB depends on the build toolchains. Hence, introduce
a new section named .text.idmap in the region between _idmap_start
and _idmap_end. And in Xen linker script, we force the .text.idmap
contents to linked after .text.header. This will ensure code of
head.S always be at the top of Xen binary.
Signed-off-by: Henry Wang <Henry.Wang@arm.com> Signed-off-by: Wei Chen <wei.chen@arm.com> Reviewed-by: Julien Grall <jgrall@amazon.com>
Henry Wang [Mon, 28 Aug 2023 01:32:13 +0000 (09:32 +0800)]
xen/arm: Introduce CONFIG_MMU Kconfig option
There are two types of memory system architectures available for
Arm-based systems, namely the Virtual Memory System Architecture (VMSA)
and the Protected Memory System Architecture (PMSA). According to
ARM DDI 0487G.a, A VMSA provides a Memory Management Unit (MMU) that
controls address translation, access permissions, and memory attribute
determination and checking, for memory accesses made by the PE. And
refer to ARM DDI 0600A.c, the PMSA supports a unified memory protection
scheme where an Memory Protection Unit (MPU) manages instruction and
data access. Currently, Xen only supports VMSA.
Introduce a Kconfig option CONFIG_MMU, which is currently default
set to y and unselectable because currently only VMSA is supported.
CONFIG_MMU will be used in follow-up patches.
Suggested-by: Julien Grall <jgrall@amazon.com> Signed-off-by: Henry Wang <Henry.Wang@arm.com> Acked-by: Julien Grall <jgrall@amazon.com>
At the moment, on MMU system, enable_mmu() will return to an
address in the 1:1 mapping, then each path is responsible to
switch to virtual runtime mapping. Then remove_identity_mapping()
is called on the boot CPU to remove all 1:1 mapping.
Since remove_identity_mapping() is not necessary on Non-MMU system,
and we also avoid creating empty function for Non-MMU system, trying
to keep only one codeflow in arm64/head.S, we move path switch and
remove_identity_mapping() in enable_mmu() on MMU system.
As the remove_identity_mapping should only be called for the boot
CPU only, so we introduce enable_boot_cpu_mm() for boot CPU and
enable_secondary_cpu_mm() for secondary CPUs in this patch.
xen/arm: ioreq: add header for 'handle_ioserv' and 'try_fwd_ioserv'
The functions referenced by this patch should have had a compatible
declaration visible prior to their definition. This is achieved by
including the arch-specific header in 'xen/arch/arm/ioreq.c'
Fixes: cb9953d2f2bc ("arm/ioreq: Introduce arch specific bits for IOREQ/DM features") Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Since QEMU's PowerNV support has matured to the point where it is
now suitable for development, drop support for booting on the
paravirtualized pseries machine type and its associated interfaces.
Support for booting on pseries was broken by 74b725a64d80 ('xen/ppc:
Implement initial Radix MMU support'), and since there is little
practical value in continuing to support pseries as a target, just drop
support for it entirely.
automation: Switch ppc64le tests to PowerNV machine type
Run ppc64le tests with the PowerNV machine type (bare metal) instead of
the paravirtualized pseries machine. This requires a more modern version
of QEMU than is present in debian bullseye's repository, so update the
dockerfile to build QEMU from source.
Support for booting on pseries was broken by 74b725a64d80 ('xen/ppc:
Implement initial Radix MMU support') which resulted in CI failures. In
preparation for removing pseries support entirely, switch the CI
infrastructure to the PowerNV machine type.
xen: apply deviation for Rule 8.4 (asm-only definitions)
As stated in 'docs/misra/rules.rst' the functions that are used only by
asm modules do not need to conform to MISRA C:2012 Rule 8.4.
The deviations are carried out with a SAF comment.
Jan Beulich [Thu, 7 Sep 2023 07:22:40 +0000 (09:22 +0200)]
Arm: constrain {,u}int64_aligned_t in public header
For using a GNU extension, it may not be exposed in general, just like
is done on x86 (except that here we need to also work around not all of
the tool stack actually defining __XEN_TOOLS__). External consumers (not
using gcc or a compatible compiler) need to make this type available up
front (just like we expect {,u}int<N>_t to be supplied) - unlike on x86
the type is actually needed outside of tools-only interfaces, because
guest handle definitions use it.
While there also add underscores around "aligned".
Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Michal Orzel [Wed, 6 Sep 2023 10:30:14 +0000 (12:30 +0200)]
xen/arm: Fix printk specifiers and arguments in iomem_remove_cb()
When building Xen for arm32 with CONFIG_DTB_OVERLAY, the following
error is printed:
common/dt-overlay.c: In function ‘iomem_remove_cb’:
././include/xen/config.h:55:24: error: format ‘%llx’ expects argument of type ‘long long unsigned int’, but argument 3 has type ‘long unsigned int’ [-Werror=format=]
Function parameters s and e (denoting MMIO region) are of type unsigned
long and indicate frame numbers and not addresses. This also means that
the arguments passed to printk() are incorrect (using PAGE_ALIGN() or
PAGE_MASK ANDed with a frame number results in unwanted output). Fix it.
Take the opportunity to switch to %pd specifier to print domain id in
a consolidated way.
FFA_RXTX_MAP is currently limited to mapping only one 4k page for each
RX and TX buffer. If a guest tries to map more than one page, an error
is returned. Until this patch, we have been using FFA_RET_NOT_SUPPORTED.
However, that error code is reserved in the FF-A specification to report
that the function is not implemented. Of all the other defined error
codes, the least bad is FFA_RET_INVALID_PARAMETERS, so use that instead.
Michal Orzel [Wed, 6 Sep 2023 12:56:09 +0000 (14:56 +0200)]
tools/xl: Guard main_dt_overlay() with LIBXL_HAVE_DT_OVERLAY
main_dt_overlay() makes a call to libxl_dt_overlay() which is for now
only compiled for Arm. This causes the build failure as reported by
gitlab CI and OSSTEST. Fix it by guarding the function, prototype and
entry in cmd_table[] using LIBXL_HAVE_DT_OVERLAY. This has an advantage
over regular Arm guard so that the code will not need to be modified again
if other architecture gain support for this feature.
Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support") Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
Michal Orzel [Wed, 6 Sep 2023 12:54:48 +0000 (14:54 +0200)]
xen: Change parameter of generic_{fls,ffs}() to unsigned int
When running with SMMUv3 and UBSAN enabled on arm64, there are a lot of
warnings printed related to shifting into sign bit in generic_fls()
as it takes parameter of type int.
Example:
(XEN) UBSAN: Undefined behaviour in ./include/xen/bitops.h:69:11
(XEN) left shift of 134217728 by 4 places cannot be represented in type 'int'
It does not make a lot of sense to ask for the last set bit of a negative
value. We don't have a direct user of this helper and all the wrappers
pass value of type unsigned {int,long}.
Linux did the same as part of commit: 3fc2579e6f16 ("fls: change parameter to unsigned int")
To keep consistency between the helpers, take the opportunity to:
- replace __inline__ with inline,
- modify generic_ffs() to take parameter of type unsigned int as well
(currently no user and the only wrapper generic_ffsl() passes unsigned
long).
Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com>
xen/arm: Implement device tree node addition functionalities
Update sysctl XEN_SYSCTL_dt_overlay to enable support for dtbo nodes addition
using device tree overlay.
xl dt-overlay add file.dtbo:
Each time overlay nodes are added using .dtbo, a new fdt(memcpy of
device_tree_flattened) is created and updated with overlay nodes. This
updated fdt is further unflattened to a dt_host_new. Next, it checks if any
of the overlay nodes already exists in the dt_host. If overlay nodes doesn't
exist then find the overlay nodes in dt_host_new, find the overlay node's
parent in dt_host and add the nodes as child under their parent in the
dt_host. The node is attached as the last node under target parent.
Finally, add IRQs, add device to IOMMUs, set permissions and map MMIO for the
overlay node.
When a node is added using overlay, a new entry is allocated in the
overlay_track to keep the track of memory allocation due to addition of overlay
node. This is helpful for freeing the memory allocated when a device tree node
is removed.
The main purpose of this to address first part of dynamic programming i.e.
making xen aware of new device tree node which means updating the dt_host with
overlay node information. Here we are adding/removing node from dt_host, and
checking/setting IOMMU and IRQ permission but never mapping them to any domain.
Right now, mapping/Un-mapping will happen only when a new domU is
created/destroyed using "xl create".
xen/arm: Implement device tree node removal functionalities
Introduce sysctl XEN_SYSCTL_dt_overlay to remove device-tree nodes added using
device tree overlay.
xl dt-overlay remove file.dtbo:
Removes all the nodes in a given dtbo.
First, removes IRQ permissions and MMIO accesses. Next, it finds the nodes
in dt_host and delete the device node entries from dt_host.
The nodes get removed only if it is not used by any of dom0 or domio.
Also, added overlay_track struct to keep the track of added node through device
tree overlay. overlay_track has dt_host_new which is unflattened form of updated
fdt and name of overlay nodes. When a node is removed, we also free the memory
used by overlay_track for the particular overlay node.
Nested overlay removal is supported in sequential manner only i.e. if
overlay_child nests under overlay_parent, it is assumed that user first removes
overlay_child and then removes overlay_parent.
Also, this is an experimental feature so it is expected from user to make sure
correct device tree overlays are used when adding nodes and making sure devices
are not being used by other domain before removing them from Xen tree.
Partially added/removed i.e. failures while removing the overlay may cause other
failures and might need a system reboot.
arm/asm/setup.h: Update struct map_range_data to add rangeset.
Add rangesets for IRQs and IOMEMs. This was done to accommodate dynamic overlay
node addition/removal operations. With overlay operations, new IRQs and IOMEMs
are added in dt_host and routed. While removing overlay nodes, nodes are removed
from dt_host and their IRQs and IOMEMs routing is also removed. Storing IRQs and
IOMEMs in the rangeset will avoid re-parsing the device tree nodes to get the
IOMEM and IRQ ranges for overlay remove ops.
Dynamic overlay node add/remove will be introduced in follow-up patches.
Dynamic programming ops will modify the dt_host and there might be other
functions which are browsing the dt_host at the same time. To avoid the race
conditions, adding rwlock for browsing the dt_host during runtime. dt_host
writer will be added in the follow-up patch for device tree overlay
functionalities.
Reason behind adding rwlock instead of spinlock:
For now, dynamic programming is the sole modifier of dt_host in Xen during
run time. All other access functions like iommu_release_dt_device() are
just reading the dt_host during run-time. So, there is a need to protect
others from browsing the dt_host while dynamic programming is modifying
it. rwlock is better suitable for this task as spinlock won't be able to
differentiate between read and write access.
asm/smp.h: Fix circular dependency for device_tree.h and rwlock.h
Dynamic programming ops will modify the dt_host and there might be other
function which are browsing the dt_host at the same time. To avoid the race
conditions, we will need to add a rwlock to protect access to the dt_host.
However, adding rwlock in device_tree.h causes following circular dependency:
device_tree.h->rwlock.h->smp.h->asm/smp.h->device_tree.h
To fix this, removed the "#include <xen/device_tree.h> and forward declared
"struct dt_device_node".
Signed-off-by: Vikram Garhwal <vikram.garhwal@amd.com> Reviewed-by: Henry Wang <Henry.Wang@arm.com> Reviewed-by: Michal Orzel <michal.orzel@amd.com> Acked-by: Julien Grall <jgrall@amazon.com>
xen/smmu: Add remove_device callback for smmu_iommu ops
Add remove_device callback for removing the device entry from smmu-master using
following steps:
1. Find if SMMU master exists for the device node.
2. Check if device is currently in use.
3. Remove the SMMU master.
xen/iommu: protect iommu_add_dt_device() with dtdevs_lock
Protect iommu_add_dt_device() with dtdevs_lock to prevent concurrent access
to add/remove/assign/deassign.
With addition of dynamic programming feature(follow-up patches in this series),
this function can be concurrently accessed by dynamic node add/remove using
device tree overlays.