]> xenbits.xensource.com Git - xen.git/log
xen.git
11 months agox86/intel: move vmce_has_lmce() routine to header
Sergiy Kibrik [Wed, 29 May 2024 07:54:22 +0000 (09:54 +0200)]
x86/intel: move vmce_has_lmce() routine to header

Moving this function out of mce_intel.c will make it possible to disable
build of Intel MCE code later on, because the function gets called from
common x86 code.

Also replace boilerplate code that checks for MCG_LMCE_P flag with
vmce_has_lmce(), which might contribute to readability a bit.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/svm: Rework VMCB_ACCESSORS() to use a plain type name
Andrew Cooper [Tue, 28 May 2024 15:29:11 +0000 (16:29 +0100)]
x86/svm: Rework VMCB_ACCESSORS() to use a plain type name

This avoids having a function call in a typeof() expression.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Release-Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/x86: Address two misc MISRA 17.7 violations
Andrew Cooper [Tue, 21 May 2024 15:22:08 +0000 (16:22 +0100)]
xen/x86: Address two misc MISRA 17.7 violations

Neither text_poke() nor watchdog_setup() have their return value consulted.
Switch them to being void.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/x86: Drop useless non-Kconfig CONFIG_* variables
Andrew Cooper [Tue, 21 May 2024 17:07:09 +0000 (18:07 +0100)]
xen/x86: Drop useless non-Kconfig CONFIG_* variables

These are all either completely unused, or do nothing useful.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/lzo: Implement COPY{4,8} using memcpy()
Andrew Cooper [Tue, 21 May 2024 16:08:32 +0000 (17:08 +0100)]
xen/lzo: Implement COPY{4,8} using memcpy()

This is simpler and easier for both humans and compilers to read.

It also addresses 6 instances of MISRA R5.3 violation (shadowing of the ptr_
local variable inside both {put,get}_unaligned()).

No change, not even in the compiled binary.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agox86/traps: address violation of MISRA C Rule 8.4
Nicola Vetrini [Tue, 28 May 2024 06:52:27 +0000 (08:52 +0200)]
x86/traps: address violation of MISRA C Rule 8.4

Rule 8.4 states: "A compatible declaration shall be visible when
an object or function with external linkage is defined".

The function do_general_protection is either used is asm code
or only within this unit, so there is no risk of this getting
out of sync with its definition, but the function must remain
extern.

Therefore, this function is deviated using a comment-based deviation.
No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoCHANGELOG: Mention libxl blktap/tapback support
Jason Andryuk [Tue, 28 May 2024 06:52:15 +0000 (08:52 +0200)]
CHANGELOG: Mention libxl blktap/tapback support

Add entry for backendtype=tap support in libxl.  blktap needs some
changes to work with libxl, which haven't been merged.  They are
available from this PR: https://github.com/xapi-project/blktap/pull/394

Signed-off-by: Jason Andryuk <jason.andryuk@amd.com>
Acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoautomation/eclair_analysis: avoid an ECLAIR warning about escaping
Nicola Vetrini [Mon, 27 May 2024 14:53:17 +0000 (16:53 +0200)]
automation/eclair_analysis: avoid an ECLAIR warning about escaping

The parentheses in this regular expression should be doubly
escaped because they undergo expansion twice.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
[stefano: fix commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agodocs/misra: exclude gdbsx from MISRA compliance
Nicola Vetrini [Mon, 27 May 2024 14:53:16 +0000 (16:53 +0200)]
docs/misra: exclude gdbsx from MISRA compliance

These files are used when debugging Xen, and are not meant to comply
with MISRA rules at the moment.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoautomation/eclair_analysis: add already clean rules to the analysis
Nicola Vetrini [Tue, 21 May 2024 19:34:21 +0000 (21:34 +0200)]
automation/eclair_analysis: add already clean rules to the analysis

Some MISRA C rules already have no violations in Xen, so they can be
set as clean.

Reorder the rules in tagging.ecl according to version ordering
(i.e. sort -V) and split the configuration on multiple lines for
readability.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoautomation/eclair_analysis: set MISRA C Rule 10.2 as clean
Nicola Vetrini [Fri, 17 May 2024 10:27:10 +0000 (12:27 +0200)]
automation/eclair_analysis: set MISRA C Rule 10.2 as clean

This rule has no more violations in the codebase, so it can be
set as clean.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agodocs: Add device tree overlay documentation
Vikram Garhwal [Thu, 23 May 2024 07:40:40 +0000 (15:40 +0800)]
docs: Add device tree overlay documentation

Signed-off-by: Vikram Garhwal <fnu.vikram@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
11 months agotools: Introduce the "xl dt-overlay attach" command
Henry Wang [Thu, 23 May 2024 07:40:39 +0000 (15:40 +0800)]
tools: Introduce the "xl dt-overlay attach" command

With the XEN_DOMCTL_dt_overlay DOMCTL added, users should be able to
attach (in the future also detach) devices from the provided DT overlay
to domains. Support this by introducing a new "xl dt-overlay" command
and related documentation, i.e. "xl dt-overlay attach. Slightly rework
the command option parsing logic.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains
Henry Wang [Thu, 23 May 2024 07:40:36 +0000 (15:40 +0800)]
xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains

In order to support the dynamic dtbo device assignment to a running
VM, the add/remove of the DT overlay and the attach/detach of the
device from the DT overlay should happen separately. Therefore,
repurpose the existing XEN_SYSCTL_dt_overlay to only add the DT
overlay to Xen device tree, instead of assigning the device to the
hardware domain at the same time. It is OK to change the sysctl behavior
as this feature is experimental so changing sysctl behavior and breaking
compatibility is OK.

Add the XEN_DOMCTL_dt_overlay with operations
XEN_DOMCTL_DT_OVERLAY_ATTACH to do the device assignment to the domain.

The hypervisor firstly checks the DT overlay passed from the toolstack
is valid. Then the device nodes are retrieved from the overlay tracker
based on the DT overlay. The attach of the device is implemented by
mapping the IRQ and IOMMU resources. All devices in the overlay are
assigned to a single domain.

Also take the opportunity to make one coding style fix in sysctl.h.
Introduce DT_OVERLAY_MAX_SIZE and use it to avoid repetitions of
KB(500).

xen,reg is to be used to handle non-1:1 mappings but it is currently
unsupported. For now return errors for not-1:1 mapped domains.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Vikram Garhwal <fnu.vikram@xilinx.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
11 months agoxen/arm/gic: Allow adding interrupt to running VMs
Henry Wang [Thu, 23 May 2024 07:40:35 +0000 (15:40 +0800)]
xen/arm/gic: Allow adding interrupt to running VMs

Currently, adding physical interrupts are only allowed at
the domain creation time. For use cases such as dynamic device
tree overlay addition, the adding of physical IRQ to
running domains should be allowed.

Drop the above-mentioned domain creation check. Since this
will introduce interrupt state unsync issues for cases when the
interrupt is active or pending in the guest, therefore for these
cases we simply reject the operation. Do it for both new and old
vGIC implementations.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
11 months agotools/arm: Introduce the "nr_spis" xl config entry
Henry Wang [Thu, 23 May 2024 07:40:34 +0000 (15:40 +0800)]
tools/arm: Introduce the "nr_spis" xl config entry

Currently, the number of SPIs allocated to the domain is only
configurable for Dom0less DomUs. Xen domains are supposed to be
platform agnostics and therefore the numbers of SPIs for libxl
guests should not be based on the hardware.

Introduce a new xl config entry for Arm to provide a method for
user to decide the number of SPIs. This would help to avoid
bumping the `config->arch.nr_spis` in libxl everytime there is a
new platform with increased SPI numbers.

Update the doc and the golang bindings accordingly.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
11 months agoxen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs
Henry Wang [Thu, 23 May 2024 07:40:33 +0000 (15:40 +0800)]
xen/arm, doc: Add a DT property to specify IOMMU for Dom0less domUs

There are some use cases in which the dom0less domUs need to have
the XEN_DOMCTL_CDF_iommu set at the domain construction time. For
example, the dynamic dtbo feature allows the domain to be assigned
a device that is behind the IOMMU at runtime. For these use cases,
we need to have a way to specify the domain will need the IOMMU
mapping at domain construction time.

Introduce a "passthrough" DT property for Dom0less DomUs following
the same entry as the xl.cfg. Currently only provide two options,
i.e. "enable" and "disable". Set the XEN_DOMCTL_CDF_iommu at domain
construction time based on the property.

Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
11 months agotools/xl: Correct the help information and exit code of the dt-overlay command
Henry Wang [Thu, 23 May 2024 07:40:32 +0000 (15:40 +0800)]
tools/xl: Correct the help information and exit code of the dt-overlay command

Fix the name mismatch in the xl dt-overlay command, the
command name should be "dt-overlay" instead of "dt_overlay".
Add the missing "," in the cmdtable.

Fix the exit code of the dt-overlay command, use EXIT_FAILURE
instead of ERROR_FAIL.

Fixes: 61765a07e3d8 ("tools/xl: Add new xl command overlay for device tree overlay support")
Suggested-by: Anthony PERARD <anthony@xenproject.org>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agotools/xenalyze: Ignore HVM_EMUL events harder
George Dunlap [Fri, 26 Apr 2024 13:17:33 +0000 (14:17 +0100)]
tools/xenalyze: Ignore HVM_EMUL events harder

To unify certain common sanity checks, checks are done very early in
processing based only on the top-level type.

Unfortunately, when TRC_HVM_EMUL was introduced, it broke some of the
assumptions about how the top-level types worked.  Namely, traces of
this type will show up outside of HVM contexts: in idle domains and in
PV domains.

Make an explicit exception for TRC_HVM_EMUL types in a number of places:

 - Pass the record info pointer to toplevel_assert_check, so that it
   can exclude TRC_HVM_EMUL records from idle and vcpu data_mode
   checks

 - Don't attempt to set the vcpu data_type in hvm_process for
   TRC_HVM_EMUL records.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agox86/hvm/trace: Use a different trace type for AMD processors
George Dunlap [Thu, 25 Apr 2024 12:03:58 +0000 (13:03 +0100)]
x86/hvm/trace: Use a different trace type for AMD processors

A long-standing usability sub-optimality with xenalyze is the
necessity to specify `--svm-mode` when analyzing AMD processors.  This
fundamentally comes about because the same trace event ID is used for
both VMX and SVM, but the contents of the trace must be interpreted
differently.

Instead, allocate separate trace events for VMX and SVM vmexits in
Xen; this will allow all readers to properly interpret the meaning of
the vmexit reason.

In xenalyze, first remove the redundant call to init_hvm_data();
there's no way to get to hvm_vmexit_process() without it being already
initialized by the set_vcpu_type call in hvm_process().

Replace this with set_hvm_exit_reson_data(), and move setting of
hvm->exit_reason_* into that function.

Modify hvm_process and hvm_vmexit_process to handle all four potential
values appropriately.

If SVM entries are encountered, set opt.svm_mode so that other
SVM-specific functionality is triggered.

Remove the `--svm-mode` command-line option, since it's now redundant.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agoxen/arm: Set correct per-cpu cpu_core_mask
Henry Wang [Thu, 21 Mar 2024 03:57:06 +0000 (11:57 +0800)]
xen/arm: Set correct per-cpu cpu_core_mask

In the common sysctl command XEN_SYSCTL_physinfo, the value of
cores_per_socket is calculated based on the cpu_core_mask of CPU0.
Currently on Arm this is a fixed value 1 (can be checked via xl info),
which is not correct. This is because during the Arm CPU online
process at boot time, setup_cpu_sibling_map() only sets the per-cpu
cpu_core_mask for itself.

cores_per_socket refers to the number of cores that belong to the same
socket (NUMA node). Currently Xen on Arm does not support physical
CPU hotplug and NUMA, also we assume there is no multithread. Therefore
cores_per_socket means all possible CPUs detected from the device
tree. Setting the per-cpu cpu_core_mask in setup_cpu_sibling_map()
accordingly. Modify the in-code comment which seems to be outdated. Add
a warning to users if Xen is running on processors with multithread
support.

Signed-off-by: Henry Wang <Henry.Wang@arm.com>
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Michal Orzel <michal.orzel@amd.com>
11 months agotools/xentrace: Remove xentrace_format
George Dunlap [Fri, 26 Apr 2024 14:18:25 +0000 (15:18 +0100)]
tools/xentrace: Remove xentrace_format

xentrace_format was always of limited utility, since trace records
across pcpus were processed out of order; it was superseded by xenalyze
over a decade ago.

But for several releases, the `formats` file it has depended on for
proper operation has not even been included in `make install` (which
generally means it doesn't get picked up by distros either); yet
nobody has seemed to complain.

Simple remove xentrace_format, and point people to xenalyze instead.

NB that there is no man page for xenalyze, so the "see also" on the
xentrace man page is simply removed for now.

Signed-off-by: George Dunlap <george.dunlap@cloud.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Olaf Hering <olaf@aepfle.de>
11 months agotools: Drop libsystemd as a dependency
Andrew Cooper [Thu, 25 Apr 2024 09:46:40 +0000 (10:46 +0100)]
tools: Drop libsystemd as a dependency

There are no more users, and we want to disuade people from introducing new
users just for sd_notify() and friends.  Drop the dependency.

We still want the overall --with{,out}-systemd to gate the generation of the
service/unit/mount/etc files.

Rerun autogen.sh, and mark the dependency as removed in the build containers.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
11 months agotools/{c,o}xenstored: Don't link against libsystemd
Andrew Cooper [Thu, 25 Apr 2024 09:26:58 +0000 (10:26 +0100)]
tools/{c,o}xenstored: Don't link against libsystemd

Use the local freestanding wrapper instead.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
11 months agotools: Import stand-alone sd_notify() implementation from systemd
Andrew Cooper [Thu, 16 May 2024 17:59:00 +0000 (18:59 +0100)]
tools: Import stand-alone sd_notify() implementation from systemd

... in order to avoid linking against the whole of libsystemd.

Only minimal changes to the upstream copy, to function as a drop-in
replacement for sd_notify() and as a header-only library.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
11 months agoLICENSES: Add MIT-0 (MIT No Attribution)
Andrew Cooper [Thu, 16 May 2024 17:50:26 +0000 (18:50 +0100)]
LICENSES: Add MIT-0 (MIT No Attribution)

We are about to import code licensed under MIT-0.  It's compatible for us to
use, so identify it as a permitted license.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Christian Lindig <christian.lindig@cloud.com>
11 months agoxen/arm: mem_access: Conditionally compile mem_access.c
Alessandro Zucchelli [Fri, 10 May 2024 12:32:11 +0000 (14:32 +0200)]
xen/arm: mem_access: Conditionally compile mem_access.c

Commit 634cfc8beb ("Make MEM_ACCESS configurable") intended to make
MEM_ACCESS configurable on Arm to reduce the code size when the user
doesn't need it.

However, this didn't cover the arch specific code. None of the code
in arm/mem_access.c is necessary when MEM_ACCESS=n, so it can be
compiled out. This will require to provide some stub for functions
called by the common code.

Signed-off-by: Alessandro Zucchelli <alessandro.zucchelli@bugseng.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agovpci: add initial support for virtual PCI bus topology
Oleksandr Andrushchenko [Thu, 23 May 2024 08:18:47 +0000 (10:18 +0200)]
vpci: add initial support for virtual PCI bus topology

Assign SBDF to the PCI devices being passed through with bus 0.
The resulting topology is where PCIe devices reside on the bus 0 of the
root complex itself (embedded endpoints).
This implementation is limited to 32 devices which are allowed on
a single PCI bus.

Please note, that at the moment only function 0 of a multifunction
device can be passed through.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agovpci/header: emulate PCI_COMMAND register for guests
Oleksandr Andrushchenko [Thu, 23 May 2024 08:18:04 +0000 (10:18 +0200)]
vpci/header: emulate PCI_COMMAND register for guests

Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
guest's (domU) view of this will want to be zero (for now), the host
having set it to 1 should be preserved, or else we'd effectively be
giving the domU control of the bit. Thus, PCI_COMMAND register needs
proper emulation in order to honor host's settings.

According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
Device Control" the reset state of the command register is typically 0,
so when assigning a PCI device use 0 as the initial state for the
guest's (domU) view of the command register.

Here is the full list of command register bits with notes about
PCI/PCIe specification, and how Xen handles the bit. QEMU's behavior is
also documented here since that is our current reference implementation
for PCI passthrough.

PCI_COMMAND_IO (bit 0)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if an I/O BAR is exposed to the guest.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP for now since we
    don't yet support I/O BARs for domUs.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_MEMORY (bit 1)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU sets this bit to 1 in
    hardware if a Memory BAR is exposed to the guest.
  Xen domU/dom0: We handle writes to this bit by mapping/unmapping BAR
    regions.
  Xen domU: For devices assigned to DomUs, memory decoding will be
    disabled at the time of initialization.

PCI_COMMAND_MASTER (bit 2)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_SPECIAL (bit 3)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_INVALIDATE (bit 4)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_VGA_PALETTE (bit 5)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: Pass through writes to hardware.
  Xen domU/dom0: Pass through writes to hardware.

PCI_COMMAND_PARITY (bit 6)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_WAIT (bit 7)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: hardwire to 0
  QEMU: res_mask
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_SERR (bit 8)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_FAST_BACK (bit 9)
  PCIe 6.1: RO, hardwire to 0
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware.
  Xen domU: (rsvdp_mask) We treat this bit as RsvdP.
  Xen dom0: We allow dom0 to control this bit freely.

PCI_COMMAND_INTX_DISABLE (bit 10)
  PCIe 6.1: RW
  PCI LB 3.0: RW
  QEMU: (emu_mask) QEMU provides an emulated view of this bit. Guest
    writes do not propagate to hardware. QEMU checks if INTx was mapped
    for a device. If it is not, then guest can't control
    PCI_COMMAND_INTX_DISABLE bit.
  Xen domU: We prohibit a guest from enabling INTx if MSI(X) is enabled.
  Xen dom0: We allow dom0 to control this bit freely.

Bits 11-15
  PCIe 6.1: RsvdP
  PCI LB 3.0: Reserved
  QEMU: res_mask
  Xen domU: rsvdp_mask
  Xen dom0: We allow dom0 to control these bits freely.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agoarm/vpci: honor access size when returning an error
Volodymyr Babchuk [Thu, 23 May 2024 08:17:30 +0000 (10:17 +0200)]
arm/vpci: honor access size when returning an error

Guest can try to read config space using different access sizes: 8,
16, 32, 64 bits. We need to take this into account when we are
returning an error back to MMIO handler, otherwise it is possible to
provide more data than requested: i.e. guest issues LDRB instruction
to read one byte, but we are writing 0xFFFFFFFFFFFFFFFF in the target
register.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Signed-off-by: Stewart Hildebrand <stewart.hildebrand@amd.com>
Acked-by: Julien Grall <jgrall@amazon.com>
11 months agox86: detect PIT aliasing on ports other than 0x4[0-3]
Jan Beulich [Thu, 23 May 2024 08:16:52 +0000 (10:16 +0200)]
x86: detect PIT aliasing on ports other than 0x4[0-3]

... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to PIT. Unlike for CMOS/RTC, do detection
pretty early, to avoid disturbing normal operation later on (even if
typically we won't use much of the PIT).

Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects (beyond such that PIT
reads have anyway) in case it does not alias the PIT's.

As to the port 0x61 accesses: Unlike other accesses we do, this masks
off the top four bits (in addition to the bottom two ones), following
Intel chipset documentation saying that these (read-only) bits should
only be written with zero.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
11 months agox86/PIT: supply and use #define-s
Jan Beulich [Thu, 23 May 2024 08:16:07 +0000 (10:16 +0200)]
x86/PIT: supply and use #define-s

Help reading of code programming the PIT by introducing constants for
control word, read back and latch commands, as well as status.

Requested-by: Jason Andryuk <jason.andryuk@amd.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
11 months agoxen/riscv: add required things to current.h
Oleksii Kurochko [Fri, 17 May 2024 13:54:58 +0000 (15:54 +0200)]
xen/riscv: add required things to current.h

Add minimal requied things to be able to build full Xen.

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoxen/riscv: introduce atomic.h
Oleksii Kurochko [Fri, 17 May 2024 13:54:55 +0000 (15:54 +0200)]
xen/riscv: introduce atomic.h

Initially the patch was introduced by Bobby, who takes the header from
Linux kernel.

The following changes were done on top of Bobby's changes:
 - atomic##prefix##_*xchg_*(atomic##prefix##_t *v, c_t n) were updated
   to use__*xchg_generic()
 - drop casts in write_atomic() as they are unnecessary
 - drop introduction of WRITE_ONCE() and READ_ONCE().
   Xen provides ACCESS_ONCE()
 - remove zero-length array access in read_atomic()
 - drop defines similar to pattern:
   #define atomic_add_return_relaxed   atomic_add_return_relaxed
 - move not RISC-V specific functions to asm-generic/atomics-ops.h
 - drop  atomic##prefix##_{cmp}xchg_{release, aquire, release}() as they
   are not used in Xen.
 - update the defintion of  atomic##prefix##_{cmp}xchg according to
   {cmp}xchg() implementation in Xen.
 - some ATOMIC_OP() macros were updated:
   - drop size argument for ATOMIC_OP which defines atomic##prefix##_xchg()
     and atomic##prefix##_cmpxchg().
   - drop c_op argument for ATOMIC_OPS which defines ATOMIC_OPS(and, and),
     ATOMIC_OPS( or,  or), ATOMIC_OPS(xor, xor), ATOMIC_OPS(add, add, +),
     ATOMIC_OPS(sub, add, -) as c_op is always "+" for them.
   - drop "" from definition of __atomic_{acquire/release"}_fence.

The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:

 - .aq:   The LR/SC sequence can be given acquire semantics by
          setting the aq bit on the LR instruction.
 - .rl:   The LR/SC sequence can be given release semantics by
              setting the rl bit on the SC instruction.
 - .aqrl: Setting the aq bit on the LR instruction, and setting
          both the aq and the rl bit on the SC instruction makes
          the LR/SC sequence sequentially consistent, meaning that
          it cannot be reordered with earlier or later memory
          operations from the same hart.

 Software should not set the rl bit on an LR instruction unless
 the aq bit is also set, nor should software set the aq bit on an
 SC instruction unless the rl bit is also set. LR.rl and SC.aq
 instructions are not guaranteed to provide any stronger ordering
 than those with both bits clear, but may result in lower
 performance.

Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]

[1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/
[2] https://lore.kernel.org/linux-riscv/41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com/

Signed-off-by: Bobby Eshleman <bobbyeshleman@gmail.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoxen/riscv: introduce cmpxchg.h
Oleksii Kurochko [Fri, 17 May 2024 13:54:54 +0000 (15:54 +0200)]
xen/riscv: introduce cmpxchg.h

The header was taken from Linux kernl 6.4.0-rc1.

Addionally, were updated:
* add emulation of {cmp}xchg for 1/2 byte types using 32-bit atomic
  access.
* replace tabs with spaces
* replace __* variale with *__
* introduce generic version of xchg_* and cmpxchg_*.
* drop {cmp}xchg{release,relaxed,acquire} as Xen doesn't use them
* drop barries and use instruction suffixices instead ( .aq, .rl, .aqrl )

Implementation of 4- and 8-byte cases were updated according to the spec:
```
              ....
Linux Construct         RVWMO AMO Mapping
    ...
atomic <op>             amo<op>.{w|d}.aqrl
Linux Construct         RVWMO LR/SC Mapping
    ...
atomic <op>             loop: lr.{w|d}.aq; <op>; sc.{w|d}.aqrl; bnez loop

Table A.5: Mappings from Linux memory primitives to RISC-V primitives

```

The current implementation is the same with 8e86f0b409a4
("arm64: atomics: fix use of acquire + release for full barrier
semantics") [1].
RISC-V could combine acquire and release into the SC
instructions and it could reduce a fence instruction to gain better
performance. Here is related description from RISC-V ISA 10.2
Load-Reserved/Store-Conditional Instructions:

 - .aq:   The LR/SC sequence can be given acquire semantics by
          setting the aq bit on the LR instruction.
 - .rl:   The LR/SC sequence can be given release semantics by
          setting the rl bit on the SC instruction.
 - .aqrl: Setting the aq bit on the LR instruction, and setting
          both the aq and the rl bit on the SC instruction makes
          the LR/SC sequence sequentially consistent, meaning that
          it cannot be reordered with earlier or later memory
          operations from the same hart.

 Software should not set the rl bit on an LR instruction unless
 the aq bit is also set, nor should software set the aq bit on an
 SC instruction unless the rl bit is also set. LR.rl and SC.aq
 instructions are not guaranteed to provide any stronger ordering
 than those with both bits clear, but may result in lower
 performance.

Also, I way of transforming ".rl + full barrier" to ".aqrl" was approved
by (the author of the RVWMO spec) [2]

[1] https://patchwork.kernel.org/project/linux-arm-kernel/patch/1391516953-14541-1-git-send-email-will.deacon@arm.com/
[2] https://lore.kernel.org/linux-riscv/41e01514-74ca-84f2-f5cc-2645c444fd8e@nvidia.com/

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoxen/x86: Simplify header dependencies in x86/hvm
Alejandro Vallejo [Thu, 23 May 2024 08:07:31 +0000 (10:07 +0200)]
xen/x86: Simplify header dependencies in x86/hvm

Otherwise it's not possible to call functions described in hvm/vlapic.h from the
inline functions of hvm/hvm.h.

This is because a static inline in vlapic.h depends on hvm.h, and pulls it
transitively through vpt.h. The ultimate cause is having hvm.h included in any
of the "v*.h" headers, so break the cycle moving the guilty inline into hvm.h.

No functional change.

Signed-off-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 months agoiommu/x86: print RMRR/IVMD ranges using full addresses
Roger Pau Monné [Thu, 23 May 2024 08:03:33 +0000 (10:03 +0200)]
iommu/x86: print RMRR/IVMD ranges using full addresses

It's easier to correlate with the physical memory map if the addresses are
fully printed, instead of using frame numbers.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 months agoxen/livepatch: make .livepatch.funcs read-only for in-tree tests
Roger Pau Monné [Thu, 23 May 2024 08:03:14 +0000 (10:03 +0200)]
xen/livepatch: make .livepatch.funcs read-only for in-tree tests

This matches the flags of the .livepatch.funcs section when generated using
livepatch-build-tools, which only sets the SHT_ALLOC flag.

Also constify the definitions of the livepatch_func variables in the tests
themselves, in order to better match the resulting output.  Note that just
making those variables constant is not enough to force the generated sections
to be read-only.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Ross Lagerwall <ross.lagerwall@citrix.com>
11 months agox86_64/cpu_idle: address violations of MISRA C Rule 20.7
Nicola Vetrini [Tue, 21 May 2024 14:01:17 +0000 (16:01 +0200)]
x86_64/cpu_idle: address violations of MISRA C Rule 20.7

MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86_64/uaccess: address violations of MISRA C Rule 20.7
Nicola Vetrini [Tue, 21 May 2024 14:00:47 +0000 (16:00 +0200)]
x86_64/uaccess: address violations of MISRA C Rule 20.7

MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.

xlat_malloc_init is touched for consistency, despite the construct
being already deviated.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/hvm: address violations of MISRA C Rule 20.7
Nicola Vetrini [Tue, 21 May 2024 14:00:20 +0000 (16:00 +0200)]
x86/hvm: address violations of MISRA C Rule 20.7

MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/vpmu: address violations of MISRA C Rule 20.7
Nicola Vetrini [Tue, 21 May 2024 13:59:50 +0000 (15:59 +0200)]
x86/vpmu: address violations of MISRA C Rule 20.7

MISRA C Rule 20.7 states: "Expressions resulting from the expansion
of macro parameters shall be enclosed in parentheses". Therefore, some
macro definitions should gain additional parentheses to ensure that all
current and future users will be safe with respect to expansions that
can possibly alter the semantics of the passed-in macro parameter.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoxen/common/dt-overlay: Fix lock issue when add/remove the device
Henry Wang [Tue, 21 May 2024 13:59:14 +0000 (15:59 +0200)]
xen/common/dt-overlay: Fix lock issue when add/remove the device

If CONFIG_DEBUG=y, below assertion will be triggered:
(XEN) Assertion 'rw_is_locked(&dt_host_lock)' failed at drivers/passthrough/device_tree.c:146
(XEN) ----[ Xen-4.19-unstable  arm64  debug=y  Not tainted ]----
[...]
(XEN) Xen call trace:
(XEN)    [<00000a0000257418>] iommu_remove_dt_device+0x8c/0xd4 (PC)
(XEN)    [<00000a00002573a0>] iommu_remove_dt_device+0x14/0xd4 (LR)
(XEN)    [<00000a000020797c>] dt-overlay.c#remove_node_resources+0x8c/0x90
(XEN)    [<00000a0000207f14>] dt-overlay.c#remove_nodes+0x524/0x648
(XEN)    [<00000a0000208460>] dt_overlay_sysctl+0x428/0xc68
(XEN)    [<00000a00002707f8>] arch_do_sysctl+0x1c/0x2c
(XEN)    [<00000a0000230b40>] do_sysctl+0x96c/0x9ec
(XEN)    [<00000a0000271e08>] traps.c#do_trap_hypercall+0x1e8/0x288
(XEN)    [<00000a0000273490>] do_trap_guest_sync+0x448/0x63c
(XEN)    [<00000a000025c480>] entry.o#guest_sync_slowpath+0xa8/0xd8
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion 'rw_is_locked(&dt_host_lock)' failed at drivers/passthrough/device_tree.c:146
(XEN) ****************************************

This is because iommu_remove_dt_device() is called without taking the
dt_host_lock. dt_host_lock is meant to ensure that the DT node will not
disappear behind back. So fix the issue by taking the lock as soon as
getting hold of overlay_node.

Similar issue will be observed in adding the dtbo:
(XEN) Assertion 'system_state < SYS_STATE_active || rw_is_locked(&dt_host_lock)'
failed at xen-source/xen/drivers/passthrough/device_tree.c:192
(XEN) ----[ Xen-4.19-unstable  arm64  debug=y  Not tainted ]----
[...]
(XEN) Xen call trace:
(XEN)    [<00000a00002594f4>] iommu_add_dt_device+0x7c/0x17c (PC)
(XEN)    [<00000a0000259494>] iommu_add_dt_device+0x1c/0x17c (LR)
(XEN)    [<00000a0000267db4>] handle_device+0x68/0x1e8
(XEN)    [<00000a0000208ba8>] dt_overlay_sysctl+0x9d4/0xb84
(XEN)    [<00000a000027342c>] arch_do_sysctl+0x24/0x38
(XEN)    [<00000a0000231ac8>] do_sysctl+0x9ac/0xa34
(XEN)    [<00000a0000274b70>] traps.c#do_trap_hypercall+0x230/0x2dc
(XEN)    [<00000a0000276330>] do_trap_guest_sync+0x478/0x688
(XEN)    [<00000a000025e480>] entry.o#guest_sync_slowpath+0xa8/0xd8

This is because the lock is released too early. So fix the issue by
releasing the lock after handle_device().

Fixes: 7e5c4a8b86f1 ("xen/arm: Implement device tree node removal functionalities")
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
11 months agox86/p2m: Add braces for better code clarity
Petr Beneš [Tue, 21 May 2024 07:16:25 +0000 (09:16 +0200)]
x86/p2m: Add braces for better code clarity

No functional change.

Signed-off-by: Petr Beneš <w1benny@gmail.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agoxen/riscv: introduce vm_event_*() functions
Oleksii Kurochko [Tue, 21 May 2024 07:16:02 +0000 (09:16 +0200)]
xen/riscv: introduce vm_event_*() functions

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
11 months agoxen/riscv: introduce monitor.h
Oleksii Kurochko [Tue, 21 May 2024 07:15:37 +0000 (09:15 +0200)]
xen/riscv: introduce monitor.h

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
11 months agoxen/x86: pretty print interrupt CPU affinity masks
Roger Pau Monné [Tue, 21 May 2024 07:15:03 +0000 (09:15 +0200)]
xen/x86: pretty print interrupt CPU affinity masks

Print the CPU affinity masks as numeric ranges instead of plain hexadecimal
bitfields.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
11 months agoxen/trace: Drop old trace API
Andrew Cooper [Mon, 20 Sep 2021 12:40:21 +0000 (13:40 +0100)]
xen/trace: Drop old trace API

With all users updated to the new API, drop the old API.  This includes all of
asm/hvm/trace.h, which allows us to drop some includes.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
11 months agoxen/trace: Removal final {__,}trace_var() users in favour of the new API
Andrew Cooper [Tue, 21 Sep 2021 18:55:47 +0000 (19:55 +0100)]
xen/trace: Removal final {__,}trace_var() users in favour of the new API

The cycles parameter (which gets removed as a consequence) determines whether
trace() or trace_time() is used.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
11 months agoxen: Switch to new TRACE() API
Andrew Cooper [Fri, 17 Sep 2021 23:31:27 +0000 (00:31 +0100)]
xen: Switch to new TRACE() API

(Almost) no functional change.

 * In irq_move_cleanup_interrupt(), use the 'me' local variable rather than
   calling smp_processor_id() again.  This manifests as a minor code
   improvement.
 * In vlapic_update_timer() and lapic_rearm(), introduce a new 'timer_period'
   local variable to simplify the expressions used for both the trace and
   create_periodic_time() calls.

All other differences in the compiled binary are to do with line numbers
changing.

Some conversion notes:
 * HVMTRACE_LONG_[234]D() and TRACE_2_LONG_[234]D() were latently buggy.  They
   blindly discard extra parameters, but luckily no users are impacted.  They
   are also obfuscated wrappers, depending on exactly one or two parameters
   being TRC_PAR_LONG() to compile successfully.
 * HVMTRACE_LONG_1D() behaves unlike its named companions, and takes exactly
   one 64bit parameter which it splits manually.  It's one user,
   vmx_cr_access()'s LMSW path, is gets adjusted.
 * TRACE_?D() and TRACE_2_LONG_*() change to TRACE_TIME() as cycles is always
   enabled.
 * HVMTRACE_ND() is opencoded for VMENTRY/VMEXIT records to include cycles.
   These are converted to TRACE_TIME(), with the old modifier parameter
   expressed as an OR at the callsite.  One callsite, svm_vmenter_helper() had
   a nested tb_init_done check, which is dropped.  (The optimiser also spotted
   this, which is why it doesn't manifest as a binary difference.)
 * All uses of *LONG() are either opencoded or swapped to using a struct, to
   avoid MISRA issues.
 * All HVMTRACE_?D() change to TRACE() as cycles is explicitly skipped.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
11 months agoxen/sched: Clean up trace handling
Andrew Cooper [Mon, 20 Sep 2021 13:07:43 +0000 (14:07 +0100)]
xen/sched: Clean up trace handling

There is no need for bitfields anywhere - use more sensible types.  There is
also no need to cast 'd' to (unsigned char *) before passing it to a function
taking void *.  Switch to new trace_time() API.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
11 months agoxen/rt: Clean up trace handling
Andrew Cooper [Fri, 17 Sep 2021 15:28:19 +0000 (16:28 +0100)]
xen/rt: Clean up trace handling

Most uses of bitfields and __packed are unnecessary.  There is also no need to
cast 'd' to (unsigned char *) before passing it to a function taking void *.
Switch to new trace_time() API.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
11 months agoxen/credit2: Clean up trace handling
Andrew Cooper [Wed, 15 Sep 2021 16:01:43 +0000 (17:01 +0100)]
xen/credit2: Clean up trace handling

There is no need for bitfields anywhere - use types with an explicit width
instead.  There is also no need to cast 'd' to (unsigned char *) before
passing it to a function taking void *.  Switch to new trace_time() API.

No functional change.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
11 months agoxen/trace: Introduce new API
Andrew Cooper [Mon, 20 Sep 2021 12:36:12 +0000 (13:36 +0100)]
xen/trace: Introduce new API

trace() and trace_time(), in function form for struct arguments, and macro
form for simple uint32_t list arguments.

This will be used to clean up the mess of macros which exists throughout the
codebase, as well as eventually dropping __trace_var().

There is intentionally no macro to split a 64-bit parameter in the new API,
for MISRA reasons.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: George Dunlap <george.dunlap@cloud.com>
11 months agotools/xen-cpuid: Drop old names
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (13:49 +0200)]
tools/xen-cpuid: Drop old names

Not used any more.  Split out of previous patch to aid legibility.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agotools/xen-cpuid: Use automatically generated feature names
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (12:49 +0100)]
tools/xen-cpuid: Use automatically generated feature names

Have gen-cpuid.py write out INIT_FEATURE_VAL_TO_NAME, derived from the same
data source as INIT_FEATURE_NAME_TO_VAL, although both aliases of common_1d
are needed.

In xen-cpuid.c, sanity check at build time that leaf_info[] and
feature_names[] are of sensible length.

As dump_leaf() rendered missing names as numbers, always dump leaves even if
we don't have the leaf name.  This conversion was argumably missed in commit
59afdb8a81d6 ("tools/misc: Tweak reserved bit handling for xen-cpuid").

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agotools/xen-cpuid: Rename decodes[] to leaf_info[]
Roger Pau Monné [Thu, 2 May 2024 11:49:22 +0000 (12:49 +0100)]
tools/xen-cpuid: Rename decodes[] to leaf_info[]

Split out of subsequent patch to aid legibility.

No functional change.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agox86/gen-cpuid: Minor cleanup
Andrew Cooper [Fri, 10 May 2024 19:04:51 +0000 (20:04 +0100)]
x86/gen-cpuid: Minor cleanup

Rename INIT_FEATURE_NAMES to INIT_FEATURE_NAME_TO_VAL as we're about to gain a
inverse mapping of the same thing.

Use dict.items() unconditionally.  iteritems() is a marginal perf optimsiation
for Python2 only, and simply not worth the effort on a script this small.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agotools/golang: Add missing golang bindings for vlan
Henry Wang [Mon, 20 May 2024 08:21:45 +0000 (16:21 +0800)]
tools/golang: Add missing golang bindings for vlan

It is noticed that commit:
3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
introduces a new "vlan" string field to libxl_device_nic. But the
golang bindings are missing. Add it in this patch.

Fixes: 3bc14e4fa4b9 ("tools/libs/light: Add vlan field to libxl_device_nic")
Signed-off-by: Henry Wang <xin.wang2@amd.com>
Acked-by: George Dunlap <george.dunlap@cloud.com>
11 months agox86/msi: prevent watchdog triggering when dumping MSI state
Roger Pau Monné [Fri, 17 May 2024 13:56:05 +0000 (15:56 +0200)]
x86/msi: prevent watchdog triggering when dumping MSI state

Use the same check that's used in dump_irqs().

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agoinclude/ctype.h: fix MISRA R10.2 violation
Stefano Stabellini [Wed, 15 May 2024 22:52:04 +0000 (15:52 -0700)]
include/ctype.h: fix MISRA R10.2 violation

The value returned by __toupper is used in arithmetic operations causing
MISRA C 10.2 violations. Cast to plain char in the toupper macro. Also
do the same in tolower for consistency.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoxen/arm: Add DT reserve map regions to bootinfo.reserved_mem
Luca Fancellu [Thu, 25 Apr 2024 13:11:18 +0000 (14:11 +0100)]
xen/arm: Add DT reserve map regions to bootinfo.reserved_mem

Currently the code is listing device tree reserve map regions
as reserved memory for Xen, but they are not added into
bootinfo.reserved_mem and they are fetched in multiple places
using the same code sequence, causing duplication. Fix this
by adding them to the bootinfo.reserved_mem at early stage.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
11 months agoxen/arm64: lib: Use the generic xen/linkage.h macros
Edgar E. Iglesias [Sat, 4 May 2024 11:55:14 +0000 (13:55 +0200)]
xen/arm64: lib: Use the generic xen/linkage.h macros

Use the generic xen/linkage.h macros to annotate code symbols.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm64: cache: Use the generic xen/linkage.h macros
Edgar E. Iglesias [Sat, 4 May 2024 11:55:13 +0000 (13:55 +0200)]
xen/arm64: cache: Use the generic xen/linkage.h macros

Use the generic xen/linkage.h macros to annotate code symbols.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm64: mmu/head: Add missing code symbol annotations
Edgar E. Iglesias [Sat, 4 May 2024 11:55:12 +0000 (13:55 +0200)]
xen/arm64: mmu/head: Add missing code symbol annotations

Use the generic xen/linkage.h macros to annotate code symbols
and add missing annotations.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm64: bpi: Add missing code symbol annotations
Edgar E. Iglesias [Sat, 4 May 2024 11:55:11 +0000 (13:55 +0200)]
xen/arm64: bpi: Add missing code symbol annotations

Use the generic xen/linkage.h macros to annotate code symbols
and add missing annotations.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm64: debug: Add missing code symbol annotations
Edgar E. Iglesias [Sat, 4 May 2024 11:55:10 +0000 (13:55 +0200)]
xen/arm64: debug: Add missing code symbol annotations

Use the generic xen/linkage.h macros to annotate code symbols
and add missing annotations.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm64: head: Add missing code symbol annotations
Edgar E. Iglesias [Sat, 4 May 2024 11:55:09 +0000 (13:55 +0200)]
xen/arm64: head: Add missing code symbol annotations

Use the generic xen/linkage.h macros to annotate code symbols
and add missing annotations.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm64: sve: Add missing code symbol annotations
Edgar E. Iglesias [Sat, 4 May 2024 11:55:08 +0000 (13:55 +0200)]
xen/arm64: sve: Add missing code symbol annotations

Use the generic xen/linkage.h macros to annotate code symbols
and add missing annotations.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm64: smc: Add missing code symbol annotations
Edgar E. Iglesias [Sat, 4 May 2024 11:55:07 +0000 (13:55 +0200)]
xen/arm64: smc: Add missing code symbol annotations

Use the generic xen/linkage.h macros to annotate code symbols
and add missing annotations.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen/arm64: entry: Add missing code symbol annotations
Edgar E. Iglesias [Sat, 4 May 2024 11:55:06 +0000 (13:55 +0200)]
xen/arm64: entry: Add missing code symbol annotations

Use the generic xen/linkage.h macros to annotate code symbols
and add missing annotations.

Signed-off-by: Edgar E. Iglesias <edgar.iglesias@amd.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agox86/ucode: Further fixes to identify "ucode already up to date"
Andrew Cooper [Thu, 16 May 2024 11:09:39 +0000 (12:09 +0100)]
x86/ucode: Further fixes to identify "ucode already up to date"

When the revision in hardware is newer than anything Xen has to hand,
'microcode_cache' isn't set up.  Then, `xen-ucode` initiates the update
because it doesn't know whether the revisions across the system are symmetric
or not.  This involves the patch getting all the way into the
apply_microcode() hooks before being found to be too old.

This is all a giant mess and needs an overhaul, but in the short term simply
adjust the apply_microcode() to return -EEXIST.

Also, unconditionally print the preexisting microcode revision on boot.  It's
relevant information which is otherwise unavailable if Xen doesn't find new
microcode to use.

Fixes: 648db37a155a ("x86/ucode: Distinguish "ucode already up to date"")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agox86/p2m: move altp2m-related code to separate file
Sergiy Kibrik [Thu, 16 May 2024 11:36:22 +0000 (13:36 +0200)]
x86/p2m: move altp2m-related code to separate file

Move altp2m code from generic p2m.c file to altp2m.c, so it is kept separately
and can possibly be disabled in the build. We may want to disable it when
building for specific platform only, that doesn't support alternate p2m.

No functional change intended.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/MCE: guard {intel/amd}_mcheck_init() calls
Sergiy Kibrik [Thu, 16 May 2024 11:35:54 +0000 (13:35 +0200)]
x86/MCE: guard {intel/amd}_mcheck_init() calls

Guard calls to CPU-specific mcheck init routines in common MCE code
using new INTEL/AMD config options.

The purpose is not to build platform-specific mcheck code and calls to it,
if this platform is disabled in config.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/MCE: guard access to Intel/AMD-specific MCA MSRs
Sergiy Kibrik [Thu, 16 May 2024 11:35:34 +0000 (13:35 +0200)]
x86/MCE: guard access to Intel/AMD-specific MCA MSRs

Add build-time checks for newly introduced INTEL/AMD config options when
calling vmce_{intel/amd}_{rdmsr/wrmsr}() routines.
This way a platform-specific code can be omitted in vmce code, if this
platform is disabled in config.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agox86/vpmu: separate amd/intel vPMU code
Sergiy Kibrik [Thu, 16 May 2024 11:34:54 +0000 (13:34 +0200)]
x86/vpmu: separate amd/intel vPMU code

Build AMD vPMU when CONFIG_AMD is on, and Intel vPMU when CONFIG_INTEL
is on respectively, allowing for a plaftorm-specific build.

No functional change intended.

Signed-off-by: Sergiy Kibrik <Sergiy_Kibrik@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
11 months agoxen/bitops: put __ffs() into linux compatible header
Oleksii Kurochko [Thu, 16 May 2024 08:08:37 +0000 (10:08 +0200)]
xen/bitops: put __ffs() into linux compatible header

The mentioned macros exist only because of Linux compatible purpose.

The patch defines __ffs() in terms of Xen bitops and it is safe
to define in this way ( as __ffs() - 1 ) as considering that __ffs()
was defined as __builtin_ctzl(x), which has undefined behavior when x=0,
so it is assumed that such cases are not encountered in the current code.

To not include <xen/linux-compat.h> to Xen library files __ffs() and __ffz()
were defined locally in find-next-bit.c.

Except __ffs() usage in find-next-bit.c only one usage of __ffs() leave
in smmu-v3.c. It seems that it __ffs can be changed to ffsl(x)-1 in
this file, but to keep smmu-v3.c looks close to linux it was deciced just
to define __ffs() in xen/linux-compat.h and include it in smmu-v3.c

Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Shawn Anastasio <sanastasio@raptorengineering.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Michal Orzel <michal.orzel@amd.com>
Acked-by: Rahul Singh <rahul.singh@arm.com>
11 months agox86: detect PIC aliasing on ports other than 0x[2A][01]
Jan Beulich [Thu, 16 May 2024 08:03:16 +0000 (10:03 +0200)]
x86: detect PIC aliasing on ports other than 0x[2A][01]

... in order to also deny Dom0 access through the alias ports (commonly
observed on Intel chipsets). Without this it is only giving the
impression of denying access to both PICs. Unlike for CMOS/RTC, do
detection very early, to avoid disturbing normal operation later on.

Like for CMOS/RTC a fundamental assumption of the probing is that reads
from the probed alias port won't have side effects in case it does not
alias the respective PIC's one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
11 months agox86: allow to suppress port-alias probing
Jan Beulich [Thu, 16 May 2024 08:02:34 +0000 (10:02 +0200)]
x86: allow to suppress port-alias probing

By default there's already no use for this when we run in shim mode.
Plus there may also be a need to suppress the probing in case of issues
with it. Before introducing further port alias probing, introduce a
command line option allowing to bypass it, default it to on when in shim
mode, and gate RTC/CMOS port alias probing on it.

Requested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Jason Andryuk <jason.andryuk@amd.com>
11 months agoautomation/eclair_analysis: deviate macro count_args_ for MISRA Rule 20.7
Nicola Vetrini [Tue, 23 Apr 2024 15:12:45 +0000 (17:12 +0200)]
automation/eclair_analysis: deviate macro count_args_ for MISRA Rule 20.7

The count_args_ macro violates Rule 20.7, but it can't be made
compliant with Rule 20.7 without breaking its functionality. Since
it's very unlikely for this macro to be misused, it is deviated.

No functional change.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoautomation/eclair_analysis: fully deviate MISRA C Rules 21.9 and 21.10
Nicola Vetrini [Wed, 15 May 2024 07:51:59 +0000 (09:51 +0200)]
automation/eclair_analysis: fully deviate MISRA C Rules 21.9 and 21.10

These rules are concerned with the use of facilities provided by the
C Standard Library (qsort, bsearch for rule 21.9, and those provided
by <time.h> for rule 21.10).

Xen provides in its source code its own implementation of some of these
functions and macros, therefore a justification is provided for allowing
uses of these functions in the project.

The rules are also marked as clean as a consequence.

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agox86/mtrr: avoid system wide rendezvous when setting AP MTRRs
Roger Pau Monne [Mon, 13 May 2024 08:59:25 +0000 (10:59 +0200)]
x86/mtrr: avoid system wide rendezvous when setting AP MTRRs

There's no point in forcing a system wide update of the MTRRs on all processors
when there are no changes to be propagated.  On AP startup it's only the AP
that needs to write the system wide MTRR values in order to match the rest of
the already online CPUs.

We have occasionally seen the watchdog trigger during `xen-hptool cpu-online`
in one Intel Cascade Lake box with 448 CPUs due to the re-setting of the MTRRs
on all the CPUs in the system.

While there adjust the comment to clarify why the system-wide resetting of the
MTRR registers is not needed for the purposes of mtrr_ap_init().

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Release-acked-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agotools/xl: add vlan keyword to vif option
Leigh Brown [Wed, 8 May 2024 21:38:21 +0000 (22:38 +0100)]
tools/xl: add vlan keyword to vif option

Update parse_nic_config() to support a new `vlan' keyword. This
keyword specifies the VLAN configuration to assign to the VIF when
attaching it to the bridge port, on operating systems that support
the capability (e.g. Linux). The vlan keyword will allow one or
more VLANs to be configured on the VIF when adding it to the bridge
port. This will be done by the vif-bridge script and functions.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
11 months agotools/libs/light: Add vlan field to libxl_device_nic
Leigh Brown [Wed, 8 May 2024 21:38:20 +0000 (22:38 +0100)]
tools/libs/light: Add vlan field to libxl_device_nic

Add `vlan' string field to libxl_device_nic, to allow a VLAN
configuration to be specified for the VIF when adding it to the
bridge device.

Update libxl_nic.c to read and write the vlan field from the
xenstore.

This provides the capability for supported operating systems (e.g.
Linux) to perform VLAN filtering on bridge ports.  The Xen
hotplug scripts need to be updated to read this information from
the xenstore and perform the required configuration.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Reviewed-by: Jason Andryuk <jandryuk@gmail.com>
11 months agotools/xentop: Fix cpu% sort order
Leigh Brown [Tue, 14 May 2024 08:13:44 +0000 (09:13 +0100)]
tools/xentop: Fix cpu% sort order

In compare_cpu_pct(), there is a double -> unsigned long long converion when
calling compare().  In C, this discards the fractional part, resulting in an
out-of order sorting such as:

        NAME  STATE   CPU(sec) CPU(%)
       xendd --b---       4020    5.7
    icecream --b---       2600    3.8
    Domain-0 -----r       1060    1.5
        neon --b---        827    1.1
      cheese --b---        225    0.7
       pizza --b---        359    0.5
     cassini --b---        490    0.4
     fusilli --b---        159    0.2
         bob --b---        502    0.2
     blender --b---        121    0.2
       bread --b---         69    0.1
    chickpea --b---         67    0.1
      lentil --b---         67    0.1

Introduce compare_dbl() function and update compare_cpu_pct() to call it.

Fixes: 49839b535b78 ("Add xenstat framework.")
Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agotools/hvmloader: Further simplify SMP setup
Andrew Cooper [Thu, 9 May 2024 17:40:11 +0000 (18:40 +0100)]
tools/hvmloader: Further simplify SMP setup

Now that we're using hypercalls to start APs, we can replace the 'ap_cpuid'
global with a regular function parameter.  This requires telling the compiler
that we'd like the parameter in a register rather than on the stack.

While adjusting, rename to cpu_setup().  It's always been used on the BSP,
making the name ap_start() specifically misleading.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Alejandro Vallejo <alejandro.vallejo@cloud.com>
11 months agox86/cpufreq: Rename cpuid variable/parameters to cpu
Andrew Cooper [Sat, 11 May 2024 18:25:00 +0000 (19:25 +0100)]
x86/cpufreq: Rename cpuid variable/parameters to cpu

Various functions have a parameter or local variable called cpuid, but this
triggers a MISRA R5.3 violation because we also have a function called cpuid()
which wraps the real CPUID instruction.

In all these cases, it's a Xen cpu index, which is far more commonly named
just cpu in our code.

While adjusting these, fix a couple of other issues:

 * cpufreq_cpu_init() is on the end of a hypercall (with in-memory parameters,
   even), making EFAULT the wrong error to use.  Use EOPNOTSUPP instead.

 * check_est_cpu() is wrong to tie EIST to just Intel, and nowhere else using
   EIST makes this restriction.  Just check the feature itself, which is more
   succinctly done after being folded into its single caller.

 * In powernow_cpufreq_update(), replace an opencoded cpu_online().

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 months agox86: respect mapcache_domain_init() failing
Jan Beulich [Wed, 15 May 2024 13:35:15 +0000 (15:35 +0200)]
x86: respect mapcache_domain_init() failing

The function itself properly handles and hands onwards failure from
create_perdomain_mapping(). Therefore its caller should respect possible
failure, too.

Fixes: 4b28bf6ae90b ("x86: re-introduce map_domain_page() et al")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agoxen/sched: set all sched_resource data inside locked region for new cpu
Juergen Gross [Wed, 15 May 2024 15:25:39 +0000 (17:25 +0200)]
xen/sched: set all sched_resource data inside locked region for new cpu

When adding a cpu to a scheduler, set all data items of struct
sched_resource inside the locked region, as otherwise a race might
happen (e.g. when trying to access the cpupool of the cpu):

  (XEN) ----[ Xen-4.19.0-1-d  x86_64  debug=y  Tainted:     H  ]----
  (XEN) CPU:    45
  (XEN) RIP:    e008:[<ffff82d040244cbf>] common/sched/credit.c#csched_load_balance+0x41/0x877
  (XEN) RFLAGS: 0000000000010092   CONTEXT: hypervisor
  (XEN) rax: ffff82d040981618   rbx: ffff82d040981618   rcx: 0000000000000000
  (XEN) rdx: 0000003ff68cd000   rsi: 000000000000002d   rdi: ffff83103723d450
  (XEN) rbp: ffff83207caa7d48   rsp: ffff83207caa7b98   r8:  0000000000000000
  (XEN) r9:  ffff831037253cf0   r10: ffff83103767c3f0   r11: 0000000000000009
  (XEN) r12: ffff831037237990   r13: ffff831037237990   r14: ffff831037253720
  (XEN) r15: 0000000000000000   cr0: 000000008005003b   cr4: 0000000000f526e0
  (XEN) cr3: 000000005bc2f000   cr2: 0000000000000010
  (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
  (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
  (XEN) Xen code around <ffff82d040244cbf> (common/sched/credit.c#csched_load_balance+0x41/0x877):
  (XEN)  48 8b 0c 10 48 8b 49 08 <48> 8b 79 10 48 89 bd b8 fe ff ff 49 8b 4e 28 48
  <snip>
  (XEN) Xen call trace:
  (XEN)    [<ffff82d040244cbf>] R common/sched/credit.c#csched_load_balance+0x41/0x877
  (XEN)    [<ffff82d040245a18>] F common/sched/credit.c#csched_schedule+0x36a/0x69f
  (XEN)    [<ffff82d040252644>] F common/sched/core.c#do_schedule+0xe8/0x433
  (XEN)    [<ffff82d0402572dd>] F common/sched/core.c#schedule+0x2e5/0x2f9
  (XEN)    [<ffff82d040232f35>] F common/softirq.c#__do_softirq+0x94/0xbe
  (XEN)    [<ffff82d040232fc8>] F do_softirq+0x13/0x15
  (XEN)    [<ffff82d0403075ef>] F arch/x86/domain.c#idle_loop+0x92/0xe6
  (XEN)
  (XEN) Pagetable walk from 0000000000000010:
  (XEN)  L4[0x000] = 000000103ff61063 ffffffffffffffff
  (XEN)  L3[0x000] = 000000103ff60063 ffffffffffffffff
  (XEN)  L2[0x000] = 0000001033dff063 ffffffffffffffff
  (XEN)  L1[0x000] = 0000000000000000 ffffffffffffffff
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 45:
  (XEN) FATAL PAGE FAULT
  (XEN) [error_code=0000]
  (XEN) Faulting linear address: 0000000000000010
  (XEN) ****************************************

Reported-by: Andrew Cooper <andrew.cooper3@citrix.com>
Fixes: a8c6c623192e ("sched: clarify use cases of schedule_cpu_switch()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agoxen/console: fix Rule 10.2 violation
Stefano Stabellini [Fri, 10 May 2024 23:37:11 +0000 (16:37 -0700)]
xen/console: fix Rule 10.2 violation

Change opt_conswitch to char to fix a violation of Rule 10.2.

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Julien Grall <jgrall@amazon.com>
11 months agodocs/misra: add R21.6 R21.9 R21.10 R21.14 R21.15 R21.16
Stefano Stabellini [Fri, 26 Apr 2024 21:36:28 +0000 (14:36 -0700)]
docs/misra: add R21.6 R21.9 R21.10 R21.14 R21.15 R21.16

Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agox86/io: Don't cast away constness in read{b..q}()
Andrew Cooper [Fri, 10 May 2024 19:23:40 +0000 (20:23 +0100)]
x86/io: Don't cast away constness in read{b..q}()

Addresses various MISRA R11.8 violations.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoRevert "evtchn: refuse EVTCHNOP_status for Xen-bound event channels"
Andrew Cooper [Tue, 2 Apr 2024 14:50:19 +0000 (15:50 +0100)]
Revert "evtchn: refuse EVTCHNOP_status for Xen-bound event channels"

The commit makes a claim without justification.

The claim is false; it broke lsevtchn in dom0, a debugging utility which
absolutely does care about all of the domain's event channels.

Whether to return information about a xen-owned evtchn is a matter of policy,
and it's not acceptable to subvert Xen's security subsystem on the decision.

This reverts commit f60ab5337f968e2f10c639ab59db7afb0fe4f7c3.

Fixes: f60ab5337f96 ("evtchn: refuse EVTCHNOP_status for Xen-bound event channels")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Daniel P. Smith <dpsmith@apertussolutions.com>
11 months agoxen: Use -Wuninitialized and -Winit-self
Andrew Cooper [Fri, 10 May 2024 22:56:52 +0000 (23:56 +0100)]
xen: Use -Wuninitialized and -Winit-self

Assigning a variable to itself is an anti-pattern.  It introduces definite UB
in an attempt to silence a warning about possible UB.

As it's definite undefined behaviour, it also mis-compiles in simple cases,
using whatever stale value happened to be in the allocated register.

Clang includes -Wuninitialized within -Wall, but GCC only includes it in
-Wextra, which is not used by Xen at this time.

Furthermore, the specific pattern of assigning a variable to itself in its
declaration is only diagnosed by GCC with -Winit-self.  Clang does diagnose
simple forms of this pattern with a plain -Wuninitialized, but it fails to
diagnose the instances in Xen that GCC manages to find.

GCC, with -Wuninitialized and -Winit-self notices:

  arch/x86/time.c: In function ‘read_pt_and_tsc’:
  arch/x86/time.c:297:14: error: ‘best’ is used uninitialized in this function [-Werror=uninitialized]
    297 |     uint32_t best = best;
        |              ^~~~
  arch/x86/time.c: In function ‘read_pt_and_tmcct’:
  arch/x86/time.c:1022:14: error: ‘best’ is used uninitialized in this function [-Werror=uninitialized]
   1022 |     uint64_t best = best;
        |              ^~~~

Fix these up to start with a value of ~0, which is also more robust in the
case that something goes wrong.

Fixes: 23658e823238 ("x86/time: further improve TSC / CPU freq calibration accuracy")
Fixes: 3f3906b462d5 ("x86/APIC: calibrate against platform timer when possible")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoxen: Use -Wflex-array-member-not-at-end when available
Andrew Cooper [Sat, 13 Jan 2024 17:40:48 +0000 (17:40 +0000)]
xen: Use -Wflex-array-member-not-at-end when available

This option is new in GCC-14, and maps to MISRA Rule 1.1.  The codebase is
clean to it, and Eclair is blocking.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
11 months agoautomation/eclair_analysis: tag MISRA C Rule 1.1 as clean
Nicola Vetrini [Fri, 10 May 2024 18:03:36 +0000 (20:03 +0200)]
automation/eclair_analysis: tag MISRA C Rule 1.1 as clean

Tag the rule as clean, as there are no more violations in the codebase since
93c27d54dd23 ("xen/arm: Fix MISRA regression on R1.1,
flexible array member not at the end").

Signed-off-by: Nicola Vetrini <nicola.vetrini@bugseng.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
11 months agolibxl: Fix handling XenStore errors in device creation
Demi Marie Obenour [Sat, 27 Apr 2024 02:17:03 +0000 (22:17 -0400)]
libxl: Fix handling XenStore errors in device creation

If xenstored runs out of memory it is possible for it to fail operations
that should succeed.  libxl wasn't robust against this, and could fail
to ensure that the TTY path of a non-initial console was created and
read-only for guests.  This doesn't qualify for an XSA because guests
should not be able to run xenstored out of memory, but it still needs to
be fixed.

Add the missing error checks to ensure that all errors are properly
handled and that at no point can a guest make the TTY path of its
frontend directory writable.

Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
11 months agox86/hvm: Allow access to registers on the same page as MSI-X table
Marek Marczykowski-Górecki [Fri, 10 May 2024 03:53:22 +0000 (05:53 +0200)]
x86/hvm: Allow access to registers on the same page as MSI-X table

Some devices (notably Intel Wifi 6 AX210 card) keep auxiliary registers
on the same page as MSI-X table. Device model (especially one in
stubdomain) cannot really handle those, as direct writes to that page is
refused (page is on the mmio_ro_ranges list). Instead, extend
msixtbl_mmio_ops to handle such accesses too.

Doing this, requires correlating read/write location with guest
MSI-X table address. Since QEMU doesn't map MSI-X table to the guest,
it requires msixtbl_entry->gtable, which is HVM-only. Similar feature
for PV would need to be done separately.

This will be also used to read Pending Bit Array, if it lives on the same
page, making QEMU not needing /dev/mem access at all (especially helpful
with lockdown enabled in dom0). If PBA lives on another page, QEMU will
map it to the guest directly.
If PBA lives on the same page, discard writes and log a message.
Technically, writes outside of PBA could be allowed, but at this moment
the precise location of PBA isn't saved, and also no known device abuses
the spec in this way (at least yet).

To access those registers, msixtbl_mmio_ops need the relevant page
mapped. MSI handling already has infrastructure for that, using fixmap,
so try to map first/last page of the MSI-X table (if necessary) and save
their fixmap indexes. Note that msix_get_fixmap() does reference
counting and reuses existing mapping, so just call it directly, even if
the page was mapped before. Also, it uses a specific range of fixmap
indexes which doesn't include 0, so use 0 as default ("not mapped")
value - which simplifies code a bit.

Based on assumption that all MSI-X page accesses are handled by Xen, do
not forward adjacent accesses to other hypothetical ioreq servers, even
if the access wasn't handled for some reason (failure to map pages etc).
Relevant places log a message about that already.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
11 months agox86/msi: Extend per-domain/device warning mechanism
Marek Marczykowski-Górecki [Fri, 10 May 2024 03:53:21 +0000 (05:53 +0200)]
x86/msi: Extend per-domain/device warning mechanism

The arch_msix struct had a single "warned" field with a domid for which
warning was issued. Upcoming patch will need similar mechanism for few
more warnings, so change it to save a bit field of issued warnings.

Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
11 months agolibxl: fix population of the online vCPU bitmap for PVH
Roger Pau Monne [Fri, 10 May 2024 12:49:13 +0000 (14:49 +0200)]
libxl: fix population of the online vCPU bitmap for PVH

libxl passes some information to libacpi to create the ACPI table for a PVH
guest, and among that information it's a bitmap of which vCPUs are online
which can be less than the maximum number of vCPUs assigned to the domain.

While the population of the bitmap is done correctly for HVM based on the
number of online vCPUs, for PVH the population of the bitmap is done based on
the number of maximum vCPUs allowed.  This leads to all local APIC entries in
the MADT being set as enabled, which contradicts the data in xenstore if vCPUs
is different than maximum vCPUs.

Fix by copying the internal libxl bitmap that's populated based on the vCPUs
parameter.

Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
Link: https://gitlab.com/libvirt/libvirt/-/issues/399
Reported-by: Leigh Brown <leigh@solinno.co.uk>
Fixes: 14c0d328da2b ('libxl/acpi: Build ACPI tables for HVMlite guests')
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Tested-by: Leigh Brown <leigh@solinno.co.uk>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>