]> xenbits.xensource.com Git - xen.git/log
xen.git
3 years agoxen/arm: Enable the existing x86 virtual PCI support for ARM
Rahul Singh [Fri, 15 Oct 2021 16:51:42 +0000 (17:51 +0100)]
xen/arm: Enable the existing x86 virtual PCI support for ARM

The existing VPCI support available for X86 is adapted for Arm.
When the device is added to XEN via the hyper call
“PHYSDEVOP_pci_device_add”, VPCI handler for the config space
access is added to the Xen to emulate the PCI devices config space.

A MMIO trap handler for the PCI ECAM space is registered in XEN
so that when guest is trying to access the PCI config space,XEN
will trap the access and emulate read/write using the VPCI and
not the real PCI hardware.

For Dom0less systems scan_pci_devices() would be used to discover the
PCI device in XEN and VPCI handler will be added during XEN boots.

This patch is also doing some small fixes to fix compilation errors on
arm32 of vpci and prevent 64bit accesses on 32bit:
- use %zu instead of lu in header.c for print
- prevent 64bit accesses in vpci_access_allowed
- ifdef out using CONFIG_64BIT handling of len 8 in
vpci_ecam_{read/write}

TODO: currently vpci_add_handlers is marked as __hwdom_init, but on ARM
vpci_add_handlers can be called after boot from
PHYSDEVOP_pci_device_add. Consider removing __hwdom_init.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
[stefano: add TODO item to commit message]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/vpci: Move ecam access functions to common code
Bertrand Marquis [Fri, 15 Oct 2021 16:51:41 +0000 (17:51 +0100)]
xen/vpci: Move ecam access functions to common code

PCI standard is using ECAM and not MCFG which is coming from ACPI[1].
Use ECAM/ecam instead of MCFG in common code and in new functions added
in common vpci code by this patch.

Move vpci_access_allowed from arch/x86/hvm/io.c to drivers/vpci/vpci.c.

Create vpci_ecam_{read,write} in drivers/vpci/vpci.c that
contains the common code to perform these operations, changed
vpci_mmcfg_{read,write} accordingly to make use of these functions.

The vpci_ecam_{read,write} functions are returning false on error and
true on success. As the x86 code was previously always returning
X86EMUL_OKAY the return code is ignored. A comment has been added in
the code to show that this is intentional.

Those functions will be used in a following patch inside by arm vpci
implementation.

Rename MMCFG_BDF to VPCI_ECAM_BDF and move it to vpci.h.
This macro is only used by functions calling vpci_ecam helpers.

No functional change intended with this patch.

[1] https://wiki.osdev.org/PCI_Express

Suggested-by: Roger Pau Monné <roger.pau@citrix.com>
Signed-off-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agox86/shadow: adjust 2-level case of SHADOW_FOREACH_L2E()
Jan Beulich [Fri, 15 Oct 2021 11:43:35 +0000 (13:43 +0200)]
x86/shadow: adjust 2-level case of SHADOW_FOREACH_L2E()

Coverity apparently takes issue with the assignment inside an if(), but
then only in two of the cases (sh_destroy_l2_shadow() and
sh_unhook_32b_mappings()). As it's pretty simple to break out of the
outer loop without the need for a local helper variable, adjust the code
that way.

While there, with the other "unused value" reports also in mind, further
drop a dead assignment from SHADOW_FOREACH_L1E().

Coverity-ID: 1492857
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/shadow: adjust some shadow_set_l<N>e() callers
Jan Beulich [Fri, 15 Oct 2021 10:48:31 +0000 (12:48 +0200)]
x86/shadow: adjust some shadow_set_l<N>e() callers

Coverity dislikes sh_page_fault() storing the return value into a local
variable but then never using the value (and oddly enough spots this in
the 2- and 3-level cases, but not in the 4-level one). Instead of adding
yet another cast to void as replacement, take the opportunity and drop a
bunch of such casts at the same time - not using function return values
is a common thing to do. (It of course is an independent question
whether ignoring errors like this is a good idea.)

Coverity-ID: 1492856
Coverity-ID: 1492858
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoAMD/IOMMU: expose errors and warnings unconditionally
Jan Beulich [Fri, 15 Oct 2021 10:47:52 +0000 (12:47 +0200)]
AMD/IOMMU: expose errors and warnings unconditionally

Making these dependent upon "iommu=debug" isn't really helpful in the
field. Where touching respective code anyway also make use of %pp and
%pd.

Requested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: pull ATS disabling earlier
Jan Beulich [Fri, 15 Oct 2021 10:47:18 +0000 (12:47 +0200)]
AMD/IOMMU: pull ATS disabling earlier

Disabling should be done in the opposite order of enabling: ATS wants to
be turned off before adjusting the DTE, just like it gets enabled only
after the DTE was suitably prepared. Note that we want ATS to be
disabled as soon as any of the DTEs involved in the handling of a device
(including phantom devices) gets adjusted respectively. For this reason
the "devfn == pdev->devfn" of the original conditional gets dropped.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: respect AtsDisabled device flag
Jan Beulich [Fri, 15 Oct 2021 10:46:42 +0000 (12:46 +0200)]
AMD/IOMMU: respect AtsDisabled device flag

IVHD entries may specify that ATS is to be blocked for a device or range
of devices. Honor firmware telling us so.

While adding respective checks I noticed that the 2nd conditional in
amd_iommu_setup_domain_device() failed to check the IOMMU's capability.
Add the missing part of the condition there, as no good can come from
enabling ATS on a device when the IOMMU is not capable of dealing with
ATS requests.

For actually using ACPI_IVHD_ATS_DISABLED, make its expansion no longer
exhibit UB.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: check IVMD ranges against host implementation limits
Jan Beulich [Fri, 15 Oct 2021 10:46:05 +0000 (12:46 +0200)]
AMD/IOMMU: check IVMD ranges against host implementation limits

When such ranges can't be represented as 1:1 mappings in page tables,
reject them as presumably bogus. Note that when we detect features late
(because of EFRSup being clear in the ACPI tables), it would be quite a
bit of work to check for (and drop) out of range IVMD ranges, so IOMMU
initialization gets failed in this case instead.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: improve (extended) feature detection
Jan Beulich [Fri, 15 Oct 2021 10:45:16 +0000 (12:45 +0200)]
AMD/IOMMU: improve (extended) feature detection

First of all the documentation is very clear about ACPI table data
superseding raw register data. Use raw register data only if EFRSup is
clear in the ACPI tables (which may still go too far). Additionally if
this flag is clear, the IVRS type 11H table is reserved and hence may
not be recognized.

Furthermore propagate IVRS type 10H data into the feature flags
recorded, as the full extended features field is available in type 11H
only.

Note that this also makes necessary to stop the bad practice of us
finding a type 11H IVHD entry, but still processing the type 10H one
in detect_iommu_acpi()'s invocation of amd_iommu_detect_one_acpi().

Note also that the features.raw check in amd_iommu_prepare_one() needs
replacing, now that the field can also be populated by different means.
Key IOMMUv2 availability off of IVHD type not being 10H, and then move
it a function layer up, so that it would be set only once all IOMMUs
have been successfully prepared.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: obtain IVHD type to use earlier
Jan Beulich [Fri, 15 Oct 2021 10:44:20 +0000 (12:44 +0200)]
AMD/IOMMU: obtain IVHD type to use earlier

Doing this in amd_iommu_prepare() is too late for it, in particular, to
be used in amd_iommu_detect_one_acpi(), as a subsequent change will want
to do. Moving it immediately ahead of amd_iommu_detect_acpi() is
(luckily) pretty simple, (pretty importantly) without breaking
amd_iommu_prepare()'s logic to prevent multiple processing.

This involves moving table checksumming, as
amd_iommu_get_supported_ivhd_type() ->  get_supported_ivhd_type() will
now be invoked before amd_iommu_detect_acpi()  -> detect_iommu_acpi(). In
the course of doing so stop open-coding acpi_tb_checksum(), seeing that
we have other uses of this originally ACPI-private function elsewhere in
the tree.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agostubdom: fix build with disabled pv-grub
Juergen Gross [Fri, 10 Sep 2021 05:55:16 +0000 (07:55 +0200)]
stubdom: fix build with disabled pv-grub

Today the build will fail if --disable-pv-grub as a parameter of
configure, as the main Makefile will unconditionally try to build a
32-bit pv-grub stubdom.

Fix that by introducing a pv-grub-if-enabled target in
stubdom/Makefile taking care of this situation.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
3 years agoxen/arm: optee: Allocate anonymous domheap pages
Oleksandr Tyshchenko [Mon, 6 Sep 2021 13:42:21 +0000 (16:42 +0300)]
xen/arm: optee: Allocate anonymous domheap pages

Allocate anonymous domheap pages as there is no strict need to
account them to a particular domain.

Since XSA-383 "xen/arm: Restrict the amount of memory that dom0less
domU and dom0 can allocate" the dom0 cannot allocate memory outside
of the pre-allocated region. This means if we try to allocate
non-anonymous page to be accounted to dom0 we will get an
over-allocation issue when assigning that page to the domain.
The anonymous page, in turn, is not assigned to any domain.

CC: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Acked-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agolibxl/arm: Add handling of extended regions for DomU
Oleksandr Tyshchenko [Thu, 14 Oct 2021 11:40:45 +0000 (14:40 +0300)]
libxl/arm: Add handling of extended regions for DomU

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain
  currently.
- The ACPI case is not covered.

***

The algorithm to choose extended regions for non-direct mapped
DomU is simpler in comparison with the algorithm for direct mapped
Dom0. We usually have a lot of unused space above 4GB, and might
have some unused space below 4GB (depends on guest memory size).
Try to allocate separate 2MB-aligned extended regions from the first
(below 4GB) and second (above 4GB) RAM banks taking into the account
the maximum supported guest physical address space size and the amount
of memory assigned to the guest. The minimum size of extended region
the same as for Dom0 (64MB).

Please note, we introduce fdt_property_reg_placeholder helper which
purpose is to create N ranges that are zeroed. The interesting fact
is that libfdt already has fdt_property_placeholder(). But this was
introduced only in 2017, so there is a risk that some distros may not
ship the last libfdt version. This is why we implement our own light
variant for now.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Julien Grall <jgrall@amazon.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agoxen/arm: Introduce gpaddr_bits field to struct xen_domctl_getdomaininfo
Oleksandr Tyshchenko [Thu, 14 Oct 2021 11:40:44 +0000 (14:40 +0300)]
xen/arm: Introduce gpaddr_bits field to struct xen_domctl_getdomaininfo

We need to pass info about maximum supported guest physical
address space size to the toolstack on Arm in order to properly
calculate the base and size of the extended region (safe range)
for the guest. The extended region is unused address space which
could be safely used by domain for foreign/grant mappings on Arm.
The extended region itself will be handled by the subsequent
patch.

Currently the same guest physical address space size is used
for all guests (p2m_ipa_bits variable on Arm, the x86 equivalent
is hap_paddr_bits).

Add an explicit padding after "gpaddr_bits" field and also
(while at it) after "domain" field.

Also make sure that full structure is cleared in all cases by
moving the clearing into getdomaininfo(). Currently it is only
cleared by the sysctl caller (and only once).

Please note, we do not need to bump XEN_DOMCTL_INTERFACE_VERSION
as a bump has already occurred in this release cycle. But we do
need to bump XEN_SYSCTL_INTERFACE_VERSION as the structure is
re-used in a sysctl.

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
[hypervisor parts]
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: fix dependencies in arch/x86/boot
Anthony PERARD [Thu, 14 Oct 2021 10:35:42 +0000 (12:35 +0200)]
build: fix dependencies in arch/x86/boot

Temporary fix the list of headers that cmdline.c and reloc.c depends
on, until the next time the list is out of sync again.

Also, add the linker script to the list.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoRevert "xen/domctl: Introduce XEN_DOMCTL_CDF_vpci flag"
Michal Orzel [Thu, 14 Oct 2021 08:47:18 +0000 (10:47 +0200)]
Revert "xen/domctl: Introduce XEN_DOMCTL_CDF_vpci flag"

This reverts commit 2075b410ee8087662c880213c3aca196fb7ade22.

During the discussion [1] that took place after
the patch was merged it was agreed that it should
be reverted to avoid introducing a bad interface.

Furthermore, the patch rejected usage of flag
XEN_DOMCTL_CDF_vpci for x86 which is not true
as it should be set for dom0 PVH.

Due to XEN_DOMCTL_CDF_vpmu being introduced after
XEN_DOMCTL_CDF_vpci, modify its bit position
from 8 to 7.

[1] https://marc.info/?t=163354215300039&r=1&w=2

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Christian Lindig <christian.lindig@citrix.com<mailto:christian.lindig@citrix.com>>
Acked-by: Julien Grall <jgrall@amazon.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoxen/arm: Add linux,pci-domain property for hwdom if not available.
Rahul Singh [Wed, 13 Oct 2021 20:28:43 +0000 (13:28 -0700)]
xen/arm: Add linux,pci-domain property for hwdom if not available.

If the property is not present in the device tree node for host bridge,
XEN while creating the dtb for hwdom will create this property and
assigns the already allocated segment to the host bridge
so that XEN and linux will have the same segment for the host bridges.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoarm/docs: Clarify legacy DT bindings on UEFI
Luca Fancellu [Wed, 13 Oct 2021 14:52:02 +0000 (15:52 +0100)]
arm/docs: Clarify legacy DT bindings on UEFI

Since the introduction of UEFI boot for Xen, the legacy
compatible strings were not supported and the stub code
was checking only the presence of “multiboot,module” to
require the Xen UEFI configuration file or not.
The documentation was not updated to specify that behavior.

Add a phrase to docs/misc/arm/device-tree/booting.txt
to clarify it.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agoarm/efi: load dom0 modules from DT using UEFI
Luca Fancellu [Mon, 11 Oct 2021 18:15:28 +0000 (19:15 +0100)]
arm/efi: load dom0 modules from DT using UEFI

Add support to load Dom0 boot modules from
the device tree using the xen,uefi-binary property.

Update documentation about that.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: clean common temporary files from root makefile
Anthony PERARD [Wed, 13 Oct 2021 15:51:12 +0000 (17:51 +0200)]
build: clean common temporary files from root makefile

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen: Expose the PMU to the guests
Michal Orzel [Wed, 13 Oct 2021 12:33:52 +0000 (14:33 +0200)]
xen: Expose the PMU to the guests

Add parameter vpmu to xl domain configuration syntax
to enable the access to PMU registers by disabling
the PMU traps(currently only for ARM).

The current status is that the PMU registers are not
virtualized and the physical registers are directly
accessible when this parameter is enabled. There is no
interrupt support and Xen will not save/restore the
register values on context switches.

According to Arm Arm, section D7.1:
"The Performance Monitors Extension is common
to AArch64 operation and AArch32 operation."
That means we have an ensurance that if PMU is
present in one exception state, it must also be
present in the other.

Please note that this feature is experimental.

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Signed-off-by: Julien Grall <julien@xen.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agolibxl: CODING_STYLE: Explicitly deprecate #ifdef
Ian Jackson [Tue, 12 Oct 2021 14:50:28 +0000 (15:50 +0100)]
libxl: CODING_STYLE: Explicitly deprecate #ifdef

We don't use ifdefs in the main code.  Actually document this.

Signed-off-by: Ian Jackson <iwj@xenproject.org>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
3 years agoxen/arm: Check for PMU platform support
Michal Orzel [Tue, 12 Oct 2021 08:13:22 +0000 (10:13 +0200)]
xen/arm: Check for PMU platform support

ID_AA64DFR0_EL1/ID_DFR0_EL1 registers provide
information about PMU support. Replace structure
dbg64/dbg32 with a union and fill in all the
register fields according to document:
ARM Architecture Registers(DDI 0595, 2021-06).

Add macros boot_dbg_feature64/boot_dbg_feature32
to check for a debug feature. Add macro
cpu_has_pmu to check for PMU support.
Any value higher than 0 and less than 15 means
that PMU is supported (we do not care about its
version for now).

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
[stefano: add in-code comment]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoxen+tools: Introduce XEN_SYSCTL_PHYSCAP_vpmu
Michal Orzel [Tue, 12 Oct 2021 08:13:21 +0000 (10:13 +0200)]
xen+tools: Introduce XEN_SYSCTL_PHYSCAP_vpmu

Introduce flag XEN_SYSCTL_PHYSCAP_vpmu which
indicates whether the platform supports vPMU
functionality. Modify Xen and tools accordingly.

Take the opportunity and fix XEN_SYSCTL_PHYSCAP_vmtrace
definition in sysctl.h which wrongly use (1 << 6)
instead of (1u << 6).

Signed-off-by: Michal Orzel <michal.orzel@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Nick Rosbrook <rosbrookn@ainfosec.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Ian Jackson <iwj@xenproject.org>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
3 years agoarm/efi: Use dom0less configuration when using EFI boot
Luca Fancellu [Mon, 11 Oct 2021 18:15:27 +0000 (19:15 +0100)]
arm/efi: Use dom0less configuration when using EFI boot

This patch introduces the support for dom0less configuration
when using UEFI boot on ARM, it permits the EFI boot to
continue if no dom0 kernel is specified but at least one domU
is found.

Introduce the new property "xen,uefi-binary" for device tree boot
module nodes that are subnode of "xen,domain" compatible nodes.
The property holds a string containing the file name of the
binary that shall be loaded by the uefi loader from the filesystem.

Introduce a new call efi_check_dt_boot(...) called during EFI boot
that checks for module to be loaded using device tree.
Architectures that don't support device tree don't have to
provide this function.

Update efi documentation about how to start a dom0less
setup using UEFI

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
[stefano: drop inline from efi_check_dt_boot]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/pv: Split pv_hypercall() in two
Andrew Cooper [Mon, 4 Oct 2021 18:11:45 +0000 (19:11 +0100)]
x86/pv: Split pv_hypercall() in two

The is_pv_32bit_vcpu() conditionals hide four lfences, with two taken on any
individual path through the function.  There is very little code common
between compat and native, and context-dependent conditionals predict very
badly for a period of time after context switch.

Move do_entry_int82() from pv/traps.c into pv/hypercall.c, allowing
_pv_hypercall() to be static and forced inline.  The delta is:

  add/remove: 0/0 grow/shrink: 1/1 up/down: 300/-282 (18)
  Function                                     old     new   delta
  do_entry_int82                                50     350    +300
  pv_hypercall                                 579     297    -282

which is tiny, but the perf implications are large:

  Guest | Naples | Milan  | SKX    | CFL-R  |
  ------+--------+--------+--------+--------+
  pv64  |  17.4% |  15.5% |   2.6% |   4.5% |
  pv32  |   1.9% |  10.9% |   1.4% |   2.5% |

These are percentage improvements in raw TSC detlas for a xen_version
hypercall, with obvious outliers excluded.  Therefore, it is an idealised best
case improvement.

The pv64 path uses `syscall`, while the pv32 path uses `int $0x82` so
necessarily has higher overhead.  Therefore, dropping the lfences is less over
an overall improvement.

I don't know why the Naples pv32 improvement is so small, but I've double
checked the numbers and they're consistent.  There's presumably something
we're doing which is a large overhead in the pipeline.

On the Intel side, both systems are writing to MSR_SPEC_CTRL on
entry/exit (SKX using the retrofitted microcode implementation, CFL-R using
the hardware implementation), while SKX is suffering further from XPTI for
Meltdown protection.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agotools/xl: fix autoballoon regex
Dmitry Isaykin [Fri, 1 Oct 2021 12:24:16 +0000 (15:24 +0300)]
tools/xl: fix autoballoon regex

This regex is used for auto-balloon mode detection based on Xen command line.

The case of specifying a negative size was handled incorrectly.
>From misc/xen-command-line documentation:

    dom0_mem (x86)
    = List of ( min:<sz> | max:<sz> | <sz> )

    If a size is positive, it represents an absolute value.
    If a size is negative, it is subtracted from the total available memory.

Also add support for [tT] granularity suffix.
Also add support for memory fractions (i.e. '50%' or '1G+25%').

Signed-off-by: Dmitry Isaykin <isaikin-dmitry@yandex.ru>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoVT-d: Tylersburg isoch DMAR unit with no TLB space
Jan Beulich [Tue, 12 Oct 2021 09:57:08 +0000 (11:57 +0200)]
VT-d: Tylersburg isoch DMAR unit with no TLB space

BIOSes, when enabling the dedicated DMAR unit for the sound device,
need to also set a non-zero number of TLB entries in a respective
system management register (VTISOCHCTRL). At least one BIOS is known
to fail to do so, causing the VT-d engine to deadlock when used.

Vaguely based on Linux'es e0fc7e0b4b5e ("intel-iommu: Yet another BIOS
workaround: Isoch DMAR unit with no TLB space").

To limit message string redundancy, fold parts with the IGD quirk logic.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agoVT-d: generalize and correct "iommu=no-igfx" handling
Jan Beulich [Tue, 12 Oct 2021 09:56:21 +0000 (11:56 +0200)]
VT-d: generalize and correct "iommu=no-igfx" handling

Linux'es supposedly equivalent "intel_iommu=igfx_off" deals with any
graphics devices (not just Intel ones) while at the same time limiting
the effect to IOMMUs covering only graphics devices. Keying the decision
to leave translation disabled for an IOMMU to merely a magic SBDF tuple
was wrong in the first place - systems may very well have non-graphics
devices at 0000:00:02.0 (ordinary root ports commonly live there, for
example). Any use of igd_drhd_address (and hence is_igd_drhd()) needs
further qualification.

Introduce a new "graphics only" field in struct acpi_drhd_unit and set
it according to device scope parsing outcome. Replace the bad use of
is_igd_drhd() in iommu_enable_translation() by use of this new field.

While adding the new field also convert the adjacent include_all one to
"bool".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Release-Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agox86/PV32: fix physdev_op_compat handling
Jan Beulich [Tue, 12 Oct 2021 09:55:42 +0000 (11:55 +0200)]
x86/PV32: fix physdev_op_compat handling

The conversion of the original code failed to recognize that the 32-bit
compat variant of this (sorry, two different meanings of "compat" here)
needs to continue to invoke the compat handler, not the native one.
Arrange for this by adding yet another #define.

Affected functions (having existed prior to the introduction of the new
hypercall) are PHYSDEVOP_set_iobitmap and PHYSDEVOP_apic_{read,write}.
For all others the operand struct layout doesn't differ.

Fixes: 1252e2823117 ("x86/pv: Export pv_hypercall_table[] rather than working around it in several ways")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoAMD/IOMMU: consider hidden devices when flushing device I/O TLBs
Jan Beulich [Tue, 12 Oct 2021 09:54:34 +0000 (11:54 +0200)]
AMD/IOMMU: consider hidden devices when flushing device I/O TLBs

Hidden devices are associated with DomXEN but usable by the
hardware domain. Hence they need flushing as well when all devices are
to have flushes invoked.

While there drop a redundant ATS-enabled check and constify the first
parameter of the involved function.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agobuild: avoid building arm/arm/*/head.o twice
Anthony PERARD [Tue, 12 Oct 2021 09:53:47 +0000 (11:53 +0200)]
build: avoid building arm/arm/*/head.o twice

head.o is been built twice, once because it is in $(ALL_OBJS) and a
second time because it is in $(extra-y) and thus it is rebuilt when
building "arch/arm/built_in.o".

Fix this by adding a dependency of "head.o" on the directory
"arch/arm/".

Also, we should avoid building object that are in subdirectories, so
move the declaration in there. This doesn't change anything as
"arch/arm/built_in.o" depends on "arch/arm/$subarch/built_in.o" which
depends on $(extra-y), so we still need to depend on
"arch/arm/built_in.o".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agobuild,arm: move LDFLAGS change to arch.mk
Anthony PERARD [Tue, 12 Oct 2021 09:50:47 +0000 (11:50 +0200)]
build,arm: move LDFLAGS change to arch.mk

Changes to XEN_LDFLAGS may or may not apply to targets in for example
"common/" depending on whether one runs `make` or `make common/`.

But arch.mk is loaded before doing any build, so changes to LDFLAGS
there mean that the value of XEN_LDFLAGS won't depends on the initial
target.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Julien Grall <jgrall@amazon.com>
3 years agox86/mm: avoid building multiple .o from a single .c file
Anthony PERARD [Tue, 12 Oct 2021 09:48:46 +0000 (11:48 +0200)]
x86/mm: avoid building multiple .o from a single .c file

This replace the use of a single .c file use for multiple .o file by
creating multiple .c file including the first one.

There's quite a few issues with trying to build more than one object
file from a single source file: there's is a duplication of the make
rules to generate those targets; there is an additional ".file" symbol
added in order to differentiate between the object files; and the
tools/symbols have an heuristic to try to pick up the right ".file".

This patch adds new .c source file which avoid the need to add a
second ".file" symbol and thus avoid the need to deal with those
issues.

Also remove __OBJECT_FILE__ from $(CC) command line as it isn't used
anywhere anymore. And remove the macro "build-intermediate" since the
generic rules for single targets can be used.

And rename the objects in mm/hap/ to remove the extra "level".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agolibxl: Only map legacy PCI IRQs if they are supported
Oleksandr Andrushchenko [Fri, 8 Oct 2021 05:55:32 +0000 (08:55 +0300)]
libxl: Only map legacy PCI IRQs if they are supported

Arm's PCI passthrough implementation doesn't support legacy interrupts,
but MSI/MSI-X. This can be the case for other platforms too.
For that reason introduce a new CONFIG_PCI_SUPP_LEGACY_IRQ and add
it to the CFLAGS and compile the relevant code in the toolstack only if
applicable.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
[stefano: minor change to Makefile]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agolibxl: Allow removing PCI devices for all types of domains
Oleksandr Andrushchenko [Fri, 8 Oct 2021 05:55:31 +0000 (08:55 +0300)]
libxl: Allow removing PCI devices for all types of domains

The PCI device remove path may now be used by PVH on ARM, so the
assert is no longer valid.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoarm/efi: Fix null pointer dereference
Luca Fancellu [Mon, 11 Oct 2021 07:56:38 +0000 (08:56 +0100)]
arm/efi: Fix null pointer dereference

Fix for commit 60649d443dc395243e74d2b3e05594ac0c43cfe3
that introduces a null pointer dereference when the
fdt_node_offset_by_compatible is called with "fdt"
argument null.

Reported-by: Julien Grall <julien@xen.org>
Fixes: 60649d443d ("arm/efi: Introduce xen,uefi-cfg-load DT property")
Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: Transitional change to build HAS_VPCI on ARM.
Rahul Singh [Mon, 11 Oct 2021 20:16:57 +0000 (13:16 -0700)]
xen/arm: Transitional change to build HAS_VPCI on ARM.

This patch will be reverted once we add support for VPCI MSI/MSIX
support on ARM.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Roger Pau Monné <rogewr.pau@citrix.com>
3 years agotools/console: use xenforeigmemory to map console ring
Roger Pau Monne [Wed, 22 Sep 2021 08:21:18 +0000 (10:21 +0200)]
tools/console: use xenforeigmemory to map console ring

This patch replaces the usage of xc_map_foreign_range with
xenforeignmemory_map from the stable xenforeignmemory library. Note
there are still other uses of libxc functions which prevents removing
the dependency.

No functional change intended.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
3 years agodocs: add references to Argo Linux driver sources and information
Christopher Clark [Mon, 11 Oct 2021 08:59:55 +0000 (10:59 +0200)]
docs: add references to Argo Linux driver sources and information

Add a section to the Argo design document to supply guidance on how to
enable Argo in Xen and where to obtain source code and documentation
for Argo device drivers for guest OSes, primarily from OpenXT.

Signed-off-by: Christopher Clark <christopher.w.clark@gmail.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agox86/HVM: fix xsm_op for 32-bit guests
Jan Beulich [Mon, 11 Oct 2021 08:58:44 +0000 (10:58 +0200)]
x86/HVM: fix xsm_op for 32-bit guests

Like for PV, 32-bit guests need to invoke the compat handler, not the
native one.

Fixes: db984809d61b ("hvm: wire up domctl and xsm hypercalls")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agox86/build: suppress EFI-related tool chain checks upon local $(MAKE) recursion
Jan Beulich [Mon, 11 Oct 2021 08:58:17 +0000 (10:58 +0200)]
x86/build: suppress EFI-related tool chain checks upon local $(MAKE) recursion

The xen-syms and xen.efi linking steps are serialized only when the
intermediate note.o file is necessary. Otherwise both may run in
parallel. This in turn means that the compiler / linker invocations to
create efi/check.o / efi/check.efi may also happen twice in parallel.
Obviously it's a bad idea to have multiple producers of the same output
race with one another - every once in a while one may e.g. observe

objdump: efi/check.efi: file format not recognized

We don't need this EFI related checking to occur when producing the
intermediate symbol and relocation table objects, and we have an easy
way of suppressing it: Simply pass in "efi-y=", overriding the
assignments done in the Makefile and thus forcing the tool chain checks
to be bypassed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoxen/arm: optee: fix possible memory leaks
Volodymyr Babchuk [Thu, 7 Oct 2021 23:25:02 +0000 (23:25 +0000)]
xen/arm: optee: fix possible memory leaks

translate_noncontig() allocates domheap page for translated list
before calling to allocate_optee_shm_buf(), which can fail for number
of reason. Anyways, after fail we need to free the allocated page(s).

Another leak is possible if the same translate_noncontig() function
fails to get domain page. In this case it should free allocated
optee_shm_buf prior exit. This will also free allocated domheap page.

Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/domain: Call pci_release_devices() when releasing domain resources
Oleksandr Tyshchenko [Fri, 8 Oct 2021 05:55:30 +0000 (08:55 +0300)]
xen/domain: Call pci_release_devices() when releasing domain resources

This is the very same that we already do for DT devices. Moreover, x86
already calls pci_release_devices().

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
3 years agoxen/arm: Mark device as PCI while creating one
Oleksandr Andrushchenko [Fri, 8 Oct 2021 22:48:31 +0000 (15:48 -0700)]
xen/arm: Mark device as PCI while creating one

While adding a PCI device mark it as such, so other frameworks
can distinguish it from DT devices.
For that introduce an architecture defined helper which may perform
additional initialization of the newly created PCI device.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
[applicable parts]
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
3 years agoxen/device-tree: Make dt_find_node_by_phandle global
Oleksandr Andrushchenko [Fri, 8 Oct 2021 05:55:28 +0000 (08:55 +0300)]
xen/device-tree: Make dt_find_node_by_phandle global

Make dt_find_node_by_phandle globally visible, so it can be re-used by
other frameworks.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
3 years agoxen/arm: Introduce pci_find_host_bridge_node helper
Oleksandr Andrushchenko [Fri, 8 Oct 2021 22:43:26 +0000 (15:43 -0700)]
xen/arm: Introduce pci_find_host_bridge_node helper

Get host bridge node given a PCI device attached to it.

This helper will be re-used for adding PCI devices by the subsequent
patches.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
3 years agoxen/arm: Add new device type for PCI
Oleksandr Andrushchenko [Fri, 8 Oct 2021 22:38:17 +0000 (15:38 -0700)]
xen/arm: Add new device type for PCI

Add new device type (DEV_PCI) to distinguish PCI devices from platform
DT devices, so some drivers, like IOMMU, can handle PCI devices
differently.

Also add a helper which is when given a struct device returns the
corresponding struct pci_dev which this device is a part of.

Because of the header cross-dependencies, e.g. we need both
struct pci_dev and struct arch_pci_dev at the same time, this cannot be
done with an inline.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Rahul Singh <rahul.singh@arm.com>
Tested-by: Rahul Singh <rahul.singh@arm.com>
3 years agox86/spec-ctrl: Build with BRANCH_HARDEN lfences by default
Andrew Cooper [Mon, 4 Oct 2021 20:39:03 +0000 (21:39 +0100)]
x86/spec-ctrl: Build with BRANCH_HARDEN lfences by default

Branch Harden is enabled by default at compile and boot time.  Invert the
logic to compile with lfence by default and nop out in the non-default case.

This has several advantages.  It removes 3829 patch points (in the random
build of Xen I have to hand) by default on boot, 70% (!) of the
.altinstr_replacement section.  For builds of Xen with a non-nops capable tool
chain, the code after `spec-ctrl=no-branch-harden` is better because Xen can
write long nops.

Most importantly however, it means the disassembly actually matches what runs
in the common case, with the ability to distinguish the lfences from other
uses of nops.

Finally, make opt_branch_harden local to spec_ctrl.c and __initdata.  It has
never been used externally, even at it's introduction in c/s 3860d5534df4
"spec: add l1tf-barrier".

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: Updates for extended regions support
Oleksandr Tyshchenko [Wed, 6 Oct 2021 11:22:26 +0000 (14:22 +0300)]
xen/arm: Updates for extended regions support

This is a follow-up of
"b6fe410 xen/arm: Add handling of extended regions for Dom0"

Add various in-code comments, update Xen hypervisor device tree
bindings text, change the log level for some prints and clarify
format specifier, reuse dt_for_each_range() to avoid open-coding
in find_memory_holes().

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
[stefano: fix typos]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/domctl: Introduce XEN_DOMCTL_CDF_vpci flag
Rahul Singh [Wed, 6 Oct 2021 17:40:33 +0000 (18:40 +0100)]
xen/domctl: Introduce XEN_DOMCTL_CDF_vpci flag

Introduce XEN_DOMCTL_CDF_vpci flag to enable VPCI support in XEN.
Reject the use of this new flag for x86 as VPCI is not supported for
DOMU guests for x86.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
[stefano: drop _XEN_DOMCTL_CDF_vpci]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Christian Lindig <christian.lindig@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: Implement pci access functions
Rahul Singh [Wed, 6 Oct 2021 17:40:32 +0000 (18:40 +0100)]
xen/arm: Implement pci access functions

Implement generic pci access functions to read/write the configuration
space.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: Add support for Xilinx ZynqMP PCI host controller
Oleksandr Andrushchenko [Wed, 6 Oct 2021 17:40:31 +0000 (18:40 +0100)]
xen/arm: Add support for Xilinx ZynqMP PCI host controller

Add support for Xilinx ZynqMP PCI host controller to map the PCI config
space to the XEN memory.

Patch helps to understand how the generic infrastructure for PCI
host-bridge discovery will be used for future references.

Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: PCI host bridge discovery within XEN on ARM
Rahul Singh [Wed, 6 Oct 2021 17:40:30 +0000 (18:40 +0100)]
xen/arm: PCI host bridge discovery within XEN on ARM

XEN during boot will read the PCI device tree node “reg” property
and will map the PCI config space to the XEN memory.

As of now only "pci-host-ecam-generic" compatible board is supported.

"linux,pci-domain" device tree property assigns a fixed PCI domain
number to a host bridge, otherwise an unstable (across boots) unique
number will be assigned by Linux. XEN access the PCI devices based on
Segment:Bus:Device:Function. A Segment number in the XEN is same as a
domain number in Linux. Segment number and domain number has to be in
sync to access the correct PCI devices.

XEN will read the “linux,pci-domain” property from the device tree node
and configure the host bridge segment number accordingly. If this
property is not available XEN will allocate the unique segment number
to the host bridge.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/arm: Add cmdline boot option "pci-passthrough = <boolean>"
Rahul Singh [Wed, 6 Oct 2021 17:40:29 +0000 (18:40 +0100)]
xen/arm: Add cmdline boot option "pci-passthrough = <boolean>"

Add cmdline boot option "pci-passthrough = = <boolean>" to enable or
disable the PCI passthrough support on ARM.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
[stefano: one bool_t/bool correction]
Signed-off-by: Stefano Stabellini <stefano.stabellini@xilinx.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/arm: Add PHYSDEVOP_pci_device_(*add/remove) support for ARM
Rahul Singh [Wed, 6 Oct 2021 17:40:28 +0000 (18:40 +0100)]
xen/arm: Add PHYSDEVOP_pci_device_(*add/remove) support for ARM

Hardware domain is in charge of doing the PCI enumeration and will
discover the PCI devices and then will communicate to XEN via hyper
call PHYSDEVOP_pci_device_add(..) to add the PCI devices in XEN.

Also implement PHYSDEVOP_pci_device_remove(..) to remove the PCI device.

As most of the code for PHYSDEVOP_pci_device_* is the same between x86
and ARM, move the code to a common file to avoid duplication.

There are other PHYSDEVOP_pci_device_* operations to add PCI devices.
Currently implemented PHYSDEVOP_pci_device_remove(..) and
PHYSDEVOP_pci_device_add(..) only as those are minimum required to
support PCI passthrough on ARM.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild/riscv: tell the build system about riscv64/head.S
Anthony PERARD [Thu, 7 Oct 2021 15:57:10 +0000 (17:57 +0200)]
build/riscv: tell the build system about riscv64/head.S

This allows to `make arch/riscv/riscv64/head.o`.

Example of rune on a fresh copy of the repository:
    make XEN_TARGET_ARCH=riscv64 CROSS_COMPILE=riscv64-linux-gnu- KBUILD_DEFCONFIG=tiny64_defconfig arch/riscv/riscv64/head.o

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Bob Eshleman <bobbyeshleman@gmail.com>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Acked-by: Connor Davis <connojdavis@gmail.com>
3 years agobuild: convert binfile use to if_changed
Anthony PERARD [Thu, 7 Oct 2021 15:55:27 +0000 (17:55 +0200)]
build: convert binfile use to if_changed

This will allow to detect command line changes and allow to regenerate
the file in that case.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoVT-d: fix deassign of device with RMRR
Jan Beulich [Fri, 1 Oct 2021 13:05:42 +0000 (15:05 +0200)]
VT-d: fix deassign of device with RMRR

Ignoring a specific error code here was not meant to short circuit
deassign to _just_ the unmapping of RMRRs. This bug was previously
hidden by the bogus (potentially indefinite) looping in
pci_release_devices(), until f591755823a7 ("IOMMU/PCI: don't let domain
cleanup continue when device de-assignment failed") fixed that loop.

This is CVE-2021-28702 / XSA-386.

Fixes: 8b99f4400b69 ("VT-d: fix RMRR related error handling")
Reported-by: Ivan Kardykov <kardykov@tabit.pro>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Tested-by: Ivan Kardykov <kardykov@tabit.pro>
3 years agoxen/arm: Fix dev_is_dt macro definition
Oleksandr Andrushchenko [Mon, 4 Oct 2021 14:11:41 +0000 (17:11 +0300)]
xen/arm: Fix dev_is_dt macro definition

This macro is not currently used, but still has an error in it:
a missing parenthesis. Fix this, so the macro is properly defined.

Fixes: 6c5d3075d97e ("xen/arm: Introduce a generic way to describe device")
Signed-off-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Michal Orzel <michal.orzel@arm.com>
3 years agoxen/arm: Add support for PCI init to initialize the PCI driver.
Rahul Singh [Mon, 4 Oct 2021 11:52:00 +0000 (12:52 +0100)]
xen/arm: Add support for PCI init to initialize the PCI driver.

pci_init(..) will be called during xen startup to initialize and probe
the PCI host-bridge driver.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoxen/pci: Include asm/pci.h after pci_sbdf_t in xen/pci.h
Rahul Singh [Mon, 4 Oct 2021 11:51:59 +0000 (12:51 +0100)]
xen/pci: Include asm/pci.h after pci_sbdf_t in xen/pci.h

Prototypes declared in asm/pci.h that take argument of type pci_sbdf_t
are included in xen/pci.h before defining pci_sbdf_t.

Include asm/pci.h after pci_sbdf_t in xen/pci.h to fix the issue.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agoxen/pci: gate APEI support on ARM
Rahul Singh [Mon, 4 Oct 2021 11:51:56 +0000 (12:51 +0100)]
xen/pci: gate APEI support on ARM

APEI not supported on ARM yet move the code under CONFIG_X86 flag to
gate the code for ARM.

This patch is the preparatory work to enable HAS_PCI on ARM to avoid
compilation error on ARM.

prelink.o: In function `pcie_aer_get_firmware_first’:
drivers/passthrough/pci.c:1251: undefined reference to `apei_hest_parse'

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoxen/arm: Add handling of extended regions for Dom0
Oleksandr Tyshchenko [Wed, 29 Sep 2021 22:52:06 +0000 (01:52 +0300)]
xen/arm: Add handling of extended regions for Dom0

The extended region (safe range) is a region of guest physical
address space which is unused and could be safely used to create
grant/foreign mappings instead of wasting real RAM pages from
the domain memory for establishing these mappings.

The extended regions are chosen at the domain creation time and
advertised to it via "reg" property under hypervisor node in
the guest device-tree. As region 0 is reserved for grant table
space (always present), the indexes for extended regions are 1...N.
If extended regions could not be allocated for some reason,
Xen doesn't fail and behaves as usual, so only inserts region 0.

Please note the following limitations:
- The extended region feature is only supported for 64-bit domain
  currently.
- The ACPI case is not covered.

***

As Dom0 is direct mapped domain on Arm (e.g. MFN == GFN)
the algorithm to choose extended regions for it is different
in comparison with the algorithm for non-direct mapped DomU.
What is more, that extended regions should be chosen differently
whether IOMMU is enabled or not.

Provide RAM not assigned to Dom0 if IOMMU is disabled or memory
holes found in host device-tree if otherwise. Make sure that
extended regions are 2MB-aligned and located within maximum possible
addressable physical memory range. The minimum size of extended
region is 64MB. The maximum number of extended regions is 128,
which is an artificial limit to minimize code changes (we reuse
struct meminfo to describe extended regions, so there are an array
field for 128 elements).

It worth mentioning that unallocated memory solution (when the IOMMU
is disabled) will work safely until Dom0 is able to allocate memory
outside of the original range.

Also introduce command line option to be able to globally enable or
disable support for extended regions for Dom0 (enabled by default).

Suggested-by: Julien Grall <jgrall@amazon.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoarm/efi: Introduce xen,uefi-cfg-load DT property
Luca Fancellu [Thu, 30 Sep 2021 14:28:44 +0000 (15:28 +0100)]
arm/efi: Introduce xen,uefi-cfg-load DT property

Introduce the xen,uefi-cfg-load DT property of /chosen
node for ARM whose presence decide whether to force
the load of the UEFI Xen configuration file.

The logic is that if any multiboot,module is found in
the DT, then the xen,uefi-cfg-load property is used to see
if the UEFI Xen configuration file is needed.

Modify a comment in efi_arch_use_config_file, removing
the part that states "dom0 required" because it's not
true anymore with this commit.

Signed-off-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoinclude/public: fix style of usbif.h
Juergen Gross [Fri, 1 Oct 2021 13:11:41 +0000 (15:11 +0200)]
include/public: fix style of usbif.h

usbif.h is violating the Xen hypervisor coding style. Fix that.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoinclude/public: add better interface description to usbif.h
Juergen Gross [Fri, 1 Oct 2021 13:11:28 +0000 (15:11 +0200)]
include/public: add better interface description to usbif.h

The PV USB protocol is poorly described. Add a more detailed
description to the usbif.h header file.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
3 years agoinclude/public: add possible status values to usbif.h
Juergen Gross [Fri, 1 Oct 2021 13:11:03 +0000 (15:11 +0200)]
include/public: add possible status values to usbif.h

The interface definition of PV USB devices is lacking the specification
of possible values of the status field in a response. Those are
negative errno values as used in Linux, so they might differ in other
OS's. Specify them via appropriate defines.

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
3 years agoautomation: Add qemu to debian:stretch container for smoke test
Anthony PERARD [Thu, 30 Sep 2021 16:17:20 +0000 (17:17 +0100)]
automation: Add qemu to debian:stretch container for smoke test

We can add qemu into the container so that there's no need to install
it everytime we run a test.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoautomation: switch GitLab x86 smoke test to use PV 64bit binary
Anthony PERARD [Thu, 30 Sep 2021 16:17:19 +0000 (17:17 +0100)]
automation: switch GitLab x86 smoke test to use PV 64bit binary

Xen is now built without CONFIG_PV32 by default and thus test jobs
"qemu-smoke-x86-64-gcc" and "qemu-smoke-x86-64-clang" fails because
they are using XTF's "test-pv32pae-example" which is an hello word
32bit PV guest.

As we are looking for whether Xen boot or not with a quick smoke test,
just use 64bit tests instead.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoxen/device-tree: Add dt_get_pci_domain_nr helper
Rahul Singh [Tue, 28 Sep 2021 18:18:17 +0000 (19:18 +0100)]
xen/device-tree: Add dt_get_pci_domain_nr helper

Based Linux commit 41e5c0f81d3e676d671d96a0a1fafb27abfbd9d7

Import the Linux helper of_get_pci_domain_nr. This function will try to
obtain the host bridge domain number by finding a property called
"linux,pci-domain" of the given device node.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoxen/device-tree: Add dt_property_read_u32_array helper
Rahul Singh [Tue, 28 Sep 2021 18:18:16 +0000 (19:18 +0100)]
xen/device-tree: Add dt_property_read_u32_array helper

Based Linux commit a67e9472da423ec47a3586920b526ebaedf25fc3

Import the Linux helper of_property_read_u32_array. This function find
and read an array of 32 bit integers from a property.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agoxen/device-tree: Add dt_property_read_variable_u32_array helper
Rahul Singh [Tue, 28 Sep 2021 18:18:15 +0000 (19:18 +0100)]
xen/device-tree: Add dt_property_read_variable_u32_array helper

Based Linux commit a67e9472da423ec47a3586920b526ebaedf25fc3

Import the Linux helper of_property_read_variable_u32_array. This
function find and read an array of 32 bit integers from a property,
with bounds on the minimum and maximum array size.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoxen/arm: pci: Add stubs to allow selecting HAS_PCI
Rahul Singh [Tue, 28 Sep 2021 18:18:11 +0000 (19:18 +0100)]
xen/arm: pci: Add stubs to allow selecting HAS_PCI

In a follow-up we will enable PCI support in Xen on Arm (i.e select
HAS_PCI).

The generic code expects the arch to implement a few functions:
arch_iommu_use_permitted()
arch_pci_clean_pirqs()

Note that this is not yet sufficient to enable HAS_PCI and will be
addressed in follow-ups.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agoxen/pci: Refactor MSI code that implements MSI functionality within XEN
Rahul Singh [Tue, 28 Sep 2021 18:18:10 +0000 (19:18 +0100)]
xen/pci: Refactor MSI code that implements MSI functionality within XEN

On Arm, the initial plan is to only support GICv3 ITS which doesn't
require us to manage the MSIs because the HW will protect against
spoofing. Move the code under CONFIG_HAS_PCI_MSI flag to gate the code
for ARM.

No functional change intended.

Signed-off-by: Rahul Singh <rahul.singh@arm.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agobuild: add --full to version.sh to guess $(XEN_FULLVERSION)
Anthony PERARD [Thu, 9 Sep 2021 14:33:06 +0000 (15:33 +0100)]
build: add --full to version.sh to guess $(XEN_FULLVERSION)

Running $(MAKE) like that in a $(shell ) while parsing the Makefile
doesn't work reliably. In some case, make will complain with
"jobserver unavailable: using -j1.  Add '+' to parent make rule.".
Also, it isn't possible to distinguish between the output produced by
the target "xenversion" and `make`'s own output.

Instead of running make, this patch "improve" `version.sh` to try to
guess the output of `make xenversion`.

In order to have version.sh works in more scenario, it will use
XEN_EXTRAVERSION and XEN_VENDORVERSION from the environment when
present. As for the cases were those two variables are overridden by a
make command line arguments, we export them when invoking version.sh
via a new $(XEN_FULLVERSION) macro.

That should hopefully get us to having ./version.sh returning the same
value that `make xenversion` would.

This fix GitLab CI's build job "debian-unstable-gcc-arm64".

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Reviewed-by: Ian Jackson <iwj@xenproject.org>
3 years agoxen: rework `checkpolicy` detection when using "randconfig"
Anthony PERARD [Wed, 29 Sep 2021 09:58:15 +0000 (11:58 +0200)]
xen: rework `checkpolicy` detection when using "randconfig"

This patch allows to easily add more override which depends on the
environment.

Also, move the check out of Config.mk and into xen/ build system.
Nothing in tools/ is using that information as it's done by
./configure.

We named the new file ".allconfig.tmp" as ".*.tmp" are already ignored
via .gitignore.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
Reviewed-by: Luca Fancellu <luca.fancellu@arm.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/PVH: actually show Dom0's stacks from debug key '0'
Jan Beulich [Wed, 29 Sep 2021 09:57:22 +0000 (11:57 +0200)]
x86/PVH: actually show Dom0's stacks from debug key '0'

show_guest_stack() does nothing for HVM. Introduce a HVM-specific
dumping function, paralleling the 64- and 32-bit PV ones. We don't know
the real stack size, so only dump up to the next page boundary.

Rather than adding a vcpu parameter to hvm_copy_from_guest_linear(),
introduce hvm_copy_from_vcpu_linear() which - for now at least - in
return won't need a "pfinfo" parameter.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agox86/HVM: convert hvm_virtual_to_linear_addr() to be remote-capable
Jan Beulich [Wed, 29 Sep 2021 09:56:18 +0000 (11:56 +0200)]
x86/HVM: convert hvm_virtual_to_linear_addr() to be remote-capable

While all present callers want to act on "current", stack dumping for
HVM vCPU-s will require the function to be able to act on a remote vCPU.
To avoid touching all present callers, convert the existing function to
an inline wrapper around the extend new one.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agopci: fix handling of PCI bridges with subordinate bus number 0xff
Igor Druzhinin [Tue, 28 Sep 2021 14:04:50 +0000 (16:04 +0200)]
pci: fix handling of PCI bridges with subordinate bus number 0xff

Bus number 0xff is valid according to the PCI spec. Using u8 typed sub_bus
and assigning 0xff to it will result in the following loop getting stuck.

    for ( ; sec_bus <= sub_bus; sec_bus++ ) {...}

Just change its type to unsigned int similarly to what is already done in
dmar_scope_add_buses().

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
3 years agox86/PVH: actually show Dom0's register state from debug key '0'
Jan Beulich [Tue, 28 Sep 2021 14:03:38 +0000 (16:03 +0200)]
x86/PVH: actually show Dom0's register state from debug key '0'

vcpu_show_registers() didn't do anything for HVM so far. Note though
that some extra hackery is needed for VMX - see the code comment.

Note further that the show_guest_stack() invocation is left alone here:
While strictly speaking guest_kernel_mode() should be predicated by a
PV / !HVM check, show_guest_stack() itself will bail immediately for
HVM.

While there and despite not being PVH-specific, take the opportunity and
filter offline vCPU-s: There's not really any register state associated
with them, so avoid spamming the log with useless information while
still leaving an indication of the fact.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agoMerge remote-tracking branch 'origin/staging' into staging
Ian Jackson [Tue, 28 Sep 2021 11:51:00 +0000 (12:51 +0100)]
Merge remote-tracking branch 'origin/staging' into staging

3 years agoConfig.mk: update OVMF to edk2-stable202108
Anthony PERARD [Tue, 31 Aug 2021 12:36:37 +0000 (13:36 +0100)]
Config.mk: update OVMF to edk2-stable202108

Update to the latest stable tag.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
3 years agoxen/arm: optee: Fix arm_smccc_smc's a0 for OPTEE_SMC_DISABLE_SHM_CACHE
Oleksandr Tyshchenko [Mon, 27 Sep 2021 13:54:10 +0000 (16:54 +0300)]
xen/arm: optee: Fix arm_smccc_smc's a0 for OPTEE_SMC_DISABLE_SHM_CACHE

Fix a possible copy-paste error in arm_smccc_smc's first argument (a0)
for OPTEE_SMC_DISABLE_SHM_CACHE case.

This error causes Linux > v5.14-rc5 (b5c10dd04b7418793517e3286cde5c04759a86de
optee: Clear stale cache entries during initialization) to stuck
repeatedly issuing OPTEE_SMC_DISABLE_SHM_CACHE call and waiting for
the result to be OPTEE_SMC_RETURN_ENOTAVAIL which will never happen.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Reviewed-by: Bertrand Marquis <bertrand.marquis@arm.com>
Reviewed-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Fixes: 2e35cdf9b2ca ("xen/arm: optee: add OP-TEE mediator skeleton")
Backport: 4.13+

3 years agotools/libs: fix build of stubdoms
Juergen Gross [Wed, 8 Sep 2021 12:43:03 +0000 (14:43 +0200)]
tools/libs: fix build of stubdoms

In case abi-dumper is available the stubdom builds will fail due to a
false dependency on dynamic loadable libraries. Fix that.

Fixes: d7c9f7a7a3959913b4 ("tools/libs: Write out an ABI analysis when abi-dumper is available")
Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
3 years agoConfig: use Mini-OS commit 9f09744aa3e5982 for xen-unstable
Juergen Gross [Wed, 8 Sep 2021 12:52:32 +0000 (14:52 +0200)]
Config: use Mini-OS commit 9f09744aa3e5982 for xen-unstable

Switch the used Mini-OS commit to 9f09744aa3e5982 in xen-unstable.

9f09744aa3e5982 is current mini-os.git#master -iwj. ]

Signed-off-by: Juergen Gross <jgross@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agotools/libxl: Correctly align the ACPI tables
Kevin Stefanov [Wed, 15 Sep 2021 14:30:00 +0000 (15:30 +0100)]
tools/libxl: Correctly align the ACPI tables

The memory allocator currently calculates alignment in libxl's virtual
address space, rather than guest physical address space. This results
in the FACS being commonly misaligned.

Furthermore, the allocator has several other bugs.

The opencoded align-up calculation is currently susceptible to a bug
that occurs in the corner case that the buffer is already aligned to
begin with. In that case, an align-sized memory hole is introduced.

The while loop is dead logic because its effects are entirely and
unconditionally overwritten immediately after it.

Rework the memory allocator to align in guest physical address space
instead of libxl's virtual memory and improve the calculation, drop
errant extra page in allocated buffer for ACPI tables, and give some
of the variables better names/types.

Fixes: 14c0d328da2b ("libxl/acpi: Build ACPI tables for HVMlite guests")
Signed-off-by: Kevin Stefanov <kevin.stefanov@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
3 years agox86: initialize memnodemapsize while faking NUMA node
Wei Chen [Fri, 24 Sep 2021 09:02:20 +0000 (11:02 +0200)]
x86: initialize memnodemapsize while faking NUMA node

When system turns NUMA off or system lacks of NUMA support,
Xen will fake a NUMA node to make system works as a single
node NUMA system.

In this case the memory node map doesn't need to be allocated
from boot pages, it will use the _memnodemap directly. But
memnodemapsize hasn't been set. Xen should assert in phys_to_nid.
Because x86 was using an empty macro "VIRTUAL_BUG_ON" to replace
ASSERT, this bug will not be triggered on x86.

Actually, Xen will only use 1 slot of memnodemap in this case.
So we set memnodemap[0] to 0 and memnodemapsize to 1 in this
patch to fix it.

Signed-off-by: Wei Chen <wei.chen@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agocommon: guest_physmap_add_page()'s return value needs checking
Jan Beulich [Fri, 24 Sep 2021 09:00:30 +0000 (11:00 +0200)]
common: guest_physmap_add_page()'s return value needs checking

The function may fail; it is not correct to indicate "success" in this
case up the call stack. Mark the function must-check to prove all
cases have been caught (and no new ones will get introduced).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Jackson <iwj@xenproject.org>
Acked-by: Stefano Stabellini <sstabellini@kernel.org>
3 years agox86: drop a bogus SHARED_M2P() check from PV Dom0 building code
Jan Beulich [Wed, 22 Sep 2021 14:19:21 +0000 (16:19 +0200)]
x86: drop a bogus SHARED_M2P() check from PV Dom0 building code

If anything, a check covering a wider range of invalid M2P entries ought
to be used (e.g. VALID_M2P()). But since everything is fully under Xen's
control at this stage, simply remove the BUG_ON().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
3 years agomm: fix broken tainted value in mark_page_free
Penny Zheng [Wed, 22 Sep 2021 14:18:30 +0000 (16:18 +0200)]
mm: fix broken tainted value in mark_page_free

Commit 540a637c3410780b519fc055f432afe271f642f8 defines a new
helper mark_page_free to extract common codes, while it accidently
breaks the local variable "tainted".

This patch fix it by letting mark_page_free() return bool of whether the
page is offlined and rename local variable "tainted" to "pg_offlined".

Coverity ID: 1491872
Fixes: 540a637c3410 ("xen: introduce mark_page_free")
Signed-off-by: Penny Zheng <penny.zheng@arm.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/mem_sharing: don't lock parent during fork reset
Tamas K Lengyel [Wed, 22 Sep 2021 14:17:54 +0000 (16:17 +0200)]
x86/mem_sharing: don't lock parent during fork reset

During fork reset operation the parent domain doesn't need to be gathered using
rcu_lock_live_remote_domain_by_id, the fork already has the parent pointer.

Signed-off-by: Tamas K Lengyel <tamas.lengyel@intel.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agoAMD/IOMMU: add "ivmd=" command line option
Jan Beulich [Wed, 22 Sep 2021 14:17:04 +0000 (16:17 +0200)]
AMD/IOMMU: add "ivmd=" command line option

Just like VT-d's "rmrr=" it can be used to cover for firmware omissions.
Since systems surfacing IVMDs seem to be rare, it is also meant to allow
testing of the involved code.

Only the IVMD flavors actually understood by the IVMD parsing logic can
be generated, and for this initial implementation there's also no way to
control the flags field - unity r/w mappings are assumed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map
Jan Beulich [Wed, 22 Sep 2021 14:16:28 +0000 (16:16 +0200)]
AMD/IOMMU: provide function backing XENMEM_reserved_device_memory_map

Just like for VT-d, exclusion / unity map ranges would better be
reflected in e.g. the guest's E820 map. The reporting infrastructure
was put in place still pretty tailored to VT-d's needs; extend
get_reserved_device_memory() to allow vendor specific code to probe
whether a particular (seg,bus,dev,func) tuple would get its data
actually recorded. I admit the de-duplication of entries is quite
limited for now, but considering our trouble to find a system
surfacing _any_ IVMD this is likely not a critical issue for this
initial implementation.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: also insert IVMD ranges into Dom0's page tables
Jan Beulich [Wed, 22 Sep 2021 14:15:29 +0000 (16:15 +0200)]
AMD/IOMMU: also insert IVMD ranges into Dom0's page tables

So far only one region would be taken care of, if it can be placed in
the exclusion range registers of the IOMMU. Take care of further ranges
as well. Seeing that we've been doing fine without this, make both
insertion and removal best effort only.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agoAMD/IOMMU: check / convert IVMD ranges for being / to be reserved
Jan Beulich [Wed, 22 Sep 2021 14:14:19 +0000 (16:14 +0200)]
AMD/IOMMU: check / convert IVMD ranges for being / to be reserved

While the specification doesn't say so, just like for VT-d's RMRRs no
good can come from these ranges being e.g. conventional RAM or entirely
unmarked and hence usable for placing e.g. PCI device BARs. Check
whether they are, and put in some limited effort to convert to reserved.
(More advanced logic can be added if actual problems are found with this
simplistic variant.)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Paul Durrant <paul@xen.org>
3 years agox86/trace: Clean up trace handling
Andrew Cooper [Mon, 20 Sep 2021 13:30:49 +0000 (14:30 +0100)]
x86/trace: Clean up trace handling

Use more appropriate types.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/pv: Move x86/trace.c to x86/pv/trace.c
Andrew Cooper [Mon, 20 Sep 2021 14:02:32 +0000 (15:02 +0100)]
x86/pv: Move x86/trace.c to x86/pv/trace.c

This entire file is pv-only, and not excluded from the build by
CONFIG_TRACEBUFFER.  Move it into the pv/ directory, build it conditionally,
and drop unused includes.

Also move the contents of asm/trace.h to asm/pv/trace.h to avoid the functions
being declared across the entire hypervisor.

One caller in fixup_page_fault() is effectively PV only, but is not subject to
dead code elimination.  Add an additional IS_ENABLED(CONFIG_PV) to keep the
build happy.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/hvm: Remove duplicate calls caused by tracing
Andrew Cooper [Fri, 17 Sep 2021 23:32:12 +0000 (00:32 +0100)]
x86/hvm: Remove duplicate calls caused by tracing

1) vpic_ack_pending_irq() calls vlapic_accept_pic_intr() twice, once in the
   TRACE_2D() instantiation and once "for real".  Make the call only once.

2) vlapic_accept_pic_intr() similarly calls __vlapic_accept_pic_intr() twice,
   although this is more complicated to disentangle.

   v cannot be NULL because it has already been dereferenced in the function,
   causing the ternary expression to always call __vlapic_accept_pic_intr().
   However, the return expression of the function takes care to skip the call
   if this vCPU isn't the PIC target.  As __vlapic_accept_pic_intr() is far
   from trivial, make the TRACE_2D() semantics match the return semantics by
   only calling __vlapic_accept_pic_intr() when the vCPU is the PIC target.

3) hpet_set_timer() duplicates calls to hpet_tick_to_ns().  Pull the logic out
   which simplifies both the TRACE and create_periodic_time() calls.

4) lapic_rearm() makes multiple calls to vlapic_lvtt_period().  Pull it out
   into a local variable.

vlapic_accept_pic_intr() is called on every VMEntry, so this is a reduction in
VMEntry complexity across the board.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
3 years agox86/hvm: Reduce stack usage from HVMTRACE_ND()
Andrew Cooper [Wed, 15 Sep 2021 16:04:00 +0000 (17:04 +0100)]
x86/hvm: Reduce stack usage from HVMTRACE_ND()

It is pointless to write all 6 entries and only consume the useful subset.
bloat-o-meter shows quite how obscene the overhead is in vmx_vmexit_handler(),
weighing in at 12% of the function arranging unread zeroes on the stack, and
8% for svm_vmexit_handler().

  add/remove: 0/0 grow/shrink: 0/20 up/down: 0/-1929 (-1929)
  Function                                     old     new   delta
  hvm_msr_write_intercept                     1049    1033     -16
  vmx_enable_intr_window                       238     214     -24
  svm_enable_intr_window                       337     313     -24
  hvmemul_write_xcr                            115      91     -24
  hvmemul_write_cr                             350     326     -24
  hvmemul_read_xcr                             115      91     -24
  hvmemul_read_cr                              146     122     -24
  hvm_mov_to_cr                                438     414     -24
  hvm_mov_from_cr                              253     229     -24
  vmx_intr_assist                             1150    1118     -32
  svm_intr_assist                              459     427     -32
  hvm_rdtsc_intercept                          138     106     -32
  hvm_msr_read_intercept                       898     866     -32
  vmx_vmenter_helper                          1142    1094     -48
  vmx_inject_event                             813     765     -48
  svm_vmenter_helper                           238     187     -51
  hvm_hlt                                      197     146     -51
  svm_inject_event                            1678    1614     -64
  svm_vmexit_handler                          5880    5392    -488
  vmx_vmexit_handler                          7281    6438    -843
  Total: Before=3644277, After=3642348, chg -0.05%

Adjust all users of HVMTRACE_ND(), using TRC_PAR_LONG() where appropriate
instead of opencoding it.

The 0 case needs a little help.  All object in C must have a unique address
and _d is passed by pointer.  Explicitly permit the optimiser to drop the
array.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>