Julien Grall [Tue, 5 May 2015 15:37:30 +0000 (16:37 +0100)]
xen/events: fifo: Make it running on 64KB granularity
Only use the first 4KB of the page to store the events channel info. It
means that we will waste 60KB every time we allocate page for:
* control block: a page is allocating per CPU
* event array: a page is allocating everytime we need to expand it
I think we can reduce the memory waste for the 2 areas by:
* control block: sharing between multiple vCPUs. Although it will
require some bookkeeping in order to not free the page when the CPU
goes offline and the other CPUs sharing the page still there
* event array: always extend the array event by 64K (i.e 16 4K
chunk). That would require more care when we fail to expand the
event channel.
Julien Grall [Mon, 4 May 2015 14:39:08 +0000 (15:39 +0100)]
xen/balloon: Don't rely on the page granularity is the same for Xen and Linux
For ARM64 guests, Linux is able to support either 64K or 4K page
granularity. Although, the hypercall interface is always based on 4K
page granularity.
With 64K page granularity, a single page will be spread over multiple
Xen frame.
To avoid splitting the page into 4K frame, take advantage of the
extent_order field to directly allocate/free chunk of the Linux page
size.
Note that PVMMU is only used for PV guest (which is x86) and the page
granularity is always 4KB. Some BUILD_BUG_ON has been added to ensure
that because the code has not been modified.
Julien Grall [Mon, 11 May 2015 12:44:21 +0000 (13:44 +0100)]
xen/biomerge: Don't allow biovec's to be merged when Linux is not using 4KB pages
On ARM all dma-capable devices on a same platform may not be protected
by an IOMMU. The DMA requests have to use the BFN (i.e MFN on ARM) in
order to use correctly the device.
While the DOM0 memory is allocated in a 1:1 fashion (PFN == MFN), grant
mapping will screw this contiguous mapping.
When Linux is using 64KB page granularitary, the page may be split
accross multiple non-contiguous MFN (Xen is using 4KB page
granularity). Therefore a DMA request will likely fail.
Checking that a 64KB page is using contiguous MFN is tedious. For
now, always says that biovec are not mergeable.
Prepare the code to support 64KB page granularity. The first
implementation will use a full Linux page per indirect and persistent
grant. When non-persistent grant is used, each page of a bio request
may be split in multiple grant.
Furthermore, the field page of the grant structure is only used to copy
data from persistent grant or indirect grant. Avoid to set it for other
use case as it will have no meaning given the page will be split in
multiple grant.
Provide 2 functions, to setup indirect grant, the other for bio page.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
pfn = page_to_gfn(...) /* Or similar */
gnttab_grant_foreign_access_ref
Replace it by a new helper. Note that when Linux is using a different
page granularity than Xen, the helper only gives access to the first 4KB
grant.
This is useful where drivers are allocating a full Linux page for each
grant.
Also include xen/interface/grant_table.h rather than xen/grant_table.h in
asm/page.h for x86 to fix a compilation issue [1]. Only the former is
useful in order to get the structure definition.
[1] Interdependency between asm/page.h and xen/grant_table.h which result
to page_mfn not being defined when necessary.
Julien Grall [Fri, 19 Jun 2015 16:49:03 +0000 (17:49 +0100)]
xen/grant: Introduce helpers to split a page into grant
Currently, a grant is always based on the Xen page granularity (i.e
4KB). When Linux is using a different page granularity, a single page
will be split between multiple grants.
The new helpers will be in charge of splitting the Linux page into grants
and call a function given by the caller on each grant.
Also provide an helper to count the number of grants within a given
contiguous region.
Note that the x86/include/asm/xen/page.h is now including
xen/interface/grant_table.h rather than xen/grant_table.h. It's
necessary because xen/grant_table.h depends on asm/xen/page.h and will
break the compilation. Furthermore, only definition in
interface/grant_table.h is required.
Julien Grall [Fri, 7 Aug 2015 16:34:41 +0000 (17:34 +0100)]
xen/privcmd: Further s/MFN/GFN/ clean-up
The privcmd code is mixing the usage of GFN and MFN within the same
functions which make the code difficult to understand when you only work
with auto-translated guests.
The privcmd driver is only dealing with GFN so replace all the mention
of MFN into GFN.
The ioctl structure used to map foreign change has been left unchanged
given that the userspace is using it. Nonetheless, add a comment to
explain the expected value within the "mfn" field.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Julien Grall [Fri, 7 Aug 2015 16:34:37 +0000 (17:34 +0100)]
xen: Use correctly the Xen memory terminologies
Based on include/xen/mm.h [1], Linux is mistakenly using MFN when GFN
is meant, I suspect this is because the first support for Xen was for
PV. This resulted in some misimplementation of helpers on ARM and
confused developers about the expected behavior.
For instance, with pfn_to_mfn, we expect to get an MFN based on the name.
Although, if we look at the implementation on x86, it's returning a GFN.
For clarity and avoid new confusion, replace any reference to mfn with
gfn in any helpers used by PV drivers. The x86 code will still keep some
reference of pfn_to_mfn which may be used by all kind of guests
No changes as been made in the hypercall field, even
though they may be invalid, in order to keep the same as the defintion
in xen repo.
Note that page_to_mfn has been renamed to xen_page_to_gfn to avoid a
name to close to the KVM function gfn_to_page.
Take also the opportunity to simplify simple construction such
as pfn_to_mfn(page_to_pfn(page)) into xen_page_to_gfn. More complex clean up
will come in follow-up patches.
Julien Grall [Fri, 7 Aug 2015 16:34:36 +0000 (17:34 +0100)]
arm/xen: implement correctly pfn_to_mfn
After the commit introducing convertion between DMA and guest addresses,
all the callers of pfn_to_mfn are expecting to get a GFN (Guest Frame
Number). On ARM, all the guests are auto-translated so the GFN is equal
to the Linux PFN (Pseudo-physical Frame Number).
The current implementation may return an MFN if the caller is passing a
PFN associated to a mapped foreign grant. In pratice, I haven't seen
the problem on running guest but we should fix it for the sake of
correctness.
Correct the implementation by always returning the pfn passed in parameter.
A follow-up patch will take care to rename pfn_to_mfn to a suitable
name.
Julien Grall [Fri, 7 Aug 2015 16:34:35 +0000 (17:34 +0100)]
xen: Make clear that swiotlb and biomerge are dealing with DMA address
The swiotlb is required when programming a DMA address on ARM when a
device is not protected by an IOMMU.
In this case, the DMA address should always be equal to the machine address.
For DOM0 memory, Xen ensure it by have an identity mapping between the
guest address and host address. However, when mapping a foreign grant
reference, the 1:1 model doesn't work.
For ARM guest, most of the callers of pfn_to_mfn expects to get a GFN
(Guest Frame Number), i.e a PFN (Page Frame Number) from the Linux point
of view given that all ARM guest are auto-translated.
Even though the name pfn_to_mfn is misleading, we need to ensure that
those caller get a GFN and not by mistake a MFN. In pratical, I haven't
seen error related to this but we should fix it for the sake of
correctness.
In order to fix the implementation of pfn_to_mfn on ARM in a follow-up
patch, we have to introduce new helpers to return the DMA from a PFN and
the invert.
On x86, the new helpers will be an alias of pfn_to_mfn and mfn_to_pfn.
The helpers will be used in swiotlb and xen_biovec_phys_mergeable.
This is necessary in the latter because we have to ensure that the
biovec code will not try to merge a biovec using foreign page and
another using Linux memory.
Lastly, the helper mfn_to_local_pfn has been renamed to bfn_to_local_pfn
given that the only usage was in swiotlb.
Bob Liu [Mon, 13 Jul 2015 09:55:24 +0000 (17:55 +0800)]
xen-blkfront: convert to blk-mq APIs
Note: This patch is based on original work of Arianna's internship for
GNOME's Outreach Program for Women.
Only one hardware queue is used now, so there is no significant
performance change
The legacy non-mq code is deleted completely which is the same as other
drivers like virtio, mtip, and nvme.
Also dropped one unnecessary holding of info->io_lock when calling
blk_mq_stop_hw_queues().
Signed-off-by: Arianna Avanzini <avanzini.arianna@gmail.com> Signed-off-by: Bob Liu <bob.liu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Jens Axboe <axboe@fb.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Robert Richter [Wed, 12 Aug 2015 11:31:38 +0000 (13:31 +0200)]
pci, thunder, acpi: Fix ITS initialization for ACPI
Since pci bridges are enable with generic acpi pci code now, the its
was no longer initialized which leads to bad requester IDs for
ITS. Fixing that by moving its initialization to ThunderX bridge
fixup. The setup is now done for both, acpi and devicetree.
This should also fix broken pci root complexes for other vendor's
since the previous ACPI code enabled ThunderX requester IDs on all
systems without any device or vendor check.
It was root caused by Tomasz Nowicki <tn@semihalf.com>.
Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
IORT shows representation of IO topology that will be used by
ARM based systems. It describes how various components are connected
together e.g. which devices are connected to given ITS instance.
This patch implements calls which allow to:
- register/remove ITS as MSI chip
- parse all IORT nodes and form node tree (for easy lookup)
- find ITS (MSI chip) that device is assigned to
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Tomasz Nowicki [Tue, 28 Jul 2015 15:51:27 +0000 (17:51 +0200)]
pci, acpi, dma: Unify coherency checking logic for PCI devices.
ACPI spec5.1 states that the value of _CCA is inherited by all
descendants of bus master devices, root PCI bridge in this case.
So this patch is checking if PCI device's root bridge has coherency
flag set and then mounts DMA ops (in similar way as DT does).
Signed-off-by: Tomasz Nowicki <tn@semihalf.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Tomasz Nowicki [Tue, 24 Mar 2015 03:31:58 +0000 (20:31 -0700)]
ARM64 / ACPI: Point KVM to the virtual timer interrupt when booting with ACPI
With ACPI enabled, kvm_timer_hyp_init can't access any device tree
information. Although registration of the virtual timer interrupt
already happened when architected timers were initialized, we need to
point KVM to the interrupt line used.
Signed-off-by: Alexander Spyridakis <a.spyridakis@virtualopensystems.com> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Tomasz Nowicki [Fri, 12 Dec 2014 08:41:23 +0000 (09:41 +0100)]
arm64/acpi/pci: provide hook for MCFG fixups
Some MCFG tables may be broken or the underlying hardware may not
be fully compliant with the PCIe ECAM mechanism. This patch provides
a mechanism to override the default mmconfig read/write routines
and/or do other MCFG related fixups.
Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Tomasz Nowicki [Fri, 14 Nov 2014 10:01:16 +0000 (11:01 +0100)]
pci, acpi: Share ACPI PCI config space accessors.
MMCFG can be used perfectly for all architectures which support ACPI.
ACPI mandates MMCFG to describe PCI config space ranges which means
we should use MMCONFIG accessors by default.
mmconfig_64.c version is going to be default implementation for arch
agnostic low-level direct PCI config space accessors via MMCONFIG.
However, now it initialize raw_pci_ext_ops pointer which is used in
x86 specific code only. Moreover, mmconfig_32.c is doing the same thing
at the same time.
Move it to mmconfig_shared.c so it becomes common for both and
mmconfig_64.c turns out to be purely arch agnostic.
Tomasz Nowicki [Thu, 13 Nov 2014 14:48:24 +0000 (15:48 +0100)]
x86, acpi, pci: Move PCI config space accessors.
We are going to use mmio_config_{} name convention across all architectures.
Currently it belongs to asm/pci_x86.h header which should be included
only for x86 specific files. From now on, those accessors are in asm/pci.h
header which can be included in non-architecture code much easier.
Tomasz Nowicki [Thu, 13 Nov 2014 10:59:18 +0000 (11:59 +0100)]
x86, acpi, pci: Move arch-agnostic MMCFG code out of arch/x86/ directory
MMCFG table seems to be architecture independent and it makes sense
to share common code across all architectures. The ones that may need
architectural specific actions have default prototype (__weak).
Tomasz Nowicki [Thu, 13 Nov 2014 10:54:53 +0000 (11:54 +0100)]
x86, acpi, pci: Reorder logic of pci_mmconfig_insert() function
This patch is the first step for MMCONFIG refactoring process.
Code that uses pci_mmcfg_lock will be moved to common file and become
accessible for all architectures. pci_mmconfig_insert() cannot be moved
so easily since it is mixing generic mmcfg code with x86 specific logic
inside of mutual exclusive block guarded by pci_mmcfg_lock.
To get rid of that constraint we reorder actions as fallow:
1. mmconfig entry allocation can be done at first, does not need lock
2. insertion to iomem_resource has its own lock, no need to wrap it into mutex
3. insertion to mmconfig list can be done as the final stage in separate
function (candidate for further factoring)
Tomasz Nowicki [Wed, 21 May 2014 13:53:23 +0000 (15:53 +0200)]
GICv3: Refactor gic_of_init() of GICv3 driver to allow for FDT and ACPI initialization.
Isolate hardware abstraction (FDT) code to gic_of_init().
Rest of the logic goes to gic_init_bases() and expects well defined
data to initialize GIC properly. The same solution is used for GICv2 driver.
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Tomasz Nowicki [Thu, 8 Jan 2015 11:36:33 +0000 (12:36 +0100)]
arm64, acpi: Implement new "GIC version" field of MADT GIC entry.
There is no need to probe GICv2 and GICv3 sequentially. From now on,
we know GIC version in advance. Note this patch does not break backward
compatibility for machines which are compliant with ACPI spec. 5.1.
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Al Stone [Tue, 11 Nov 2014 00:11:20 +0000 (17:11 -0700)]
clocksource: arm_arch_timer: fix system hang
Arm allows for two possible architectural clock sources. One memory mapped
and the other coprocessor based. If both timers exist, then the driver waits
for both to be probed before registering a clocksource.
Commit c387f07e6205 ("clocksource: arm_arch_timer: Discard unavailable timers
correctly") attempted to fix a hang occurring when one of the two possible
timers had a device node, but was disabled. In that case, the second probe
would never occur and the system would hang without a clocksource being
registered.
Unfortunately, incorrect logic in that commit made things worse such that
a hang would occur unless both timers had a device node and were enabled.
This patch fixes the logic so that we don't wait to probe a second timer
unless it exists and is enabled.
Signed-off-by: Mark Salter <msalter@redhat.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Naresh Bhat [Wed, 23 Oct 2013 11:23:31 +0000 (16:53 +0530)]
mfd: vexpress-sysreg Add ACPI support for probing to driver
Add match table and pointers for ACPI probing into vexpress-sysreg driver.
vexpress-sysreg is self-contained so it gets resources automatically
being platform driver. However, while it is not probed, it still should
provides base address so that other related drivers can take advantage
of it. Make it possible and find resources based on device HID.
Signed-off-by: Naresh Bhat <naresh.bhat@linaro.org> Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Graeme Gregory [Mon, 30 Jun 2014 18:52:02 +0000 (19:52 +0100)]
Juno / net: smsc911x add support for probing from ACPI
This is a standard platform device to resources are converted in the
ACPI core in the same fasion as DT resources. For the other DT
provided information there is _DSD for ACPI.
Andrew Pinski [Sat, 21 Mar 2015 20:08:01 +0000 (13:08 -0700)]
ARM64: Improve copy_page for 128 cache line sizes.
Adding a check for the cache line size is not much overhead.
Special case 128 byte cache line size.
This improves copy_page by 85% on ThunderX compared to the
original implementation.
For LMBench, it improves between 4-10%.
Signed-off-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andrew Pinski [Sat, 21 Mar 2015 01:55:27 +0000 (18:55 -0700)]
ARM64:spinlocks: Fix up for WFE and improve performance slightly.
In the previous patch, I had made a mistake of putting WFE after the delay which
meant if we enable the WFE, we would get the same bad performance as before.
Also use the flags register some more to allow the instructions to be fused together.
Signed-off-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andrew Pinski [Thu, 5 Mar 2015 09:40:11 +0000 (01:40 -0800)]
ARM64:VDSO: Improve __do_get_tspec, don't use udiv
In most other targets (x86/tile for an example),
the division in __do_get_tspec is converted into
a simple loop. The main reason for this is
because the result of this division is going
to be either 0 or 1.
This changes the division to the simple loop
and thus speeding up gettimeofday.
Signed-off-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
irqchip, gicv3-its, numa: Workaround for Cavium ThunderX erratum 23144
This implements a workaround for gicv3-its erratum 23144 applicable
for Cavium's ThunderX multinode systems.
The erratum fixes the hang of ITS SYNC command by avoiding inter node
io and collections/cpu mapping. This fix is only applicable for
Cavium's ThunderX dual-socket platforms.
arm64, numa: adding numa support for arm64 platforms.
Adding numa support for arm64 based platforms.
This patch adds by default the dummy numa node and
maps all memory and cpus to node 0.
using this patch, numa can be simulated on single node arm64 platforms.
arm64, numa: adding numa support for arm64 platforms.
Adding numa support for arm64 based platforms.
This patch adds by default the dummy numa node and
maps all memory and cpus to node 0.
using this patch, numa can be simulated on single node arm64 platforms.
Ard Biesheuvel [Mon, 2 Mar 2015 18:10:07 +0000 (18:10 +0000)]
arm64/efi: adapt to relaxed FDT placement requirements
With the relaxed FDT placement requirements in place, we can change
the allocation strategy used by the stub to put the FDT image higher
up in memory. At the same time, reduce the minimal alignment to 8 bytes,
and impose a 2 MB size limit, as per the new requirements.
Ard Biesheuvel [Sun, 10 May 2015 06:41:31 +0000 (08:41 +0200)]
arm64/efi: ignore DT memreserve entries instead of removing them
Now that the reservation of the FDT image itself is split off, we
can make the DT scanning of memreserves conditional on whether we
booted via UEFI and have its memory map available. This allows us
to drop deletion of these memreserves in the stub. It also fixes
the issue where the /reserved-memory/ node (which offers another
way of reserving memory ranges) was not being ignored under UEFI.
Ard Biesheuvel [Sun, 10 May 2015 08:26:44 +0000 (10:26 +0200)]
arm64/efi: ignore DT memory nodes instead of removing them
There are two problems with the UEFI stub DT memory node removal
routine:
- it deletes nodes as it traverses the tree, which happens to work
but is not supported, as deletion invalidates the node iterator;
- deleting memory nodes entirely may discard annotations in the form
of additional properties on the nodes.
Now that the UEFI initialization has moved to an earlier stage, we can
actually just ignore any memblocks that are installed after we have
processed the UEFI memory map. This way, it is no longer necessary to
remove the nodes, so we can remove that logic from the stub as well.
Ard Biesheuvel [Sun, 10 May 2015 12:03:31 +0000 (14:03 +0200)]
arm64/efi: move EFI init before early FDT processing
The early FDT processing is responsible for enumerating the
DT memory nodes and installing them as memblocks. This should
only be done if we are not booting via EFI, but at this point,
we don't know yet if that is the case or not.
So move the EFI init to before the early FDT processing. This involves
making some changes to the way EFI discovers the locations of the
EFI system table and the memory map, since those values are retrieved
from the FDT as well. Instead the of_scan infrastructure, it now uses
libfdt directly to access the /chosen node.
Ard Biesheuvel [Sun, 10 May 2015 10:09:14 +0000 (12:09 +0200)]
efi: move FDT handling to separate object file
The EFI specific FDT handling is compiled conditionally, and is
logically independent of the rest of efi.o. So move it to a separate
file before making changes to it in subsequent patches.
Acked-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Override the __weak early_init_dt_add_memory_arch() with our own
version. This allows us to relax the imposed restrictions at memory
discovery time, which is needed if we want to defer the assignment
of PHYS_OFFSET and make it independent of where the kernel Image
is placed in physical memory.
So copy the generic original, but only retain the check against
regions whose sizes become zero when clipped to page alignment.
For now, we will remove the range below PHYS_OFFSET explicitly
until we rework that logic in a subsequent patch. Any memory that
we will not be able to map due to insufficient size of the linear
region is also removed.
Eric Auger [Thu, 18 Jun 2015 08:46:07 +0000 (10:46 +0200)]
KVM: arm/arm64: enable MSI routing
Up to now, only irqchip routing entries could be set. This patch
adds the capability to insert MSI routing entries, with or without
device id. Although standard MSI entries can be set, their
injection still is not supported. For ARM64, let's also increase
KVM_MAX_IRQ_ROUTES to 4096: include SPI irqchip flat routes plus
MSI routes. In the future this might be extended.
The new MSI routing entry type also must be managed similarly to
legacy KVM_IRQ_ROUTING_MSI in eventfd irqfd_wakeup and irqfd_update.
Eric Auger [Tue, 23 Jun 2015 14:55:02 +0000 (16:55 +0200)]
KVM: arm/arm64: build a default routing table
Implement a default routing table made of flat irqchip routing
entries (gsi = irqchip.pin) covering the VGIC SPI indexes.
This routing table is overwritten by the first user-space call
to KVM_SET_GSI_ROUTING ioctl.
Eric Auger [Tue, 7 Apr 2015 09:43:29 +0000 (11:43 +0200)]
KVM: arm/arm64: enable irqchip routing
This patch adds compilation and link against irqchip.
On ARM, irqchip routing is not really useful since there is
a single irqchip. However main motivation behind using irqchip
code is to enable MSI routing code. With the support of in-kernel
GICv3 ITS emulation, it now seems to be a MUST HAVE requirement.
Functions previously implemented in vgic.c and substitute
to more complex irqchip implementation are removed:
Eric Auger [Thu, 18 Jun 2015 08:30:19 +0000 (10:30 +0200)]
KVM: irqchip: convey devid to kvm_set_msi
on ARM, a devid field is populated in kvm_msi struct in case the
flag is set to KVM_MSI_VALID_DEVID. Let's populate the corresponding
kvm_kernel_irq_routing_entry devid field and set the msi type to
KVM_IRQ_ROUTING_EXTENDED_MSI.
Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Eric Auger [Thu, 18 Jun 2015 13:26:59 +0000 (15:26 +0200)]
KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI
On ARM, the MSI msg (address and data) comes along with
out-of-band device ID information. The device ID encodes the
device that writes the MSI msg. Let's convey the device id in
kvm_irq_routing_msi and use a new routing entry type to
indicate the devid is populated.
Andre Przywara [Fri, 10 Jul 2015 14:21:51 +0000 (15:21 +0100)]
KVM: arm64: enable ITS emulation as a virtual MSI controller
If userspace has provided a base address for the ITS register frame,
we enable the bits that advertise LPIs in the GICv3.
When the guest has enabled LPIs and the ITS, we enable the emulation
part by initializing the ITS data structures and trapping on ITS
register frame accesses by the guest.
Also we enable the KVM_SIGNAL_MSI feature to allow userland to inject
MSIs into the guest. Not having enabled the ITS emulation will lead
to a -ENODEV when trying to inject a MSI.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:50 +0000 (15:21 +0100)]
KVM: arm64: implement MSI injection in ITS emulation
When userland wants to inject a MSI into the guest, we have to use
our data structures to find the LPI number and the VCPU to receive
the interrupt.
Use the wrapper functions to iterate the linked lists and find the
proper Interrupt Translation Table Entry. Then set the pending bit
in this ITTE to be later picked up by the LR handling code. Kick
the VCPU which is meant to handle this interrupt.
We provide a VGIC emulation model specific routine for the actual
MSI injection. The wrapper functions return an error for models not
(yet) implementing MSIs (like the GICv2 emulation).
We also provide the handler for the ITS "INT" command, which allows a
guest to trigger an MSI via the ITS command queue.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:49 +0000 (15:21 +0100)]
KVM: arm64: implement ITS command queue command handlers
The connection between a device, an event ID, the LPI number and the
allocated CPU is stored in in-memory tables in a GICv3, but their
format is not specified by the spec. Instead software uses a command
queue in a ring buffer to let the ITS implementation use their own
format.
Implement handlers for the various ITS commands and let them store
the requested relation into our own data structures.
To avoid kmallocs inside the ITS spinlock, we preallocate possibly
needed memory outside of the lock and free that if it turns out to
be not needed (mostly error handling).
Error handling is very basic at this point, as we don't have a good
way of communicating errors to the guest (usually a SError).
The INT command handler is missing at this point, as we gain the
capability of actually injecting MSIs into the guest only later on.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:48 +0000 (15:21 +0100)]
KVM: arm64: sync LPI configuration and pending tables
The LPI configuration and pending tables of the GICv3 LPIs are held
in tables in (guest) memory. To achieve reasonable performance, we
cache this data in our own data structures, so we need to sync those
two views from time to time. This behaviour is well described in the
GICv3 spec and is also exercised by hardware, so the sync points are
well known.
Provide functions that read the guest memory and store the
information from the configuration and pending tables in the kernel.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:47 +0000 (15:21 +0100)]
KVM: arm64: handle pending bit for LPIs in ITS emulation
As the actual LPI number in a guest can be quite high, but is mostly
assigned using a very sparse allocation scheme, bitmaps and arrays
for storing the virtual interrupt status are a waste of memory.
We use our equivalent of the "Interrupt Translation Table Entry"
(ITTE) to hold this extra status information for a virtual LPI.
As the normal VGIC code cannot use it's fancy bitmaps to manage
pending interrupts, we provide a hook in the VGIC code to let the
ITS emulation handle the list register queueing itself.
LPIs are located in a separate number range (>=8192), so
distinguishing them is easy. With LPIs being only edge-triggered, we
get away with a less complex IRQ handling.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:46 +0000 (15:21 +0100)]
KVM: arm64: add data structures to model ITS interrupt translation
The GICv3 Interrupt Translation Service (ITS) uses tables in memory
to allow a sophisticated interrupt routing. It features device tables,
an interrupt table per device and a table connecting "collections" to
actual CPUs (aka. redistributors in the GICv3 lingo).
Since the interrupt numbers for the LPIs are allocated quite sparsely
and the range can be quite huge (8192 LPIs being the minimum), using
bitmaps or arrays for storing information is a waste of memory.
We use linked lists instead, which we iterate linearily. This works
very well with the actual number of LPIs/MSIs in the guest being
quite low. Should the number of LPIs exceed the number where iterating
through lists seems acceptable, we can later revisit this and use more
efficient data structures.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:45 +0000 (15:21 +0100)]
KVM: arm64: implement basic ITS register handlers
Add emulation for some basic MMIO registers used in the ITS emulation.
This includes:
- GITS_{CTLR,TYPER,IIDR}
- ID registers
- GITS_{CBASER,CREADR,CWRITER}
those implement the ITS command buffer handling
Most of the handlers are pretty straight forward, but CWRITER goes
some extra miles to allow fine grained locking. The idea here
is to let only the first instance iterate through the command ring
buffer, CWRITER accesses on other VCPUs meanwhile will be picked up
by that first instance and handled as well. The ITS lock is thus only
hold for very small periods of time and is dropped before the actual
command handler is called.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:44 +0000 (15:21 +0100)]
KVM: arm64: introduce ITS emulation file with stub functions
The ARM GICv3 ITS emulation code goes into a separate file, but
needs to be connected to the GICv3 emulation, of which it is an
option.
Introduce the skeleton with function stubs to be filled later.
Introduce the basic ITS data structure and initialize it, but don't
return any success yet, as we are not yet ready for the show.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>