Eric Auger [Thu, 18 Jun 2015 08:30:19 +0000 (10:30 +0200)]
KVM: irqchip: convey devid to kvm_set_msi
on ARM, a devid field is populated in kvm_msi struct in case the
flag is set to KVM_MSI_VALID_DEVID. Let's populate the corresponding
kvm_kernel_irq_routing_entry devid field and set the msi type to
KVM_IRQ_ROUTING_EXTENDED_MSI.
Signed-off-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Eric Auger [Thu, 18 Jun 2015 13:26:59 +0000 (15:26 +0200)]
KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI
On ARM, the MSI msg (address and data) comes along with
out-of-band device ID information. The device ID encodes the
device that writes the MSI msg. Let's convey the device id in
kvm_irq_routing_msi and use a new routing entry type to
indicate the devid is populated.
Andre Przywara [Fri, 10 Jul 2015 14:21:51 +0000 (15:21 +0100)]
KVM: arm64: enable ITS emulation as a virtual MSI controller
If userspace has provided a base address for the ITS register frame,
we enable the bits that advertise LPIs in the GICv3.
When the guest has enabled LPIs and the ITS, we enable the emulation
part by initializing the ITS data structures and trapping on ITS
register frame accesses by the guest.
Also we enable the KVM_SIGNAL_MSI feature to allow userland to inject
MSIs into the guest. Not having enabled the ITS emulation will lead
to a -ENODEV when trying to inject a MSI.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:50 +0000 (15:21 +0100)]
KVM: arm64: implement MSI injection in ITS emulation
When userland wants to inject a MSI into the guest, we have to use
our data structures to find the LPI number and the VCPU to receive
the interrupt.
Use the wrapper functions to iterate the linked lists and find the
proper Interrupt Translation Table Entry. Then set the pending bit
in this ITTE to be later picked up by the LR handling code. Kick
the VCPU which is meant to handle this interrupt.
We provide a VGIC emulation model specific routine for the actual
MSI injection. The wrapper functions return an error for models not
(yet) implementing MSIs (like the GICv2 emulation).
We also provide the handler for the ITS "INT" command, which allows a
guest to trigger an MSI via the ITS command queue.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:49 +0000 (15:21 +0100)]
KVM: arm64: implement ITS command queue command handlers
The connection between a device, an event ID, the LPI number and the
allocated CPU is stored in in-memory tables in a GICv3, but their
format is not specified by the spec. Instead software uses a command
queue in a ring buffer to let the ITS implementation use their own
format.
Implement handlers for the various ITS commands and let them store
the requested relation into our own data structures.
To avoid kmallocs inside the ITS spinlock, we preallocate possibly
needed memory outside of the lock and free that if it turns out to
be not needed (mostly error handling).
Error handling is very basic at this point, as we don't have a good
way of communicating errors to the guest (usually a SError).
The INT command handler is missing at this point, as we gain the
capability of actually injecting MSIs into the guest only later on.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:48 +0000 (15:21 +0100)]
KVM: arm64: sync LPI configuration and pending tables
The LPI configuration and pending tables of the GICv3 LPIs are held
in tables in (guest) memory. To achieve reasonable performance, we
cache this data in our own data structures, so we need to sync those
two views from time to time. This behaviour is well described in the
GICv3 spec and is also exercised by hardware, so the sync points are
well known.
Provide functions that read the guest memory and store the
information from the configuration and pending tables in the kernel.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:47 +0000 (15:21 +0100)]
KVM: arm64: handle pending bit for LPIs in ITS emulation
As the actual LPI number in a guest can be quite high, but is mostly
assigned using a very sparse allocation scheme, bitmaps and arrays
for storing the virtual interrupt status are a waste of memory.
We use our equivalent of the "Interrupt Translation Table Entry"
(ITTE) to hold this extra status information for a virtual LPI.
As the normal VGIC code cannot use it's fancy bitmaps to manage
pending interrupts, we provide a hook in the VGIC code to let the
ITS emulation handle the list register queueing itself.
LPIs are located in a separate number range (>=8192), so
distinguishing them is easy. With LPIs being only edge-triggered, we
get away with a less complex IRQ handling.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:46 +0000 (15:21 +0100)]
KVM: arm64: add data structures to model ITS interrupt translation
The GICv3 Interrupt Translation Service (ITS) uses tables in memory
to allow a sophisticated interrupt routing. It features device tables,
an interrupt table per device and a table connecting "collections" to
actual CPUs (aka. redistributors in the GICv3 lingo).
Since the interrupt numbers for the LPIs are allocated quite sparsely
and the range can be quite huge (8192 LPIs being the minimum), using
bitmaps or arrays for storing information is a waste of memory.
We use linked lists instead, which we iterate linearily. This works
very well with the actual number of LPIs/MSIs in the guest being
quite low. Should the number of LPIs exceed the number where iterating
through lists seems acceptable, we can later revisit this and use more
efficient data structures.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:45 +0000 (15:21 +0100)]
KVM: arm64: implement basic ITS register handlers
Add emulation for some basic MMIO registers used in the ITS emulation.
This includes:
- GITS_{CTLR,TYPER,IIDR}
- ID registers
- GITS_{CBASER,CREADR,CWRITER}
those implement the ITS command buffer handling
Most of the handlers are pretty straight forward, but CWRITER goes
some extra miles to allow fine grained locking. The idea here
is to let only the first instance iterate through the command ring
buffer, CWRITER accesses on other VCPUs meanwhile will be picked up
by that first instance and handled as well. The ITS lock is thus only
hold for very small periods of time and is dropped before the actual
command handler is called.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:44 +0000 (15:21 +0100)]
KVM: arm64: introduce ITS emulation file with stub functions
The ARM GICv3 ITS emulation code goes into a separate file, but
needs to be connected to the GICv3 emulation, of which it is an
option.
Introduce the skeleton with function stubs to be filled later.
Introduce the basic ITS data structure and initialize it, but don't
return any success yet, as we are not yet ready for the show.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:43 +0000 (15:21 +0100)]
KVM: arm64: handle ITS related GICv3 redistributor registers
In the GICv3 redistributor there are the PENDBASER and PROPBASER
registers which we did not emulate so far, as they only make sense
when having an ITS. In preparation for that emulate those MMIO
accesses by storing the 64-bit data written into it into a variable
which we later read in the ITS emulation.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:42 +0000 (15:21 +0100)]
KVM: arm64: Introduce new MMIO region for the ITS base address
The ARM GICv3 ITS controller requires a separate register frame to
cover ITS specific registers. Add a new VGIC address type and store
the address in a field in the vgic_dist structure.
Provide a function to check whether userland has provided the address,
so ITS functionality can be guarded by that check.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:41 +0000 (15:21 +0100)]
KVM: arm/arm64: make GIC frame address initialization model specific
Currently we initialize all the possible GIC frame addresses in one
function, without looking at the specific GIC model we instantiate
for the guest.
As this gets confusing when adding another VGIC model later, lets
move these initializations into the respective model's init functions.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:40 +0000 (15:21 +0100)]
KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities
KVM capabilities can be a per-VM property, though ARM/ARM64 currently
does not pass on the VM pointer to the architecture specific
capability handlers.
Add a "struct kvm*" parameter to those function to later allow proper
per-VM capability reporting.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:39 +0000 (15:21 +0100)]
KVM: arm/arm64: add emulation model specific destroy function
Currently we destroy the VGIC emulation in one function that cares for
all emulated models. To be on par with init_model (which is model
specific), lets introduce a per-emulation-model destroy method, too.
Use it for a tiny GICv3 specific code already, later it will be handy
for the ITS emulation.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
KVM: extend struct kvm_msi to hold a 32-bit device ID
The ARM GICv3 ITS MSI controller requires a device ID to be able to
assign the proper interrupt vector. On real hardware, this ID is
sampled from the bus. To be able to emulate an ITS controller, extend
the KVM MSI interface to let userspace provide such a device ID. For
PCI devices, the device ID is simply the 16-bit bus-device-function
triplet, which should be easily available to the userland tool.
Also there is a new KVM capability which advertises whether the
current VM requires a device ID to be set along with the MSI data.
This flag is still reported as not available everywhere, later we will
enable it when ITS emulation is used.
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Reviewed-by: Eric Auger <eric.auger@linaro.org> Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Andre Przywara [Fri, 10 Jul 2015 14:21:37 +0000 (15:21 +0100)]
KVM: arm/arm64: VGIC: don't track used LRs in the distributor
Currently we track which IRQ has been mapped to which VGIC list
register and also have to synchronize both. We used to do this
to hold some extra state (for instance the active bit).
It turns out that this extra state in the LRs is no longer needed and
this extra tracking causes some pain later.
Remove the tracking feature (lr_map and lr_used) and get rid of
quite some code on the way.
On a guest exit we pick up all still pending IRQs from the LRs and put
them back in the distributor. We don't care about active-only IRQs,
so we keep them in the LRs. They will be retired either by our
vgic_process_maintenance() routine or by the GIC hardware in case of
edge triggered interrupts.
In places where we scan LRs we now use our shadow copy of the ELRSR
register directly.
This code change means we lose the "piggy-back" optimization, which
would re-use an active-only LR to inject the pending state on top of
it. Tracing with various workloads shows that this actually occurred
very rarely, the ballpark figure is about once every 10,000 exits
in a disk I/O heavy workload. Also the list registers don't seem to
as scarce as assumed, with all 4 LRs on the popular implementations
used less than once every 100,000 exits.
This has been briefly tested on Midway, Juno and the model (the latter
both with GICv2 and GICv3 guests).
Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
David Daney [Mon, 6 Apr 2015 23:00:29 +0000 (16:00 -0700)]
net/mlx4: Remove improper usage of dma_alloc_coherent().
The dma_alloc_coherent() function returns a virtual address which can
be used for coherent access to the underlying memory. On some
architectures, like arm64, undefined behavior results if this memory is
also accessed via virtual mappings that are not coherent. Because of
their undefined nature, operations like virt_to_page() return garbage
when passed virtual addresses obtained from dma_alloc_coherent(). Any
subsequent mappings via vmap() of the garbage page values are unusable
and result in bad things like bus errors (synchronous aborts in ARM64
speak).
The MLX4 driver contains code that does the equivalent of:
vmap(virt_to_page(dma_alloc_coherent))
This results in an OOPs when the device is opened.
To fix this...
Always use result of dma_alloc_coherent() directly.
Remove 'max_direct' parameter to mlx4_buf_alloc(), as it is unused,
and adjust all callers.
Remove mlx4_en_map_buffer() and mlx4_en_unmap_buffer() as they now do
nothing, and adjust all callers.
Remove 'page_list' element from struct mlx4_buf as it is unused.
Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Sunil Goutham [Sun, 30 Aug 2015 09:29:16 +0000 (12:29 +0300)]
net: thunderx: Support for internal loopback mode
Support for setting VF's corresponding BGX LMAC in internal
loopback mode. This mode can be used for verifying basic HW
functionality such as packet I/O, RX checksum validation,
CQ/RBDR interrupts, stats e.t.c. Useful when DUT has no external
network connectivity.
'loopback' mode can be enabled or disabled via ethtool.
Note: This feature is not supported when no of VFs enabled are
morethan no of physical interfaces i.e active BGX LMACs
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d77a2384988fd397cf4f71417b9d971aa435758d) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Sunil Goutham [Sun, 30 Aug 2015 09:29:15 +0000 (12:29 +0300)]
net: thunderx: Support for upto 96 queues for a VF
This patch adds support for handling multiple qsets assigned to a
single VF. There by increasing no of queues from earlier 8 to max
no of CPUs in the system i.e 48 queues on a single node and 96 on
dual node system. User doesn't have option to assign which Qsets/VFs
to be merged. Upon request from VF, PF assigns next free Qsets as
secondary qsets. To maintain current behavior no of queues is kept
to 8 by default which can be increased via ethtool.
If user wants to unbind NICVF driver from a secondary Qset then it
should be done after tearing down primary VF's interface.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 92dc87697e6a71675a9e9eec04ebecd8cf4837a3) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Sunil Goutham [Sun, 30 Aug 2015 09:29:14 +0000 (12:29 +0300)]
net: thunderx: Rework interrupt handling
Rework interrupt handler to avoid checking IRQ affinity of
CQ interrupts. Now separate handlers are registered for each IRQ
including RBDR. Register interrupt handlers for only those
which are being used. Add nicvf_dump_intr_status() and use it
in irq handlers.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 39ad6eea6c1a01b69abb1102a767697fb9349830) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Sunil Goutham [Sun, 30 Aug 2015 09:29:13 +0000 (12:29 +0300)]
net: thunderx: Support for HW VLAN stripping
This patch configures HW to strip 802.1Q header if found in a
receiving packet. The stripped VLAN ID and TCI information is
passed on to software via CQE_RX. Also sets netdev's 'vlan_features'
so that other HW offload features can be used for tagged packets.
This offload feature can be enabled or disabled via ethtool.
Network stack normally ignores RPS for 802.1Q packets and hence low
throughput. With this offload enabled throughput for tagged packets
will be almost same as normal packets.
Note: This patch doesn't enable HW VLAN insertion for transmit packets.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit aa2e259b474a4f52ecc9f6e0d444547de0aac4b2) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Sunil Goutham [Sun, 30 Aug 2015 09:29:12 +0000 (12:29 +0300)]
net: thunderx: Receive hashing HW offload support
Adding support for receive hashing HW offload by using RSS_ALG
and RSS_TAG fields of CQE_RX descriptor. Also removed dependency
on minimum receive queue count to configure RSS so that hash is
always generated.
This hash is used by RPS logic to distribute flows across multiple
CPUs. Offload can be disabled via ethtool.
Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 38bb5d4f4f988c98035fca003138dd84471432f2) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Sunil Goutham [Sun, 30 Aug 2015 09:29:11 +0000 (12:29 +0300)]
net: thunderx: mailboxes: remove code duplication
Use the nicvf_send_msg_to_pf() function in the mailbox code.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6051cba77c1c768d954cf9e423c44bcb85b9adb8) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Aleksey Makarov [Sun, 30 Aug 2015 09:29:09 +0000 (12:29 +0300)]
net: thunderx: fix MAINTAINERS
The liquidio and thunder drivers have different maintainers.
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 322e5cc5c6c03584ff9362357fc1448b5e442e9e) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Based on code from: Narinder Dhillon <ndhillon@cavium.com>
Tomasz Nowicki <tomasz.nowicki@linaro.org>
Robert Richter <rrichter@cavium.com>
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 46b903a01c053d0c94975ea7a6819618f121d3d6) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Robert Richter [Tue, 11 Aug 2015 00:58:36 +0000 (17:58 -0700)]
net: thunder: Factor out DT specific code in BGX
Separate DT code in preparation for follow-on ACPI integration.
Based on code from: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: David Daney <david.daney@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit de387e1156c76b273529f1803c6bd87b61eac2c5) Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
arm64: gicv3: its: Increase FORCE_MAX_ZONEORDER for Cavium ThunderX
In case of ARCH_THUNDER, there is a need to allocate the GICv3 ITS table
which is bigger than the allowed max order. So we are forcing it only in
case of 4KB page size.
Robert Richter [Mon, 29 Jun 2015 16:25:45 +0000 (18:25 +0200)]
irqchip, gicv3-its: Add HW revision detection and configuration
Some GIC revisions require an individual configuration to esp. add
workarounds for HW bugs. This patch implements generic code to parse
the hw revision provided by an IIDR register value and runs specific
code if hw matches. There are functions that read the IIDR registers
for GICV3 and ITS (GICD_IIDR/GITS_IIDR) and then go through a list of
init functions to be called for specific versions.
A MIDR register value may also be used, this is especially useful for
hw detection from a guest.
The patch is needed to implement workarounds for HW errata in Cavium's
ThunderX GICV3.
v4:
* only enable hw detection for its in its_enable_quirks()
* removed gicv3_check_capabilities()
v3:
* use arm64 errata framework for midr check
v2:
* adding MIDR check
Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Robert Richter [Wed, 27 Aug 2014 14:24:24 +0000 (16:24 +0200)]
irqchip, gicv3-its: Read typer register outside the loop
No need to read the typer register in the loop. Values do not change.
This patch is basically a prerequisite for a follow-on patch that adds
errata code for Cavium ThunderX. It moves the calculation of the
number of id entries to the beginning of the function close to other
setup values that are needed to allocate the its table. Now we have a
central location to modify the setup parameters and the errata code
can be implemented in a single block.
Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Robert Richter [Mon, 29 Jun 2015 16:46:22 +0000 (18:46 +0200)]
irqchip, gicv3: Workaround for Cavium ThunderX erratum 23154
This patch implements Cavium ThunderX erratum 23154.
The gicv3 of ThunderX requires a modified version for reading the IAR
status to ensure data synchronization. Since this is in the fast-path
and called with each interrupt, runtime patching is used using jump
label patching for smallest overhead (no-op). This is the same
technique as used for tracepoints.
v4:
* simplify code to only use cpus_have_cap() in gicv3_enable_quirks()
v3:
* fix erratum to be dependend from midr
* use arm64 errata framework
v2:
* implement code in a single asm() to keep instruction sequence
* added comment to the code that explains the erratum
* apply workaround also if running as guest, thus check MIDR
Signed-off-by: Robert Richter <rrichter@cavium.com> Signed-off-by: Vadim Lomovtsev <Vadim.Lomovtsev@caviumnetworks.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg
While making the root blkg unconditional, ec13b1d6f0a0 ("blkcg: always
create the blkcg_gq for the root blkcg") removed the part which clears
q->root_blkg and ->root_rl.blkg during q exit. This leaves the two
pointers dangling after blkg_destroy_all(). blk-throttle exit path
performs blkg traversals and dereferences ->root_blkg and can lead to
the following oops.
While the bug is a straigh-forward use-after-free bug, it is tricky to
reproduce because blkg release is RCU protected and the rest of exit
path usually finishes before RCU grace period.
This patch fixes the bug by updating blkg_destro_all() to clear
q->root_blkg and ->root_rl.blkg.
Tim Gardner [Tue, 4 Aug 2015 17:26:04 +0000 (11:26 -0600)]
workqueue: Make flush_workqueue() available again to non GPL modules
Commit 37b1ef31a568fc02e53587620226e5f3c66454c8 ("workqueue: move
flush_scheduled_work() to workqueue.h") moved the exported non GPL
flush_scheduled_work() from a function to an inline wrapper.
Unfortunately, it directly calls flush_workqueue() which is a GPL function.
This has the effect of changing the licensing requirement for this function
and makes it unavailable to non GPL modules.
Thomas Hellstrom [Wed, 26 Aug 2015 12:49:21 +0000 (05:49 -0700)]
drm/vmwgfx: Allow dropped masters render-node like access on legacy nodes v2
Applications like gnome-shell may try to render after dropping master
privileges. Since the driver should now be safe against this scenario,
allow those applications to use their legacy node like a render node.
v2: Add missing return statement.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Thomas Hellstrom [Thu, 25 Jun 2015 17:47:43 +0000 (10:47 -0700)]
vmwgfx: Rework device initialization
This commit reworks device initialization so that we always enable the
FIFO at driver load, deferring SVGA enable until either first modeset
or fbdev enable.
This should always leave the fifo properly enabled for render- and
control nodes.
In addition,
*) We disable the use of VRAM when SVGA is not enabled.
*) We simplify PM support so that we only throw out resources on hibernate,
not on suspend, since the device keeps its state on suspend.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Jonathon Jongsma [Thu, 20 Aug 2015 19:04:32 +0000 (12:04 -0700)]
drm/qxl: validate monitors config modes
Due to some recent changes in
drm_helper_probe_single_connector_modes_merge_bits(), old custom modes
were not being pruned properly. In current kernels,
drm_mode_validate_basic() is called to sanity-check each mode in the
list. If the sanity-check passes, the mode's status gets set to to
MODE_OK. In older kernels this check was not done, so old custom modes
would still have a status of MODE_UNVERIFIED at this point, and would
therefore be pruned later in the function.
As a result of this new behavior, the list of modes for a device always
includes every custom mode ever configured for the device, with the
largest one listed first. Since desktop environments usually choose the
first preferred mode when a hotplug event is emitted, this had the
result of making it very difficult for the user to reduce the size of
the display.
The qxl driver did implement the mode_valid connector function, but it
was empty. In order to restore the old behavior where old custom modes
are pruned, we implement a proper mode_valid function for the qxl
driver. This function now checks each mode against the last configured
custom mode and the list of standard modes. If the mode doesn't match
any of these, its status is set to MODE_BAD so that it will be pruned as
expected.
Signed-off-by: Jonathon Jongsma <jjongsma at redhat.com> Cc: stable at vger.kernel.org
Hans de Goede [Thu, 23 Jul 2015 15:20:12 +0000 (17:20 +0200)]
nv46: Change mc subdev oclass from nv44 to nv4c
MSI interrupts appear to not work for nv46 based cards. Change the mc
subdev oclass for these cards from nv44 to nv4c, the nv4c mc code is
identical to the nv44 mc code except that it does not use msi
(it does not define a msi_rearm callback).
Eric Sandeen [Wed, 5 Aug 2015 22:13:58 +0000 (15:13 -0700)]
ext4: don't manipulate recovery flag when freezing no-journal fs
At some point along this sequence of changes:
f6e63f9 ext4: fold ext4_nojournal_sops into ext4_sops bb04457 ext4: support freezing ext2 (nojournal) file systems 9ca9238 ext4: Use separate super_operations structure for no_journal filesystems
ext4 started setting needs_recovery on filesystems without journals
when they are unfrozen. This makes no sense, and in fact confuses
blkid to the point where it doesn't recognize the filesystem at all.
(freeze ext2; unfreeze ext2; run blkid; see no output; run dumpe2fs,
see needs_recovery set on fs w/ no journal).
To fix this, don't manipulate the INCOMPAT_RECOVER feature on
filesystems without journals.
Reported-by: Stu Mark <smark@xxxxxxxxx> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
Chris Leech [Tue, 16 Jun 2015 23:07:13 +0000 (16:07 -0700)]
iSCSI: let session recovery_tmo sysfs writes persist across recovery
The iSCSI session recovery_tmo setting is writeable in sysfs, but it's
also set every time a connection is established when parameters are set
from iscsid over netlink. That results in the timeout being reset to
the default value after every recovery.
The DM multipath tools want to use the sysfs interface to lower the
default timeout when there are multiple paths to fail over. It has
caused confusion that we have a writeable sysfs value that seem to keep
resetting itself.
This patch adds an in-kernel flag that gets set once a sysfs write
occurs, and then ignores netlink parameter setting once it's been
modified via the sysfs interface. My thinking here is that the sysfs
interface is much simpler for external tools to influence the session
timeout, but if we're going to allow it to be modified directly we
should ensure that setting is maintained.
Signed-off-by: Chris Leech <cleech@redhat.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <JBottomley@Odin.com>
Hans de Goede [Fri, 24 Jul 2015 09:45:28 +0000 (11:45 +0200)]
ideapad-laptop: Add Lenovo Yoga 3 14 to no_hw_rfkill dmi list
Like some of the other Yoga models the Lenovo Yoga 3 14 does not have a
hw rfkill switch, and trying to read the hw rfkill switch through the
ideapad module causes it to always reported blocking breaking wifi.
This commit adds the Lenovo Yoga 3 14 to the no_hw_rfkill dmi list, fixing
the wifi breakage.
Dave Young [Fri, 18 Sep 2015 15:27:32 +0000 (15:27 +0000)]
kexec/uefi: copy secure_boot flag in boot params across kexec reboot
Kexec reboot in case secure boot being enabled does not keep the secure boot
mode in new kernel, so later one can load unsigned kernel via legacy kexec_load.
In this state, the system is missing the protections provided by secure boot.
Adding a patch to fix this by retain the secure_boot flag in original kernel.
secure_boot flag in boot_params is set in EFI stub, but kexec bypasses the stub.
Fixing this issue by copying secure_boot flag across kexec reboot.
HID: chicony: Add support for Acer Aspire Switch 12
Acer Aspire Switch 12 keyboard Chicony's controller reports too big usage
index on the 1st interface. The patch fixes the report. The work based on
solution from drivers/hid/hid-holtek-mouse.c
When resuming, the bluetooth driver needs to re-request the
firmware. This re-request is happening before usermodehelper
is fully enabled. If the firmware load succeeded previously, the
caching behavior of the firmware code typically negates the
need to call the usermodehelper code again and the request
succeeds. If the firmware was never loaded because it isn't
actually present in the file system, this results in a call
to usermodehelper and a failure warning every resume.
The proper fix is to add a reset_resume functionality to the
btusb driver to be able to handle the resume case. The
work for this is ongoing so in the mean time just silence
the warning since we know it's a problem.
Bugzilla: 1133378
Upstream-status: Working on it. It's a difficult problem :( Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
Input - synaptics: pin 3 touches when the firmware reports 3 fingers
Synaptics PS/2 touchpad can send only 2 touches in a report. They can
detect 4 or 5 and this information is valuable.
In commit 63c4fda (Input: synaptics - allocate 3 slots to keep stability
in image sensors), we allocate 3 slots, but we still continue to report
the 2 available fingers. That means that the client sees 2 used slots while
there is a total of 3 fingers advertised by BTN_TOOL_TRIPLETAP.
For old kernels this is not a problem because max_slots was 2 and libinput/
xorg-synaptics knew how to deal with that. Now that max_slot is 3, the
clients ignore BTN_TOOL_TRIPLETAP and count the actual used slots (so 2).
It then gets confused when receiving the BTN_TOOL_TRIPLETAP and DOUBLETAP
information, and goes wild.
We can pin the 3 slots until we get a total number of fingers below 2.
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1212230 Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com>
xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
There is no need for this at all. Worst it means that if
the guest tries to write to BARs it could lead (on certain
platforms) to PCI SERR errors.
Please note that with af6fc858a35b90e89ea7a7ee58e66628c55c776b
"xen-pciback: limit guest control of command register"
a guest is still allowed to enable those control bits (safely), but
is not allowed to disable them and that therefore a well behaved
frontend which enables things before using them will still
function correctly.
This is done via an write to the configuration register 0x4 which
triggers on the backend side:
command_write
\- pci_enable_device
\- pci_enable_device_flags
\- do_pci_enable_device
\- pcibios_enable_device
\-pci_enable_resourcess
[which enables the PCI_COMMAND_MEMORY|PCI_COMMAND_IO]
However guests (and drivers) which don't do this could cause
problems, including the security issues which XSA-120 sought
to address.
Reported-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Dave Jones [Tue, 24 Jun 2014 12:43:34 +0000 (08:43 -0400)]
watchdog: Disable watchdog on virtual machines.
For various reasons, VMs seem to trigger the soft lockup detector a lot,
in cases where it's just not possible for a lockup to occur.
(Example: https://bugzilla.redhat.com/show_bug.cgi?id=971139)
In some cases it seems that the host just never scheduled the app running
the VM for a very long time (Could be the host was under heavy load).
Just disable the detector on VMs.
Bugzilla: 971139
Upstream-status: Fedora mustard for now
Éric Piel [Thu, 3 Nov 2011 15:22:40 +0000 (16:22 +0100)]
lis3: improve handling of null rate
When obtaining a rate of 0, we would disable the device supposely
because it seems to behave incorectly. It actually only comes from the
fact that the device is off and on lis3dc it's reflected in the rate.
So handle this nicely by just waiting a safe time, and then using the
device as normally.
Bastien Nocera [Thu, 20 May 2010 14:30:31 +0000 (10:30 -0400)]
disable i8042 check on apple mac
As those computers never had any i8042 controllers, and the
current lookup code could potentially lock up/hang/wait for
timeout for long periods of time.
Fixes intermittent hangs on boot on a MacbookAir1,1
Bugzilla: N/A
Upstream-status: http://lkml.indiana.edu/hypermail/linux/kernel/1005.0/00938.html (and pinged on Dec 17, 2013)
Adam Jackson [Wed, 13 Nov 2013 15:17:24 +0000 (10:17 -0500)]
drm/i915: hush check crtc state
This is _by far_ the most common backtrace for i915 on retrace.fp.o, and
it's mostly useless noise. There's not enough context when it's generated
to know if something actually went wrong. Downgrade the message to
KMS debugging so we can still get it if we want it.
Josh Boyer [Thu, 3 Oct 2013 14:14:23 +0000 (10:14 -0400)]
MODSIGN: Support not importing certs from db
If a user tells shim to not use the certs/hashes in the UEFI db variable
for verification purposes, shim will set a UEFI variable called MokIgnoreDB.
Have the uefi import code look for this and not import things from the db
variable.
Josh Boyer [Fri, 26 Oct 2012 16:42:16 +0000 (12:42 -0400)]
MODSIGN: Import certificates from UEFI Secure Boot
Secure Boot stores a list of allowed certificates in the 'db' variable.
This imports those certificates into the system trusted keyring. This
allows for a third party signing certificate to be used in conjunction
with signed modules. By importing the public certificate into the 'db'
variable, a user can allow a module signed with that certificate to
load. The shim UEFI bootloader has a similar certificate list stored
in the 'MokListRT' variable. We import those as well.
In the opposite case, Secure Boot maintains a list of disallowed
certificates in the 'dbx' variable. We load those certificates into
the newly introduced system blacklist keyring and forbid any module
signed with those from loading.
Josh Boyer [Fri, 26 Oct 2012 16:36:24 +0000 (12:36 -0400)]
KEYS: Add a system blacklist keyring
This adds an additional keyring that is used to store certificates that
are blacklisted. This keyring is searched first when loading signed modules
and if the module's certificate is found, it will refuse to load. This is
useful in cases where third party certificates are used for module signing.
Josh Boyer [Fri, 20 Jun 2014 12:53:24 +0000 (08:53 -0400)]
hibernate: Disable in a signed modules environment
There is currently no way to verify the resume image when returning
from hibernate. This might compromise the signed modules trust model,
so until we can work with signed hibernate images we disable it in
a secure modules environment.
Josh Boyer [Wed, 6 Feb 2013 00:25:05 +0000 (19:25 -0500)]
efi: Disable secure boot if shim is in insecure mode
A user can manually tell the shim boot loader to disable validation of
images it loads. When a user does this, it creates a UEFI variable called
MokSBState that does not have the runtime attribute set. Given that the
user explicitly disabled validation, we can honor that and not enable
secure boot mode if that variable is set.
Matthew Garrett [Fri, 9 Aug 2013 22:36:30 +0000 (18:36 -0400)]
Add option to automatically enforce module signatures when in Secure Boot mode
UEFI Secure Boot provides a mechanism for ensuring that the firmware will
only load signed bootloaders and kernels. Certain use cases may also
require that all kernel modules also be signed. Add a configuration option
that enforces this automatically when enabled.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Matthew Garrett [Fri, 8 Feb 2013 19:12:13 +0000 (11:12 -0800)]
x86: Restrict MSR access when module loading is restricted
Writing to MSRs should not be allowed if module loading is restricted,
since it could lead to execution of arbitrary code in kernel mode. Based
on a patch by Kees Cook.
Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Matthew Garrett [Fri, 9 Aug 2013 07:33:56 +0000 (03:33 -0400)]
kexec: Disable at runtime if the kernel enforces module loading restrictions
kexec permits the loading and execution of arbitrary code in ring 0, which
is something that module signing enforcement is meant to prevent. It makes
sense to disable kexec in this situation.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Josh Boyer [Mon, 25 Jun 2012 23:57:30 +0000 (19:57 -0400)]
acpi: Ignore acpi_rsdp kernel parameter when module loading is restricted
This option allows userspace to pass the RSDP address to the kernel, which
makes it possible for a user to circumvent any restrictions imposed on
loading modules. Disable it in that case.
Matthew Garrett [Fri, 9 Mar 2012 14:28:15 +0000 (09:28 -0500)]
Restrict /dev/mem and /dev/kmem when module loading is restricted
Allowing users to write to address space makes it possible for the kernel
to be subverted, avoiding module loading restrictions. Prevent this when
any restrictions have been imposed on loading modules.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Matthew Garrett [Fri, 9 Mar 2012 13:46:50 +0000 (08:46 -0500)]
asus-wmi: Restrict debugfs interface when module loading is restricted
We have no way of validating what all of the Asus WMI methods do on a
given machine, and there's a risk that some will allow hardware state to
be manipulated in such a way that arbitrary code can be executed in the
kernel, circumventing module loading restrictions. Prevent that if any of
these features are enabled.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Matthew Garrett [Fri, 9 Mar 2012 13:39:37 +0000 (08:39 -0500)]
ACPI: Limit access to custom_method
custom_method effectively allows arbitrary access to system memory, making
it possible for an attacker to circumvent restrictions on module loading.
Disable it if any such restrictions have been enabled.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Matthew Garrett [Thu, 8 Mar 2012 15:35:59 +0000 (10:35 -0500)]
x86: Lock down IO port access when module security is enabled
IO port access would permit users to gain access to PCI configuration
registers, which in turn (on a lot of hardware) give access to MMIO register
space. This would potentially permit root to trigger arbitrary DMA, so lock
it down by default.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Matthew Garrett [Thu, 8 Mar 2012 15:10:38 +0000 (10:10 -0500)]
PCI: Lock down BAR access when module security is enabled
Any hardware that can potentially generate DMA has to be locked down from
userspace in order to avoid it being possible for an attacker to modify
kernel code, allowing them to circumvent disabled module loading or module
signing. Default to paranoid - in future we can potentially relax this for
sufficiently IOMMU-isolated devices.
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Matthew Garrett [Fri, 9 Aug 2013 21:58:15 +0000 (17:58 -0400)]
Add secure_modules() call
Provide a single call to allow kernel code to determine whether the system
has been configured to either disable module loading entirely or to load
only modules signed with a trusted key.
Bugzilla: N/A
Upstream-status: Fedora mustard. Replaced by securelevels, but that was nak'd
Signed-off-by: Matthew Garrett <matthew.garrett@nebula.com>
Josh Stone [Fri, 21 Nov 2014 18:40:00 +0000 (10:40 -0800)]
Kbuild: Add an option to enable GCC VTA
Due to recent codegen issues, gcc -fvar-tracking-assignments was
unconditionally disabled in commit 2062afb4f804a ("Fix gcc-4.9.0
miscompilation of load_balance() in scheduler"). However, this reduces
the debuginfo coverage for variable locations, especially in inline
functions. VTA is certainly not perfect either in those cases, but it
is much better than without. With compiler versions that have fixed the
codegen bugs, we would prefer to have the better details for SystemTap,
and surely other debuginfo consumers like perf will benefit as well.
This patch simply makes CONFIG_DEBUG_INFO_VTA an option. I considered
Frank and Linus's discussion of a cc-option-like -fcompare-debug test,
but I'm convinced that a narrow test of an arch-specific codegen issue
is not really useful. GCC has their own regression tests for this, so
I'd suggest GCC_COMPARE_DEBUG=-fvar-tracking-assignments-toggle is more
useful for kernel developers to test confidence.
In fact, I ran into a couple more issues when testing for this patch[1],
although neither of those had any codegen impact.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1140872
With gcc-4.9.2-1.fc22, I can now build v3.18-rc5 with Fedora's i686 and
x86_64 configs, and this is completely clean with GCC_COMPARE_DEBUG.
Cc: Frank Ch. Eigler <fche@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Markus Trippelsdorf <markus@trippelsdorf.de> Cc: Michel Dänzer <michel@daenzer.net> Signed-off-by: Josh Stone <jistone@redhat.com>
Peter Jones [Thu, 25 Sep 2008 20:23:33 +0000 (16:23 -0400)]
input: silence i8042 noise
Don't print an error message just because there's no i8042 chip.
Some systems, such as EFI-based Apple systems, won't necessarily have an
i8042 to initialize. We shouldn't be printing an error message in this
case, since not detecting the chip is the correct behavior.