xen: update the comment about smp_processor_id() in do_softirq().
It's said that "SCHEDULE_SOFTIRQ may move us to another processor",
which I don't find it very clear, nor that much representative of
what the situation actually is.
We have two possible situations:
- context_switch() is a "terminal function" (i.e., it jumps,
rather than returning normally. This happens on x86;
- context_switch() "just" returns, and another step of the
loop is executed. This happens on ARM.
The real reason why we need to re-sample smp_processor_id() is
that, on ARM, where the function return, we get back inside
of the loop, with (potentially) a different stack (because
context_switch() changed what's in the stack pointer register).
And in this case, what's in the new stack, at the address of
the local variable 'cpu', may not be consistent.
State this in the comment.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Changes from v1:
* v1 had a patch that was getting rid of the call to smp_processor_id(), but
that is wrong (on ARM). So, it has been replaced with this, which only
adjust the comment.
--- Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: George Dunlap <George.Dunlap@eu.citrix.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Julien Grall <julien.grall@arm.com> Cc: Tim Deegan <tim@xen.org>
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci: do not expose unneeded functions to the user-space test harness
Some functions in vpci.c (vpci_remove_device and vpci_add_handlers)
are not used by the user-space test harness, so guard them with
__XEN__ in order to avoid exposing them to the user-space test
harness.
Requested-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci/msix: add MSI-X handlers
Add handlers for accesses to the MSI-X message control field on the
PCI configuration space, and traps for accesses to the memory region
that contains the MSI-X table and PBA. This traps detect attempts from
the guest to configure MSI-X interrupts and properly sets them up.
Note that accesses to the Table Offset, Table BIR, PBA Offset and PBA
BIR are not trapped by Xen at the moment.
Finally, turn the panic in the Dom0 PVH builder into a warning.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
[IO] Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci: add a priority parameter to the vPCI register initializer
This is needed for MSI-X, since MSI-X will need to be initialized
before parsing the BARs, so that the header BAR handlers are aware of
the MSI-X related holes and make sure they are not mapped in order for
the trap handlers to work properly.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
[ARM] Acked-by: Julien Grall <julien.grall@arm.com>
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci/msi: add MSI handlers
Add handlers for the MSI control, address, data and mask fields in
order to detect accesses to them and setup the interrupts as requested
by the guest.
Note that the pending register is not trapped, and the guest can
freely read/write to it.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
[IO] Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
x86/pt: mask MSI vectors on unbind
When a MSI device with per-vector masking capabilities is detected or
added to Xen all the vectors are masked when initializing it. This
implies that the first time the interrupt is bound to a domain it's
masked.
This however only applies to the first time the interrupt is bound
because neither the unbind nor the pirq unmap will mask the vector
again. In order to fix this re-mask the interrupt when unbinding it
from a guest. This makes sure that pairs of bind/unbind will always
get the same masking state.
Note that no issues have been reported regarding this behavior because
QEMU always uses the newly introduced XEN_PT_GFLAGSSHIFT_UNMASKED when
binding interrupts, so it's always unmasked.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Thu, 22 Mar 2018 14:00:00 +0000 (15:00 +0100)]
vpci: add header handlers
Introduce a set of handlers that trap accesses to the PCI BARs and the
command register, in order to snoop BAR sizing and BAR relocation.
The command handler is used to detect changes to bit 2 (response to
memory space accesses), and maps/unmaps the BARs of the device into
the guest p2m. A rangeset is used in order to figure out which memory
to map/unmap. This makes it easier to keep track of the possible
overlaps with other BARs, and will also simplify MSI-X support, where
certain regions of a BAR might be used for the MSI-X table or PBA.
The BAR register handlers are used to detect attempts by the guest to
size or relocate the BARs.
Note that the long running BAR mapping and unmapping operations are
deferred to be performed by hvm_io_pending, so that they can be safely
preempted.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
[IO] Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
pci: split code to size BARs from pci_add_device
So that it can be called from outside in order to get the size of regular PCI
BARs. This will be required in order to map the BARs from PCI devices into PVH
Dom0 p2m.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
x86/mmcfg: add handlers for the PVH Dom0 MMCFG areas
Introduce a set of handlers for the accesses to the MMCFG areas. Those
areas are setup based on the contents of the hardware MMCFG tables,
and the list of handled MMCFG areas is stored inside of the hvm_domain
struct.
The read/writes are forwarded to the generic vpci handlers once the
address is decoded in order to obtain the device and register the
guest is trying to access.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monne [Thu, 22 Mar 2018 13:59:00 +0000 (14:59 +0100)]
vpci: introduce basic handlers to trap accesses to the PCI config space
This functionality is going to reside in vpci.c (and the corresponding
vpci.h header), and should be arch-agnostic. The handlers introduced
in this patch setup the basic functionality required in order to trap
accesses to the PCI config space, and allow decoding the address and
finding the corresponding handler that should handle the access
(although no handlers are implemented).
Note that the traps to the PCI IO ports registers (0xcf8/0xcfc) are
setup inside of a x86 HVM file, since that's not shared with other
arches.
A new XEN_X86_EMU_VPCI x86 domain flag is added in order to signal Xen
whether a domain should use the newly introduced vPCI handlers, this
is only enabled for PVH Dom0 at the moment.
A very simple user-space test is also provided, so that the basic
functionality of the vPCI traps can be asserted. This has been proven
quite helpful during development, since the logic to handle partial
accesses or accesses that expand across multiple registers is not
trivial.
The handlers for the registers are added to a linked list that's keep
sorted at all times. Both the read and write handlers support accesses
that expand across multiple emulated registers and contain gaps not
emulated.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
[IO parts] Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
[ARM] Acked-by: Julien Grall <julien.grall@arm.com>
[Tools] Acked-by: Wei Liu <wei.liu2@citrix.com>
Jan Beulich [Thu, 22 Mar 2018 17:02:19 +0000 (18:02 +0100)]
x86emul: fix #XM delivery typo
This clearly wasn't meant the way it was originally written.
Reported-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 22 Mar 2018 09:43:54 +0000 (10:43 +0100)]
x86/shadow: fold sh_x86_emulate_{write,cmpxchg}() into their only callers
The functions have a single caller only and are now guest paging type
independent (except for the tracing part), so have no need to exist as
standalone ones, let alone multiple times. Replace the two prior hooks
with just a single one for dealing with tracing.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 22 Mar 2018 09:43:21 +0000 (10:43 +0100)]
x86/shadow: fully move unmap-dest into common code
By adding guest PTE size to shadow emulation context, the work begun by
commit 2c80710a78 ("x86/shadow: compile most write emulation code just
once") can be completed, paving the road for further movement into
common code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 22 Mar 2018 09:42:31 +0000 (10:42 +0100)]
x86/HVM: use x86emul_write_xcr()
... instead of directly calling handle_xsetbv(), to make use of the
additional checking there.
Also don't call hvm_monitor_crX(XCR0, ...) for indexes other than zero
anymore.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 22 Mar 2018 09:41:37 +0000 (10:41 +0100)]
x86/HVM: make use of new read-modify-write emulator hook
..., at least as far as currently possible, i.e. when a mapping can be
obtained.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 22 Mar 2018 09:39:32 +0000 (10:39 +0100)]
x86emul: add read-modify-write hook
In order to correctly emulate read-modify-write insns, especially
LOCKed ones, we should not issue reads and writes separately. Use a
new hook to combine both, and don't uniformly read the memory
destination anymore. Instead, DstMem opcodes without Mov now need to
have done so in their respective case blocks.
Also strip bogus _ prefixes from macro parameters when this only affects
lines which are being changed anyway.
In the test harness, besides some re-ordering to facilitate running a
few tests twice (one without and a second time with the .rmw hook in
place), tighten a few EFLAGS checks and add a test for NOT with memory
operand (in particular to verify EFLAGS don't get altered there).
For now make use of the hook optional for callers; eventually we may
want to consider making this mandatory.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
If the ->cmpxchg() hook finds a mismatch, we should deal with this the
same way as when the "manual" comparison reports a mismatch.
This involves reverting bfce0e62c3 ("x86/emul: Drop
X86EMUL_CMPXCHG_FAILED"), albeit with X86EMUL_CMPXCHG_FAILED now
becoming a value distinct from X86EMUL_RETRY.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 22 Mar 2018 09:38:02 +0000 (10:38 +0100)]
x86emul: tell cmpxchg hook whether LOCK is in effect
This is necessary for the hook to correctly perform the operation.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Thu, 22 Mar 2018 09:36:55 +0000 (10:36 +0100)]
x86emul: adjust_bnd() should check XCR0
Experimentally MPX instructions have been confirmed to behave as NOPs
unless both related XCR0 bits are set to 1. By implication branches
then also don't clear BNDn.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 22 Mar 2018 09:35:50 +0000 (10:35 +0100)]
x86emul: abstract out XCRn accesses
Use hooks, just like done for other special purpose registers.
This includes moving XCR0 checks from hvmemul_get_fpu() to the emulator
itself as well as adding support for XGETBV emulation.
For now fuzzer reads will obtain the real values (minus the fuzzing of
the hook pointer itself).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> [tracing parts]
Doug Goldstein [Thu, 15 Mar 2018 15:54:04 +0000 (10:54 -0500)]
ci: add new bits to MAINTAINERS combine with Travis
Created a new section just called 'CI' since this is adding GitLab CI
and still leaving the old Travis CI files around. This consolidates the
two sections and adds the new files as well as adding another Travis
file that was missing.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Sun, 11 Mar 2018 06:08:50 +0000 (00:08 -0600)]
ci: use GitLab CI to build
Added a GitLab CI config which has a lot more flexibility to allow us to
test a lot more distro configurations than Travis can and even build
test on FreeBSD. This includes a modified copy of scripts/travis-build
that is expected to diverge future over time as we build more than what
Travis is currently building.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Wed, 14 Mar 2018 16:23:31 +0000 (11:23 -0500)]
ci: add Dockerfile for Debian stretch
Added a Dockerfile which captures all the necessary dependencies to
build Xen on a Debian stretch system.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Tue, 13 Mar 2018 02:32:27 +0000 (21:32 -0500)]
ci: add Dockerfile for Debian jessie
Added a Dockerfile which captures all the necessary dependencies to
build Xen on a Debian jessie system.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Mon, 12 Mar 2018 17:45:00 +0000 (12:45 -0500)]
ci: add Dockerfile for Ubuntu 16.04
Added a Dockerfile which captures all the necessary dependencies to
build Xen on a Ubuntu 16.04 system.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Mon, 12 Mar 2018 17:41:33 +0000 (12:41 -0500)]
ci: add Dockerfile for Ubuntu 14.04
Added a Dockerfile which captures all the necessary dependencies to
build Xen on a Ubuntu 14.04 system.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Mon, 12 Mar 2018 17:40:45 +0000 (12:40 -0500)]
ci: add Dockerfile for CentOS 7.2
Added a Dockerfile which captures all the necessary dependencies to
build Xen on a CentOS 7.2 system.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Doug Goldstein [Tue, 13 Mar 2018 03:15:07 +0000 (22:15 -0500)]
ci: add README and makefile for containers
Add a basic README explaining the containers and how people can use them
to locally test with if they see an error in CI and want to reproduce it
locally. Added a makefile to help with building and pushing the
containers to the container registry.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Joe Jin [Wed, 14 Mar 2018 17:14:03 +0000 (10:14 -0700)]
xenbaked.c: Avoid divide by zero issue
xenbaked.c -> dump_stats(), run_time = time(&end_time) - time(&start_time),
time() returns the value in seconds. If one cancels xenmon.py immediately
after started, run_time can be zero, and then xenbaked will hit divide by
zero fault.
Signed-off-by: Joe Jin <joe.jin@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Paul Durrant [Tue, 20 Mar 2018 18:05:25 +0000 (18:05 +0000)]
x86/hvm: add stricter permissions checks to ioreq server control plane
There has always been an intention in the ioreq server API that only the
domain that creates an ioreq server should be able to manipulate it.
However, so far, nothing has enforced this. This means that two domains
with DM_PRIV over a target domain can currently manipulate each others
ioreq servers.
A previous patch added code to take a reference and store a pointer to the
domain that creates an ioreq server. This patch now adds checks to the
functions that manipulate the ioreq server to make sure they are being
called by the same domain.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Tue, 20 Mar 2018 18:05:24 +0000 (18:05 +0000)]
x86/hvm: re-structure some of the ioreq server look-up loops
This patch is a cosmetic re-structuring of some of the loops with look up
an ioreq server based on target domain and server id.
The restructuring is done separately here to ease review of a subsquent
patch.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Tue, 20 Mar 2018 18:05:23 +0000 (18:05 +0000)]
x86/hvm: take a reference on ioreq server emulating domain
When an ioreq server is created the code currently stores the id
of the emulating domain, but does not take a reference on that domain.
This patch modifies the code to hold a reference for the lifetime of the
ioreq server.
NOTE: ioreq servers are either destroyed explicitly or destroyed implicitly
in context of XEN_DOMCTL_destroydomain.
If the emulating domain is shut down prior to the target then the
any domain reference held by an ioreq server will prevent it from
being destroyed. However, if an emulating domain is shut down prior
to its target then it is likely that the target's vcpus will block
fairly quickly waiting for emulation that will never occur, and when
the target domain is destroyed the reference on the zombie emulating
domain will be dropped allowing both to be cleaned up.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Tue, 20 Mar 2018 18:05:22 +0000 (18:05 +0000)]
x86/hvm: stop passing explicit domid to hvm_create_ioreq_server()
Only in the legacy 'default server' case do we pass anything other than
current->domain->domain_id, and in that case we pass the value of
HVM_PARAM_DM_DOMAIN.
The only known user of HVM_PARAM_DM_DOMAIN is qemu-trad (and only when
compiled as a stubdom), which always sets it to DOMID_SELF (ignoring the
return value of xc_set_hvm_param) [1] and never reads it.
This patch:
- Disallows setting HVM_PARAM_DM_DOMAIN to anything other than DOMID_SELF
and removes the call to hvm_set_dm_domain().
- Stops passing a domid to hvm_create_ioreq_server()
- Changes hvm_create_ioreq_server() to always set
current->domain->domain_id as the domid of the emulating domain
- Removes the hvm_set_dm_domain() implementation since it is no longer
needed.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The fact of whether or not a vCPU has a soft-affinity
which is effective, i.e., with the power of actually
affecting the scheduling of the vCPU itself rarely
changes. Very, very rarely, as compared to how often
we need to check for the same thing (basically, at
every scheduling decision!).
That can be improved by storing in a (per-vCPU) flag
(it's actually a boolean field in struct vcpu) whether
or not, considering how hard-affinity and soft-affinity
look like, soft-affinity should or not be taken into
account during scheduling decisions.
This saves some cpumask manipulations, which is nice,
considering how frequently they were being done. Note
that we can't get rid of 100% of the cpumask operations
involved in the check, because soft-affinity being
effective or not, not only depends on the relationship
between the hard and soft-affinity masks of a vCPU, but
also of the online pCPUs and/or of what pCPUs are part
of the cpupool where the vCPU lives, and that's rather
impractical to store in a per-vCPU flag. Still the
overhead is reduced to "just" one cpumask_subset() (and
only if the newly introduced flag is 'true')!
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Wed, 21 Mar 2018 17:17:46 +0000 (17:17 +0000)]
xen: sched: improve checking soft-affinity
The function has_soft_affinity() determines whether the soft-affinity
of a vcpu will have any effect -- that is, whether the affinity will
have any difference, scheduling-wise, from an empty soft-affinity
mask.
Such function takes a custom cpumask as its third parameter for better
flexibility; but that mask is different from the vCPU's hard-affinity
only in one case. Getting rid of that parameter not only simplifies
the function, but enables optimizing the soft affinity check.
It's mostly mechanical, with the exception of
sched_credit.c:_cshed_cpu_pick(), which was the one case where we
passed in something other than the existing hard-affinity.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Wed, 21 Mar 2018 17:17:45 +0000 (17:17 +0000)]
xen: sched: optimize exclusive pinning case (Credit1 & 2)
Exclusive pinning of vCPUs is used, sometimes, for
achieving the highest level of determinism, and the
least possible overhead, for the vCPUs in question.
Although static 1:1 pinning is not recommended, for
general use cases, optimizing the tickling code (of
Credit1 and Credit2) is easy and cheap enough, so go
for it.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Wed, 21 Mar 2018 17:17:44 +0000 (17:17 +0000)]
xen: sched: introduce 'adjust_affinity' hook.
For now, just as a way to give a scheduler an "heads up",
about the fact that the affinity changed.
This enables some optimizations, such as pre-computing
and storing (e.g., in flags) facts like a vcpu being
exclusively pinned to a pcpu, or having or not a
soft affinity. I.e., conditions that, despite the fact
that they rarely change, are right now checked very
frequently, even in hot paths.
Note that, as we expect many scheduler specific
implementations of the adjust_affinity hook to do
something with the per-scheduler vCPU private data,
this commit moves the calls to sched_set_affinity()
after that is allocated (in sched_init_vcpu()).
Note also that this, in future, may turn out as a useful
mean for, e.g., having the schedulers vet, ack or nack
the changes themselves.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Julien Grall [Wed, 21 Mar 2018 03:34:35 +0000 (03:34 +0000)]
xen/arm: gic: Read unconditionally the source from the LRs
Commit 5cb00d1 "ARM: GIC: extend LR read/write functions to cover EOI
and source" extended gic_lr to cover the source. The new field was only
set for SGIs interrupt in the read function. However, the write function
is writing the field unconditionally for virtual interrupt.
This means that if the caller was combining the 2 functions (e.g to
update the LR), the source need to be set to 0 by the caller.
Unfortunately, gic_update_one_lr is not zeroing the structure before
reading the LRs. This will lead to trigger the assert randomly.
Instead of zeroing the structure in gic_update_one_lr, make sure that
the source is written unconditionally on read. This is also simplifying
the code to avoid an if statement in the read path.
Lastly, properly update the comments in write_lr that was mistakenly
speaking about the read lr path.
Dario Faggioli [Thu, 15 Mar 2018 17:51:38 +0000 (18:51 +0100)]
tools: xenpm: continue to support {set, get}-vcpu-migration-delay
Now that it is possible to get and set the migration
delay via the SCHEDOP sysctl, use that in xenpm, instead
of the special purpose libxc interface (which will be
removed in a following commit).
The sysctl, however, requires a cpupool-id argument,
for knowing on which scheduler it is operating on. In
this case, since we don't want to alter xenpm's command
line interface, we always use '0', which means xenpm
will always act on the default cpupool ('Pool-0').
>From this commit on, `xenpm {set,get}-vcpu-migration-delay'
commands work again. But that is only for the sake of
backward compatibility, and their use is deprecated, in
favour of 'xl sched-credit -s [-c <poolid>] -m <delay>'.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Thu, 15 Mar 2018 17:51:30 +0000 (18:51 +0100)]
tools: libxl/xl: allow to get/set Credit1's vcpu_migration_delay
Make it possible to get and set a (Credit1) scheduler's
vCPU migration delay via the SCHEDOP sysctl, from both
libxl and xl (no change needed in libxc).
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Dario Faggioli [Thu, 15 Mar 2018 17:51:23 +0000 (18:51 +0100)]
xen: sched/credit1: make vcpu_migration_delay per-cpupool
Right now, vCPU migration delay is controlled by
the vcpu_migration_delay boot parameter. This means
the same value will always be used for every instance
of Credit1, in any cpupool that will be created.
Also, in order to get and set such value, a special
purpose libxc interface is defined, and used by the
xenpm tool. And this is problematic if Xen is built
without Credit1 support.
This commit adds a vcpu_migr_delay field inside
struct csched_private, so that we can get/set the
migration delay indepently for each Credit1 instance,
in different cpupools.
Getting and setting now happens via XEN_SYSCTL_SCHEDOP_*,
which is much better suited for this parameter.
The value of the boot time parameter is used for
initializing the vcpu_migr_delay field of the private
structure of all the scheduler instances, when they're
created.
While there, save reading NOW() and doing any s_time_t
operation, when the migration delay of a scheduler is
zero (as it is, by default), in
__csched_vcpu_is_cache_hot().
Finally, note that, from this commit on, using `xenpm
{set,get}-vcpu-migration-delay' will have no effect
any longer. A subsequent commit will re-enable it, for
the sake of backwards-compatibility.
Signed-off-by: Dario Faggioli <dfaggioli@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Doug Goldstein [Tue, 20 Mar 2018 10:23:29 +0000 (11:23 +0100)]
fix null sched build with clang and debug=n
The null_dom() static inline is just used when debug=y so with clang it
results in an error with the default CFLAGS and debug=n. This function
is used in only one place and it a one line helper so remove it until we
actually need it.
David E. Box [Tue, 20 Mar 2018 10:21:58 +0000 (11:21 +0100)]
x86/mwait-idle: add Gemini Lake support
Gemini Lake uses the same C-states as Broxton and also uses the
IRTL MSR's to determine maximum C-state latency.
Signed-off-by: David E. Box <david.e.box@linux.intel.com> Acked-by: Len Brown <len.brown@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[Linux commit 1b2e87687d3f951a66900cab6f1583d94099d2f7] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Thu, 15 Mar 2018 20:30:13 +0000 (20:30 +0000)]
ARM: GIC: extend LR read/write functions to cover EOI and source
So far our LR read/write functions do not handle the EOI bit and the
source CPUID bits in an LR, because the current VGIC implementation does
not use them.
Extend the gic_lr data structure to hold these bits of information by
using a union to differentiate field used depending on whether the vIRQ
has a corresponding pIRQ.
This allows the new VGIC to use this information.
This is based on the original patch sent by Andre Przywara [1].
Julien Grall [Thu, 15 Mar 2018 20:30:09 +0000 (20:30 +0000)]
xen/arm: vgic: Override the group in lr everytime
At the moment, write_lr is assuming the caller will set correctly the
group. However the group should always be 0 when the guest is using
vGICv2 and 1 for vGICv3. As the caller should not care about the group,
override it directly.
With that change, write_lr is now behaving like update_lr for the group.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Andre Przywara <andre.przywara@arm.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org>
Andre Przywara [Thu, 15 Mar 2018 20:30:07 +0000 (20:30 +0000)]
ARM: Implement vcpu_kick()
If we change something in a vCPU that affects its runnability or
otherwise needs the vCPU's attention, we might need to tell the scheduler
about it.
We are using this in one place (vIRQ injection) at the moment, but will
need this at more places soon.
So let's factor out this functionality, using the already existing
vcpu_kick() prototype (used in x86 only so far), to make this available
to the rest of the Xen code.
Also adjust the perfcounter name to reflect the new usage.
Andre Przywara [Thu, 15 Mar 2018 20:30:06 +0000 (20:30 +0000)]
ARM: VGIC: rename gic_event_needs_delivery()
gic_event_needs_delivery() is not named very intuitively, especially
the gic_ prefix is somewhat misleading.
Rename it to vgic_vcpu_pending_irq(), which makes it clear that this
relates to the virtual GIC and is about interrupts.
Also add a VCPU parameter, which makes the code more flexible in the
future. The current VGIC expect this to be the current VCPU, so add
an assert to spot any regressions.
Amit Singh Tomar [Sun, 18 Mar 2018 09:20:26 +0000 (14:50 +0530)]
xen/arm: Fix platform name to xilinx_zynqmp from xgene_storm
Signed-off-by: Amit Singh Tomar <amittomer25@gmail.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Acked-by: Julien Grall <julien.grall@arm.com>
Julien Grall [Mon, 12 Mar 2018 15:34:52 +0000 (15:34 +0000)]
xen/arm: p2m: Prevent deadlock when using memaccess
Commit 7d623b358a4 "arm/mem_access: Add long-descriptor based gpt"
assumed the read-write lock can be taken recursively. However, this
assumption is wrong and will lead to deadlock when the lock is
contended.
The read lock is taken recursively in the following case:
1) get_page_from_gva
=> Take the read lock (first read lock)
=> Call p2m_mem_access_check_and_get_page on failure when
memaccess is enabled
2) p2m_mem_access_check_and_get_page
=> If hardware translation failed fallback to software lookup
=> Call guest_walk_tables
3) guest_walk_tables
=> Will use access_guest_memory_by_ipa to access stage-1 page-table
4) access_guest_memory_by_ipa
=> Because Arm does not have hardware instruction to only do
stage-2 page-table, this is done in software.
=> Take the read lock (second read lock)
To avoid the nested lock, rework the locking in get_page_from_gva and
p2m_mem_access_check_and_get_page. The latter will now be called without
the p2m lock. The new locking in p2m_mem_accces_check_and_get_page will
not cover the translation of the VA to an IPA.
This is fine because we can't promise that the stage-1 page-table have
changed behind our back (they are under guest control). Modification in
the stage-2 page-table can now happen, but I can't issue any potential
issue here except with the break-before-make sequence used when updating
page-table. gva_to_ipa may fail if the sequence is executed at the same
on another CPU. In that case we would fallback in the software lookup
path.
A recent update to the ARM SMCCC_ARCH_WORKAROUND_1 specification (see [1])
allows firmware to return a non zero, positive value, to describe that
although the mitigation is implemented at the higher exception level,
the CPU on which the call is made is not affected.
Relax the check on the return value from ARM_WORKAROUND_1 so that we
only error out if the returned value is negative.
[1] https://developer.arm.com/support/security-update/downloads
"Firmware interfaces for mitigating CVE-2017-5715 System Software on Arm
Systems"
Julien Grall [Thu, 8 Mar 2018 15:24:04 +0000 (15:24 +0000)]
xen/arm: Restrict when a physical IRQ can be routed/removed from/to a domain
Xen is currently allowing to route/remove an interrupt from/to the
domain while it is running.
However, we never sync the virtual interrupt state to the physical
interrupt. This could lead to undesirable effect on the vGIC emulation
and potentially the hardware.
One solution would be to sync the interrupt state when routing, but I am
not sure it is worth the effort as you never really when it is safe to
route/remove the interrupt when a domain is running.
Jan Beulich [Fri, 16 Mar 2018 16:27:36 +0000 (17:27 +0100)]
x86: correct EFLAGS.IF in SYSENTER frame
Commit 9d1d31ad94 ("x86: slightly reduce Meltdown band-aid overhead")
moved the STI past the PUSHF. While this isn't an active problem (as we
force EFLAGS.IF to 1 before exiting to guest context), let's not risk
internal confusion by finding a PV guest frame with interrupts
apparently off.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 8 Mar 2018 19:24:58 +0000 (19:24 +0000)]
xen/mm: Clean up share_xen_page_with_guest() API
The share_xen_page_with_guest() functions are used by common code, and are
implemented the same by each arch. Move the declarations into the common mm.h
rather than duplicating them in each arch/mm.h
Turn an int readonly into a boolean enum, to retain ro/rw context at the
callsites, but use shorter labels which avoids a large number of split lines.
Implement share_xen_page_with_privileged_guests() as a static inline wrapper
around share_xen_page_with_guest() to avoid having a call into a separate
translation unit whose only purpose is to shuffle function arguments.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 8 Mar 2018 12:39:36 +0000 (12:39 +0000)]
xen/domain: Pass the full domctl_createdomain struct to create_domain()
In future patches, the structure will be extended with further information,
and this is far cleaner than adding extra parameters.
One minor tweak is that the setting of guest_type needs to be deferred until
config is known-good to dereference, but this doesn't result in any changed
behaviour as system domains never used to pass XEN_DOMCTL_CDF_hvm_guest.
Also for completeness, move the setting of d->handle into the tail of
domain_create() where it more logically should live.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Thu, 8 Mar 2018 17:25:29 +0000 (17:25 +0000)]
x86/domain: Optimise the order of actions in arch_domain_create()
The only relevent initialisation for the idle domain is the context switch and
poisoned pointers. Collect these bits together early in the function and exit
when complete (although as a consequence, the e820 and vtsc lock
initialisation are moved forwards). This allows us to remove subsequent
is_idle_domain() checks and unindent most of the logic.
Furthermore, we no longer call these functions for the idle domain:
* mapcache_domain_init() and tsc_set_info() were previously guarded against
the idle domain, and have had their guards turned into ASSERT()s.
* pit_init() is implicitly guarded by has_vpit().
* psr_domain_init() no longer allocates a socket array.
Finally, two changes are introduced for the benefit of the following patch:
* For PV hardware domains, or XEN_X86_EMU_PIT into emflags rather than into
config->emulation_flags, to facilitating config becoming const.
* References to domcr_flags are moved until after the idle early exist, to
facilitiate them being unavailable for system domains.
No practical change in behaviour.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Thu, 8 Mar 2018 13:58:41 +0000 (13:58 +0000)]
x86/domain: Remove unused parameters from {hvm,pv}_domain_initialise()
Neither domcr_flags nor config are used on either side. Drop them, making
{hvm,pv}_domain_initialise() symmetric with all the other domain/vcpu
initialise/destroy calls.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Andrew Cooper [Thu, 8 Mar 2018 11:31:47 +0000 (11:31 +0000)]
xen/domain: Drop all DOMCRF_* constants
With DOMCRF_dummy removed, all remaining DOMCRF_* identically match their
DOMCTL counterparts. Avoid having a conversion between two different bit
layouts, and use the DOMCTL_CDF_* constants everywhere.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Thu, 8 Mar 2018 11:03:17 +0000 (11:03 +0000)]
xen/domain: Drop DOMCRF_dummy
At the moment, there is a tight coupling between the domid and the use of
DOMCRF_dummy. Instead of using DOMCRF_dummy, base the one relevant decision
on domid alone.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Julien Grall <julien.grall@arm.com>
Jan Beulich [Thu, 15 Mar 2018 16:01:33 +0000 (17:01 +0100)]
x86emul: place test blobs in executable section
This allows the section contents to be disassembled without going
through any extra hoops, simplifying the analysis of problems in test
and/or emulation code.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com>
Liran Alon [Thu, 15 Mar 2018 15:59:52 +0000 (16:59 +0100)]
x86/vlapic: clear TMR bit upon acceptance of edge-triggered interrupt to IRR
According to Intel SDM section "Interrupt Acceptance for Fixed Interrupts":
"The trigger mode register (TMR) indicates the trigger mode of the
interrupt (see Figure 10-20). Upon acceptance of an interrupt
into the IRR, the corresponding TMR bit is cleared for
edge-triggered interrupts and set for level-triggered interrupts.
If a TMR bit is set when an EOI cycle for its corresponding
interrupt vector is generated, an EOI message is sent to
all I/O APICs."
Before this patch TMR-bit was cleared on LAPIC EOI which is not what
real hardware does. This was also confirmed in KVM upstream commit a0c9a822bf37 ("KVM: dont clear TMR on EOI").
Behavior after this patch is aligned with both Intel SDM and KVM
implementation.
Signed-off-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Thu, 15 Mar 2018 11:45:30 +0000 (12:45 +0100)]
x86/VMX: don't risk corrupting host CR4
Instead of "syncing" the live value to what mmu_cr4_features has, make
sure vCPU-s run with the value most recently loaded into %cr4, such that
after the next VM exit we continue to run with the intended value rather
than a possibly stale one.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 15 Mar 2018 11:44:24 +0000 (12:44 +0100)]
x86: ignore guest microcode loading attempts
The respective MSRs are write-only, and hence attempts by guests to
write to these are - as of 1f1d183d49 ("x86/HVM: don't give the wrong
impression of WRMSR succeeding") no longer ignored. Restore original
behavior for the two affected MSRs.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Anthony PERARD [Wed, 14 Mar 2018 15:00:14 +0000 (15:00 +0000)]
docs: Fix entry for the "usbdev" option
The man for xl.cfg have the "devtype=hostdev" option, but xl only
understand "type=hostdev", fix the manual to reflect actual
implementation.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Wed, 14 Mar 2018 10:48:36 +0000 (10:48 +0000)]
x86/pv: Fix guest crashes following f75b1a5247b "x86/pv: Drop int80_bounce from struct pv_vcpu"
The original init_int80_direct_trap() was in fact buggy; `int $0x80` is not an
exception. This went unnoticed for years because int80_bounce and trap_bounce
were separate structures, but were combined by this change.
Exception handling is different to interrupt handling for PV guests. By
reusing trap_bounce, the following corner case can occur:
* Handle a guest `int $0x80` instruction. Latches TBF_EXCEPTION into
trap_bounce.
* Handle an exception, which emulates to success (such as ptwr support),
which leaves trap_bounce unmodified.
* The exception exit path sees TBF_EXCEPTION set and re-injects the `int
$0x80` a second time.
Drop the TBF_EXCEPTION from the int80 invocation, which matches the equivalent
logic from the syscall/sysenter paths.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Anthony PERARD [Tue, 13 Mar 2018 11:13:18 +0000 (11:13 +0000)]
libxl_qmp: Tell QEMU about live migration or snapshot
Since version 2.10, QEMU will lock the disk images so a second QEMU
instance will not try to open it. This would prevent live migration from
working correctly. A new parameter as been added to the QMP command
"xen-save-devices-state" in QEMU version 2.11 which allow to unlock the
disk image for a live migration, but also keep it locked for a snapshot.
xenalyze.c: In function 'find_symbol':
xenalyze.c:382:36: error: 'snprintf' output may be truncated before the last format character [-Werror=format-truncation=]
snprintf(name, 128, "(%s +%llx)",
^
xenalyze.c:382:5: note: 'snprintf' output between 6 and 144 bytes into a destination of size 128
snprintf(name, 128, "(%s +%llx)",
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lastname, offset);
~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Michael Young [Mon, 12 Mar 2018 18:49:29 +0000 (18:49 +0000)]
make xen ocaml safe-strings compliant
Xen built with ocaml 4.06 gives errors such as
Error: This expression has type bytes but an expression was
expected of type string
as Byte and safe-strings which were introduced in 4.02 are the
default in 4.06.
This patch which is partly by Richard W.M. Jones of Red Hat
from https://bugzilla.redhat.com/show_bug.cgi?id=1526703
fixes these issues.
Signed-off-by: Michael Young <m.a.young@durham.ac.uk> Reviewed-by: Christian Lindig<christian.lindig@citrix.com>
When building debug use -Og as the optimization level if its available,
otherwise retain the use of -O0. -Og has been added by GCC to enable all
optimizations that to not affect debugging while retaining full
debugability.
Signed-off-by: Doug Goldstein <cardoe@cardoe.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
George Dunlap [Fri, 9 Mar 2018 11:04:18 +0000 (11:04 +0000)]
docs: Remove redundant qemu-xen-security document
All this information is now covered in SUPPORT.md.
Most of the emulated hardware is obvious a couple of the items are
worth pointing out specifically.
"xen_disk" is listed under "Blkback"
"...the PCI host bridge and the PIIX3 chipset...": This statement is
redundant -- the PCI host bridge is a part of the piix3 chipset, which
is listed as supported.
xenfb: The "graphics" side of "xenfb" is listed under "PV Framebuffer
(backend)", and the "input" side of "xenfb" (including both keyboard
and mouse) is listed under "PV Keyboard (backend)".
Backing storage image format is listed in the "Blkback" section.
Fix 'stdvga' spelling while we're here.
Signed-off-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:33 +0000 (15:11 +0000)]
ARM: GICv2: fix GICH_V2_LR definitions
The bit definition for the CPUID mask in the GICv2 LR register was
wrong, fortunately the current implementation does not use that bit.
Fix it up (it's starting at bit 10, not bit 9) and clean up some
nearby definitions on the way.
This will be used by the new VGIC shortly.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:32 +0000 (15:11 +0000)]
ARM: GICv3: poke_irq: make RWP optional
A GICv3 hardware implementation can be implemented in several parts that
communicate with each other (think multi-socket systems).
To make sure that critical settings have arrived at all endpoints, some
bits are tracked using the RWP bit in the GICD_CTLR register, which
signals whether a register write is still in progress.
However this only applies to *some* registers, namely the bits in the
GICD_ICENABLER (disabling interrupts) and some bits in the GICD_CTLR
register (cf. Arm IHI 0069D, 8.9.4: RWP, bit[31]).
But our gicv3_poke_irq() was always polling this bit before returning,
resulting in pointless MMIO reads for many registers.
Add an option to gicv3_poke_irq() to state whether we want to wait for
this bit and use it accordingly to match the spec.
Replace a "1 << " with a "1U << " on the way to fix a potentially
undefined behaviour when the argument evaluates to 31.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:31 +0000 (15:11 +0000)]
ARM: GICv2: introduce gicv2_poke_irq()
The GICv2 uses bitmaps spanning several MMIO registers for holding some
interrupt state. Similar to GICv3, add a poke helper functions to set a bit
for a given irq_desc in one of those bitmaps.
At the moment there is only one use in gic-v2.c, but there will be more
coming soon.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:29 +0000 (15:11 +0000)]
ARM: GICv3: rename HYP interface definitions to use ICH_ prefix
On a GICv3 in non-compat mode the hypervisor interface is always
accessed via system registers. Those register names have a "ICH_" prefix
in the manual, to differentiate them from the MMIO registers. Also those
registers are mostly 64-bit (compared to the 32-bit GICv2 registers) and
use different bit assignments.
To make this obvious and to avoid clashes with double definitions using
the same names for actually different bits, lets change all GICv3
hypervisor interface registers to use the "ICH_" prefix from the manual.
This renames the definitions in gic_v3_defs.h and their usage in gic-v3.c
and is needed to allow co-existence of the GICv2 and GICv3 definitions
in the same file.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Acked-by: Julien Grall <julien.grall@arm.com>
Andre Przywara [Fri, 9 Mar 2018 15:11:28 +0000 (15:11 +0000)]
ARM: VGIC: Introduce gic_get_nr_lrs()
So far the number of list registers (LRs) a GIC implements is only
needed in the hardware facing side of the VGIC code (gic-vgic.c).
The new VGIC will need this information in more and multiple places, so
export a function that returns the number.
Signed-off-by: Andre Przywara <andre.przywara@linaro.org> Reviewed-by: Julien Grall <julien.grall@arm.com>