In fact, right now, we recommend keepeing runqueues
arranged per-core, so that it is the inter-runqueue load
balancing code that automatically spreads the work in an
SMT friendly way. This means that any other runq
arrangement one may want to use falls short of SMT
scheduling optimizations.
This commit implements SMT awareness --similar to the
one we have in Credit1-- for any possible runq
arrangement. This turned out to be pretty easy to do,
as the logic can live entirely in runq_tickle()
(although, in order to avoid for_each_cpu loops in
that function, we use a new cpumask which indeed needs
to be updated in other places).
In addition to disentangling SMT awareness from load
balancing, this also allows us to support the
sched_smt_power_savings parametar in Credit2 as well.
libxl: trigger attach events for devices attached before xl devd startup
When this daemon is started after creating backend device, that device
will not be configured.
Racy situation:
1. driver domain is started
2. frontend domain is started (just after kicking driver domain off)
3. device in frontend domain is connected to the backend (as specified
in frontend domain configuration)
4. xl devd is started in driver domain
End result is that backend device in driver domain is not configured
(like network interface is not enabled), so the device doesn't work.
Fix this by artifically triggering events for devices already present in
xenstore before xl devd is started. Do this only after xenstore watch is
already registered, and only for devices not already initialized (in
XenbusStateInitWait state).
Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Add support for debugging memory allocation statistics to xenstored.
Specifying "-M <file>" on the command line will enable the feature.
Whenever xenstored receives SIGUSR1 it will dump out a full talloc
report to <file>. This helps finding e.g. memory leaks in xenstored.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xenstore: use temporary memory context for firing watches
Use a temporary memory context for memory allocations when firing
watches. This will avoid leaking memory in case of long living
connections and/or xenstore entries.
This requires adding a new parameter to fire_watches() and add_event()
to specify the memory context to use for allocations.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xenstore: add explicit memory context parameter to get_node()
Add a parameter to xenstored get_node() function to explicitly
specify the memory context to be used for allocations. This will make
it easier to avoid memory leaks by using a context which is freed
soon.
This requires adding the temporary context to errno_from_parents() and
ask_parents(), too.
When calling get_node() select a sensible memory context for the new
parameter by preferring a temporary one.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xenstore: add explicit memory context parameter to read_node()
Add a parameter to xenstored read_node() function to explicitly
specify the memory context to be used for allocations. This will make
it easier to avoid memory leaks by using a context which is freed
soon.
When calling read_node() select a sensible memory context for the new
parameter by preferring a temporary one.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xenstore: add explicit memory context parameter to get_parent()
Add a parameter to xenstored get_parent() function to explicitly
specify the memory context to be used for allocations. This will make
it easier to avoid memory leaks by using a context which is freed
soon.
When available use a temporary context when calling get_parent(),
otherwise mimic the old behavior by calling get_parent() with the same
argument for both parameters.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xenstore: call each xenstored command function with temporary context
In order to be able to avoid leaving temporary memory allocated after
processing of a command in xenstored call all command functions with
the temporary "in" context. Each function can then make use of that
temporary context for allocating temporary memory instead of either
leaving that memory allocated until the connection is dropped (or
even until end of xenstored) or freeing the memory itself.
This requires to modify the interfaces of the functions taking only
one argument from the connection by moving the call of onearg() into
the single functions. Other than that no functional change.
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
xen: credit2: the private scheduler lock can be an rwlock.
In fact, the data it protects only change either at init-time,
during cpupools manipulation, or when changing domains' weights.
In all other cases (namely, load balancing, reading weights
and status dumping), information is only read.
Therefore, let the lock be an read/write one. This means there
is no full serialization point for the whole scheduler and
for all the pCPUs of the host any longer.
This is particularly good for scalability (especially when doing
load balancing).
Also, update the high level description of the locking discipline,
and take the chance for rewording it a little bit (as well as
for adding a couple of locking related ASSERT()-s).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
more specifically, with: TICKLE_NEW, RUNQ_MAX_WEIGHT,
MIGRATE, LOAD_CHECK, LOAD_BALANCE and PICKED_CPU, and
in both both xenalyze and formats (for xentrace_format).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
xen/tools: improve tracing of Credit2 load tracking events
Add the shift used for the precision of the integer
arithmetic to the trace records, and update both xenalyze
and xentrace_format to make use of/print it.
In particular, in xenalyze, we are can now show the
load as a (easier to interpreet) percentage.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
The existing load tracking code was hard to understad and
maintain, and not entirely consistent. This is due to a
number of reasons:
- code and comments were not in perfect sync, making it
difficult to figure out what the intent of a particular
choice was (e.g., the choice of 18 for load_window_shift);
- the math, although effective, was not entirely consistent.
In fact, we were doing (if W is the lenght of the window):
The reason why the formula above sort of worked was because
the number of bits used for the fractional parts of the
values used in fixed point math and the number of bits used
for the lenght of the window were the same (load_window_shift
was being used for both).
This may look handy, but it introduced a (not especially well
documented) dependency between the lenght of the window and
the precision of the calculations, which really should be
two independent things. Especially if treating them as such
(like it is done in this patch) does not lead to more
complex maths (same number of multiplications and shifts, and
there is still room for some optimization).
Therefore, in this patch, we:
- split length of the window and precision (and, since there
is already a command line parameter for length of window,
introduce one for precision too),
- align the math with one proper incarnation of exponential
smoothing (at no added cost),
- add comments, about the details of the algorithm and the
math used.
While there fix a couple of style issues as well (pointless
initialization, long lines, comments).
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
xen: credit2: prevent load balancing to go mad if time goes backwards
This really should not happen, but:
1. it does happen! Some more info here:
http://lists.xen.org/archives/html/xen-devel/2016-06/msg00922.html
2. independently from 1, it makes sense and is easy enough
to have a 'safety catch'.
The reason why this is particularly bad for Credit2 is that
negative values of delta mean out of scale high load (because
of the conversion to unsigned). This, for instance in the
case of runqueue load, results in a runqueue having its load
updated to values of the order of 10000% or so, which in turns
means that the load balancer will migrate everything off from
the pCPUs in the runqueue, and leave them idle until the load
gets back to something sane... which may indeed take a while!
This is not a fix for the problem of time going backwards. In
fact, if that happens a lot, load tracking accuracy is still
compromized, but at least the effect is a lot less bad than
before.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
In both Credit1 and Credit2, stop considering a pCPU idle,
if the reason why the idle vCPU is being selected, is to
do tasklet work.
Not doing so means that the tickling and load balancing
logic, seeing the pCPU as idle, considers it a candidate
for picking up vCPUs. But the pCPU won't actually pick
up or schedule any vCPU, which would then remain in the
runqueue, which is bad, especially if there were other,
truly idle pCPUs, that could execute it.
The only drawback is that we can't assume that a pCPU is
in always marked as idle when being removed from an
instance of the Credit2 scheduler (csched2_deinit_pdata).
In fact, if we are in stop-machine (i.e., during suspend
or shutdown), the pCPUs are running the stopmachine_tasklet
and hence are actually marked as busy. On the other hand,
when removing a pCPU from a Credit2 pool, it will indeed
be idle. The only thing we can do, therefore, is to
remove the BUG_ON() check.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
asm/atomic.h: common prototyping (add xen/atomic.h)
Create a common-side <xen/atomic.h> to establish, among others, prototypes of
atomic functions called from common-code. Done to avoid introducing
inconsistencies between arch-side <asm/atomic.h> headers when we make subtle
changes to one of them. Some arm-side macros had to be turned into inline
functions in the process.
Removed outdated comment ("NB. I've [...]").
Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Julien Grall <julien.grall@arm.com>
Andrew Cooper [Wed, 13 Jul 2016 13:55:48 +0000 (14:55 +0100)]
xen/build: Use C99 booleans
and switch bool_t to being of type _Bool rather than char.
Using bool_t as char causes several subtle problems; first that a bool_t
actually has more than two values, and that (bool_t)0x100 actually has the
value 0 rather than the expected 1, due to truncation.
Making this change reveals two bugs now caught by the compiler.
errata_c6_eoi_workaround() actually makes use of bool_t having more than two
states, while generic_apic_probe() has a integer in the middle of a compound
bool_t assignment (which triggers a [-Werror=parentheses] warning on Debian
Jessie).
Finally, it turns out that ARM is mixing and matching bool_t and bool, despite
their different semantics. This change brings the semantics of bool_t to
match bool, but does not alter the current mix.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Julien Grall <julien.grall@arm.com> Acked-by: Tim Deegan <tim@xen.org>
Quan Xu [Fri, 8 Jul 2016 06:46:15 +0000 (00:46 -0600)]
VT-d: fix Device-TLB flush timeout issue
If Device-TLB flush timed out, we hide the target ATS device
immediately. By hiding the device, we make sure it can't be
assigned to any domain any longer (see device_assigned).
Signed-off-by: Quan Xu <quan.xu@intel.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Quan Xu <quan.xu@intel.com>
Quan Xu [Fri, 8 Jul 2016 06:45:13 +0000 (00:45 -0600)]
IOMMU: add domain crash logic
Add domain crash logic to the generic IOMMU layer to benefit
all platforms.
No spamming of the log can occur. For DomU, we avoid logging any
message for already dying domains. For Dom0, that'll still be more
verbose than we'd really like, but it at least wouldn't outright
flood the console.
Signed-off-by: Quan Xu <quan.xu@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Tested-by: Quan Xu <quan.xu@intel.com>
Quan Xu [Fri, 8 Jul 2016 06:44:23 +0000 (00:44 -0600)]
IOMMU/ATS: use a struct pci_dev * instead of SBDF
Do away with struct pci_ats_dev; integrate the few bits of information
in struct pci_dev (and as a result drop get_ats_device() altogether).
Hook ATS devices onto a linked list off of each IOMMU instead of on a
global one.
Signed-off-by: Quan Xu <quan.xu@intel.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Quan Xu <quan.xu@intel.com>
XSM-Policy: allow source domain access to setpodtarget and getpodtarget for ballooning.
Access to setpodtarget and getpodtarget is required by dom0 to set the balloon
targets for domU. The patch gives source domain (dom0) access to set
this target for domU and resolve the following permission denied erro
message during ballooning :
avc: denied { setpodtarget } for domid=0 target=9
scontext=system_u:system_r:dom0_t
tcontext=system_u:system_r:domU_t tclass=domain
Signed-off-by: Anshul Makkar <anshul.makkar@citrix.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Daniel De Graaf [Thu, 14 Jul 2016 14:18:47 +0000 (10:18 -0400)]
xsm: add a default policy to .init.data
This adds a Kconfig option and support for including the XSM policy from
tools/flask/policy in the hypervisor so that the bootloader does not
need to provide a policy to get sane behavior from an XSM-enabled
hypervisor. The policy provided by the bootloader, if present, will
override the built-in policy.
The XSM policy is not moved out of tools because that remains the
primary location for installing and configuring the policy.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Daniel De Graaf [Thu, 14 Jul 2016 14:18:46 +0000 (10:18 -0400)]
xsm: rework policy_buffer globals
This makes the buffers function parameters instead of globals, in
preparation for adding alternate locations for the policy.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
arm: vgic: Split vgic_domain_init() functionality into two functions
Separate the code logic that does the registration of vgic_v3/v2 ops
to a new function domain_vgic_register(). The intention of this
separation is to record the required mmio count in vgic_v3/v2_init()
and pass it to function domain_io_init() in a follow-up patch patch.
arm/gic-v3: Remove an unused macro MAX_RDIST_COUNT
The macro MAX_RDIST_COUNT is not being used after converting code
to handle number of redistributor dynamically. So remove it from
header file and the two other panic() messages that are not valid
anymore.
xen/arm: vgic: Use dynamic memory allocation for vgic_rdist_region
The number of Redistributor regions allowed for dom0 is hardcoded
to a define MAX_RDIST_COUNT which is 4. Some systems, especially
latest server chips, may have more than 4 redistributors. Either we
have to increase MAX_RDIST_COUNT to a bigger number or allocate
memory based on the number of redistributors that are found in MADT
table. In the worst case scenario, the macro MAX_RDIST_COUNT should
be equal to CONFIG_NR_CPUS in order to support per CPU Redistributors.
Increasing MAX_RDIST_COUNT has a effect, it blows 'struct domain'
size and hits BUILD_BUG_ON() in domain build code path.
struct domain *alloc_domain_struct(void)
{
struct domain *d;
BUILD_BUG_ON(sizeof(*d) > PAGE_SIZE);
d = alloc_xenheap_pages(0, 0);
if ( d == NULL )
return NULL;
...
This patch uses the second approach to fix the BUILD_BUG().
arm/gic-v3: Parse per-cpu redistributor entry in GICC subtable
The redistributor address can be specified either as part of GICC or
GICR subtable depending on the power domain. The current driver
doesn't support parsing redistributor entry that is defined in GICC
subtable. The GIC CPU subtable entry holds the associated Redistributor
base address if it is not on always-on power domain.
The per CPU Redistributor size is not defined in ACPI specification.
Set the GICR region size to SZ_256K if the GIC hardware is capable of
Direct Virtual LPI Injection feature, SZ_128K otherwise.
This patch adds necessary code to handle both types of Redistributors
base addresses.
arm/gic-v3: Move GICR subtable parsing into a new function
Add a new function to parse GICR subtable and move the code that
is specific to GICR table to a new function without changing the
function gicv3_acpi_init() behavior.
For ACPI based XEN boot, the GICD region needs to be accessed inside
the function gicv3_acpi_init() in later patch. There is a duplicate
panic() message, one in the DTS probe and second one in the ACPI probe
path. For these two reasons, move the code that validates the GICD base
address and does the region ioremap to a separate function. The
following patch accesses the GICD region inside gicv3_acpi_init() for
finding per CPU Redistributor size.
Julien Grall [Wed, 22 Jun 2016 13:21:03 +0000 (14:21 +0100)]
xen/arm: traps: Data Abort are always unconditional
The HSR encoding for an exception from a data abort does not contain a
conditional code (see G6-4264 in ARM DDI 0487A.i) because they are
always conditional.
Julien Grall [Wed, 22 Jun 2016 13:21:02 +0000 (14:21 +0100)]
xen/arm: traps: Second attempt to correctly use the content of HPFAR_EL2
Commit c051618 "xen/arm: traps: Correctly interpret the content of the
register HPFAR_EL2" attempted to fix the interpretation of HPFAR_EL2.
However, the register contains a 4KB-aligned address. This means that
the reported address is not directly usable to know the faulting IPA.
The offset in the 4KB page can be found by looking at the associated virtual
address (FAR_EL2/HDFAR).
xen/arm: p2m: Introduce helpers to insert and remove mapping
More the half of the arguments of INSERT and REMOVE are the same for
each callers. Simplify the callers of apply_p2m_changes by adding new
helpers which will fill common arguments with default values.
xen/arm: dom0_build: Remove dead code in allocate_memory
The code to allocate memory when dom0 does not use direct mapping is
relying on the presence of memory node in the DT.
However, they are not present when booting using UEFI or when using
ACPI.
Rather than fixing the code, remove it because dom0 is always direct
memory mapped and therefore the code is never tested. Also add a
check to avoid disabling direct memory mapped and not implementing
the associated RAM bank allocation.
xen/arm: map_regions_rw_cache: Map the region with p2m->default_access
The parameter 'access' is used by memaccess to restrict temporarily the
permission. This parameter should not be used for other purpose (such
as restricting permanently the permission).
Instead, we should use the default access requested by memacess. When it
is not enabled, the access will be p2m_access_rwx (i.e no restriction
applied).
The type p2m_mmio_direct will map the region read-write and
non-executable before any further restriction by memaccess. Note that
this is already the resulting permission with the curreent combination
of the type and the access. So there is no functional change.
Julien Grall [Wed, 22 Jun 2016 11:15:18 +0000 (12:15 +0100)]
xen/arm: Add macros to handle the MIDR
Add new macros to easily get different parts of the register and to
check if a given MIDR match a CPU model range. The latter will be really
useful to handle errata later.
The macros have been imported from the header
arch/arm64/include/asm/cputype.h in Linux v4.6-rc3.
x86, hvm: document the de facto policy for vCPU ids
PVHVM guests may need to know Xen's idea of vCPU ids they have and the
only way they can figure them out is to use ACPI ids from MADT table.
Document the de facto policy.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tamas K Lengyel [Tue, 12 Jul 2016 18:13:18 +0000 (12:13 -0600)]
vmx/monitor: CPUID events
This patch implements sending notification to a monitor subscriber when an
x86/vmx guest executes the CPUID instruction.
Signed-off-by: Tamas K Lengyel <tamas.lengyel@zentific.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com> Acked-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Wed, 13 Jul 2016 13:13:44 +0000 (14:13 +0100)]
libxl: constify src parameter of libxl_nocpuid.c:libxl_cpuid_policy_list_copy
In 11316d31 ("libxl: constify copy and length calculation functions") I
forgot to take care of libxl_nocpuid.c which also contains an
implementation of libxl_cpuid_policy_list_copy. That broke ARM build.
Fix it by constifying the src parameter.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Fri, 10 Jun 2016 17:48:02 +0000 (18:48 +0100)]
xen/arm: Rework the interface of p2m_cache_flush and use typesafe gfn
p2m_cache_flush is expecting GFNs in parameter and not MFNs. Rename
the variable to *gfn* and use typesafe to avoid possible misusage.
Also, modify the prototype of the function to describe the range
using the start and the number of GFNs. This will avoid to wonder
whether the end if inclusive or exclusive.
Note that the type of the parameters 'start' is changed from xen_pfn_t
(aka uint64_t) to gfn_t (aka unsigned long). This means that a truncation
will occur for ARM32. It is fine because it will always be encoded on 28
bits maximum (40 bits address).
Julien Grall [Tue, 14 Jun 2016 08:31:00 +0000 (09:31 +0100)]
xen/arm: Rework the interface of p2m_lookup and use typesafe gfn and mfn
The prototype and the declaration of p2m_lookup disagree on how the
function should be used. One expect a frame number whilst the other
an address.
Thankfully, everyone is using with an address today. However, most of
the callers have to convert a guest frame to an address. Modify
the interface to take a guest physical frame in parameter and return
a machine frame.
Whilst modifying the interface, use typesafe gfn and mfn for clarity
and catching possible misusage.
Julien Grall [Tue, 28 Jun 2016 13:37:57 +0000 (14:37 +0100)]
xen: Use a typesafe to define INVALID_GFN
Also take the opportunity to convert arch/x86/debug.c to the typesafe gfn.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Kevin Tian <kevin.tian@intel.com>
Julien Grall [Fri, 24 Jun 2016 14:38:54 +0000 (15:38 +0100)]
xen: Use a typesafe to define INVALID_MFN
Also take the opportunity to convert arch/x86/debug.c to the typesafe
mfn and use proper printf format for MFN/GFN when the code around is
modified.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Kevin Tian <kevin.tian@intel.com>
Julien Grall [Tue, 28 Jun 2016 12:31:32 +0000 (13:31 +0100)]
xen/passthrough: x86: Use INVALID_GFN rather than INVALID_MFN
A variable containing a guest frame should be compared to INVALID_GFN
and not INVALID_MFN.
Signed-off-by: Julien Grall <julien.grall@arm.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
libxl: move DEFINE_DEVICE* macros to libxl_internal.h
In order to be able to have all functions related to a device type in
a single source file move the macros used to generate device type
specific functions to libxl_internal.h. Rename the macros as they are
no longer local to a source file. While at it hide device remove and
device destroy in one macro as those are always used in pairs. Move
usage of the macros to the appropriate source files.
Signed-off-by: Juergen Gross <jgross@suse.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Wed, 8 Jun 2016 14:01:02 +0000 (15:01 +0100)]
libxl: only issue cpu-add call to QEMU for not present CPU
Calculate the final bitmap for CPUs to add to avoid having annoying
error messages complaining those CPUs are already present. Example
message is like (wrapped):
libxl: error: libxl_qmp.c:287:qmp_handle_error_response: received an
error message from QMP server: Unable to add CPU: 0, it already exists
We can also properly handle error from QMP now.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Wei Liu [Tue, 7 Jun 2016 09:03:39 +0000 (10:03 +0100)]
libxl: introduce libxl__qmp_query_cpus
It interrogates QEMU for CPUs and update the bitmap accordingly.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: Anthony PERARD <anthony.perard@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Tue, 28 Jun 2016 16:34:31 +0000 (17:34 +0100)]
xen/arm: io: Protect the handlers with a read-write lock
Currently, accessing the I/O handlers does not require to take a lock
because new handlers are always added at the end of the array. In a
follow-up patch, this array will be sort to optimize the look up.
Given that most of the time the I/O handlers will not be modify,
using a spinlock will add contention when multiple vCPU are accessing
the emulated MMIOs. So use a read-write lock to protected the handlers.
Finally, take the opportunity to re-indent correctly domain_io_init.
Julien Grall [Tue, 28 Jun 2016 15:51:54 +0000 (16:51 +0100)]
xen/arm: gic-v3: No need to sort the Redistributor regions
The sorting was required by the vGIC emulation until commit 9b9d51e98edb8c5c731e2d06dfad3633053d88a4 "xen/arm: vgic-v3:
Correctly retrieve the vCPU associated to a re-distributor".
Furthermore, the code is buggy because both local variables 'l' and 'r'
point to the same region.
So drop the code which sort the Redistributors array.
Julien Grall [Tue, 14 Jun 2016 11:50:26 +0000 (12:50 +0100)]
xen/arm: map_dev_mmio_region: The iomem permission check should be done on MFN
The helper iomem_access_permitted expects MFNs in parameters and not
GNFs. Thankfully only the hardware domain can call this function and
it will always be with GFNS == MFNs for now.
Also, fix the printf to use the MFN range and not the GFN one.
XSM/policy: Allow the source domain access to settime and setdomainhandle domctls while creating domain.
This patch resolves the following permission denied scenarios while creating
new domU :
avc: denied { setdomainhandle } for domid=0 target=1
scontext=system_u:system_r:dom0_t tcontext=system_u:system_r:domU_t tclass=domain
A dedicated Xen driver domain init service starts "xl devd" in domU. But
currently, it is only supplied in the form of a SysV init script, which
systemd users run through a backward compatiblity wrapper automatically
generated by systemd-sysv-generator. This patch adds a (naturally more
lightweight) native systemd unit to be used instead.
The xendriverdomain service is only relevant to domU, but should not run
in dom0. Therefore, the systemd unit uses "ConditionVirtualization=xen",
which evaluates to true in domU and (since systemd version 214, released
on 2014-06-11) to false in dom0. Users or distributors who need to be
compatible with even older systemd versions, but still want to prevent
"xl devd" startup in dom0, could add the following line in [Service]:
ExecStartPre=/bin/sh -c "! grep -q control_d /proc/xen/capabilities"
(Please rerun autogen.sh after applying this patch)
Signed-off-by: Rusty Bird <rustybird@openmailbox.org> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
[ wei: rerun autogen.sh ]
Andrew Cooper [Thu, 30 Jun 2016 16:40:23 +0000 (17:40 +0100)]
tools/xl: Allow callers of `xl info` to select specific information
When scripting, it is much more convenient to use:
[root@fusebot ~]# xl info xen_version
4.8-unstable
than to construct some sed/awk/other to parse:
[root@fusebot ~]# xl info
...
xen_version : 4.8-unstable
...
This works by wrapping all printf() calls in main_info() with maybe_printf(),
which formats its arguments, compares the resulting string to the provided
restriction, and discards it if no match is found.
A restriction like this doesn't make sense in combination with --numa, so is
excluded in that case.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
xen: credit2: when tickling, check idle cpus first
If there are idle pCPUs, it's always better to try to
"ship" the new vCPU there, instead than letting it
preempting on a currently busy one.
This commit also adds a cpumask_test_or_cycle() helper
function, to make it easier to code the preference for
the pCPU where the vCPU was running before.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
xen: sched: make the 'tickled' perf counter clearer
In fact, what we have right now, i.e., tickle_idlers_none
and tickle_idlers_some, is not good enough for describing
what really happens in the various tickling functions of
the various scheduler.
Switch to a more descriptive set of counters, such as:
- tickled_no_cpu: for when we don't tickle anyone
- tickled_idle_cpu: for when we tickle one or more
idler
- tickled_busy_cpu: for when we tickle one or more
non-idler
While there, fix style of an "out:" label in sched_rt.c.
xen/arm64: Use the correct TLBs flush instruction to nuke stage-2 TLBs
The function flush_tlb is called to invalidate the TLBs for the current
domain when the stage-2 page tables are modified.
On ARMv8, the instruction "tlbi vmalle1is" (resp. "tlbi vmalle1") will
invalidate stage 1 entries associated to the current VMID (see D4-1811 in
ARM DDI 0487A.j).
Given that an implementation is allowed to cache separately stage 1 and
stage 2 translation (see D4.7.1), the instructions will not remove stage
2 entries when the translation is not combined in a single entry.
This will result the TLBs to hold invalid entries and possibly multiple
entries using the same VA.
Use "tlbi vmalls12e1is" (resp. "tlbi vmalls12e1"), to flush both stage
1 and 2 entries when the domain p2m is changed.
Also modify flush_tlb_local to invalidate stage 1 and 2 for the local
TLBs. Note that this function is used in the instruction abort path
before translating a GVA to a IPA. As far as I understand is to avoid a
guest poisoning the DTLB when memacces is in use. We might be able to
only invalidate stage 1 entries. However, I choose the safest way for now
(i.e invalidating stage 1 and 2 entries). We would need to introduce a
new set of helpers when we will want to restrict it.
Move xen/paging.h #include from hvm/monitor.h to hvm/monitor.c (include strictly
where needed) and also change to asm/paging.h (include strictly what's needed).
Signed-off-by: Corneliu ZUZU <czuzu@bitdefender.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com> Acked-by: Razvan Cojocaru <rcojocaru@bitdefender.com>