Andrew Cooper [Tue, 22 Apr 2014 12:45:10 +0000 (14:45 +0200)]
x86/EPT: correct double unmap_domain_page() on error path
c/s 3d90d6e6 "x86/EPT: split super pages upon mismatching memory types"
accidentally introduced an error path where the epte domain page would be
unmapped twice if splitting the superpage failed.
Only unmap the page if the loop is to be continued. When breaking from the
loop, the page will be unmapped by the subsequent code.
Coverity-ID: 1203047 Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Daniel De Graaf [Tue, 22 Apr 2014 10:10:13 +0000 (12:10 +0200)]
allow hardware domain != dom0
This adds a hypervisor command line option "hardware_dom=" which takes a
domain ID. When the domain with this ID is created, it will be used
as the hardware domain.
This is intended to be used when domain 0 is a dedicated stub domain for
domain building, allowing the hardware domain to be de-privileged and
act only as a driver domain.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Tue, 22 Apr 2014 10:08:56 +0000 (12:08 +0200)]
x86/HVM: use fixed TSC value when saving or restoring domain
When a domain is saved each VCPU's TSC value needs to be preserved. To get it we
use hvm_get_guest_tsc(). This routine (either itself or via get_s_time() which
it may call) calculates VCPU's TSC based on current host's TSC value (by doing a
rdtscll()). Since this is performed for each VCPU separately we end up with
un-synchronized TSCs.
Similarly, during a restore each VCPU is assigned its TSC based on host's current
tick, causing virtual TSCs to diverge further.
With this, we can easily get into situation where a guest may see time going
backwards.
Instead of reading new TSC value for each VCPU when saving/restoring it we should
use the same value across all VCPUs.
Reported-by: Philippe Coquard <philippe.coquard@mpsa.com> Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Tue, 22 Apr 2014 10:08:06 +0000 (12:08 +0200)]
x86/svm: enable TSC scaling
TSC ratio enabling logic is inverted: we want to use it when we
are running in native tsc mode, i.e. when d->arch.vtsc is zero.
Also, since now svm_set_tsc_offset()'s calculations depend
on vtsc's value, we need to call hvm_funcs.set_tsc_offset() after
vtsc changes in tsc_set_info().
In addition, with TSC ratio enabled, svm_set_tsc_offset() will
need to do rdtsc. With that we may end up having TSCs on guest's
processors out of sync. d->arch.hvm_domain.sync_tsc which is set
by the boot processor can now be used by APs as reference TSC
value instead of host's current TSC.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Boris Ostrovsky [Tue, 22 Apr 2014 10:07:37 +0000 (12:07 +0200)]
x86: use native RDTSC(P) execution when guest and host frequencies are the same
We should be able to continue using native RDTSC(P) execution on
HVM/PVH guests after migration if host and guest frequencies are
equal (this includes the case when the frequencies are made equal
by TSC scaling feature).
This also allows us to revert main part of commit 4aab59a3 (svm: Do not
intercept RDTSC(P) when TSC scaling is supported by hardware) which
was wrong: while RDTSC intercepts were disabled domain's vtsc could
still be set, leading to inconsistent view of guest's TSC.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 22 Apr 2014 10:05:44 +0000 (12:05 +0200)]
ACPI/ERST: fix signed/unsigned type conflicts
Error checks exist in the respective code path that expect negative
values to indicate errors, yet negative values can't be communicated
through size_t. Thus ssize_t needs to be introduced (also on ARM for
consistency, even if the code in question isn't currently being used
on there).
The bug is theoretical only in so far as all the involved code is
effectively dead. Reflect this by excluding that code from non-debug
builds.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Christoph Egger <chegger@amazon.de>
Jan Beulich [Tue, 22 Apr 2014 10:04:20 +0000 (12:04 +0200)]
x86/MSI: drop workaround for insecure Dom0 kernels
Considering that
- the workaround is expensive (iterating through the entire P2M space
of a domain),
- the planned elimination of the expensiveness (by propagating the type
change step by step to the individual P2M leaves) wouldn't address
the IOMMU side of things (as for it to obey to the changed
permissions the adjustments must be pushed down immediately through
the entire tree)
- the proper solution (PHYSDEVOP_msix_prepare) should by now be
implemented by all security conscious Dom0 kernels
remove the workaround, killing eventual guests that would be known to
become a security risk instead.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Daniel De Graaf [Thu, 17 Apr 2014 08:10:33 +0000 (10:10 +0200)]
implement is_hardware_domain using hardware_domain global
This requires setting the hardware_domain variable earlier in
domain_create so that functions called from it (primarily in
arch_domain_create) observe the correct results when they call
is_hardware_domain.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
pvh dom0: make xsm_map_gmfn_foreign available for x86
In this patch we make xsm_map_gmfn_foreign available for x86 also. This
is used in the next patch "pvh dom0: Add and remove foreign pages" in
function p2m_add_foreign.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
In this patch, a new type p2m_map_foreign is introduced for pages
that toolstack on an auto translated dom0 or a control domain maps
from foreign domains that its creating or supporting during its
run time.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org>
The "A" constraint, while documented up to gcc 4.5 as "The a and d
registers, as a pair (for instructions that return half the result in
one and half in the other)," never really behaved that (natural) way,
but always meant (and is now also documented so) %eax _or_ %edx (%rax
_or_ %rdx on x86-64) unless the operand was wide enough to require both
(i.e. more than 32 bits on ix86 and more than 64 bits on x86-64).
Interestingly something internal to the compiler changed between 4.4
and 4.5 to actually expose the difference - up to gcc 4.4 I was unable
to construct a case where, when only the low half of the result is
actually looked at, the result would be considered to be in %edx/%rdx
(and %eax/%rax would be treated as unmodified by the instruction).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Tested-by: Don Slutz <dslutz@verizon.com>
xen/arm: Pass the timer "clock-frequency" to DOM0 in make_timer_node()
If the DT representing the ARM generic timer mentions a clock-frequency,
propragate it to the DT that is built for DOM0.
This is necessary as a workaround for boards (Odroid-XU) where CNTFRQ is not
set or returns a wrong value.
Ideally CNTFRQ should be set by the boot loader. The bootloader should respect
the ARM ARM (see B.8.1.1):
"The CNTFRQ register is UNKNOWN at reset, and therefore the counter
frequency must written to CNTFRQ as part of the system boot process."
For the Odroid-XU the SPL BL2 code is entered in NS HYP mode which prevents
the execution of the mcr call to set CNTFRQ.
Andrew Cooper [Tue, 15 Apr 2014 18:18:42 +0000 (19:18 +0100)]
tools/libxc: Remove valgrind conditional sections from libxc
The ifdef sections are not enabled at all in tree, and their justification is
out of date now that Xen hypercall support exists upstream in valgrind.
This also removes a commented-out tweak to CFLAGS in the libxc Makefile which
is not being used, and becomes stale given this patch. In the unlikely event
that any developers were using the line, the results can be more easily
achieved by tweaking APPEND_CFLAGS in the environment.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Fri, 11 Apr 2014 15:46:14 +0000 (16:46 +0100)]
tools/libxl: Improvements to libxl-save-helper when using valgrind
Fix two unfree()'d allocations in libxl-save-helper, to get them out of the
way of other legitimate complaints from valgrind.
The first is easy; close the interface to libxc when done with it.
The second can be fixed by removing the complexity of creating the logging
instance. Initialise the global 'logger' in place rather than as an
allocation, which requires changing the indirection of its use in 5 locations.
struct xentoollog_logger_tellparent and function createlogger_tellparent() are
now unused and removed.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Fri, 11 Apr 2014 15:46:13 +0000 (16:46 +0100)]
libxl/save-helper: Code motion of logging functions
... in preparation for a subsequent functional fix
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Thu, 10 Apr 2014 15:26:31 +0000 (16:26 +0100)]
libxl/gentypes.py: don't generate JSON for private type(s)
Private types are only useful inside libxl. They don't have a valid JSON
generation function by default.
Currently there's only one private type, that's libxl_ev_link. Not
skipping this field causes testidl to fail as the code generated for
this type is NULL.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
On arm64, VFP instructions requires vfpregs to be 128-byte aligned.
By chance, the field is already correctly aligned. In the case if someone
decides to add a new field before, Xen will receive a data abort as soon as
it saves/restores VFP.
We are safe on arm32 as the only constraint is to be 32-byte aligned.
The Linux = issue which this works around was fixed in v3.13 via f52bb722547f
"ARM: mm: Correct virt_to_phys patching for 64 bit physical addresses".
This is the second attempt to revert this. Now that we have fixed
allocate_memory_11 to allocate accessible memory on 32-bit this is safe to do.
This is not quite a straight revert since we need to ensure that for 32-bit
domain 0 we do not allocate dom0's memory above 4GB where the domain cannot
access it without paging (which is disabled at start of day) and LPAE (which
the kernel may not support) enabled.
Ian Campbell [Fri, 4 Apr 2014 12:56:58 +0000 (13:56 +0100)]
xen: arm: probe the kernel to determine the guest type earlier
We need to know if the kernel is 32- or 64- bit capable sooner so that we know
what sort of domain we are building when allocating memory to it (so we can
place appropriate limits when allocating RAM to the guest). At the moment
kernel_prepare() decides this but it needs the memory to have already been
allocated.
Therefore split the probing functionality of kernel_prepare() and call it much
earlier. The remainder of kernel_prepare() deals with where to place the
kernel in RAM which can be deferred until kernel_load() so do so.
Document the input and output of kernel_probe() and _load().
Jan Beulich [Mon, 14 Apr 2014 13:13:33 +0000 (15:13 +0200)]
MAINTAINERS: extend coverage for "THE REST"
As agreed upon in offlist discussion among the committers, make all
committers eligible to approve changes to code not having its
maintainership covered explicitly. (For committers to make changes to
such code, generally an ack from a second committer is going to be
required.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Tim Deegan <tim@xen.org>
Because some of the leaf p2m functions return 0 for failure and
TRUE for success, the real errno is lost. We change some of those
functions to return proper -errno. Also, any code in the immediate
vicinity that is in coding style violation is fixed up.
This patch doesn't change any functionality.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
This patch renames set_p2m_entry defined in arch/x86/mm/p2m.c
to p2m_set_entry which makes it consistent with other functions
from that file. It also facilitates changing the function signature
to return approriate errno for failure cases. This patch doesn't
change any functionality.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org>
This patch renames "public" functions in p2m-pt.c. In addition to
making them more descriptive, it also frees up "p2m_set_entry" name
to be used later. This patch doesn't change any functionality.
Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Acked-by: Tim Deegan <tim@xen.org>
- Remove version from installed package name, to make "upgrades" work
- Add conffiles to manage files in /etc on package install/update/remove
- Added in description that this is a .deb for testing only
Signed-off-by: Fabio Fantoni <fabio.fantoni@m2r.biz> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Fri, 11 Apr 2014 09:27:04 +0000 (11:27 +0200)]
further prefetch cleanup
- commit 630017f4 ("xen: x86 & generic: change to __builtin_prefetch()")
removed the ARCH_HAS_PREFETCH{,W} defines, but left the
ARCH_HAS_SPINLOCK_PREFETCH one in place
- the x86 special casing code has always been dead due to the two
respective CONFIG_* settings not getting defined anywhere
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Apr 2014 09:22:49 +0000 (11:22 +0200)]
libxl: allow dom0 to be destroyed
When dom0 is not the hardware domain, it can be destroyed in the same
way as any other service domain. To avoid accidental use when a domain
is not resolved, destroying domain 0 requires passing -f to xl destroy.
Since the hypervisor already prevents a domain from destroying itself,
this patch is only useful in a disaggregated environment.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Apr 2014 09:20:55 +0000 (11:20 +0200)]
rename dom0 to hardware_domain
This should not change any functionality other than renaming the global
variable. In a few cases (primarily the domain building code), a local
variable or argument named dom0 was created and used instead of the
global hardware_domain to clarify that the domain being used in this
case is actually domain 0.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Christoph Egger <chegger@amazon.de> Acked-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Apr 2014 09:20:08 +0000 (11:20 +0200)]
prevent 0 from being used as a dynamic domid
When the hardware domain is made distinct from dom0, it becomes possible
to shut down and destroy domain 0 while leaving the hypervisor running.
If this happens, prevent this domain ID from being considered for
allocation to a new guest.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Acked-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Apr 2014 09:19:16 +0000 (11:19 +0200)]
iommu: Move dom0 setup code to __hwdom_init
When the hardware domain is split from domain 0, the initialization code
for the hardware domain cannot be in the __init section, since the
actual domain creation happens after these sections have been discarded.
Create a __hwdom_init section designator to annotate these functions,
and control it using the XSM configuration option for now (since XSM is
required to take advantage of the security benefits of disaggregation).
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Keir Fraser <keir@xen.org>
Daniel De Graaf [Fri, 11 Apr 2014 09:16:52 +0000 (11:16 +0200)]
use domid check in is_hardware_domain
Instead of checking is_privileged to determine if a domain should
control the hardware, check that the domain_id is equal to zero (which
is currently the only domain for which is_privileged is true). This
allows other places where domain_id is checked for zero to be replaced
with is_hardware_domain.
The distinction between is_hardware_domain, is_control_domain, and
domain 0 is based on the following disaggregation model:
Domain 0 bootstraps the system. It may remain to perform requested
builds of domains that need a minimal trust chain (i.e. vTPM domains).
Other than being built by the hypervisor, nothing is special about this
domain - although it may be useful to have is_control_domain() return
true depending on the toolstack it uses to build other domains.
The hardware domain manages devices for PCI pass-through to driver
domains or can act as a driver domain itself, depending on the desired
degree of disaggregation. It is also the domain managing devices that
do not support pass-through: PCI configuration space access, parsing the
hardware ACPI tables and system power or machine check events. This is
the only domain where is_hardware_domain() is true. The return of
is_control_domain() may be false for this domain.
The control domain manages other domains, controls guest launch and
shutdown, and manages resource constraints; is_control_domain() returns
true. The functionality guarded by is_control_domain may in the future
be adapted to use explicit hypercalls, eliminating the special treatment
of this domain. It may be reasonable to have multiple control domains
on a multi-tenant system.
Guest domains and other service or driver domains are all treated
identically by the hypervisor; the security policy may further constrain
administrative actions on or communication between these domains.
Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Keir Fraser <keir@xen.org>
Jan Beulich [Thu, 10 Apr 2014 14:07:17 +0000 (16:07 +0200)]
x86: fix pinned cache attribute handling
- make sure UC- is only used for PAT purposes (MTRRs and hence EPT
don't have this type)
- add order input to "get", and properly handle conflict case (forcing
an EPT page split)
- properly detect (and refuse) overlaps during "set"
- properly use RCU constructs
- support deleting ranges through a special type input to "set"
- set ignore-PAT flag in epte_get_entry_emt() when "get" succeeds
- set "get" output to ~0 (invalid) rather than 0 (UC) on error (the
caller shouldn't be looking at it anyway)
- move struct hvm_mem_pinned_cacheattr_range from header to C file
(used only there)
Note that the code (before and after this change) implies the GFN
ranges passed to the hypercall to be inclusive, which is in contrast
to the sole current user in qemu (all variants). It is not clear to me
at which layer (qemu, libxc, hypervisor) this would best be fixed.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 10 Apr 2014 14:06:09 +0000 (16:06 +0200)]
x86/EPT: IOMMU snoop capability should not affect memory type selection
This capability solely makes a statement on cache coherency guarantees
by the IOMMU. It does specifically not imply any further guarantees
implied by certain memory types (cachability, ordering).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 10 Apr 2014 14:05:12 +0000 (16:05 +0200)]
x86/EPT: split super pages upon mismatching memory types
... between constituent pages. To indicate such, the page order is
being passed down to the vMTRR routines, with a negative return value
(possible only on order-non-zero pages) indicating such collisions.
Some code redundancy reduction is being done to ept_set_entry() along
the way, allowing the new handling to be centralized to a single place
there.
In order to keep ept_set_entry() fast and simple, the actual splitting
is being deferred to the EPT_MISCONFIG VM exit handler.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Thu, 10 Apr 2014 14:01:41 +0000 (16:01 +0200)]
x86/EPT: force re-evaluation of memory type as necessary
The main goal here is to drop the bogus dependency of
epte_get_entry_emt() on d->arch.hvm_domain.params[HVM_PARAM_IDENT_PT].
Any change to state influencing epte_get_entry_emt()'s decision needs
to result in re-calculation. Do this by using the EPT_MISCONFIG VM
exit, storing an invalid memory type into EPT's emt field (leaving the
IOMMU, which doesn't care about memory types, unaffected).
This is being done in a hierarchical manner to keep execution time
down: Initially only the top level directory gets invalidated this way.
Upon access, the involved intermediate page table levels get cleared
back to zero, and the leaf entry gets its field properly set. For 4k
leaves all other entries in the same directory also get processed to
amortize the cost of the extra VM exit (which halved the number of
these VM exits in my testing).
This restoring can result in spurious EPT_MISCONFIG VM exits (since
two vCPU-s may access addresses involving identical page table
structures). Rather than simply returning in such cases (and risking
that such a VM exit results from a real mis-configuration, which
would then result in an endless loop rather than killing the VM), a
per-vCPU flag is being introduced indicating when such a spurious VM
exit might validly happen - if another one occurs right after VM re-
entry, the flag would generally end up being clear, causing the VM
to be killed as before on such VM exits.
Note that putting a reserved memory type value in the EPT structures
isn't formally sanctioned by the specification. Intel isn't willing to
adjust the specification to make this or a similar use of the
EPT_MISCONFIG VM exit formally possible, but they have indicated that
us using this is low risk wrt forward compatibility.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Kevin Tian <kevin.tian@intel.com>
Ian Campbell [Wed, 9 Apr 2014 11:51:16 +0000 (12:51 +0100)]
xen: arm: rework dom0 initrd and dtb placement
This now uses the same decision tree as libxc (which is much easier to test).
The main change is to explicitly handle the placement at 128MB or end of RAM
as two cases, rather than combining with MIN. The effect is the same but the
code is clearer.
Secondly the attempt to place the modules right after the kernel is removed,
since it is redundant with the case where placing them at the end of RAM ends
up abutting the kernel.
Also round the kernel size up to a 2MB boundary.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
Ian Campbell [Wed, 9 Apr 2014 11:51:14 +0000 (12:51 +0100)]
tools: arm: improve placement of initial modules.
314c9815e2f5 "tools: implement initial ramdisk support for ARM." broke starting
guests with <= 128 MB ram by placing the boot modules (dtb and initrd)
immediately after the kernel in this case, running the risk of them being
overwritten. Instead place the modules at the end of RAM, as the hypervisor
does for dom0.
The hypervisor also falls back to placing things before the kernel as a last
resort before failing, so add that here too.
Tested with the Debian installer initrd and guests of 96MB, 128MB, 256MB and
1GB. All work, also tested with 64MB but the installer doesn't run with so
little RAM (but our placement of the initrd is correct).
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org> Cc: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Mon, 7 Apr 2014 11:07:04 +0000 (12:07 +0100)]
xen: make sure that likely and unlikely convert the expression to a boolean
According to http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
__builtin_expect has the prototype:
long __builtin_expect (long exp, long c)
If sizeof(exp) > sizeof(long) then this will effectively mask off the top bits
of exp, meaning that the if in "if (unlikey(x))" will see the masked version,
which might be false when true was expected, likely has the same issue.
This is mostly likely to affect x86_32 and arm32 builds. x86_32 is not
present on 4.3 onwards and a quick grep of current staging shows that all the
existing arm32 uses of both likely and unlikely already pass a boolean. I
noticed this with an as yet unposted patch which did not have this property.
Also the defintion of likely might not have had the expected affect for cases
where a true value > 1 might be passed.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Cc: Keir Fraser <keir@xen.org> Cc: Tim Deegan <tim@xen.org>
Ian Campbell [Tue, 8 Apr 2014 15:37:58 +0000 (16:37 +0100)]
build: remove Linux kernel build integration.
We haven't shipped a XenoLinux kernel for more releases than I can remember.
We held onto these because osstest was using them but this is no longer the
case.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
These are a xend-ism. Since Xen 4.1 the recommened way to configure networking
has been to use the distro facilities (e.g.
http://wiki.xen.org/wiki/HostConfiguration/Networking)
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Ian Campbell [Wed, 9 Apr 2014 08:26:23 +0000 (09:26 +0100)]
docs: remove stray CONFIG_XENDs and configure option from docs.
These were added by 7dbfc2f8b054 "docs: Honour --{en, dis}able-xend when
building docs" between v1 and the (eventually committed) v2 of 9e8672f1c36d
"tools: remove xend and associated python modules" and were missed when
rebasing for v2.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Tested-by: Olaf Hering <olaf@aepfle.de> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Wed, 9 Apr 2014 14:13:25 +0000 (16:13 +0200)]
x86/AMD: feature masking is unavailable on Fam11
Reported-by: Aravind Gopalakrishnan<aravind.gopalakrishnan@amd.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
The root cause is there is an wronng
'write_unlock(&pcd_tree_rwlocks[firstbyte])' in function
tmem_try_to_evict_pgp().
Nobody will lock &pcd_tree_rwlocks if dedup=0, but the write_unlock() will be
executed anyway. This was introduced by a git commit 38c433d0c711406778aba1ae183a195da98656f0 ("tmem: add page deduplication with
optional compression or trailing-zero-elimination")
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Tue, 28 Jan 2014 04:28:32 +0000 (12:28 +0800)]
tmem: reorg the shared pool allocate path
Reorg the code to make it more readable.
Check the return value of shared_pool_join() and drop a unneeded call to
it. Disable creating a shared & persistant pool in an advance place.
Note that one might be tempted to delay the creation of the pool even
further in the code. That however would break the behavior of the code
- that is if we ended up creating a shared pool and the
'uuid_lo == -1L && uuid_hi == -1L' logic stands we still need to
create a pool - just not shared type.
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Tue, 28 Jan 2014 04:28:31 +0000 (12:28 +0800)]
tmem: cleanup: refactor function tmemc_shared_pool_auth()
Make function tmemc_shared_pool_auth() more readable.
Note that the previous check for free being set the first time
'(free == -1)' in the loop is now removed. That is OK because
when we set free the first time ('free = i;') we follow it
immediately with a break to get out of the loop.
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Tue, 28 Jan 2014 04:28:28 +0000 (12:28 +0800)]
tmem: remove unneeded parameters from obj destroy path
Parameters "selective" and "no_rebalance" are meaningless in obj
destroy path, this patch remove them. No place uses
no_rebalance=1. In the obj_destroy path we always call it with
no_balance=0.
Note that this will now free it only if:
obj->last_client == cli_id
Which is OK - even if we allocate a non-shared pool we set by
default the obj->last_client to TMEM_CLI_ID_NULL so even if
the pool is never used, the pool_flush will take care of removing
those.
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Tue, 28 Jan 2014 04:28:30 +0000 (12:28 +0800)]
tmem: fix the return value of tmemc_set_var()
tmemc_set_var() calls tmemc_set_var_one() but without taking its return value,
this patch fix this issue.
Also rename tmemc_set_var_one() to __tmemc_set_var().
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Wed, 12 Feb 2014 14:43:24 +0000 (22:43 +0800)]
tmem: cleanup the pgp free path
There are several functions related with pgp free, but their relationships are
not clear enough for understanding. This patch made some cleanup by remove
pgp_delist() and pgp_free_from_inv_list().
The call trace is simple now:
pgp_delist_free()
> pgp_free()
> __pgp_free()
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Bob Liu [Tue, 28 Jan 2014 04:28:24 +0000 (12:28 +0800)]
tmem: cleanup: remove unneed parameter from pgp_delist()
The parameter "eph_lock" is only needed for function tmem_evict(). Embeded the
delist code into tmem_evict() directly so as to drop the eph_lock parameter. By
this change, the eph list lock can also be released a bit earier.
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: A fix for an assertion of 'client->eph_count >= 0' was rolled in]
Bob Liu [Tue, 28 Jan 2014 04:28:23 +0000 (12:28 +0800)]
tmem: bugfix in obj allocate path
There is a potential bug in the obj allocate path. When there are parallel
callers allocate a obj and insert it to pool->obj_rb_root, an unexpected
obj might be returned (both callers use the same oid).
Continue write data to objA
But in future obj_find(), objB
will always be returned.
The route cause is the allocate path didn't check the return value of
obj_rb_insert(). This patch fix it and replace obj_new() with better name
obj_alloc().
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This is done so PVH guests can use PHYSDEVOP_pirq_eoi_gmfn_v{1/2}.
Update users of this fields, to reflect that this has been moved and
it is now also available to other kind of guests.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Move auto_unmask ahead of the other two fields, to reduce padding.
Don Slutz [Wed, 9 Apr 2014 10:16:00 +0000 (12:16 +0200)]
xentrace: add TRC_HVM_EMUL
This add a set of trace events that track the setup of various
emulated devices related to timers in domU.
This set is hpet, pit (i8253, i8254), rtc (MC146818), apic (lapic),
and pic (i8259). The pmtimer is not traced since it does not have a
changeable rate.
Signed-off-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
xen/arm32: __cmpxchg_mb should be marked always_inline
Currently __cmpxchg_mb is only marked inline. The compiler can decide to not
inline this function. In this case, the call to __cmpxchg will be inlined
but not optimised. This will result linking failure because of __bad_cmpxchg.
Caught by clang 3.5.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Fri, 4 Apr 2014 13:28:45 +0000 (14:28 +0100)]
tools: implement initial ramdisk support for ARM.
The ramdisk is passed to the kernel as a property in the chosen node of the
device tree. This is somewhat tricky since in order to place the ramdisk and
dtb in ram we first need to know the size of the dtb. So we initially create a
DTB with placeholders for the ramdisk and finalise the value (which doesn't
change the size) once we know where everything is.
Rename libxl__arch_domain_configure to xl__arch_domain_init_hw_description to
better reflect its use and to be consistent with the new
libxl__arch_domain_finalise_hw_description.
The common xc_dom_build_image() function did not support explicit placement of
the ramdisk, instead passing 0 to xc_dom_alloc_segment, meaning "pick
somewhere". This change instead passes ramdisk_seg.vstart. If nothing has set
vstart then it will be zero because the entire dom struct is zeroed on
allocation in xc_dom_allocate(). Therefore there is no change to the behaviour
on x86. This is also consistent with how other segments (kernel, dtb) are
handled.
Furthermore if the ramdisk has been explicitly placed then xc_dom_build_image()
assumes that it is not to be decompressed (since that would muck up the sizings
used on placement).
With all that I'm able to boot a domain using the current Debian Jessie armhf
installer initrd and have it complete successfully.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Julien Grall <julien.grall@linaro.org>
[ ijc -- s/itherwise/otherwise and dropped bogus emacs magic change ]
Bob Liu [Wed, 12 Feb 2014 14:43:19 +0000 (22:43 +0800)]
tmem: refactor function do_tmem_op()
Refactor function do_tmem_op() to make it more readable.
Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fixed up tab vs spaces, also removed dead code and added gulped code]
Andrew Cooper [Tue, 8 Apr 2014 10:39:23 +0000 (12:39 +0200)]
atomic: use static inlines instead of macros
This is some coverity-inspired tidying.
Coverity has some grief analysing the call sites of atomic_read(). This is
believed to be a bug in Coverity itself when expanding the nested macros, but
there is no legitimate reason for it to be a macro in the first place.
This patch changes {,_}atomic_{read,set}() from being macros to being static
inline functions, thus gaining some type safety.
One issue which is not immediately obvious is that the non-atomic variants take
their atomic_t at a different level of indirection to the atomic variants.
This is not suitable for _atomic_set() (when used to initialise an atomic_t)
which is converted to take its parameter as a pointer. One callsite of
_atomic_set() is updated, while the other two callsites are updated to
ATOMIC_INIT().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan<tim@xen.org> Acked-by: Keir Fraser <keir@xen.org>
[For the arm bits:] Acked-by: Ian Campbell <ian.campbell@citrix.com>
do_tmem_destroy_pool is checking if pools == NULL. But, pools is a fixed
array.
Clang 3.5 will fail to compile xen/common/tmem.c with the following error:
tmem.c:1848:18: error: comparison of array 'client->pools' equal to a null
pointer is always false [-Werror,-Wtautological-pointer-compare]
if ( client->pools == NULL )
print_special() uses the width argument to both select output format
and array size. So by passing 4 it expects an array of uint32_t.
But an array of uint64_t is passed.
So copy and mask the registers to 32 bits.
Signed-off-by: Don Slutz <dslutz@verizon.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Don Slutz [Thu, 3 Apr 2014 19:07:04 +0000 (15:07 -0400)]
xenctx: change is_kernel_text() into kernel_addr().
A new enum has been added to allow the caller to determine if this
kernel address is a text or data address. This is currenlty not
used, but will be in the next patch.
Add both _end and __bss_stop as kernel_end.
Signed-off-by: Don Slutz <dslutz@verizon.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>