Roger Pau Monné [Tue, 10 Nov 2015 11:06:28 +0000 (12:06 +0100)]
x86: allow disabling the emulated PIT
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reported by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Roger Pau Monné [Tue, 10 Nov 2015 11:05:35 +0000 (12:05 +0100)]
x86: allow disabling the emulated IOMMU
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Jan Beulich [Tue, 10 Nov 2015 11:03:08 +0000 (12:03 +0100)]
x86/HVM: always intercept #AC and #DB
Both being benign exceptions, and both being possible to get triggered
by exception delivery, this is required to prevent a guest from locking
up a CPU (resulting from no other VM exits occurring once getting into
such a loop).
The specific scenarios:
1) #AC may be raised during exception delivery if the handler is set to
be a ring-3 one by a 32-bit guest, and the stack is misaligned.
This is CVE-2015-5307 / XSA-156.
Reported-by: Benjamin Serebrin <serebrin@google.com>
2) #DB may be raised during exception delivery when a breakpoint got
placed on a data structure involved in delivering the exception. This
can result in an endless loop when a 64-bit guest uses a non-zero IST
for the vector 1 IDT entry, but even without use of IST the time it
takes until a contributory fault would get raised (results depending
on the handler) may be quite long.
This is CVE-2015-8104 / XSA-156.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Paul Durrant [Fri, 6 Nov 2015 14:17:00 +0000 (15:17 +0100)]
x86/hvm: make sure stdvga cache cannot be re-enabled
As soon as the cache is disabled, it will become out-of-sync with the
VGA device model and since no mechanism exists to acquire current VRAM
state from the device model, re-enabling it leads to stale data
being seen by the guest.
The problem was introduced by commit 3bbaaec0 ("x86/hvm: unify stdvga
mmio intercept with standard mmio intercept") and can be seen by
deliberately crashing a Windows guest; the BSOD output is corrupted.
This patch changes the existing 'cache' boolean in hvm_hw_stdvga into a
tri-state enum and only allows the state to move from 'uninitialized' to
'enabled'. Once the cache state becomes 'disabled' it will remain so for
the lifetime of the VM.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Dario Faggioli [Fri, 6 Nov 2015 14:16:38 +0000 (15:16 +0100)]
sched: fix locking of remove_vcpu() in credit1
In fact, csched_vcpu_remove() (i.e., the credit1
implementation of remove_vcpu()) manipulates runqueues,
so holding the runqueue lock is necessary.
However, the vCPU just can't be on the runqueue, when
the function is called. We can therefore ASSERT() that,
and avoid doing any runqueue manipulations (rather than
adding the runqueue locking around it).
Also, while there, *_lock_irq() (for the private lock) is
enough, there is no need to *_lock_irqsave().
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Andrew Cooper [Wed, 4 Nov 2015 16:47:17 +0000 (17:47 +0100)]
x86: cleanup of early cpuid handling
Use register names for variables, rather than their content for leaf 1.
Reduce the number of cpuid instructions issued. Also drop some trailing
whitespace.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Harmandeep Kaur [Wed, 4 Nov 2015 16:46:46 +0000 (17:46 +0100)]
credit: remove cpu argument to __runq_insert()
__runq_insert() takes two arguments, cpu and svc. However,
the cpu argument is redundant because we can get all the
information we need about cpu from svc.
Ross Lagerwall [Mon, 2 Nov 2015 11:17:38 +0000 (11:17 +0000)]
xenconsoled: Remove unexpected daemonize behavior
Previously, xenconsoled's daemonize function would do nothing if its
parent process is init (as it is under systemd but not sysv init).
This is confusing. Instead, always daemonize when asked to, but use the
"interactive" switch when running from the systemd service.
Because a pidfile is only written when daemonizing, drop the pidfile
parameters from the service file (systemd keeps track of the pids
anyway).
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Wed, 4 Nov 2015 12:03:31 +0000 (13:03 +0100)]
xl: avoid (another) uninitialised use of rc in vcpuset()
Rearange the case when we check the new number of vCPUs
against the number of host pCPUs not to use rc for internal
error reporting. In fact:
- rc was at risk of being used uninitialised;
- rc should only be used for holding libxl error codes.
Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Wed, 4 Nov 2015 11:32:57 +0000 (11:32 +0000)]
xl: initialise rc before using it in vcpuset
In 5b725e56 (xl: improve return and exit codes of vcpu related
functions), the return value of libxl_cpu_bitmap_alloc was not stored in
rc anymore. Yet the subsequent fprintf still used that.
Reinstate the original implementation, that is, to store return value of
libxl_cpu_bitmap_alloc in rc before using rc.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Tue, 3 Nov 2015 17:15:58 +0000 (18:15 +0100)]
x86: query for paddr_bits in early_cpu_detect()
It is __read_mostly, so repeatedly writing to it is suboptiomal. As the
MTRRs have already been set up, nothing good will come from its value
changing across CPUs.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Andrew Cooper [Tue, 3 Nov 2015 17:15:15 +0000 (18:15 +0100)]
x86/vmx: replace unqualified ud2 instructions with BUG frames
Using new _ASM_BUGFRAME* internals.
A side effect of complicating the ASM statements is that GCC now chooses to
out-of-line the stub functions, resulting in identical copies being present in
all translation units. As with the stac()/clac() stubs, force them always
inline.
No functional change, other than the failure cases, which now produce a
far more clear error message.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Andrew Cooper [Tue, 3 Nov 2015 17:14:49 +0000 (18:14 +0100)]
x86/bug: break out the internals of BUG_FRAME()
To allow bug frames can be created inside existing asm() statements. In
order to do so, the current bugframe positional parameters are altered
to be named parameters, to avoid interactions with the parameters of the
existing asm() statement.
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 3 Nov 2015 17:11:56 +0000 (18:11 +0100)]
x86/HAP: use %pv printk() format where suitable
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 3 Nov 2015 17:07:20 +0000 (18:07 +0100)]
compat: enforce distinguishable file names in symbol table
To make it possible to tell apart the static symbols in files built a
second time for compat guest support, arrange for their source file
names to be prefixed by a suitable path. We can't do this without
explicit .file directives, since gcc has always been stripping paths
from file names handed to the internally generated .file directive.
However, we can leverage __FILE__ if we make sure the second instance
gets compiled out of other than the very directory the wrapper sits in.
Where suitable, remove the long redundant explicit inclusions of
xen/config.h at once.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 3 Nov 2015 17:05:35 +0000 (18:05 +0100)]
symbols: prefix static symbols with their source file names
This requires adjustments to the tool generating the symbol table and
its as well as nm's invocation.
Note: Not warning about duplicate symbols in the EFI case for now, as
a binutils bug causes misnamed file name entries to appear in EFI
binaries' symbol tables when the file name is longer than 18 chars.
(Not doing so also avoids other duplicates getting printed twice.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:24 +0000 (07:56 +0530)]
xl: improve return and exit codes of parse related functions
Turning parsing related functions exit codes towards using the
EXIT_[SUCCESS|FAILURE] constants, instead of instead of arbitrary numbers
or libxl return codes.
- for main_*: arbitrary -> EXIT_SUCCESS|EXIT_FAILURE.
- for internal fucntion: arbitrary -> 0/1.
Don't touch parse_config_data() which is big enough to deserve its own patch.
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:23 +0000 (07:56 +0530)]
xl: improve return and exit codes of parse related functions
Turning cpupools related functions exit codes towards using the
EXIT_[SUCCESS|FAILURE] constants, instead of instead of arbitrary numbers
or libxl return codes.
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:22 +0000 (07:56 +0530)]
xl: improve return and exit codes of vcpu related functions
Turning vcpu manipulation functions exit codes toward using the
EXIT_[SUCCESS|FAILURE] constants, instead of instead of arbitrary numbers
or libxl return codes.
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:21 +0000 (07:56 +0530)]
xl: improve return and exit codes of scheduling related functions
Turning scheduling related functions exit codes towards using the
EXIT_[SUCCESS|FAILURE] constants, instead of instead of arbitrary numbers
or libxl return codes.
- for main_*: arbitrary -> EXIT_SUCCESS|EXIT_FAILURE.
- for internal fucntion: arbitrary -> 0/1.
Harmandeep Kaur [Wed, 28 Oct 2015 02:26:20 +0000 (07:56 +0530)]
xl: convert main() exit codes to EXIT_[SUCCESS|FAILURE]
Turning main() function exit codes towards using the EXIT_[SUCCESS|FAILURE]
constants, instead of instead of arbitrary numbers or libxl return codes.
Also includes a document comment in xl.h stating xl process should always
return EXIT_FOO and main_* can be treated as main() as if they are returning
a process exit status and not a function return value)
Olaf Hering [Thu, 29 Oct 2015 11:02:54 +0000 (11:02 +0000)]
tools/hotplug: xendomains.service conflicts with libvirt
xendomains will manage guests behind libvirts back:
- libvirt starts a guest
- that guest can be "managed" by libvirt and xl at the same time
- when xendomains runs on shutdown it will save the guest using xl
libvirt does not know about this
- when xendomains runs on boot it will restore the saved guest using xl
libvirt does not know about this, it will just fail to manage the
restored guest
To prevent xendomains from interfering with libvirt add a Conflicts= to
xendomains.service. It will cause libvirt to be stopped if xendomains is
started manually with 'systemctl start'.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Dario Faggioli [Mon, 2 Nov 2015 14:33:19 +0000 (15:33 +0100)]
credit1: on vCPU wakeup, kick away current only if makes sense
In fact, when waking up a vCPU, __runq_tickle() is called
to allow the new vCPU to run on a pCPU (which one, depends
on the relationship between the priority of the new vCPU,
and the ones of the vCPUs that are already running).
If there is no idle processor on which the new vCPU can
run (e.g., because of pinning/affinity), we try to migrate
away the vCPU that is currently running on the new vCPU's
processor (i.e., the processor on which the vCPU is waking
up).
Now, trying to migrate a vCPU has the effect of pushing it
through a
running --> offline --> runnable
transition, which, in turn, has the following negative
effects:
1) Credit1 counts that as a wakeup, and it BOOSTs the
vCPU, even if it is a CPU-bound one, which wouldn't
normally have deserved boosting. This can prevent
legit IO-bound vCPUs to get ahold of the processor
until such spurious boosting expires, hurting the
performance!
2) since the vCPU is fails the vcpu_runnable() test
(within the call to csched_schedule() that follows
the wakeup, as a consequence of tickling) the
scheduling rate-limiting mechanism is also fooled,
i.e., the context switch happens even if less than
the minimum execution amount of time passed.
In particular, 1) has been reported to cause the following
issue:
* VM-IO: 1-vCPU pinned to a pCPU, running netperf
* VM-CPU: 1-vCPU pinned the the same pCPU, running a busy
CPU loop
==> Only VM-I/O: throughput is 806.64 Mbps
==> VM-I/O + VM-CPU: throughput is 166.50 Mbps
This patch solves (for the above scenario) the problem
by checking whether or not it makes sense to try to
migrate away the vCPU currently running on the processor.
In fact, if there aren't idle processors where such a vCPU
can execute. attempting the migration is just futile
(harmful, actually!).
With this patch, in the above configuration, results are:
==> Only VM-I/O: throughput is 807.18 Mbps
==> VM-I/O + VM-CPU: throughput is 731.66 Mbps
Reported-by: Kun Suo <ksuo@uccs.edu> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com> Tested-by: Kun Suo <ksuo@uccs.edu> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Mon, 2 Nov 2015 14:32:48 +0000 (15:32 +0100)]
x86: make compat_iret() domain crash cases distinguishable
Rather than issuing a (mostly) useless separate message, rely on
domain_crash() providing enough data, and leverage the line number
information it prints.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Mon, 2 Nov 2015 14:28:33 +0000 (15:28 +0100)]
libxlu: avoid linker warnings
Recent ld warns about libxenlight.so's dependency libraries not being
available, which can be easily avoided by not just passing the raw
library name on ld's command line.
In the course of checking how things fit together (I originally
suspected the warning to come from the linking of xl) I also noticed a
stray L in SHLIB_libxenguest, which gets removed at once.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Mon, 2 Nov 2015 14:26:40 +0000 (15:26 +0100)]
drop get_xen_guest_handle()
Its use in the tools (and its apparent abuse in the hypervisor) are
long gone.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 21 Oct 2015 15:18:30 +0000 (16:18 +0100)]
libxl: adjust PoD target by memory fudge, too
PoD guests need to balloon at least as far as required by PoD, or risk
crashing. Currently they don't necessarily know what the right value
is, because our memory accounting is (at the very least) confusing.
Apply the memory limit fudge factor to the in-hypervisor PoD memory
target, too. This will increase the size of the guest's PoD cache by
the fudge factor LIBXL_MAXMEM_CONSTANT (currently 1Mby). This ensures
that even with a slightly-off balloon driver, the guest will be
stable even under memory pressure.
There are two call sites of xc_domain_set_pod_target that need fixing:
The one in libxl_set_memory_target is straightforward.
The one in xc_hvm_build_x86.c:setup_guest is more awkward. Simply
setting the PoD target differently does not work because the various
amounts of memory during domain construction no longer match up.
Instead, we adjust the guest memory target in xenstore (but only for
PoD guests).
This introduces a 1Mby discrepancy between the balloon target of a PoD
guest at boot, and the target set by an apparently-equivalent `xl
mem-set' (or similar) later. This approach is low-risk for a security
fix but we need to fix this up properly in xen.git#staging and
probably also in stable trees.
Jan Beulich [Thu, 29 Oct 2015 12:37:19 +0000 (13:37 +0100)]
x86: rate-limit logging in do_xen{oprof,pmu}_op()
Some of the sub-ops are acessible to all guests, and hence should be
rate-limited. In the xenoprof case, just like for XSA-146, include them
only in debug builds. Since the vPMU code is rather new, allow them to
be always present, but downgrade them to (rate limited) guest messages.
This is CVE-2015-7971 / XSA-152.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Thu, 29 Oct 2015 12:36:25 +0000 (13:36 +0100)]
x86/PoD: Eager sweep for zeroed pages
Based on the contents of a guests physical address space,
p2m_pod_emergency_sweep() could degrade into a linear memcmp() from 0 to
max_gfn, which runs non-preemptibly.
As p2m_pod_emergency_sweep() runs behind the scenes in a number of contexts,
making it preemptible is not feasible.
Instead, a different approach is taken. Recently-populated pages are eagerly
checked for reclaimation, which amortises the p2m_pod_emergency_sweep()
operation across each p2m_pod_demand_populate() operation.
Note that in the case that a 2M superpage can't be reclaimed as a superpage,
it is shattered if 4K pages of zeros can be reclaimed. This is unfortunate
but matches the previous behaviour, and is required to avoid regressions
(domain crash from PoD exhaustion) with VMs configured close to the limit.
This is CVE-2015-7970 / XSA-150.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Thu, 29 Oct 2015 12:35:07 +0000 (13:35 +0100)]
x86: guard against undue super page PTE creation
When optional super page support got added (commit bd1cd81d64 "x86: PV
support for hugepages"), two adjustments were missed: mod_l2_entry()
needs to consider the PSE and RW bits when deciding whether to use the
fast path, and the PSE bit must not be removed from L2_DISALLOW_MASK
unconditionally.
This is CVE-2015-7835 / XSA-148.
Reported-by: "栾尚聪(好风)" <shangcong.lsc@alibaba-inc.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org>
Ian Campbell [Thu, 29 Oct 2015 12:34:17 +0000 (13:34 +0100)]
arm: handle races between relinquish_memory and free_domheap_pages
Primarily this means XENMEM_decrease_reservation from a toolstack
domain.
Unlike x86 we have no requirement right now to queue such pages onto
a separate list, if we hit this race then the other code has already
fully accepted responsibility for freeing this page and therefore
there is no more for relinquish_memory to do.
This is CVE-2015-7814 / XSA-147.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Julien Grall [Thu, 29 Oct 2015 12:31:10 +0000 (13:31 +0100)]
arm: Support hypercall_create_continuation for multicall
Multicall for ARM has been supported since commit f0dbdc6 "xen: arm: fully
implement multicall interface.". Although, if an hypercall in multicall
requires preemption, it will crash the host:
Julien Grall [Thu, 29 Oct 2015 11:24:13 +0000 (12:24 +0100)]
sched-rt: avoid to shadow the variable "svc" in rt_dom_cntl
The variable "svc" is declared twice within rt_dom_cntl. However, the
top declaration could be re-used avoiding re-declaring another time the
variable.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Dario Faggioli <dario.faggioli@citrix.com>
Jan Beulich [Tue, 27 Oct 2015 15:34:29 +0000 (16:34 +0100)]
x86/mm: don't call HVM-only function for PV guests
Somehow I managed to drop the HVM dependency from v2 to v3 of what
became commit 5c23c760a8 ("x86/HVM: correct page dirty marking in
hvm_map_guest_frame_rw()"), obviously breaking migration of PV guests.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Julien Grall [Tue, 27 Oct 2015 13:47:01 +0000 (14:47 +0100)]
mm: unmap page for direct mapped domain on decrease reservation
Direct mapped domain needs to retrieve the exact same underlying
physical page when the region is re-populated.
Currently, when the memory reservation for this domain is decreased, the
request is just ignored and the page stayed mapped in the P2M. However,
this make more difficult to spot issue when the domain has not yet mapped
foreign page but trying to access the region.
What we really care for direct mapped domain is to not give back the
page to the allocator. So we can re-enable to direct mapped when the guest
memory region is re-populated.
The rest of the process to remove a page can be safely done. This
also ensures us to stay close to the normal domain memory handling.
At the same time, drop the trailing whitespaces around the code
modified.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Tue, 27 Oct 2015 13:46:12 +0000 (14:46 +0100)]
x86/PV: don't zero-map LDT
This effectvely reverts the LDT related part of commit cf6d39f819
("x86/PV: properly populate descriptor tables"), which broke demand
paged LDT handling in guests.
Reported-by: David Vrabel <david.vrabel@citrix.com> Diagnosed-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> Tested-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 27 Oct 2015 10:46:35 +0000 (11:46 +0100)]
x86/mm: only a single instance of gw_page_flags[] is needed
None of its elements depends on GUEST_PAGING_LEVELS.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 27 Oct 2015 10:46:05 +0000 (11:46 +0100)]
x86/mm: build map_domain_gfn() just once
It doesn't depend on GUEST_PAGING_LEVELS. Moving the function to p2m.c
at once allows a bogus #define/#include pair to be removed from
hap/nested_ept.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 27 Oct 2015 10:44:52 +0000 (11:44 +0100)]
x86/mm: override stored file names for multiply built sources
To make it possible to tell apart the static symbols therein, use their
object file names instead of their source ones.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 27 Oct 2015 10:44:20 +0000 (11:44 +0100)]
use clear_domain_page() instead of open coding it
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@citrix.com>
Jan Beulich [Tue, 27 Oct 2015 10:42:04 +0000 (11:42 +0100)]
x86/HVM: correct page dirty marking in hvm_map_guest_frame_rw()
Rather than dirtying a page when establishing a (permanent) mapping,
dirty it when the page gets unmapped, or - if still mapped - on the
final iteration of a save operation (or in other cases where the guest
is paused or already shut down). (Transient mappings continue to get
dirtied upon getting mapped, to avoid the overhead of tracking.)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Julien Grall [Mon, 26 Oct 2015 12:58:35 +0000 (13:58 +0100)]
x86/mm: pod: use the correct memory flags for alloc_domheap_page{,s}
The last parameter of alloc_domheap_page{s,} contain the memory flags and
not the order of the allocation.
Use 0 for the call in p2m_pod_set_cache_target as it was before 1069d63c5ef2510d08b83b2171af660e5bb18c63 "x86/mm/p2m: use defines for
page sizes". Note that PAGE_ORDER_4K is also equal to 0 so the behavior
stays the same.
For the call in p2m_pod_offline_or_broken_replace we want to allocate
the new page on the same numa node as the previous page. So retrieve the
numa node and pass it in the memory flags.
Ian Jackson [Thu, 22 Oct 2015 15:39:12 +0000 (16:39 +0100)]
libxl: Do not call assert() in signal handlers
assert is not async-signal-safe.
In practice the effect of calling assert there is that if the
assertion fails we might get a secondary crash, or other undesirable
behaviour from stdio (which is how assert usually reports failures).
Mention in a comment in libxl__self_pipe_wakeup that it has to be
async-signal-safe.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 12 Oct 2015 15:39:11 +0000 (16:39 +0100)]
xen/arm: Add support of PSCI v1.0 for the host
From Xen's point of view, PSCI v0.2 and PSCI v1.0 are very similar. All
the PSCI calls used within Xen (PSCI_VERSION, CPU_ON, SYSTEM_OFF and
SYSTEM_RESET) behave exactly the same.
Furthermore, based on the spec (5.3.1 DEN0022C), any 1.y version must be
compatible with 1.x when y > x for any functions existing in 1.x.
So check the presence of the new compatible string [1] and allow Xen to
boot on any platform using PSCI 1.x.
We are currently using a per-platform quirk to know if the 2 4KB region of
the GIC CPU interface are each aligned to 64KB. Although, it may be
possible to have different layout on a same platform (depending on the
firmware version).
Rather than having a quirk it's possible to detect by reading the GIC
memory. This patch is based from the Linux commit "irqchip/GIC: Add workaround
for aliased GIC400" [1].
Take the opportunity to clean up the GICv2 of code which was only
required because of the quirk.
Note that none of the platform using the gic-hip04 were actually using
the quirk, so the code has been dropped. I will let the maintainers
decide whether it's relevant or not to add proper detection for aliased
GIC for this hardware.
The GICv2 architecture mandates that the two 4kB GIC regions are
contiguous, and on two separate physical pages (so that access to
the second page can be trapped by a hypervisor). This doesn't work
very well when PAGE_SIZE is 64kB.
A relatively common hack^Wway to work around this is to alias each
4kB region over its own 64kB page. Of course in this case, the base
address you want to use is not really the begining of the region,
but base + 60kB (so that you get a contiguous 8kB region over two
distinct pages).
Normally, this would be described in DT with a new property, but
some HW is already out there, and the firmware makes sure that
it will override whatever you put in the GIC node. Duh. And of course,
said firmware source code is not available, despite being based
on u-boot.
The workaround is to detect the case where the CPU interface size
is set to 128kB, and verify the aliasing by checking that the ID
register for GIC400 (which is the only GIC wired this way so far)
is the same at base and base + 0xF000. In this case, we update
the GIC base address and let it roll.
And if you feel slightly sick by looking at this, rest assured that
I do too...
Reported-by: Julien Grall <julien.grall@citrix.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Stuart Yoder <stuart.yoder@freescale.com> Cc: Pavel Fedin <p.fedin@samsung.com> Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/1442142873-20213-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Thu, 8 Oct 2015 18:23:52 +0000 (19:23 +0100)]
xen/arm: gic: Check the size of the CPU and vCPU interface retrieved from DT
The size of the CPU interface will be used in a follow-up patch to map the
region in Xen memory.
Based on GICv2 spec, the CPU interface should at least be 8KB, although
most of the platform we are supporting use incorrectly the GICv1 size
(i.e 4KB) in their DT. Only warn and update the size to avoid any
breakage on these platforms.
Furthermore, Xen is relying on the fact that the Virtual CPU interface
is at least 8KB. As in reality the Virtual CPU interface matches the CPU
interface, check that the 2 interfaces have the same size.
For GICv3, vGICv2 is only available for guest. So we only need to check
that the GICV is at least 8KB.
Julien Grall [Thu, 8 Oct 2015 18:23:51 +0000 (19:23 +0100)]
xen/arm: vgic-v2: Report the correct GICC size to the guest
The GICv2 DT node is usually used by the guest to know the address/size
of the regions (GICD, GICC...) to map into their virtual memory.
While the GICv2 spec requires the size of the GICC to be 8KB, we
correctly do an 8KB stage-2 mapping but erroneously report 256 in the
device tree (based on GUEST_GICC_SIZE).
I bet we didn't see any issue so far because all the registers except
GICC_DIR lives in the first 256 bytes of the GICC region and all the
guests I have seen so far are driving the GIC with GICC_CTLR.EIOmode =
0.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- fixed some typos in commit message ]
Wei Liu [Tue, 6 Oct 2015 16:57:26 +0000 (17:57 +0100)]
tools/python: remove broken xl binding
Various people say this binding doesn't compile or doesn't work. Remove
it for the benefit of xl feature development -- so that new features
won't need to worry about making this broken binding happy.
This isn't going to expose any user visible changes because that module
is not built by default.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Zhigang Wang <zhigang.x.wang@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 19 Oct 2015 12:58:00 +0000 (13:58 +0100)]
xen/device-tree: Print the DT path on error in dt_for_each_range
With the current log is not possible for the user to understand
properly the error:
(XEN) Grant table range: 0x0000007fc00000-0x0000007fc72000
(XEN) DT: no ranges; cannot enumerate
(XEN) Device tree generation failed (-22).
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Could not set up DOM0 guest OS
(XEN) ****************************************
(XEN)
The other error message within the function already print the DT path.
Do the same here.
Juergen Gross [Thu, 8 Oct 2015 15:23:47 +0000 (17:23 +0200)]
libxc: remove superpages option for pv domains
The pv domain builder currently supports the additional flag
"superpages" to build a pv domain with 2MB pages. This feature isn't
being used by any component other than the python xc bindings.
Remove the flag and its support from the xc bindings and the domain
builder
Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
He Chen [Mon, 19 Oct 2015 07:31:55 +0000 (15:31 +0800)]
tools & docs: add tools and docs support for Intel CDP
This is the xl/xc changes to support Intel Code/Data Prioritization.
CAT xl commands to set/get CBMs are extended to support CDP.
Add new CDP options with CAT commands in xl interface man page.
Add description of CDP in xl-psr.markdown.
Signed-off-by: He Chen <he.chen@linux.intel.com> Reviewed-by: Chao Peng <chao.p.peng@linux.intel.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Ian Campbell [Fri, 16 Oct 2015 10:33:12 +0000 (11:33 +0100)]
tools: libxl: CODING_STYLE: GC* cannot be used with NOGC
GC* assume an existing gc in scope, which means they can't be passed
NOGC. Instead recommend the use of the underlying functions with NOGC,
noting that this is excepitonal.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
[ ijc -- refer to libxl__calloc not (nonexistent) libxl__alloc ]
Jan Beulich [Wed, 21 Oct 2015 08:56:31 +0000 (10:56 +0200)]
x86/shadow: drop stray name tags from sh_{guest_get,map}_eff_l1e()
They (as a now being removed comment validly says) depend only on Xen's
number of page table levels, and hence their tags didn't serve any
useful purpose (there could only ever be one instance in a single
binary, even back in the x86-32 days).
Further conditionalize the inclusion of PV-specific hook pointers, at
once making sure that PV guests can't ever get other than 4-level mode
enabled for them.
For consistency reasons shadow_{write,cmpxchg}_guest_entry() also get
moved next to the other PV-only actors, allowing them to become static
just like the $subject ones do.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org>
Jan Beulich [Wed, 21 Oct 2015 08:53:35 +0000 (10:53 +0200)]
x86/HVM: prefix both instances of enable_intr_window()
... to tell them apart by their names even without further context.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Aravind Gopalakrishnan<Aravind.Gopalakrishnan@amd.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Wed, 21 Oct 2015 08:52:28 +0000 (10:52 +0200)]
x86: don't build platform hypercall helpers multiple times
... to eliminate the resulting duplicate symbols. This includes
dropping an odd per-CPU variable left from 32-bit days: Now that we
only care about 64-bit builds, converting the uint64_t needing
passing to a void pointer is no problem anymore.
Since the COMPAT handling section needs to be re-organized for this
anyway, also adjust a few other shortcomings (like declarations not
being visible at the point of the respective definition, risking both
to get out of sync).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Kai Huang [Wed, 21 Oct 2015 08:49:54 +0000 (10:49 +0200)]
x86/vmx: fix coding style of PML functions
According to Jan's comments, also fix the coding style of for_each_vcpu in
existing PML functions.
Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Kai Huang [Wed, 21 Oct 2015 08:49:16 +0000 (10:49 +0200)]
x86/ept: defer enabling of EPT A/D bit until PML get enabled
Existing PML implementation turns on EPT A/D bit unconditionally if PML is
supported by hardware. This works but enabling of EPT A/D bit can be deferred
until PML get enabled. There's no point in enabling the extra feature for every
domain when we're not meaning to use it (yet).
Also added ASSERT of domain having been paused to ept_flush_pml_buffers to make
it consistent with ept_enable{disable}_pml.
Sanity live migration and GUI display were tested on Broadwell Machine.
Suggested-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Kai Huang <kai.huang@linux.intel.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Fri, 16 Oct 2015 15:49:51 +0000 (17:49 +0200)]
x86/Centaur: drop __init annotations
Commit 6f8f53cc64 ("x86 cpu: Fix bug: unify cpu_dev attr as
__cpuinitdata") fixed centaur_cpu_dev's annotation without also fixing
the pointers hanging off of it. Even if CPU hotplig support may be
purely theoretical for Centaur, we should still not leave this as is.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>