Andrew Cooper [Sat, 7 Feb 2015 13:54:47 +0000 (13:54 +0000)]
x86/shadow: Change the gating of shadow heuristics
Each of these functions will have their vcpu paramters replaced with domain
parameters because they are part of domain-generic rather than vcpu specific
codepaths, which means that the use of 'v' will have to change. 'current' can
be used to obtain a vcpu when in an appropriate context.
The 'curr->domain == d' test is less restrictive than 'v == current'. The end
result is still safe as the code still only runs in the context of the correct
domain, but is now valid to run in cases where previously 'v' was some other
vcpu in the same domin.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Julien Grall [Fri, 30 Jan 2015 18:49:18 +0000 (18:49 +0000)]
xen/dt: Extend dt_device_match to possibly store data
Some drivers may want to configure differently the device depending on
the compatible string. For this purpose, add a new field in the
dt_device_match to store the data.
Also modify the return type of dt_match_node to return the matching
structure.
Julien Grall [Mon, 16 Feb 2015 14:50:55 +0000 (14:50 +0000)]
xen/arm: gic-v3: Update some comments in the code
- Drop wrong comment about the default stride. It's not always 2 * SZ_64K.
When the re-distributor support VLPIs (from GICv4), the default
stride is 4 * SZ_64K
- Explain why SZ_64K * 2
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 16 Feb 2015 14:50:54 +0000 (14:50 +0000)]
xen/arm: vgic: Drop iactive, ipend, pendsgi field
The current VGIC code doesn't support to change the pending and active status
of an IRQ via the (re-)distributor.
Futhermore, all the access size wasn't support correctly and some
registers was implemented as write-ignore. The latter make very
difficult for a kernel developer to find that we don't support R/W to
those registers.
Make the support consistent:
- read will return 0 (RAZ)
- write will print an error and inject a data abort to the guest
Also, those fields was never set and field such as ipend and pendsgi was
doing the same jobs.
Rather than wasting memory, we should better drop it. We could re-introduce
them if we need it when the support will be made.
All the GICv2 registers are word-accessible. Some them are also
byte-accessible (see GICD_IPRIORITYR*).
Those registers are incorrectly implemented when they should be RAZ. Only
word-access size are currently allowed for them.
To avoid further issues, introduce different label following the access-size
of the registers:
- read_as_zero_32 and write_ignore_32: Used for registers accessible
via a word.
- read_as_zero: Used when we don't have to check the access size.
The latter is used when the access size has already been checked in the
register emulation and/or when the register offset is reserved/implementation
defined.
Note that, only used labels has been introduced.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 16 Feb 2015 14:50:49 +0000 (14:50 +0000)]
xen/arm: vgic-v3: Clarify which distributor is used in the common emulation
The messages in the common emulation doesn't specify which distributor
(redistributor or distributor) is used. This make difficult to find the
correct registers.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 16 Feb 2015 14:50:48 +0000 (14:50 +0000)]
xen/arm: vgic-v3: Emulate correctly the re-distributor
There is a one-to-one mapping between each re-distributors and processors.
Each re-distributors can be accessed by any processor at any time. For
instance during the initialization of the GIC, the drivers will browse
the re-distributor to find the one associated to the current processor
(via GICR_TYPER). So each re-distributor has its own MMIO region.
The current implementation of the vGICv3 emulation assumes that the
re-distributor region is banked. Therefore, the processor won't be able
to access other re-distributor. While this is working fine for Linux, a
processor will only access GICR_TYPER to find the associated re-distributor,
we have to implement correctly the re-distributor emulation in order to boot
other operating systems.
All emulated registers of the re-distributors take a vCPU in parameter
and necessary lock. Therefore concurrent access is already properly handled.
The missing bit is retrieving the right vCPU following the region accessed.
Retrieving the right vCPU could be slow, so it has been divided in 2 paths:
- fast path: The current vCPU is accessing its own re-distributor
- slow path: The current vCPU is accessing another re-distributor
As the processor needs to initialize itself, the former case is very
common. To handle the access quickly, the base address of the
re-distributor is computed and stored per-vCPU during the vCPU initialization.
The latter is less common and more complicate to handle. The re-distributors
can be spread across multiple regions in the memory.
During the domain creation, Xen will browse those regions to find the first
vCPU handled by this region.
When an access hits the slow path, Xen will:
1) Retrieve the region using the base address of the re-distributor
accessed
2) Find the vCPU ID attached to the redistributor
3) Check the validity of the vCPU. If it's not valid, a data abort
will be injected to the guest
Finally, this patch also correctly support the bit GICR_TYPER.LAST which
indicates if the redistributor is the last one of the contiguous region.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 16 Feb 2015 14:50:46 +0000 (14:50 +0000)]
xen/arm: vgic-v3: Set stride during domain initialization
The stride may not be set if the hardware GIC is using the default
layout. It happens on the Foundation model.
On GICv3, the default stride is 2 * 64K. Therefore it's possible to avoid
checking at every redistributor MMIO access if the stride is not set.
Because domU uses a static stride configuration this only happens for
dom0, so we can move this code in gicv_v3_init. Take the opportunity to move
the stride setting a bit earlier because the loop to set regions will require
the stride.
Also, use 2 * 64K rather than 128K and explain the reason.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Some of the registers are accessible via multiple size (see GICD_IPRIORITYR*).
Those registers are incorrectly implemented when they should be RAZ. Only
word-access size are currently allowed for them.
The paragraph 5.3.1 in the GICv3 spec (PRD03-GENC-010745 24.0) indicates
the different access-sizes supported for each register.
The current vGICv3 driver is not ready for 32 bits guest and will
require some rework. So, for now, only supporting access-size of a system not
supporting aarch32.
To avoid further issues, introduce different label following the access-size
of the registers:
- read_as_zero_64 and write_ignore_64: Used for registers accessible
via a double-word.
- read_as_zero_32 and write_ignore_32: Used for registers accessible
via a word.
- read_as_zero: Used when we don't have to check the access size.
The latter is used when the access size has already been checked in the
register emulation and/or when the register offset is
reserved/implementation defined.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 16 Feb 2015 14:50:42 +0000 (14:50 +0000)]
xen/arm: vgic-v3: Correctly set GICD_TYPER.CPUNumber
On GICv3, the value (CPUNumber + 1) indicates the number of processor that may
be used as interrupts targets when ARE bit is zero. The maximum is 8
processors.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Mon, 16 Feb 2015 14:50:41 +0000 (14:50 +0000)]
xen/arm: vgic-v3: Correctly set GICD_TYPER.IDbits
From Linux 3.19, the GICv3 drivers is using GICD_TYPER.IDbits to check
the validity of the hardware interrupt number.
The field IDBits in the register GICD_TYPER is used to know the number of
interrupt identifiers (SPI, PPIs, SGIs, LPIs) supported by GIC Stream Protocol
Interface.
This field contains the number of interrupt identifier bits minus one.
Signed-off-by: Julien Grall <julien.grall@linaro.org> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Thu, 19 Feb 2015 12:43:57 +0000 (12:43 +0000)]
tools/libxl: Do not use remus teardown paths for non-remus guests
It causes a suspend failure to emit
libxl: error: libxl_dom.c:2035:remus_teardown_done: Remus: failed to
teardown device for guest with domid 17, rc -3
for all domains, including those not using remus at all.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell [Wed, 18 Feb 2015 17:01:55 +0000 (17:01 +0000)]
xen: arm64: more useful logging on bad trap.
Dump the register state before panicing so we have some clue where the
issue occurred. Also decode the ESR register a bit to save having to
grab a pen and paper.
ESR_EL2 is a 32-bit register, so use SYSREG_READ32 not ..._READ64, as
we already do correctly in the main trap handler.
While here notice that do_trap_serror is never called and remove it.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Julien Grall <julien.grall@linaro.org> Tested-by: Jintack Lim <jintack@cs.columbia.edu> Cc: jintack@cs.columbia.edu
[ ijc -- add missing \n to first printk ]
Olaf Hering [Wed, 11 Feb 2015 15:00:44 +0000 (16:00 +0100)]
tools: require at least pixman 0.21.8 for qemu-xen
Avoid late build failure in openSUSE 11.4, it has just pixman-0.20:
....
[ 211s] ERROR: pixman >= 0.21.8 not present. Your options:
[ 211s] (1) Preferred: Install the pixman devel package (any recent
[ 211s] distro should have packages as Xorg needs pixman too).
[ 211s] (2) Fetch the pixman submodule, using:
[ 211s] git submodule update --init pixman
....
Signed-off-by: Olaf Hering <olaf@aepfle.de> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
In 674ad2b (xenstore: extend the xenstore ring with a 'closing' signal)
two uses of uint32 are added to tools/ocaml/libs/xb/xs_ring_stubs.c .
As of ocaml 4.03.0+dev the uint32 type is no longer supported. This patch
replaces the uses of uint32 with uint32_t .
Signed-off-by: Michael Young <m.a.young@durham.ac.uk> Acked-by: David Scott <dave.scott@citrix.com>
Andrew Cooper [Wed, 18 Feb 2015 16:02:18 +0000 (17:02 +0100)]
x86: adjust rdtsc inline assembly
Currently there are three related rdtsc macros, all of which are lowercase and
not obviously macros, which write by value to their parameters.
This is non-intuitive to program which, being contrary to C semantics for code
appearing to be a regular function call. It is also causes Coverity to
conclude that __udelay() has an infinite loop, as all of its loop conditions
are constant.
Two of these macros (rdtsc() and rdtscl()) have only a handful of uses while
the vast majority of code uses the rdtscll() variant. rdtsc() and rdtscll()
are equivalent, while rdtscl() discards the high word.
Replace all 3 macros with a static inline which returns the complete tsc.
Most of this patch is a mechanical change of
- rdtscll($FOO);
+ $FOO = rdtsc();
And a diff of the generated assembly confirms that this is no change at all.
The single use of the old rdtsc() in emulate_privileged_op() is altered to use
the new rdtsc() and the rdmsr_writeback path to set eax/edx appropriately.
The pair of use of rdtscl() in __udelay() are extended to use full 64bit
values, which makes the overflow edge condition (and early exit from the loop)
far rarer.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 18 Feb 2015 15:57:02 +0000 (16:57 +0100)]
domctl: do away with tool stack based retrying
XEN_DOMCTL_destroydomain so far is being special cased in libxc to
reinvoke the operation when getting back EAGAIN. Quite a few other
domctl-s have gained continuations, so I see no reason not to use them
here too.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Wed, 18 Feb 2015 15:55:17 +0000 (16:55 +0100)]
introduce and use relaxed cpumask bitops
Using atomic (LOCKed on x86) bitops for certain of the operations on
cpumask_t is overkill when the variables aren't concurrently accessible
(e.g. local function variables, or due to explicit locking). Introduce
alternatives using non-atomic bitops and use them where appropriate.
Note that this
- adds a volatile qualifier to cpumask_test_and_{clear,set}_cpu()
(should have been there from the beginning, like is the case for
cpumask_{clear,set}_cpu())
- replaces several cpumask_clear()+cpumask_set_cpu(, n) pairs by the
simpler cpumask_copy(, cpumask_of(n)) (or just cpumask_of(n) if we
can do without copying)
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Tue, 17 Feb 2015 13:36:26 +0000 (14:36 +0100)]
xen/Coverity: audit of MISSING_BREAK defects
Coverity uses several heuristics to identify when one case statement
legitimately falls through into the next, and a comment as the final item in a
case statement is one heuristic (the assumption being that it is a
justification for the fallthrough).
Use this to perform an audit of defects and hide the legitimate fallthroughs.
No functional change. All identified fallthroughs are legitimate.
Elena Ufimsteva [Tue, 17 Feb 2015 13:33:11 +0000 (14:33 +0100)]
x86: dump vNUMA information with debug key 'u'
Signed-off-by: Elena Ufimsteva <ufimtseva@gmail.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Dario Faggioli <dario.faggioli@citrix.com>
Andrew Cooper [Tue, 17 Feb 2015 13:32:37 +0000 (14:32 +0100)]
x86/shadow: introduce 'd' pointers and clean up use of 'v->domain'
All of the introduced domain pointers will eventually be removed, but doing
this mechanical cleanup here allows the subsequent patches which change
function prototypes to be smaller and more clear.
In addition, swap some use of is_pv_32on64_vcpu(v) for is_pv_32on64_domain(d).
No functional change.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Ian Jackson [Fri, 13 Feb 2015 16:04:34 +0000 (16:04 +0000)]
tools/configure: detect $host_vendor of rumprun, not just rumpxen
This has been renamed by the rumpkernels upstream.
(This patch needs to be backported.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Antti Kantee <pooka@iki.fi> CC: Martin Lucina <martin@lucina.net> CC: Ian Campbell <Ian.Campbell@eu.citrix.com>
Don Slutz [Wed, 11 Feb 2015 16:21:14 +0000 (17:21 +0100)]
x86/HVM: do not retry in hvmemul_do_io() if no ioreq server exists for this I/O
This saves a VMENTRY and a VMEXIT since we no longer retry the
ioport read on backing DM not handling a given ioreq.
There are 2 case about "no ioreq server exists for this I/O":
1) No ioreq servers (PVH case)
2) No ioreq servers for this I/O (non PVH case)
The routine hvm_has_dm() used to check for the empty list, the PVH
case (#1).
By changing from hvm_has_dm() to hvm_select_ioreq_server() both
cases are considered. Doing it this way allows
hvm_send_assist_req() to only have 2 possible return values.
The key part of skipping the retry is to do "rc = X86EMUL_OKAY"
which is what the error path on the call to hvm_has_dm() does in
hvmemul_do_io() (the only call on hvm_has_dm()).
Since this case is no longer handled in hvm_send_assist_req(), move
the call to hvm_complete_assist_req() into hvmemul_do_io().
As part of this change, do the work of hvm_complete_assist_req() in
the PVH case. Acting more like real hardware looks to be better.
Since hvm_select_ioreq_server() has already been called, switch to
using hvm_send_assist_req_to_ioreq_server().
Since there is no longer any calls to hvm_send_assist_req(), drop
that routine and rename hvm_send_assist_req_to_ioreq_server() to
hvm_send_assist_req.
Since hvm_send_assist_req() is an extern, add an ASSERT() on s.
Signed-off-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Andrew Cooper [Wed, 11 Feb 2015 16:18:27 +0000 (17:18 +0100)]
x86/nmi: fix shootdown of pcpus running in VMX non-root mode
c/s 7dd3b06ff "vmx: fix handling of NMI VMEXIT" fixed one issue but
inadvertently introduced a regression when it came to the NMI shootdown. The
shootdown code worked by patching vector 2 in each IDT, but the introduced
direct call to do_nmi() bypassed this.
Instead of patching each IDT, take a different approach by updating the
existing dispatch table. This allows for the removal of the remote IDT
patching and the removal of the nmi_crash() entry point.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Paul Durrant [Tue, 10 Feb 2015 12:29:51 +0000 (13:29 +0100)]
x86/hvm: explicitly mark ioreq server pages dirty
...when they are added back into the guest physmap, when an ioreq
server is disabled. If this is not done then the pages are missed
during migration, causing ioreq server creation to fail on the remote end.
This problem only manifests if the ioreq server is non-default because in
the default case the pages are never removed from the guest physmap.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Paul Durrant [Tue, 10 Feb 2015 12:28:40 +0000 (13:28 +0100)]
x86/hvm: wait for at least one ioreq server to be enabled
In the case where a stub domain is providing emulation for an HVM
guest, there is no interlock in the toolstack to make sure that
the stub domain is up and running before the guest is unpaused.
Prior to the introduction of ioreq servers this was not a problem,
since there was only ever one emulator so ioreqs were simply
created anyway and the vcpu remained blocked until the stub domain
started and picked up the ioreq.
Since ioreq servers allow for multiple emulators for a single guest
it's not possible to know a priori which emulator will handle a
particular ioreq, so emulators must attach to a guest before the
guest runs.
This patch works around the lack of interlock in the toolstack for
stub domains by keeping the domain paused until at least one ioreq
server is created and enabled, which in practice means the stub
domain is indeed up and running.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Ian Jackson [Mon, 9 Feb 2015 15:34:40 +0000 (15:34 +0000)]
libxl: More probably detect reentry of destroyed ctx
In libxl_ctx_free:
1. Move the GC_INIT earlier, so that we can:
2. Take the lock around most of the work. This is technically
unnecessary since calling any other libxl entrypoint on a ctx being
passed to libxl_ctx_free risks undefined behaviour. But, taking
the lock allows us to much more usually spot this.
3. Increment osevent_in_hook by 1000. If we are reentered after
destroy, this will trip some of our entrypoints' asserts. It also
means that if we crash for some other reason relating to reentering
a destroyed ctx, the cause will be more obviously visible by
examining ctx->osevent_in_hook (assuming that the memory previously
used for the ctx hasn't been reused and overwritten).
4. Free the lock again. (pthread_mutex_destroy requires that the
mutex be unlocked.)
With this patch, I find that an occasional race previously reported
as:
libvirtd: libxl_internal.h:3265: libxl__ctx_unlock: Assertion `!r' failed.
is now reported as:
libvirtd: libxl_event.c:1236: libxl_osevent_occurred_fd: Assertion `!libxl__gc_owner(gc)->osevent_in_hook' failed.
Examining the call trace with gdb shows this:
(gdb) bt
#0 0xb773f424 in __kernel_vsyscall ()
#1 0xb7022871 in raise () from /lib/i386-linux-gnu/i686/nosegneg/libc.so.6
#2 0xb7025d62 in abort () from /lib/i386-linux-gnu/i686/nosegneg/libc.so.6
#3 0xb701b958 in __assert_fail () from /lib/i386-linux-gnu/i686/nosegneg/libc.so.6
#4 0xb6f00390 in libxl_osevent_occurred_fd (ctx=0xb84813a8, for_libxl=0xb84d6848, fd=31, events_ign=0, revents_ign=1) at libxl_event.c:1236
#5 0xb1b70464 in libxlDomainObjFDEventCallback () from /usr/local/lib/libvirt/connection-driver/libvirt_driver_libxl.so
#6 0xb72163b1 in virEventPollDispatchHandles () from /usr/local/lib/libvirt.so.0
#7 0xb7216be5 in virEventPollRunOnce () from /usr/local/lib/libvirt.so.0
#8 0xb7214a7e in virEventRunDefaultImpl () from /usr/local/lib/libvirt.so.0
#9 0xb77c7b98 in virNetServerRun ()
#10 0xb7771c63 in main ()
(gdb) print ctx->osevent_in_hook
$2 = 1000
(gdb)
which IMO demonstrates that libxl_osevent_occurred_fd is being called
on a destroyed ctx.
This is probably a symptom of the bug in libvirt fixed by these
patches:
https://www.redhat.com/archives/libvir-list/2015-February/msg00024.html
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Mon, 9 Feb 2015 15:20:32 +0000 (15:20 +0000)]
libxl: event handling: ao_inprogress does waits while reports outstanding
libxl__ao_inprogress needs to check (like
libxl__ao_complete_check_progress_reports) that there are no
oustanding progress callbacks.
Otherwise it might happen that we would destroy the ao while another
thread has an outstanding callback its egc report queue. The other
thread would then, in its egc_run_callbacks, touch the destroyed ao.
Instead, when this happens in libxl__ao_inprogress, simply run round
the event loop again. The thread which eventually makes the callback
will spot our poller in the ao, and notify the poller, waking us up.
This fixes an assertion failure race seen with libvirt:
libvirtd: libxl_event.c:1792: libxl__ao_complete_check_progress_reports: Assertion `ao->in_initiator' failed.
or (after "Add an assert to egc_run_callbacks")
libvirtd: libxl_event.c:1338: egc_run_callbacks: Assertion `aop->ao->magic == 0xA0FACE00ul' failed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Mon, 9 Feb 2015 15:18:30 +0000 (15:18 +0000)]
libxl: event handling: Break out ao_work_outstanding
Break out the test in libxl__ao_complete_check_progress_reports, into
ao_work_outstanding, which reports false if either (i) the ao is still
ongoing or (ii) there is a progress report (perhaps on a different
thread's callback queue) which has yet to be reported to the
application.
No functional change.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Mon, 9 Feb 2015 15:10:11 +0000 (15:10 +0000)]
libxl: event handling: Add an assert to egc_run_callbacks
Check that the ao is still live when we are about to running some of
its callbacks.
This reveals an existing bug in libxl which is exercised by libvirt,
converting
libvirtd: libxl_event.c:1792: libxl__ao_complete_check_progress_reports: Assertion `ao->in_initiator' failed.
into
libvirtd: libxl_event.c:1338: egc_run_callbacks: Assertion `aop->ao->magic == 0xA0FACE00ul' failed.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Ian Campbell <ian.campbell@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Jim Fehlig <jfehlig@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Thu, 5 Feb 2015 16:28:56 +0000 (16:28 +0000)]
tools: work around collision of -O0 and -D_FORTIFY_SOURCE
Some systems have python-config include -D_FORTIFY_SOURCE in the
CFLAGS. But -D_FORTIFY_SOURCE does not (currently) work with -O0, and
-O0 is enabled in debug builds (since 1166ecf781). As a result, on
those systems, debug builds fail.
Work around this problem as follows:
* In configure, detect -D_FORTIFY_SOURCE in $(python-config --cflags)
* If detected, set the new autoconf substitution and make variable
PY_NOOPT_CFLAGS to -O1.
* In tools/Rules.mk, where we add -O0, also add PY_NOOPT_CFLAGS
(which will override the -O0 with -O1 if required).
Overriding the -O0 is better than disabling Fortify because the
latter might have an adverse security impact. A user who wants to
disable optimisation completely even for Python and also disable
Fortify can set the environment variable
EXTRA_CFLAGS_XEN_TOOLS='-U_FORTIFY_SOURCE -O0'
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Reported-by: Jan Beulich <JBeulich@suse.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> CC: Jan Beulich <JBeulich@suse.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Euan Harris <euan.harris@citrix.com> CC: Wei Liu <wei.liu2@citrix.com> CC: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Tested-by: Don Slutz <dslutz@verizon.com>
Lasse Collin [Thu, 5 Feb 2015 13:01:09 +0000 (14:01 +0100)]
common/xz: add comments for the intentionally missing break statements
Signed-off-by: Lasse Collin <lasse.collin@tukaani.org>
[Linux commit 84d517f3e56f7d0d305c14a701cee8f7372ebe1e] Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: IAn Campbell <ian.campbell@citrix.com>
Liang Li [Thu, 5 Feb 2015 12:59:48 +0000 (13:59 +0100)]
x86: avoid needless EPT table ajustment and cache flush
When a guest change it's MTRR MSRs, ajusting EPT table and flushing
cache are needed only when guest has IOMMU device, using need_iommu(d)
can minimize the impact to guest with device assigned, since a guest
may be hot plugged with a device thus there may be dirty cache lines
before need_iommu(d) becoming true, force the p2m_memory_type_changed
and flush_all when the first device is assigned to guest to amend this
issue.
Signed-off-by: Liang Li <liang.z.li@intel.com> Signed-off-by: Yang Zhang <yang.z.zhang@intel.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Jan Beulich [Wed, 4 Feb 2015 16:07:48 +0000 (16:07 +0000)]
libvchan: address compiler warnings
Both vchan_wr() and stdout_wr() should be defined with a non-empty
argument list (i.e. void). Additionally both of them as well as usage()
should be static to make clear that no other code is referencing them.
Further, statements should follow declarations.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Mike Latimer [Fri, 30 Jan 2015 21:01:00 +0000 (14:01 -0700)]
libxl: Wait for ballooning if free memory is increasing
During domain startup, all required memory ballooning must complete
within a maximum window of 33 seconds (3 retries, 11 seconds of delay).
If not, domain creation is aborted with a 'failed to free memory' error.
In order to accommodate large domains or slower hardware (which require
substantially longer to balloon memory) the free memory process should
continue retrying if the amount of free memory is increasing on each
iteration of the loop.
Signed-off-by: Mike Latimer <mlatimer@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Tue, 3 Feb 2015 13:47:08 +0000 (13:47 +0000)]
rump kernels: use new platform macro
Starting from rump kernel changeset 91d5623 ("Renaming platform macros,
app-tools and autoconf target string"), __RUMPUSER_XEN__ and __RUMPAPP__
are deleted. We are supposed to use __RUMPRUN__ instead.
We still keep __RUMPUSER_XEN__ for now in order to make xen-unstable
pass osstest push gate. I will remove __RUMPUSER_XEN__ later.
Related discussion:
http://thread.gmane.org/gmane.comp.rumpkernel.user/739
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Jan Beulich [Tue, 3 Feb 2015 10:39:17 +0000 (11:39 +0100)]
x86: provide build time option to support up to 123Tb of memory
As this requires growing struct page_info from 32 to 48 bytes as well
as shrinking the always accessible direct mapped memory range from 5Tb
to 3.5Tb, this isn't being introduced as a general or default enabled
feature.
A side effect of the change to x86's mm.h is that asm/mm.h may no
longer be included directly. Hence in the few places where this was done,
xen/mm.h is being substituted (indirectly in the hvm/mtrr.h case).
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 3 Feb 2015 10:36:39 +0000 (11:36 +0100)]
x86/mm: allow for building without shadow mode support
Considering the complexity of the code, it seems to be a reasonable
thing to allow people to disable that code entirely even outside the
immediate need for this by the next patch.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tim Deegan [Tue, 3 Feb 2015 10:34:12 +0000 (11:34 +0100)]
x86/shadow: tidy up fragmentary page lists in multi-page shadows
Multi-page shadows are linked together using the 'list' field. When
those shadows are in the pinned list, the list fragments are spliced
into the pinned list; otherwise they have no associated list head.
Rework the code that handles these fragments to use the page_list
interface rather than manipulating the fields directly. This makes
the code cleaner, and allows the 'list' field to be either the
compact pdx form or a normal list_entry.
Signed-off-by: Tim Deegan <tim@xen.org> Tested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Introduce sh_terminate_list() and make it use LIST_POISON*.
Move helper array of shadow_size() into common.c.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org>
Boris Ostrovsky [Tue, 3 Feb 2015 10:30:40 +0000 (11:30 +0100)]
x86/VPMU: handle APIC_LVTPC accesses
Don't have the hypervisor update APIC_LVTPC when _it_ thinks the vector
should be updated. Instead, handle guest's APIC_LVTPC accesses and write what
the guest explicitly wanted (but only when VPMU is enabled).
This is updated version of commit 8097616fbdda that was reverted by cc3404093c85. Unlike the previous version, we don't update APIC_LVTPC
when VPMU is disabled to avoid interfering with NMI watchdog (which
runs only when VPMU is off).
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Kevin Tian <kevin.tian@intel.com>
Boris Ostrovsky [Tue, 3 Feb 2015 10:30:09 +0000 (11:30 +0100)]
x86/VPMU: disable when NMI watchdog is on
NMI watchdog sets APIC_LVTPC register to generate an NMI when PMU counter
overflow occurs. This may be overwritten by VPMU code later, effectively
turning off the watchdog.
We should disable VPMU when NMI watchdog is running.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Tue, 3 Feb 2015 10:25:47 +0000 (11:25 +0100)]
time: widen wallclock seconds to 64 bits
Linux is in the process of converting their seconds representation to
64 bits, so in order to support it consistently we should follow suit
(which at some point in quite a few years we'd have to do anyway). To
represent this in struct shared_info we leverage a 32-bit hole in
x86-64's and arm's variant of the structure; for x86-32 guests the only
(reasonable) choice we have is to put the extension in struct
arch_shared_info.
A note on the conditional suppressing the xen_wc_sec_hi helper macro
definition in the ix86 case for hypervisor and tools: Neither of the
two actually need this, and its presence causes the tools to fail to
build (due to the inclusion of both the x86-64 and x86-32 variants of
the header).
As a secondary change, x86's do_platform_op() gets a pointless
initializer as well as a pointless assignment of that same variable
dropped.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Tim Deegan <tim@xen.org> Acked-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Fri, 30 Jan 2015 14:11:14 +0000 (14:11 +0000)]
ocaml/xenctrl: Fix stub_xc_readconsolering()
The Ocaml stub to retrieve the hypervisor console ring had a few problems.
* A single 32k buffer would truncate a large console ring.
* The buffer was static and not under the protection of the Ocaml GC lock so
could be clobbered by concurrent accesses.
* Embedded NUL characters would cause caml_copy_string() (which is strlen()
based) to truncate the buffer.
The function is rewritten from scratch, using the same algorithm as the python
stubs, but uses the protection of the Ocaml GC lock to maintain a static
running total of the ring size, to avoid redundant realloc()ing in future
calls.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Dave Scott <dave.scott@eu.citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: David Scott <dave.scott@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Andrew Cooper [Wed, 28 Jan 2015 17:55:32 +0000 (17:55 +0000)]
ocaml/xenctrl: Make failwith_xc() thread safe
The static error_str[] buffer is not thread-safe, and 1024 bytes is
unreasonably large. Reduce to 256 bytes (which is still much larger than any
current use), and move it to being a stack variable.
Also, propagate the Noreturn attribute from caml_raise_with_string().
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Dave Scott <Dave.Scott@eu.citrix.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
Andrew Cooper [Wed, 28 Jan 2015 15:52:35 +0000 (15:52 +0000)]
tools/libxc: Don't leave scratch_pfn uninitialised if the domain has no memory
c/s 5b5c40c0d1 "libxc: introduce a per architecture scratch pfn for temporary
grant mapping" accidentally an issue whereby there were two paths out of
xc_core_arch_get_scratch_gpfn() which returned 0, but only one of which
assigned a value to the gpfn parameter.
xc_domain_maximum_gpfn() can validly return 0, at which point gpfn 1 is a
valid scratch page to use.
In addition, widen rc before adding 1 and possibly overflowing.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Julien Grall <julien.grall@linaro.org> CC: Jan Beulich <JBeulich@suse.com> CC: Ian Campbell <Ian.Campbell@citrix.com> CC: Ian Jackson <Ian.Jackson@eu.citrix.com> CC: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>