Ian Jackson [Tue, 10 Feb 2015 16:36:46 +0000 (16:36 +0000)]
libxl: events: Permit timeouts to signal ao abort
The callback functions provided by users must take an rc value. This
rc value can be ERROR_TIMEDOUT or ERROR_ABORTED.
Users of xswait are now expected to deal correctly with
ERROR_ABORTED. If they experience this, it hasn't been logged.
And the caller won't log it either since it's not TIMEDOUT.
Luckily this is correct, so we can just change the doc comment.
Currently nothing generates ERROR_ABORTED; in particular the timeouts
cannot in fact signal abort requests.
There should be no publicly visible change except that some error
returns from libxl will change from ERROR_FAIL to ERROR_TIMEDOUT, and
some changes to debugging messages.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v4: ABORTED not CANCELLED.
Ian Jackson [Tue, 10 Feb 2015 16:27:39 +0000 (16:27 +0000)]
libxl: events: Make libxl__async_exec_* pass caller an rc
The internal user of libxl__async_exec_start et al now gets an rc as
well as the process's exit status.
For now this is always either 0 or ERROR_FAIL, but with ao
abort requests this will possibly be ABORTED or TIMEDOUT too.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v4: Improve doc comment as suggested by Ian C.
v2: New patch due to rebause; v1 had changes to device_hotplug_*
scripts instead.
Callback now gets unambiguous information about error situation:
previously, if only thing that went wrong was that child died
badly, rc would be FAILED, which was unambigously; now rc=0.
Add a comment document the meaning of the rc and status parameters
to the callback.
Ian Jackson [Tue, 10 Feb 2015 16:13:36 +0000 (16:13 +0000)]
libxl: events: Make timeout and async exec setup take an ao, not a gc
Change the timeout setup functions to take a libxl__ao, not a
libxl__gc. This is going to be needed for ao abort, because timeouts
are going to be a main hook for ao abort requests - so the timeouts
need to be associated with an ao.
This means that timeouts can only occur as part of a long-running
libxl function (but this is of course correct, as libxl shouldn't have
any global timeouts, and indeed all the call sites have an ao).
Also remove the gc parameter from libxl__async_exec_start. It can
just use the gc from the ao supplied in the aes.
All the callers follow the obvious patterns and therefore supply the
ao's gc to libxl__async_exec_start and the timeout setup functions.
There is therefore no functional change in this patch.
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com> CC: Yang Hongyang <yanghy@cn.fujitsu.com> CC: Wen Congyang <wency@cn.fujitsu.com> CC: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Wen Congyang <wency@cn.fujitsu.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: This patch split off from "Permit timeouts to signal cancellation".
Rebased; consequently, deal with libxl__async_exec_start.
CC'd authors of the libxl__async_exec_* functions.
Ian Jackson [Fri, 20 Dec 2013 15:18:59 +0000 (15:18 +0000)]
libxl: New error codes ABORTED etc.
We introduce ERROR_ABORTED now, so that we can write code to handle
it, and decreee that functions might return it, even though currently
there is nowhere where this error is generated.
While we're here, provide ERROR_NOTFOUND which will also be used
later, but only as part of the public API.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
---
v4: CANCELLED renamed to ABORTED.
No longer introduce ERROR_NOTIMPLEMENTED.
v2: Rebase means new errors have bigger (more negative) numbers.
Ian Jackson [Thu, 25 Jun 2015 15:34:10 +0000 (16:34 +0100)]
libxl: Change some log messages to say `abandoning' rather than `aborting'
We are going to introduce application-requested aborts of (ao)
operations, but these suspend failures are something different.
Reword to avoid confusion.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ian Jackson [Fri, 20 Dec 2013 12:49:53 +0000 (12:49 +0000)]
libxl: suspend: Return correct error from callbacks
If a suspend callback fails, it has a libxl error code in its hand.
However we must return to libxc the values that libxc expects. So we
stash the libxl error code in dss->rc and fish it out again after
libxc returns from the suspend call.
While we're here, abolish the now-redundant `ok' variable in
remus_devices_postsuspend_cb.
The overall functional change is that libxl_domain_save now completes
with the correct error code as determined when the underlying failure
happened. (Usually this is, still, ERROR_FAIL.)
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Add cleanup in remus_devices_postsuspend_cb.
Ian Jackson [Fri, 20 Dec 2013 12:43:17 +0000 (12:43 +0000)]
libxl: suspend: common suspend callbacks take rc
Change the following functions to take a libxl error code rather than
a boolean "ok" value, and translate that value to the boolean expected
by libxc at the last moment:
domain_suspend_callback_common_done } dss->callback_common_done
remus_domain_suspend_callback_common_done }
domain_suspend_common_done
Also, abolish domain_suspend_common_failed as
domain_suspend_common_done can easily do its job and the call sites
now have to supply the right rc value anyway.
In domain_suspend_common_guest_suspended, change "ret" to "rc"
as it contains a libxl error code.
There is no functional change in this patch: the proper rc value now
propagates further, but is still eventually smashed to a boolean.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
---
v2: Fix a leftover comment referring to domain_suspend_common_failed
Ian Jackson [Fri, 20 Dec 2013 12:34:09 +0000 (12:34 +0000)]
libxl: suspend: switch_logdirty_done takes rc
switch_logdirty_done used to take the value to pass to
libxl__xc_domain_saverestore_async_callback_done (ie, the return value
from the callback). (This was mistakenly described as "ok" in the
prototype, but in the definition it is "broke" and all the call sites
passed 0 for success or -1 for error.)
Instead, make it take a libxl error code (rc). Convert this to the
suspend callback value at the end.
No functional change in this patch.
Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Jackson [Wed, 8 Apr 2015 11:22:38 +0000 (12:22 +0100)]
libxl: ao internal API docs: Mention synchronous ao completion
This doc comment about ao lifecycle failed to mention the option of
completing the ao during the initiator function. (Indeed, the most
obvious reading would forbid it.)
Restructure the comment, describe this situation, and generally
improve the wording.
Also, fix a grammar problem (missing word `a').
Reported-by: Koushik Chakravarty <koushik.chakravarty@citrix.com> Signed-off-by: Ian Jackson <Ian.Jackson@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
David Vrabel [Fri, 26 Jun 2015 15:35:27 +0000 (17:35 +0200)]
x86,arm: remove asm/spinlock.h from all architectures
Now that all architecture use a common ticket lock implementation for
spinlocks, remove the architecture specific byte lock implementations.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Acked-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
David Vrabel [Fri, 26 Jun 2015 15:33:34 +0000 (17:33 +0200)]
use ticket locks for spin locks
Replace the byte locks with ticket locks. Ticket locks are: a) fair;
and b) peform better when contented since they spin without an atomic
operation.
The lock is split into two ticket values: head and tail. A locker
acquires a ticket by (atomically) increasing tail and using the
previous tail value. A CPU holds the lock if its ticket == head. The
lock is released by increasing head.
spin_lock_irq() and spin_lock_irqsave() now spin with irqs disabled
(previously, they would spin with irqs enabled if possible). This is
required to prevent deadlocks when the irq handler tries to take the
same lock with a higher ticket.
Architectures need only provide arch_fetch_and_add() and two barriers:
arch_lock_acquire_barrier() and arch_lock_release_barrier().
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Tim Deegan <tim@xen.org> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Ian Campbell [Tue, 23 Jun 2015 14:58:32 +0000 (15:58 +0100)]
tools: libxl: Take the userdata lock around maxmem changes
There is an issue in libxl_set_memory_target whereby the target and
the max mem can get out of sync, this is because the call the
xc_domain_setmaxmem is not tied in any way to the xenstore transaction
which controls updates to the xenstore side of things.
Consider a domain with 1M of RAM (==target and maxmem for the sake of
argument) and two simultaneous calls to libxl_set_memory_target, both
with relative=0 and enforce=1, one with target=3 and the other with
target=5.
target=5 call target=3 call
transaction start
transaction start
write target=5 to xenstore
write target=3 to xenstore
setmaxmem(5)
setmaxmem(3)
In reality the target=3 case will the retry and eventually (hopefully)
succeed with target=maxmem=3, however the bad state will persist for
some window which is undesirable. On failure other than EAGAIN all
bets are off anyway, but in that case we will likely stick in the bad
state until someone else sets the memory).
To fix this we slightly abuse the userdata lock which is used to
protect updates to the domain's json configuration. Abused because
maxmem is not actually stored in there, but is kept by Xen. However
the lock protects some semantically similar things and is convenient
to use here too.
libxl_domain_setmaxmem also takes the lock, since it reads
memory/target from xenstore before calling xc_domain_setmaxmem there
is a small (but perhaps not very interesting) race there too.
There is on more use of xc_domain_setmaxmem in libxl__build_pre.
However taking a lock around this would be tricky since the xenstore
parts are not done until libxl__build_post. I think this one could be
argued to be OK since the domid is not "public" yet, that is it has
not been returned to the application yet (as the result of the create
operation). Toolstacks which go round fiddling with random domid's
which they find lying on the floor should be taught to do better.
Add a doc note that taking the userdata lock requires the CTX_LOCK to
be held.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Andrew Cooper [Thu, 25 Jun 2015 12:57:31 +0000 (14:57 +0200)]
x86/setup: initialise CR4 before creating idle_vcpu[0]
PV vcpu initialise has CR4 seeded from mmu_cr4_features. Adjust the order of
basic CR4 setup and creation of the idle domain, such that idle_vcpu[0] is not
wildly different from the other idle vcpus.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Andrew Cooper [Thu, 25 Jun 2015 12:57:04 +0000 (14:57 +0200)]
x86/traps: avoid using current too early on boot
Early on boot, current has the sentinel value 0xfffff000. Blindly using it in
show_registers() causes a nested failure and no useful information printed
from an early crash.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Roger Pau Monne [Fri, 19 Jun 2015 08:58:25 +0000 (10:58 +0200)]
configure: check for argp
argp is only present in the GNU C library, so add a specific check for it in
configure. Also check if -largp is needed for linking against it.
Please run autoconf after applying.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
[ ijc -- ran autogen.sh ]
Roger Pau Monne [Fri, 19 Jun 2015 08:58:24 +0000 (10:58 +0200)]
xen{trace/analyze}: don't use 64bit versions of libc functions
This is not needed, neither encouraged. Configure already checks
_FILE_OFFSET_BITS and appends it when needed, so that the right functions
are used. Also remove the usage of loff_t and O_LARGEFILE for the same
reason.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Cc: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Wei Liu [Wed, 24 Jun 2015 10:10:09 +0000 (11:10 +0100)]
NetBSDRump: provide evtchn.h and privcmd.h
Xen's build system has a target for rump kernel called NetBSDRump. We
want to build libxc against rump kernel, so we need to copy NetBSD's
evtchn.h and privcmd.h to NetBSDRump. These copies is not very likely to
diverge from NetBSD's copies, but we don't preclude such possibility.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
A new memory model that allows QEMU to bump memory behind libxl's back
was merged a few months ago. We didn't fully understand the
repercussions back then. Now it breaks migration and becomes blocker of
4.6 release.
It's better to restore to original behaviour at this stage of the
release cycle, that would put us in a position no worse than before, so
the release is unblocked.
The said function is still racy after reverting these two patches.
Making domain memory state consistent requires a bit more work. Separate
patch(es) will be sent out to deal with that problem.
Fix up conflicts with f5b43e95 (libxl: fix "xl mem-set" regression from 0c029c4da2).
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Julien Grall [Fri, 19 Jun 2015 12:41:29 +0000 (13:41 +0100)]
xen/arm: Propagate clock-frequency to DOMU if present in the DT timer node
When the property "clock-frequency" is present in the DT timer node, it
means that the bootloader/firmware didn't correctly configure the
CNTFRQ/CNTFRQ_EL0 on each processor.
The best solution would be to fix the offending firmware/bootloader,
although it may not always be possible to modify and re-flash it.
As it's not possible to trap the register CNTFRQ/CNTFRQ_EL0, we have
to extend xen_arch_domainconfig to provide the timer frequency to the
toolstack when the property "clock-frequency" is present to the host DT
timer node. Then, a property "clock-frequency" will be created in the guest
DT timer node if the value is not 0.
We could have set the property in the guest DT no matter if the property
is present in the host DT. Although, we still want to let the guest
using CNTFRQ in normal case. After all, the property "clock-frequency"
is just a workaround for buggy firmware.
Also add a stub for fdt_property_u32 which is not present in libfdt <
1.4.0 used by distribution such as Debian Wheezy.
Signed-off-by: Julien Grall <julien.grall@citrix.com> Tested-by: Chris Brand <chris.brand@broadcom.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- ran autogen.sh ]
Tim Deegan [Tue, 23 Jun 2015 10:30:26 +0000 (11:30 +0100)]
Change kdd to 'Odd Fixes'.
kdd's knowledge of Windows kernel internals is several releases out of
date now. However the underlying implementation of the serial protocol
is still sound. I have heard that some people are using it, and I'm happy
to answer questions/bug reports, so don't deprecate it just yet.
Tim Deegan [Tue, 23 Jun 2015 10:47:26 +0000 (11:47 +0100)]
New maintainer for x86/mm
George has a long record of contributions to the x86 memory management
and p2m code. He will be taking over as the primary maintainer of
x86/mm; I will still help out with the shadow pagetable code.
Signed-off-by: Tim Deegan <tim@xen.org> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Paul Durrant [Wed, 24 Jun 2015 15:53:36 +0000 (17:53 +0200)]
x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops
The struct just contains three methods and no data, so the name
hvm_mmio_ops more accurately reflects its content. A subsequent patch
introduces a new structure which more accurately warrants the name
hvm_mmio_handler so doing the rename in this purely cosmetic patch avoids
conflating functional and cosmetic changes in a single patch.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Wed, 24 Jun 2015 08:37:50 +0000 (10:37 +0200)]
x86/mm: use is_..._vcpu() instead of open coding it
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: George Dunlap <george.dunlap@eu.citrix.com>
Paul Durrant [Tue, 23 Jun 2015 16:07:49 +0000 (18:07 +0200)]
x86/hvm: remove hvm_io_pending() check in hvmemul_do_io()
The check is done at the wrong point (since it is irrelevant if the
I/O is to be handled by the hypervisor) and its functionality can be
covered by returning X86EMUL_UNHANDLEABLE from hvm_send_assist_req()
instead.
This patch also removes the domain_crash() call from
hvm_send_assist_req(). Returning X86EMUL_UNHANDLEABLE allows the
higher layers of emulation to decide what to do instead.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Paul Durrant [Tue, 23 Jun 2015 16:07:03 +0000 (18:07 +0200)]
x86/hvm: simplify hvmemul_do_io()
Currently hvmemul_do_io() handles paging for I/O to/from a guest address
inline. This causes every exit point to have to execute:
if ( ram_page )
put_page(ram_page);
This patch introduces wrapper hvmemul_do_io_addr() and
hvmemul_do_io_buffer() functions. The latter is used for I/O to/from a Xen
buffer and thus the complexity of paging can be restricted only to the
former, making the common hvmemul_do_io() function less convoluted.
This patch also tightens up some types and introduces pio/mmio wrappers
for the above functions with comments to document their semantics.
Signed-off-by: Paul Durrant <paul.durrant@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Mon, 22 Jun 2015 15:53:21 +0000 (17:53 +0200)]
x86/vLAPIC: adjust types in internal read/write handling
- use 32-bit types where possible (produces slightly better code)
- drop (now) unnecessary casts
- avoid indirection where not needed
- avoid duplicate log messages in vlapic_write()
- minor other cleanup
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Don Slutz [Mon, 22 Jun 2015 09:40:28 +0000 (11:40 +0200)]
gdbsx_guestmemio: allow it to check domain
gdbsx_guest_mem_io() does not get d passed, it expects to handle the
domain lookup itself. Specifically, the caller of
XEN_DOMCTL_gdbsx_guestmemio is expected to use DOMID_IDLE to interact
with the hypervisor, rather than a domain, which doesn't interact well
with with the domain rcu lock.
Signed-off-by: Don Slutz <dslutz@verizon.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
David Vrabel [Mon, 22 Jun 2015 09:39:46 +0000 (11:39 +0200)]
evtchn: pad struct evtchn to 64 bytes
The number of struct evtchn in a page must be a power of two. Under
some workloads performance is improved slightly by padding struct
evtchn to 64 bytes (a typical cache line size), thus putting the fewer
per-channel locks into each cache line.
This does not decrease the number of struct evtchn's per-page.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com>
David Vrabel [Mon, 22 Jun 2015 09:39:03 +0000 (11:39 +0200)]
evtchn: use a per-event channel lock for sending events
When sending an event, use a new per-event channel lock to safely
validate the event channel state.
This new lock must be held when changing event channel state. Note
that the event channel lock must also be held when changing state from
ECS_FREE or it will race with a concurrent get_free_port() call.
To avoid having to take the remote event channel locks when sending to
an interdomain event channel, the local and remote channel locks are
both held when binding or closing an interdomain event channel.
This significantly increases the number of events that can be sent
from multiple VCPUs. But struct evtchn increases in size, reducing
the number that fit into a single page to 64 (instead of 128).
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
David Vrabel [Mon, 22 Jun 2015 09:38:01 +0000 (11:38 +0200)]
evtchn: defer freeing struct evtchn's until evtchn_destroy_final()
notify_via_xen_event_channel() and free_xen_event_channel() had to
check if the domain was dying because they may be called while the
domain is being destroyed and the struct evtchn's are being freed.
By deferring the freeing of the struct evtchn's until all references
to the domain are dropped, these functions can rely on the channel
state being present and valid.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
David Vrabel [Mon, 22 Jun 2015 09:36:17 +0000 (11:36 +0200)]
evtchn: clear xen_consumer when clearing state
Freeing a xen event channel would clear xen_consumer before clearing
the channel state, leaving a window where the channel is in a funny
state (still bound but no consumer).
Move the clear of xen_consumer into free_evtchn() where the state is
also cleared.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Ditch the pointless evtchn_close() wrapper around __evtchn_close()
(renaming the latter) as well as some bogus casts of function results
to void.
Jan Beulich [Mon, 22 Jun 2015 09:34:57 +0000 (11:34 +0200)]
x86/HVM: EOI handling function adjustments
The vector parameters are more usefully u8 right away. This is
particularly important for the vioapic_update_EOI() invocation from
vioapic_write() (which luckily is only a latent issue, as
VIOAPIC_VERSION_ID is still hard coded to 0x11 right now). But it at
once allows simplifying VMX's EXIT_REASON_EOI_INDUCED handling (the
kind of pointless helper function should have been static anyway; not
being use for anything else, it gets removed altogether).
Plus vlapic_handle_EOI() (now renamed for that purpose) can be used as
the tail of vlapic_EOI_set() instead of duplicating that code.
Finally replace a stray current->domain use in vlapic_handle_EOI().
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Malcolm Crossley [Fri, 19 Jun 2015 09:01:24 +0000 (11:01 +0200)]
gnttab: use per-VCPU maptrack free lists
Performance analysis of aggregate network throughput with many VMs
shows that performance is signficantly limited by contention on the
maptrack lock when obtaining/releasing maptrack handles from the free
list.
Instead of a single free list use a per-VCPU list. This avoids any
contention when obtaining a handle. Handles must be released back to
their original list and since this may occur on a different VCPU there
is some contention on the destination VCPU's free list tail pointer
(but this is much better than a per-domain lock).
Increase the default maximum number of maptrack frames by 4 times
because: a) struct grant_mapping is now 16 bytes (instead of 8); and
b) a guest may not evenly distribute all the grant map operations
across the VCPUs (meaning some VCPUs need more maptrack entries than
others).
Signed-off-by: Malcolm Crossley <malcolm.crossley@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich [Fri, 19 Jun 2015 08:59:53 +0000 (10:59 +0200)]
x86/MSI: track host and guest masking separately
In particular we want to avoid losing track of our own intention to
have an entry masked. Physical unmasking now happens only when both
host and guest requested so.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Fri, 19 Jun 2015 08:58:45 +0000 (10:58 +0200)]
x86/MSI-X: cleanup
- __pci_enable_msix() now checks that an MSI-X capability was actually
found
- pass "pos" to msix_capability_init() as both callers already know it
(and hence there's no need to re-obtain it)
- call __pci_disable_msi{,x}() directly instead of via
pci_disable_msi() from __pci_enable_msi{x,}() state validation paths
- use msix_control_reg() instead of open coding it
- log message adjustments
- coding style corrections
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 18 Jun 2015 14:44:15 +0000 (16:44 +0200)]
x86/HVM: avoid pointer wraparound in bufioreq handling
The number of slots per page being 511 (i.e. not a power of two) means
that the (32-bit) read and write indexes going beyond 2^32 will likely
disturb operation. Extend I/O req server creation so the caller can
indicate that it is using suitable atomic accesses where needed (not
all accesses to the two pointers really need to be atomic), allowing
the hypervisor to atomically canonicalize both pointers when both have
gone through at least one cycle.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Ian Campbell <ian.campbell@citrix.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 18 Jun 2015 14:42:56 +0000 (16:42 +0200)]
x86/HAP: prefer is_..._domain() over is_..._vcpu()
In hvm_hap_nested_page_fault() latch the current domain alongside the
current vCPU into a local variable, making use of it where possible
also beyond what the title says.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
Jan Beulich [Thu, 18 Jun 2015 13:07:10 +0000 (15:07 +0200)]
x86: synchronize PCI config space access decoding
Both PV and HVM logic have similar but not similar enough code here.
Synchronize the two so that
- in the HVM case we don't unconditionally try to access extended
config space
- in the PV case we pass a correct range to the XSM hook
- in the PV case we don't needlessly deny access when the operation
isn't really on PCI config space
All this along with sharing the macros HVM already had here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>
There's no need for two exit paths each using rcu_unlock_domain() on
its own here.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
David Vrabel [Thu, 18 Jun 2015 12:53:23 +0000 (14:53 +0200)]
evtchn: simplify port_is_valid()
By keeping a count of the number of currently valid event channels,
port_is_valid() can be simplified.
d->valid_evtchns is only increased (while holding d->event_lock), so
port_is_valid() may be safely called without taking the lock (this
will be useful later).
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Juergen Gross [Thu, 18 Jun 2015 12:52:32 +0000 (14:52 +0200)]
pvusb: don't rely on linux kernel macros for the interface
The interface description of pvUSB lacks some access macros as using
linux kernel macros is assumed to work well. This solution is rather
unfriendly for pvusb implementations being outside the linux kernel.
Additionally things will break quite unpleasent in case the linux
kernel implementation is changed.
To avoid these problems define own macros for accessing bitfields of
the interface and for values of several structure members.
While working on the file add some more comments, especially for the
xenstore interface.
Wei Liu [Wed, 17 Jun 2015 19:39:49 +0000 (20:39 +0100)]
oxenstored: fix del_watches and del_transactions
The statement to reset nb_watches should be in del_watches, not
del_transactions.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Cc: David Scott <dave.scott@citrix.com> Acked-by: David Scott <dave.scott@citrix.com>
[ ijc -- fix syntax error by adding a ";" to the previous line in the
new location and removing from the previous line in the old ]
Wei Liu [Wed, 17 Jun 2015 11:08:38 +0000 (12:08 +0100)]
libxl: refactor toolstack save restore code
This patch does following things:
1. Document v1 format.
2. Factor out function to handle QEMU restore data and function to
handle v1 blob for restore path.
3. Refactor save function to generate different blobs in the order
specified in format specification.
4. Change functions to use "goto out" idiom.
No functional changes introduced.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Roger Pau Monne [Thu, 11 Jun 2015 16:05:20 +0000 (18:05 +0200)]
libxc: fix xc_dom_load_elf_symtab
xc_dom_load_elf_symtab was incorrectly trying to perform the same
calculations already done in elf_parse_bsdsyms when load == 0 is used.
Instead of trying to repeat the calculations, just trust what
elf_parse_bsdsyms has already accounted for.
This also simplifies the code by allowing the non-load case to return
earlier.
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Ross Lagerwall [Mon, 15 Jun 2015 10:12:07 +0000 (11:12 +0100)]
tools/libxc: Batch memory allocations for PV guests
The current code for allocating memory for PV guests batches the
hypercalls to allocate memory by allocating 1024*1024 extents of order 0
at a time. To make this faster, first try allocating extents of order 9
(2 MiB) before falling back to the order 0 allocating if the order 9
allocation fails.
On my test machine this reduced the time to start a 128 GiB PV guest by
about 60 seconds.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com>
Wei Liu [Thu, 4 Jun 2015 10:23:01 +0000 (11:23 +0100)]
libxc: unify handling of vNUMA layout
This patch does the following:
1. Use local variables for dummy vNUMA layout in PV case.
2. Avoid leaking dummy layout back to caller in PV case.
3. Use local variables to reference vNUMA layout (whether it is dummy
or provided by caller) for both PV and HVM.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Wei Liu [Wed, 3 Jun 2015 10:44:50 +0000 (11:44 +0100)]
libxl: clean up qemu-save and qemu-resume files
These files are leaked when using qemu-trad stubdom. They are
intermediate files created by libxc. Unfortunately they don't fit well
in our userdata scheme. Clean them up after we destroy all userdata,
we're sure they are not useful anymore at that point.
Signed-off-by: Wei Liu <wei.liu2@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:44 +0000 (16:30 +0000)]
xenalyze: remove argp_program_version
Since xenalyze is now upstream its Open Source and part of the given
release.
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:43 +0000 (16:30 +0000)]
xenalyze: remove trailing whitespaces
Result of "sed -i 's@[[:blank:]]\+$@@' tools/xentrace/xenalyze.c"
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:41 +0000 (16:30 +0000)]
xenalyze: handle TRC_TRACE_WRAP_BUFFER
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:40 +0000 (16:30 +0000)]
xenalyze: include odd mmio states in default output
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:39 +0000 (16:30 +0000)]
xenalyze: print newline after unknown hvm events
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: Wei Liu <wei.liu2@citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
Olaf Hering [Thu, 11 Jun 2015 16:30:38 +0000 (16:30 +0000)]
xenalyze: add to tools/xentrace/
This merges xenalyze.hg, changeset 150:24308507be1d,
into tools/xentrace/xenalyze.c to have the tool and
public/trace.h in one place.
Adjust code to use public/trace.h instead of private trace.h
Signed-off-by: Olaf Hering <olaf@aepfle.de> Acked-by: Wei Liu <wei.liu2@citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Cc: Ian Jackson <ian.jackson@eu.citrix.com> Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com>
[ ijc -- wrap $(BIN) install in a check in case it is empty (which it
is on !x86, avoid BIN += since it results in BIN = ' ' on
!x86 ]
Jan Beulich [Tue, 16 Jun 2015 10:29:18 +0000 (12:29 +0200)]
gnttab: make struct grant_mapping private
This documents that no entity outside of gnttab.c actually accesses
objects of that type, which is particularly important with the now more
fine grained locking in place.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:28:11 +0000 (12:28 +0200)]
gnttab: fix/adjust gnttab_transfer()
- don't update shared entry's frame number for translated domains (as
MFNs shouldn't be exposed to such guests)
- for v1 grant table format, force copying of the page also when the
intended MFN doesn't fit in 32 bits (and the domain isn't translated)
- fix an apparent off-by-one error (it's unclear to me why commit 5cc77f9098 ("32-on-64: Fix domain address-size clamping, implement")
uses BITS_PER_LONG-1 here, while using BITS_PER_LONG in the two other
invocations of domain_clamp_alloc_bitsize())
- adjust comments accompanying the shared entry's frame field
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:26:03 +0000 (12:26 +0200)]
gnttab: simplify shared entry v1 vs v2 handling
In a number of places both v1 and v2 pointers are being obtained when
none or just one suffices. Additionally in __acquire_grant_for_copy()
the flow of if/else-if can be slightly improved by re-ordering.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:25:35 +0000 (12:25 +0200)]
gnttab: limit mapcount() looping
The function doesn't need to return counts in the first place; all its
callers are after is whether at least one entry of a certain kind
exists. With that there's no point for that loop to continue once the
looked for condition was found to be met by one entry. Rename the
function to match the changed behavior.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
Jan Beulich [Tue, 16 Jun 2015 10:24:49 +0000 (12:24 +0200)]
gnttab: eliminate several explicit version checks
By having nr_grant_entries() return zero when the grant table version
is still unset we can reduce the number of error paths and at once fix
grant_map_exists() running into the being removed ASSERT() when called
for a page owned by a domain not having its grant table set up yet.
Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Ian Campbell <ian.campbell@citrix.com>
David Vrabel [Mon, 15 Jun 2015 11:25:20 +0000 (13:25 +0200)]
gnttab: make the grant table lock a read-write lock
In combination with the per-active entry locks, the grant table lock
can be made a read-write lock since the majority of cases only the
read lock is required. The grant table read lock protects against
changes to the table version or size (which are done with the write
lock held).
The write lock is also required when two active entries must be
acquired.
The double lock is still required when updating IOMMU page tables.
With the lock contention being only on the maptrack lock (unless IOMMU
updates are required), performance and scalability is improved.
Based on a patch originally by Matt Wilson <msw@amazon.com>.
Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>