Jake Wires [Tue, 22 Feb 2011 01:30:06 +0000 (17:30 -0800)]
CA-51988/XC-3264: Don't (re-)issue requests while paused.
Picks missing quiesced-state related bits only from XC blktap.git 32923215. Issue was then duplicated through 595:8651e424a229
(vbd_recheck_state), hence the second hunk.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
From 3292321524cc5797bd0e01b9d36e71ed4a54cbbf Mon Sep 17 00:00:00 2001
Date: Tue, 10 Aug 2010 16:41:38 -0700
* don't issue requests while paused
* don't timeout requests while paused
* ensure progress on resume Signed-off-by: Jake Wires <Jake.Wires@citrix.com>
---
drivers/tapdisk-vbd.c | 24 +++++++++++++++++++++++-
1 files changed, 23 insertions(+), 1 deletions(-)
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: llcache - local leaf caching drivers.
Add PR-1053 compliant local leaf caching support, in toplevel filter
drivers. Since data paths are very different (local or shared storage
writes, in the non-persistent vs. mirrored or shared storage write in
the persistent case), this adds two new driver types:
- llp: Local Leaf, Persistent
- lle: Local Leaf, Non-persistent ('ephemeral')
Both work by driving an aggregated vhd image, internally.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: Loop cache writes as vreqs, not treqs.
Looping treqs on the toplevel image is prone to error and doesn't
retry. It cannot be properly scheduled, therefore may starve normal
VBD taps. Queue vreqs instead.
Related change:
* The 'phase' request state is now gone, dfa state now driven by
callback separation.
* Failing req alloc now stalls the queue (-EBUSY) instead of dropping
a cache entry.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: Shorten names.
- Align with later llcache (local leaf) driver types.
- s/local_cache/lcache/, matching public names.
- s/lreq/req/, aligning with other drivers.
- s/get/alloc/, s/put/free, lacking refcounts.
- shorten some field names, unlikely to cause clashing.
- prefix TD_, where appropriate.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: Fix tapdisk-stream.
Was dropped by the vreq patch, because the VBD change broke
compilation. Essentially acts as a tap, but in a somewhat improperly
(now) because it emulates a blkif ring to do so.
Fixed with a major rewrite. The tool got somewhat out of fashion, but
is a good API exercise:
* New code now queues vreqs instead of fake blkif reqs.
* Won't get rid of the loopback event fd yet.
The latter is because new requests are only processed per iteration,
after select fired. As a consequence, vreq completion cannot just
queue new requests, because they won't get run before the next
external event fires. This will affect VBD I/O queued by filter
drivers, too.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: Move the blktap ring out of tapdisk-vbd.
About to potentially put multiple users on top of a single VBD.
Motivation:
- Enable more than one external taps, such as blktap + blkif/gntdev.
- Let filter drivers issue true VBD I/O, cleanly.
So make the local blktap device ring only one of them
(tapdisk-blktap).
Usage:
- Entry is tapdisk_vbd_queue_request. Callers are fully responsible
for vreq memory preallocation and some vreq init.
- Exit is through a completion callback. Callbacks come in
bursts. Final callback gets annotated, so kicking external pads
stays efficient.
This changes a couple details about VBD-requests:
- Break out blkif-style segment formats. Replaced with a struct
td_iovec, essentially in uio.h's struct iovec spirit: a base+len
vector, but in sectors, not bytes.
Segment merges are up to the caller. This reverts cset 3a9bcd90c.
- Treqs now link back to const vreqs. Presently only to get treqs
'name'd for debugging, because ring 'ids' don't make sense anymore
internally.
- Moves related typedefs into tapdisk.h
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1129: Add x-chain: pseudo target type.
Adds a new cli driver type, syntaxed 'x-chain:/path'. Doesn't target a
driver, but opens <path> and parses a list of "<type>:<images>
<flags>\n" entries.
Drops most of the present vbd_open code, replaced with a couple
primitives in tapdisk_image:
- tapdisk_image_open - single image.
- tapdisk_image_open_parents -- append parent chain as per get_parent_id.
- tapdisk_image_open_chain -- the normal vbd entry.
- tapdisk_image_validate_chain -- dito.
The x-chain target will create the head of the vbd image chain
precisely as described in its input file, followed by a final call to
tapdisk_image_open_parents and validation.
Apart from sharing tapdev parent nodes as presently done with the
local caching drivers.
VHD-index mode currently dropped. Other legacy tap-ctl args should
keep working.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1248: Shorten the tapdisk_vbd_open_vdi control path.
We used to have to pipe all additional driver activations through
tapdisk_vbd_open_vdi, back then mainly for blktap1 compat. This can be
done in the control callback now. Reduces the amount of flagging
involved, and simplifies passing additional parameters to drivers.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
- td-rated: Stand alone bridge process, listing to bandwidth
requests, typically from valve:/ instances. Includes a plugin
interface for various rate limiting algorithms.
Algorithms (yet slightly experimental):
- "Token Bucket". A classic, with some trivial modifications to
promote batching.
- "Meminfo". Watching /proc/meminfo for pagecache congestion.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1129: Enforce masked state.
Masking timeout events only skips timeval calculation, but doesn't
prevent them from anyway as they expired. Enforce masked state right
around the callback. For timeouts, this means the event will keep
ticking at the interval given.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:44 +0000 (01:37 -0800)]
Remove td_image.private.
Not needed on the datapath because of the vreq->vbd map. (Meaning that
ultimately driver and image could event be merged (again), if vbd
image lists get ever out of fashion.)
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 18 Jan 2011 19:53:23 +0000 (11:53 -0800)]
EA-1001: Build VBD ring macros from kernel, not xen headers.
Includes linux/blktap.h, not xen/io/blkif.h. Make therefore wants a
KBUILD argument to find the kernel headers.
Fallout:
* Presently comes with loads of ugly blkif_t/ypedefs and defines to
reduce noise under drivers/. Will go away with later VBD patches.
* BLKTAP_RING_SIZE isn't constant anymore (due to _SC_PAGE_SIZE).
Substituted with a MAX_REQUESTS defined to a fixed 32U. More
flexible queue size dimensions are clearly a field for future work.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Thu, 30 Sep 2010 21:01:47 +0000 (14:01 -0700)]
CA-44322: Restrict I/O request merging on filesystems.
Ensure that every single iocb can be issued with only the memory
reserves held in kernel space. Main resource prone to congestion are
bio structs.
For I/O continguous in physical storage, such as bare LUN mappings, a
single bio will hold up to 256 pages. To accomodate block mappings on
file systems, we reserve a more than 1 bio, but cannot submit iocbs of
arbitrary length without risking to stall once the reserve is
exhausted.
Limits the iocb size on ext. Assumes 4k blocks for now.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Thu, 30 Sep 2010 21:01:45 +0000 (14:01 -0700)]
CA-44322: Add an I/O-submit thread.
Slightly annoying to add threads to core blktap code, but necessary to
avoid potential starvation when dom0 gets under memory
pressure. Blktap can guarantee io_submit makes progress by keeping
memory reserves, but not enough to guarantee that it's
non-blocking. To refill the reserves, we want completion of in-flight
I/O the main even loop.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.comm>
Daniel Stodden [Thu, 30 Sep 2010 21:01:40 +0000 (14:01 -0700)]
CA-46079: Recover the image storage type.
Used to be a message parameter passed in at open time, then down
through the VBD and images up to the driver. Replaced by stat() and
statfs().
The vbd->storage isn't really applicable with cross-SR VHD chains
became more popular, so removed. We keep the driver->storage, but only
for verbosity. Drivers with type-dependent code call
tapdisk_storage_type() during td_open() are encouraged to store the
result here.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Thu, 30 Sep 2010 21:01:35 +0000 (14:01 -0700)]
CA-46079: Remove the image reopen hack.
Used to reopen the image chain on the first request, thereby detecting
guest activation after migration. Obsolete since tapdisk is
spawned/resumed after VM stop/copy now.
Signed-off-by: Daniel Stoddden <daniel.stodden@citrix.com>
Daniel Stodden [Thu, 9 Sep 2010 09:05:17 +0000 (02:05 -0700)]
CA-44974: Make tap_ctl_close idempotent.
Avoid potential freelist/conn vector corruption due to
double-frees. Upgrade the WARN_ON() to a panic(), the present drain
loop doesn't want to be asked after disconnect.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 7 Sep 2010 02:42:26 +0000 (19:42 -0700)]
CA-44675: Fix parent cache corruption due to I/O crosstalk.
Previous patch requeued completing ring I/O buffers.
An interesting question is why this succeeds without a proper tapdisk
crash. The only sane explanation I can come up with is that the common
path manages to queue AIOs before our response hits the kernel so
unmap goes after GUP page translation. Which sounds not too
improbable, the target leaf vhd bitmap was likely still hot at this
point.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 17 Aug 2010 08:56:28 +0000 (01:56 -0700)]
blktap: Write tapdisk-control response data asynchronous and fully buffered.
Tapdisk should never block on control connections. We preallocate
connection state for a number of clients, including buffer space for
the response (presently 4k/conn). Then stream back response data
asynchronously, driven by the event loop.
Daniel Stodden [Fri, 6 Aug 2010 10:53:33 +0000 (03:53 -0700)]
CA-43084: Remove blocking opportunities from tapdisk-control.
Certainly should not syslog(), but use tlog instead. While we are at
it, preallocate the connection structs, too. Fixing a memleak by
509:abadd2f7ca77 (control op cancellation). Now uses tlog_syslog. The
noise "received"/"sending" noise should be avoided, but the logging
aids debugging in the meantime.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Andrei Lifchits [Tue, 20 Jul 2010 16:20:24 +0000 (17:20 +0100)]
CP-1732, CP-1733, CP-1734: add local caching support (alpha quality). This includes:
- read caching into the leaf
- mirror write mode
- failover to the secondary image on ENOSPC
- possibility of snapshotting of empty images (for cache setup)
Daniel Stodden [Wed, 14 Jul 2010 00:47:39 +0000 (17:47 -0700)]
blktap2: Redo tap-ctl-list.
Consolidate all outputs to tap_list_t on list_heads, removing the
overcomplicated vectors. Includes a change to the tap_ctl_list
signature, accordingly. Simplifies the old join3 code, now
inlined. Remove the obsolete tap_list id. Introduces list_move and
list_splice_tail macros to list.h.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 29 Jun 2010 20:03:09 +0000 (13:03 -0700)]
All: Fix RPM install path definitions.
Help make libraries install to the correct location without Makefile
divergence from OSS org. Removes private defintions of LIBDIR from the
Makefiles. LIBDIR is globally defined by the toplevel make at xen.org,
and now by the RPM build accordingly.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 29 Jun 2010 20:03:09 +0000 (13:03 -0700)]
blktap2: Update tap-ctl timeouts and blocking behavior.
Adds a -t <timeout> switch to the destroy, pause and close operations,
which are known to block. Changes the library call interfaces to use
timeval structs, not <int> secs, which are then passed around
internally. On Linux, the effect should be that we track the total
timeout across calls, not a maximum for individual operations.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 29 Jun 2010 20:03:09 +0000 (13:03 -0700)]
blktap2: Synchronize device removal when removing the vbd.
Employ a new ring ioctl, BLKTAP2_IOCTL_REMOVE_DEVICE, which succeeds
only once the bdev is closed. Iterating events then safely drains the
queue. If not implemented, fall back to the previous behavior, which
may fail requests.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>