Partial revert of 569:9f795e737459 (EA-1001: Build VBD ring macros
from kernel, not xen headers). Dropping KBUILD dependencies after
resorting to a private copy of the linux headers.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Wed, 13 Jul 2011 20:51:37 +0000 (13:51 -0700)]
VHD: Improve misleading result code from failing vhd_parent_locator_get.
Presently always comes back with -EINVAL, due to a final
vhd_parent_locator_read failing. Fix returns -EINVAL when without
candidates altogether, and the last vhd_find_parent results
otherwise. Such as -ENOENT.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 28 Jun 2011 01:08:57 +0000 (18:08 -0700)]
CA-61156: control: Anticipate tap-ctl-spawn racing a bugtool killall -USR1.
There's a race between tapdisk's sigaction init and xen-bugtool
shooting debug signals under RT stress. If killed by something as
innocuous as USR1, then just retry the fork().
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Mike McClurg [Tue, 7 Jun 2011 18:35:30 +0000 (19:35 +0100)]
Make blktap portable to Ubuntu
These are changes mostly required by gcc 4.5, with the exception of the change
to Config.mk, which inserts some make variables that we expect to be inserted
by the spec file. The rest of the changes are including missing headers,
initialising a variable and providing a mode when opening a file.
Signed-off-by: Mike McClurg <mike.mcclurg@citrix.com>
Daniel Stodden [Mon, 23 May 2011 02:23:46 +0000 (19:23 -0700)]
MAR-125: Add driver log rate limits.
Allocates one td-loglimit instance per driver instance. Present burst
size is 16 messages, over an interval of 90 seconds. To be shared by
both driver code and the VBD.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Mon, 23 May 2011 02:23:45 +0000 (19:23 -0700)]
MAR-125: Support log rate limiting.
Much like Linux's ratelimit. Allow for message bursts of some size,
count messages as they pass. Drop messages once the burst size was
exceeded within a given interval. Next interval resets the count.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Sat, 16 Apr 2011 00:56:11 +0000 (17:56 -0700)]
XOP-38: (lcache) Detect out-of-space conditions before write() does.
Test free space in the caching SR before attempting to store our
reads. VHD block allocation writes on Ext3 have the nasty property of
blocking excessively after running out of space. We therefore
enable/disable ourselves at a 1/s granularity, querying free space
through statfs beforehand.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Jake Wires [Tue, 22 Feb 2011 01:30:06 +0000 (17:30 -0800)]
CA-51988/XC-3264: Don't (re-)issue requests while paused.
Picks missing quiesced-state related bits only from XC blktap.git 32923215. Issue was then duplicated through 595:8651e424a229
(vbd_recheck_state), hence the second hunk.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
From 3292321524cc5797bd0e01b9d36e71ed4a54cbbf Mon Sep 17 00:00:00 2001
Date: Tue, 10 Aug 2010 16:41:38 -0700
* don't issue requests while paused
* don't timeout requests while paused
* ensure progress on resume Signed-off-by: Jake Wires <Jake.Wires@citrix.com>
---
drivers/tapdisk-vbd.c | 24 +++++++++++++++++++++++-
1 files changed, 23 insertions(+), 1 deletions(-)
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: llcache - local leaf caching drivers.
Add PR-1053 compliant local leaf caching support, in toplevel filter
drivers. Since data paths are very different (local or shared storage
writes, in the non-persistent vs. mirrored or shared storage write in
the persistent case), this adds two new driver types:
- llp: Local Leaf, Persistent
- lle: Local Leaf, Non-persistent ('ephemeral')
Both work by driving an aggregated vhd image, internally.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: Loop cache writes as vreqs, not treqs.
Looping treqs on the toplevel image is prone to error and doesn't
retry. It cannot be properly scheduled, therefore may starve normal
VBD taps. Queue vreqs instead.
Related change:
* The 'phase' request state is now gone, dfa state now driven by
callback separation.
* Failing req alloc now stalls the queue (-EBUSY) instead of dropping
a cache entry.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: Shorten names.
- Align with later llcache (local leaf) driver types.
- s/local_cache/lcache/, matching public names.
- s/lreq/req/, aligning with other drivers.
- s/get/alloc/, s/put/free, lacking refcounts.
- shorten some field names, unlikely to cause clashing.
- prefix TD_, where appropriate.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: Fix tapdisk-stream.
Was dropped by the vreq patch, because the VBD change broke
compilation. Essentially acts as a tap, but in a somewhat improperly
(now) because it emulates a blkif ring to do so.
Fixed with a major rewrite. The tool got somewhat out of fashion, but
is a good API exercise:
* New code now queues vreqs instead of fake blkif reqs.
* Won't get rid of the loopback event fd yet.
The latter is because new requests are only processed per iteration,
after select fired. As a consequence, vreq completion cannot just
queue new requests, because they won't get run before the next
external event fires. This will affect VBD I/O queued by filter
drivers, too.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1053: Move the blktap ring out of tapdisk-vbd.
About to potentially put multiple users on top of a single VBD.
Motivation:
- Enable more than one external taps, such as blktap + blkif/gntdev.
- Let filter drivers issue true VBD I/O, cleanly.
So make the local blktap device ring only one of them
(tapdisk-blktap).
Usage:
- Entry is tapdisk_vbd_queue_request. Callers are fully responsible
for vreq memory preallocation and some vreq init.
- Exit is through a completion callback. Callbacks come in
bursts. Final callback gets annotated, so kicking external pads
stays efficient.
This changes a couple details about VBD-requests:
- Break out blkif-style segment formats. Replaced with a struct
td_iovec, essentially in uio.h's struct iovec spirit: a base+len
vector, but in sectors, not bytes.
Segment merges are up to the caller. This reverts cset 3a9bcd90c.
- Treqs now link back to const vreqs. Presently only to get treqs
'name'd for debugging, because ring 'ids' don't make sense anymore
internally.
- Moves related typedefs into tapdisk.h
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1129: Add x-chain: pseudo target type.
Adds a new cli driver type, syntaxed 'x-chain:/path'. Doesn't target a
driver, but opens <path> and parses a list of "<type>:<images>
<flags>\n" entries.
Drops most of the present vbd_open code, replaced with a couple
primitives in tapdisk_image:
- tapdisk_image_open - single image.
- tapdisk_image_open_parents -- append parent chain as per get_parent_id.
- tapdisk_image_open_chain -- the normal vbd entry.
- tapdisk_image_validate_chain -- dito.
The x-chain target will create the head of the vbd image chain
precisely as described in its input file, followed by a final call to
tapdisk_image_open_parents and validation.
Apart from sharing tapdev parent nodes as presently done with the
local caching drivers.
VHD-index mode currently dropped. Other legacy tap-ctl args should
keep working.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1248: Shorten the tapdisk_vbd_open_vdi control path.
We used to have to pipe all additional driver activations through
tapdisk_vbd_open_vdi, back then mainly for blktap1 compat. This can be
done in the control callback now. Reduces the amount of flagging
involved, and simplifies passing additional parameters to drivers.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
- td-rated: Stand alone bridge process, listing to bandwidth
requests, typically from valve:/ instances. Includes a plugin
interface for various rate limiting algorithms.
Algorithms (yet slightly experimental):
- "Token Bucket". A classic, with some trivial modifications to
promote batching.
- "Meminfo". Watching /proc/meminfo for pagecache congestion.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:45 +0000 (01:37 -0800)]
PR-1129: Enforce masked state.
Masking timeout events only skips timeval calculation, but doesn't
prevent them from anyway as they expired. Enforce masked state right
around the callback. For timeouts, this means the event will keep
ticking at the interval given.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 15 Feb 2011 09:37:44 +0000 (01:37 -0800)]
Remove td_image.private.
Not needed on the datapath because of the vreq->vbd map. (Meaning that
ultimately driver and image could event be merged (again), if vbd
image lists get ever out of fashion.)
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 18 Jan 2011 19:53:23 +0000 (11:53 -0800)]
EA-1001: Build VBD ring macros from kernel, not xen headers.
Includes linux/blktap.h, not xen/io/blkif.h. Make therefore wants a
KBUILD argument to find the kernel headers.
Fallout:
* Presently comes with loads of ugly blkif_t/ypedefs and defines to
reduce noise under drivers/. Will go away with later VBD patches.
* BLKTAP_RING_SIZE isn't constant anymore (due to _SC_PAGE_SIZE).
Substituted with a MAX_REQUESTS defined to a fixed 32U. More
flexible queue size dimensions are clearly a field for future work.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Thu, 30 Sep 2010 21:01:47 +0000 (14:01 -0700)]
CA-44322: Restrict I/O request merging on filesystems.
Ensure that every single iocb can be issued with only the memory
reserves held in kernel space. Main resource prone to congestion are
bio structs.
For I/O continguous in physical storage, such as bare LUN mappings, a
single bio will hold up to 256 pages. To accomodate block mappings on
file systems, we reserve a more than 1 bio, but cannot submit iocbs of
arbitrary length without risking to stall once the reserve is
exhausted.
Limits the iocb size on ext. Assumes 4k blocks for now.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Thu, 30 Sep 2010 21:01:45 +0000 (14:01 -0700)]
CA-44322: Add an I/O-submit thread.
Slightly annoying to add threads to core blktap code, but necessary to
avoid potential starvation when dom0 gets under memory
pressure. Blktap can guarantee io_submit makes progress by keeping
memory reserves, but not enough to guarantee that it's
non-blocking. To refill the reserves, we want completion of in-flight
I/O the main even loop.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.comm>
Daniel Stodden [Thu, 30 Sep 2010 21:01:40 +0000 (14:01 -0700)]
CA-46079: Recover the image storage type.
Used to be a message parameter passed in at open time, then down
through the VBD and images up to the driver. Replaced by stat() and
statfs().
The vbd->storage isn't really applicable with cross-SR VHD chains
became more popular, so removed. We keep the driver->storage, but only
for verbosity. Drivers with type-dependent code call
tapdisk_storage_type() during td_open() are encouraged to store the
result here.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Thu, 30 Sep 2010 21:01:35 +0000 (14:01 -0700)]
CA-46079: Remove the image reopen hack.
Used to reopen the image chain on the first request, thereby detecting
guest activation after migration. Obsolete since tapdisk is
spawned/resumed after VM stop/copy now.
Signed-off-by: Daniel Stoddden <daniel.stodden@citrix.com>
Daniel Stodden [Thu, 9 Sep 2010 09:05:17 +0000 (02:05 -0700)]
CA-44974: Make tap_ctl_close idempotent.
Avoid potential freelist/conn vector corruption due to
double-frees. Upgrade the WARN_ON() to a panic(), the present drain
loop doesn't want to be asked after disconnect.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Daniel Stodden [Tue, 7 Sep 2010 02:42:26 +0000 (19:42 -0700)]
CA-44675: Fix parent cache corruption due to I/O crosstalk.
Previous patch requeued completing ring I/O buffers.
An interesting question is why this succeeds without a proper tapdisk
crash. The only sane explanation I can come up with is that the common
path manages to queue AIOs before our response hits the kernel so
unmap goes after GUP page translation. Which sounds not too
improbable, the target leaf vhd bitmap was likely still hot at this
point.
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>