ia64/xen-unstable

changeset 19736:9f4c5734e4aa

blktap2: README updates

As promised, this brings the long out-of-sync documentation up to
date, and adds some getting started information about tapdisk driver
development - I get the occasional email on this latter subject.

Signed-off-by: Dutch Meyer <dmeyer@cs.ubc.ca>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 09:31:23 2009 +0100 (2009-06-05)
parents 21d1fcb0be41
children 6eff3fe96aff
files tools/blktap2/README
line diff
     1.1 --- a/tools/blktap2/README	Fri Jun 05 09:30:36 2009 +0100
     1.2 +++ b/tools/blktap2/README	Fri Jun 05 09:31:23 2009 +0100
     1.3 @@ -1,19 +1,21 @@
     1.4 -Blktap Userspace Tools + Library
     1.5 +Blktap2 Userspace Tools + Library
     1.6  ================================
     1.7  
     1.8 +Dutch Meyer
     1.9 +4th June 2009
    1.10 +
    1.11  Andrew Warfield and Julian Chesterfield
    1.12  16th June 2006
    1.13  
    1.14 -{firstname.lastname}@cl.cam.ac.uk
    1.15  
    1.16 -The blktap userspace toolkit provides a user-level disk I/O
    1.17 -interface. The blktap mechanism involves a kernel driver that acts
    1.18 +The blktap2 userspace toolkit provides a user-level disk I/O
    1.19 +interface. The blktap2 mechanism involves a kernel driver that acts
    1.20  similarly to the existing Xen/Linux blkback driver, and a set of
    1.21 -associated user-level libraries.  Using these tools, blktap allows
    1.22 +associated user-level libraries.  Using these tools, blktap2 allows
    1.23  virtual block devices presented to VMs to be implemented in userspace
    1.24  and to be backed by raw partitions, files, network, etc.
    1.25  
    1.26 -The key benefit of blktap is that it makes it easy and fast to write
    1.27 +The key benefit of blktap2 is that it makes it easy and fast to write
    1.28  arbitrary block backends, and that these user-level backends actually
    1.29  perform very well.  Specifically:
    1.30  
    1.31 @@ -38,7 +40,7 @@ perform very well.  Specifically:
    1.32  
    1.33  How it works (in one paragraph):
    1.34  
    1.35 -Working in conjunction with the kernel blktap driver, all disk I/O
    1.36 +Working in conjunction with the kernel blktap2 driver, all disk I/O
    1.37  requests from VMs are passed to the userspace deamon (using a shared
    1.38  memory interface) through a character device. Each active disk is
    1.39  mapped to an individual device node, allowing per-disk processes to
    1.40 @@ -49,74 +51,271 @@ asynchronous request dispatch achieved w
    1.41  code.  We provide a simple, asynchronous virtual disk interface that
    1.42  makes it quite easy to add new disk implementations.
    1.43  
    1.44 -As of June 2006 the current supported disk formats are:
    1.45 +As of June 2009 the current supported disk formats are:
    1.46  
    1.47   - Raw Images (both on partitions and in image files)
    1.48 - - File-backed Qcow disks
    1.49 - - Standalone sparse Qcow disks
    1.50 - - Fast shareable RAM disk between VMs (requires some form of cluster-based 
    1.51 -   filesystem support e.g. OCFS2 in the guest kernel)
    1.52 - - Some VMDK images - your mileage may vary
    1.53 + - Fast sharable RAM disk between VMs (requires some form of 
    1.54 +   cluster-based filesystem support e.g. OCFS2 in the guest kernel)
    1.55 + - VHD, including snapshots and sparse images
    1.56 + - Qcow, including snapshots and sparse images
    1.57  
    1.58 -Raw and QCow images have asynchronous backends and so should perform
    1.59 -fairly well.  VMDK is based directly on the qemu vmdk driver, which is
    1.60 -synchronous (a.k.a. slow).
    1.61  
    1.62  Build and Installation Instructions
    1.63  ===================================
    1.64  
    1.65 -Make to configure the blktap backend driver in your dom0 kernel.  It
    1.66 -will cooperate fine with the existing backend driver, so you can
    1.67 -experiment with tap disks without breaking existing VM configs.
    1.68 +Make to configure the blktap2 backend driver in your dom0 kernel.  It
    1.69 +will inter-operate with the existing backend and frontend drivers.  It
    1.70 +will also cohabitate with the original blktap driver.  However, some
    1.71 +formats (currently aio and qcow) will default to their blktap2
    1.72 +versions when specified in a vm configuration file.
    1.73  
    1.74 -To build the tools separately, "make && make install" in 
    1.75 -tools/blktap.
    1.76 +To build the tools separately, "make && make install" in
    1.77 +tools/blktap2.
    1.78  
    1.79  
    1.80  Using the Tools
    1.81  ===============
    1.82  
    1.83 -Prepare the image for booting. For qcow files use the qcow utilities
    1.84 -installed earlier. e.g. qcow-create generates a blank standalone image
    1.85 -or a file-backed CoW image. img2qcow takes an existing image or
    1.86 -partition and creates a sparse, standalone qcow-based file.
    1.87 +Preparing an image for boot:
    1.88  
    1.89  The userspace disk agent is configured to start automatically via xend
    1.90 -(alternatively you can start it manually => 'blktapctrl')
    1.91 -
    1.92 -Customise the VM config file to use the 'tap' handler, followed by the
    1.93 -driver type. e.g. for a raw image such as a file or partition:
    1.94  
    1.95 -disk = ['tap:aio:<FILENAME>,sda1,w']
    1.96 +Customize the VM config file to use the 'tap:tapdisk' handler,
    1.97 +followed by the driver type. e.g. for a raw image such as a file or
    1.98 +partition:
    1.99  
   1.100 -e.g. for a qcow image:
   1.101 +disk = ['tap:tapdisk:aio:<FILENAME>,sda1,w']
   1.102  
   1.103 -disk = ['tap:qcow:<FILENAME>,sda1,w']
   1.104 +Alternatively, the vhd-util tool (installed with make install, or in
   1.105 +/blktap2/vhd) can be used to build sparse copy-on-write vhd images.
   1.106 +
   1.107 +For example, to build a sparse image -
   1.108 +  vhd-util create -n MyVHDFile -s 1024
   1.109 +
   1.110 +This creates a sparse 1GB file named "MyVHDFile" that can be mounted
   1.111 +and populated with data.
   1.112 +
   1.113 +One can also base the image on a raw file -
   1.114 +  vhd-util snapshot -n MyVHDFile -p SomeRawFile -m
   1.115 +
   1.116 +This creates a sparse VHD file named "MyVHDFile" using "SomeRawFile"
   1.117 +as a parent image.  Copy-on-write semantics ensure that writes will be
   1.118 +stored in "MyVHDFile" while reads will be directed to the most
   1.119 +recently written version of the data, either in "MyVHDFile" or
   1.120 +"SomeRawFile" as is appropriate.  Other options exist as well, consult
   1.121 +the vhd-util application for the complete set of VHD tools.
   1.122 +
   1.123 +VHD files can be mounted automatically in a guest similarly to the
   1.124 +above AIO example simply by specifying the vhd driver.
   1.125 +
   1.126 +disk = ['tap:tapdisk:vhd:<VHD FILENAME>,sda1,w']
   1.127  
   1.128  
   1.129 -Mounting images in Dom0 using the blktap driver
   1.130 +Snapshots:
   1.131 +
   1.132 +Pausing a guest will also plug the corresponding IO queue for blktap2
   1.133 +devices and stop blktap2 drivers.  This can be used to implement a
   1.134 +safe live snapshot of qcow and vhd disks.  An example script "xmsnap"
   1.135 +is shown in the tools/blktap2/drivers directory.  This script will
   1.136 +perform a live snapshot of a qcow disk.  VHD files can use the
   1.137 +"vhd-util snapshot" tool discussed above.  If this snapshot command is
   1.138 +applied to a raw file mounted with tap:tapdisk:AIO, include the -m
   1.139 +flag and the driver will be reloaded as VHD.  If applied to an already
   1.140 +mounted VHD file, omit the -m flag.
   1.141 +
   1.142 +
   1.143 +Mounting images in Dom0 using the blktap2 driver
   1.144  ===============================================
   1.145  Tap (and blkback) disks are also mountable in Dom0 without requiring an
   1.146 -active VM to attach. You will need to build a xenlinux Dom0 kernel that
   1.147 -includes the blkfront driver (e.g. the default 'make world' or 
   1.148 -'make kernels' build. Simply use the xm command-line tool to activate
   1.149 -the backend disks, and blkfront will generate a virtual block device that
   1.150 -can be accessed in the same way as a loop device or partition:
   1.151 +active VM to attach. 
   1.152  
   1.153 -e.g. for a raw image file <FILENAME> that would normally be mounted using
   1.154 -the loopback driver (such as 'mount -o loop <FILENAME> /mnt/disk'), do the
   1.155 -following:
   1.156 +The syntax is -
   1.157 +  tapdisk2 -n <type>:<full path to file>
   1.158  
   1.159 -xm block-attach 0 tap:aio:<FILENAME> /dev/xvda1 w 0
   1.160 -mount /dev/xvda1 /mnt/disk        <--- don't use loop driver
   1.161 +For example -
   1.162 +  tapdisk2  -n aio:/home/images/rawFile.img
   1.163  
   1.164 -In this way, you can use any of the userspace device-type drivers built
   1.165 -with the blktap userspace toolkit to open and mount disks such as qcow
   1.166 -or vmdk images:
   1.167 +When successful the location of the new device will be provided by
   1.168 +tapdisk2 to stdout and tapdisk2 will terminate.  From that point
   1.169 +forward control of the device is provided through sysfs in the
   1.170 +directory-
   1.171  
   1.172 -xm block-attach 0 tap:qcow:<FILENAME> /dev/xvda1 w 0
   1.173 -mount /dev/xvda1 /mnt/disk
   1.174 +  /sys/class/blktap2/blktap#/
   1.175 +
   1.176 +Where # is a blktap2 device number present in the path that tapdisk2
   1.177 +printed before terminating.  The sysfs interface is largely intuitive,
   1.178 +for example, to remove tap device 0 one would-
   1.179 +  
   1.180 +  echo 1 > /sys/class/blktap2/blktap0/remove
   1.181 +
   1.182 +Similarly, a pause control is available, which is can be used to plug
   1.183 +the request queue of a live running guest.
   1.184 +
   1.185 +Previous versions of blktap mounted devices in dom0 by using blkfront
   1.186 +in dom0 and the xm block-attach command.  This approach is still
   1.187 +available, though slightly more cumbersome.
   1.188 +
   1.189 +
   1.190 +Tapdisk Development
   1.191 +===============================================
   1.192 +
   1.193 +People regularly ask how to develop their own tapdisk drivers, and
   1.194 +while it has not yet been well documented, the process is relatively
   1.195 +easy.  Here I will provide a brief overview.  The best reference, of
   1.196 +course, comes from the existing drivers.  Specifically,
   1.197 +blktap2/drivers/block-ram.c and blktap2/drivers/block-aio.c provide
   1.198 +the clearest examples of simple drivers.
   1.199 + 
   1.200 +
   1.201 +Setup:
   1.202 +
   1.203 +First you need to register your new driver with blktap. This is done
   1.204 +in disktypes.h.  There are five things that you must do.  To
   1.205 +demonstrate, I will create a disk called "mynewdisk", you can name
   1.206 +yours freely.
   1.207 +
   1.208 +1) Forward declare an instance of struct tap_disk.
   1.209 +
   1.210 +e.g. -  
   1.211 +  extern struct tap_disk tapdisk_mynewdisk;
   1.212 +
   1.213 +2) Claim one of the unused disk type numbers, take care to observe the
   1.214 +MAX_DISK_TYPES macro, increasing the number if necessary.
   1.215 +
   1.216 +e.g. -
   1.217 +  #define DISK_TYPE_MYNEWDISK         10
   1.218 +
   1.219 +3) Create an instance of disk_info_t.  The bulk of this file contains examples of these.
   1.220 +
   1.221 +e.g. -
   1.222 +  static disk_info_t mynewdisk_disk = {
   1.223 +          DISK_TYPE_MYNEWDISK,
   1.224 +          "My New Disk (mynewdisk)",
   1.225 +          "mynewdisk",
   1.226 +          0,
   1.227 +  #ifdef TAPDISK
   1.228 +          &tapdisk_mynewdisk,
   1.229 +  #endif
   1.230 +  };
   1.231 +
   1.232 +A few words about what these mean.  The first field must be the disk
   1.233 +type number you claimed in step (2).  The second field is a string
   1.234 +describing your disk, and may contain any relevant info.  The third
   1.235 +field is the name of your disk as will be used by the tapdisk2 utility
   1.236 +and xend (for example tapdisk2 -n mynewdisk:/path/to/disk.image, or in
   1.237 +your xm create config file).  The forth is binary and determines
   1.238 +whether you will have one instance of your driver, or many.  Here, a 1
   1.239 +means that your driver is a singleton and will coordinate access to
   1.240 +any number of tap devices.  0 is more common, meaning that you will
   1.241 +have one driver for each device that is created.  The final field
   1.242 +should contain a reference to the struct tap_disk you created in step
   1.243 +(1).
   1.244 +
   1.245 +4) Add a reference to your disk info structure (from step (3)) to the
   1.246 +dtypes array.  Take care here - you need to place it in the position
   1.247 +corresponding to the device type number you claimed in step (2).  So
   1.248 +we would place &mynewdisk_disk in dtypes[10].  Look at the other
   1.249 +devices in this array and pad with "&null_disk," as necessary.
   1.250 +
   1.251 +5) Modify the xend python scripts.  You need to add your disk name to
   1.252 +the list of disks that xend recognizes.
   1.253 +
   1.254 +edit:
   1.255 +  tools/python/xen/xend/server/BlktapController.py
   1.256 +
   1.257 +And add your disk to the "blktap_disk_types" array near the top of
   1.258 +your file.  Use the same name you specified in the third field of step
   1.259 +(3).  The order of this list is not important.
   1.260 +
   1.261 +
   1.262 +Now your driver is ready to be written.  Create a block-mynewdisk.c in
   1.263 +tools/blktap2/drivers and add it to the Makefile.
   1.264 +
   1.265 +
   1.266 +Development:
   1.267 +
   1.268 +Copying block-aio.c and block-ram.c would be a good place to start.
   1.269 +Read those files as you go through this, I will be assisting by
   1.270 +commenting on a few useful functions and structures.
   1.271 +
   1.272 +struct tap_disk:
   1.273 +
   1.274 +Remember the forward declaration in step (1) of the setup phase above?
   1.275 +Now is the time to make that structure a reality.  This structure
   1.276 +contains a list of function pointers for all the routines that will be
   1.277 +asked of your driver.  Currently the required functions are open,
   1.278 +close, read, write, get_parent_id, validate_parent, and debug.
   1.279 +
   1.280 +e.g. -
   1.281 +  struct tap_disk tapdisk_mynewdisk = {
   1.282 +          .disk_type          = "tapdisk_mynewdisk",
   1.283 +          .flags              = 0,
   1.284 +          .private_data_size  = sizeof(struct tdmynewdisk_state),
   1.285 +          .td_open            = tdmynewdisk_open,
   1.286 +                 ....
   1.287 +
   1.288 +The private_data_size field is used to provide a structure to store
   1.289 +the state of your device.  It is very likely that you will want
   1.290 +something here, but you are free to design whatever structure you
   1.291 +want.  Blktap will allocate this space for you, you just need to tell
   1.292 +it how much space you want.
   1.293 +
   1.294 +
   1.295 +tdmynewdisk_open:
   1.296 +
   1.297 +This is the open routine.  The first argument is a structure
   1.298 +representing your driver.  Two fields in this array are
   1.299 +interesting. 
   1.300 +
   1.301 +driver->data will contain a block of memory of the size your requested
   1.302 +in in the .private_data_size field of your struct tap_disk (above).
   1.303 +
   1.304 +driver->info contains a structure that details information about your
   1.305 +disk.  You need to fill this out.  By convention this is done with a
   1.306 +_get_image_info() function.  Assign a size (the total number of
   1.307 +sectors), sector_size (the size of each sector in bytes, and set
   1.308 +driver->info->info to 0.
   1.309 +
   1.310 +The second parameter contains the name that was specified in the
   1.311 +creation of your device, either through xend, or on the command line
   1.312 +with tapdisk2.  Usually this specifies a file that you will open in
   1.313 +this routine.  The final parameter, flags, contains one of a number of
   1.314 +flags specified in tapdisk.h that may change the way you treat the
   1.315 +disk.
   1.316 +
   1.317 +
   1.318 +_queue_read/write:
   1.319 +
   1.320 +These are your read and write operations.  What you do here will
   1.321 +depend on your disk, but you should do exactly one of- 
   1.322 +
   1.323 +1) call td_complete_request with either error or success code.
   1.324 +
   1.325 +2) Call td_forward_request, which will forward the request to the next
   1.326 +driver in the stack.
   1.327 +
   1.328 +3) Queue the request for asynchronous processing with
   1.329 +td_prep_read/write.  In doing so, you will also register a callback
   1.330 +for request completion.  When the request completes you must do one of
   1.331 +options (1) or (2) above.  Finally, call td_queue_tiocb to submit the
   1.332 +request to a wait queue.
   1.333 +
   1.334 +The above functions are defined in tapdisk-interface.c.  If you don't
   1.335 +use them as specified you will run into problems as your driver will
   1.336 +fail to inform blktap of the state of requests that have been
   1.337 +submitted.  Blktap keeps track of all requests and does not like losing track.
   1.338 +
   1.339 +
   1.340 +_close, _get_parent_id, _validate_parent:
   1.341 +
   1.342 +These last few tend to be very routine.  _close is called when the
   1.343 +device is closed, and also when it is paused (in this case, open will
   1.344 +also be called later).  The other functions are used in stacking
   1.345 +drivers.  Most often drivers will return TD_NO_PARENT and -EINVAL,
   1.346 +respectively.
   1.347  
   1.348  
   1.349  
   1.350 - 
   1.351 +
   1.352 +
   1.353 +