ia64/xen-unstable

changeset 16534:671ef298d491

xenstore: document the xenstore protocol

The attached patch adds a new text file docs/misc/xenstore.txt which
describes the actual protocol implemented by xenstored. This was
reverse-engineered from the actual code in tools/xenstore.

I didn't bother making any automatic arrangements to ensure that the
implemented and documented protocols are kept in step (for example,
automatic code generation, etc.) The protocol is rather messy
unfortunately and unsuitable for an xdr approach, and in any case is
not likely to change very quickly.

Also in this patch are a couple of comments for xenstored_core.c which
help clarify the behaviour of some payload parsing helper functions.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Wed Dec 05 11:08:07 2007 +0000 (2007-12-05)
parents b6fb8b4dc261
children c67d024fdd2d
files docs/misc/xenstore.txt tools/xenstore/xenstored_core.c
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/misc/xenstore.txt	Wed Dec 05 11:08:07 2007 +0000
     1.3 @@ -0,0 +1,287 @@
     1.4 +Xenstore protocol specification
     1.5 +-------------------------------
     1.6 +
     1.7 +Xenstore implements a database which maps filename-like pathnames
     1.8 +(also known as `keys') to values.  Clients may read and write values,
     1.9 +watch for changes, and set permissions to allow or deny access.  There
    1.10 +is a rudimentary transaction system.
    1.11 +
    1.12 +While xenstore and most tools and APIs are capable of dealing with
    1.13 +arbitrary binary data as values, this should generally be avoided.
    1.14 +Data should generally be human-readable for ease of management and
    1.15 +debugging; xenstore is not a high-performance facility and should be
    1.16 +used only for small amounts of control plane data.  Therefore xenstore
    1.17 +values should normally be 7-bit ASCII text strings containing bytes
    1.18 +0x20..0x7f only, and should not contain a trailing nul byte.  (The
    1.19 +APIs used for accessing xenstore generally add a nul when reading, for
    1.20 +the caller's convenience.)
    1.21 +
    1.22 +A separate specification will detail the keys and values which are
    1.23 +used in the Xen system and what their meanings are.  (Sadly that
    1.24 +specification currently exists only in multiple out-of-date versions.)
    1.25 +
    1.26 +
    1.27 +Paths are /-separated and start with a /, just as Unix filenames.
    1.28 +
    1.29 +We can speak of two paths being <child> and <parent>, which is the
    1.30 +case if they're identical, or if <parent> is /, or if <parent>/ is an
    1.31 +initial substring of <child>.  (This includes <path> being a child of
    1.32 +itself.)
    1.33 +
    1.34 +If a particular path exists, all of its parents do too.  Every
    1.35 +existing path maps to a possibly empty value, and may also have zero
    1.36 +or more immediate children.  There is thus no particular distinction
    1.37 +between directories and leaf nodes.  However, it is conventional not
    1.38 +to store nonempty values at nodes which also have children.
    1.39 +
    1.40 +The permitted character for paths set is ASCII alphanumerics and plus
    1.41 +the four punctuation characters -/_@ (hyphen slash underscore atsign).
    1.42 +@ should be avoided except to specify special watches (see below).
    1.43 +Doubled slashes and trailing slashes (except to specify the root) are
    1.44 +forbidden.  The empty path is also forbidden.
    1.45 +
    1.46 +
    1.47 +Communication with xenstore is via either sockets, or event channel
    1.48 +and shared memory, as specified in io/xs_wire.h: each message in
    1.49 +either direction is a header formatted as a struct xsd_sockmsg
    1.50 +followed by xsd_sockmsg.len bytes of payload.
    1.51 +
    1.52 +The payload syntax varies according to the type field.  Generally
    1.53 +requests each generate a reply with an identical type, req_id and
    1.54 +tx_id.  However, if an error occurs, a reply will be returned with
    1.55 +type ERROR, and only req_id and tx_id copied from the request.
    1.56 +
    1.57 +A caller who sends several requests may receive the replies in any
    1.58 +order and must use req_id (and tx_id, if applicable) to match up
    1.59 +replies to requests.  (The current implementation always replies to
    1.60 +requests in the order received but this should not be relied on.)
    1.61 +
    1.62 +
    1.63 +---------- Xenstore protocol details - introduction ----------
    1.64 +
    1.65 +The payload syntax and semantics of the requests and replies are
    1.66 +described below.  In the payload syntax specifications we use the
    1.67 +following notations:
    1.68 +
    1.69 + |		A nul (zero) byte.
    1.70 + <foo>		A string guaranteed not to contain any nul bytes.
    1.71 + <foo|>		Binary data (which may contain zero or more nul bytes)
    1.72 + <foo>|*	Zero or more strings each followed by a trailing nul
    1.73 + <foo>|+	One or more strings each followed by a trailing nul
    1.74 + ?		Reserved value (may not contain nuls)
    1.75 + ??		Reserved value (may contain nuls)
    1.76 +
    1.77 +Except as otherwise noted, reserved values are believed to be sent as
    1.78 +empty strings by all current clients.  Clients should not send
    1.79 +nonempty strings for reserved values; those parts of the protocol may
    1.80 +be used for extension in the future.
    1.81 +
    1.82 +
    1.83 +Error replies are as follows:
    1.84 +
    1.85 +ERROR						E<something>|
    1.86 +	Where E<something> is the name of an errno value
    1.87 +	listed in io/xs_wire.h.  Note that the string name
    1.88 +	is transmitted, not a numeric value.
    1.89 +
    1.90 +
    1.91 +Where no reply payload format is specified below, success responses
    1.92 +have the following payload:
    1.93 +						OK|
    1.94 +
    1.95 +Values commonly included in payloads include:
    1.96 +
    1.97 +    <path>
    1.98 +	Specifies a path in the hierarchical key structure.
    1.99 +	If <path> starts with a / it simply represents that path.
   1.100 +
   1.101 +	<path> is allowed not to start with /, in which case the
   1.102 +	caller must be a domain (rather than connected via a socket)
   1.103 +	and the path is taken to be relative to /local/domain/<domid>
   1.104 +	(eg, `x/y' sent by domain 3 would mean `/local/domain/3/x/y').
   1.105 +
   1.106 +    <domid>
   1.107 +	Integer domid, represented as decimal number 0..65535.
   1.108 +	Parsing errors and values out of range generally go
   1.109 +	undetected.  The special DOMID_... values (see xen.h) are
   1.110 +	represented as integers; unless otherwise specified it
   1.111 +	is an error not to specify a real domain id.
   1.112 +
   1.113 +
   1.114 +
   1.115 +The following are the actual type values, including the request and
   1.116 +reply payloads as applicable:
   1.117 +
   1.118 +
   1.119 +---------- Database read, write and permissions operatons ----------
   1.120 +
   1.121 +READ			<path>|			<value|>
   1.122 +WRITE			<path>|<value|>
   1.123 +	Store and read the octet string <value> at <path>.
   1.124 +	WRITE creates any missing parent paths, with empty values.
   1.125 +
   1.126 +MKDIR			<path>|
   1.127 +	Ensures that the <path> exists, by necessary by creating
   1.128 +	it and any missing parents with empty values.  If <path>
   1.129 +	or any parent already exists, its value is left unchanged.
   1.130 +
   1.131 +RM			<path>|
   1.132 +	Ensures that the <path> does not exist, by deleting
   1.133 +	it and all of its children.  It is not an error if <path> does
   1.134 +	not exist, but it _is_ an error if <path>'s immediate parent
   1.135 +	does not exist either.
   1.136 +
   1.137 +DIRECTORY		<path>|			<child-leaf-name>|*
   1.138 +	Gives a list of the immediate children of <path>, as only the
   1.139 +	leafnames.  The resulting children are each named
   1.140 +	<path>/<child-leaf-name>.
   1.141 +
   1.142 +GET_PERMS	 	<path>|			<perm-as-string>|+
   1.143 +SET_PERMS		<path>|<perm-as-string>|+?
   1.144 +	<perm-as-string> is one of the following
   1.145 +		w<domid>	write only
   1.146 +		r<domid>	read only
   1.147 +		b<domid>	both read and write
   1.148 +		n<domid>	no access
   1.149 +	See http://wiki.xensource.com/xenwiki/XenBus section
   1.150 +	`Permissions' for details of the permissions system.
   1.151 +
   1.152 +---------- Watches ----------
   1.153 +
   1.154 +WATCH			<wpath>|<token>|?
   1.155 +	Adds a watch.
   1.156 +
   1.157 +	When a <path> is modified (including path creation, removal,
   1.158 +	contents change or permissions change) this generates an event
   1.159 +	on the changed <path>.  Changes made in transactions cause an
   1.160 +	event only if and when committed.  Each occurring event is
   1.161 +	matched against all the watches currently set up, and each
   1.162 +	matching watch results in a WATCH_EVENT message (see below).
   1.163 +
   1.164 +	The event's path matches the watch's <wpath> if it is an child
   1.165 +	of <wpath>.
   1.166 +
   1.167 +	<wpath> can be a <path> to watch or @<wspecial>.  In the
   1.168 +	latter case <wspecial> may have any syntax but it matches
   1.169 +	(according to the rules above) only the following special
   1.170 +	events which are invented by xenstored:
   1.171 +	    @introduceDomain	occurs on INTRODUCE
   1.172 +	    @releaseDomain 	occurs on any domain crash or
   1.173 +				shutdown, and also on RELEASE
   1.174 +				and domain destruction
   1.175 +
   1.176 +	When a watch is first set up it is triggered once straight
   1.177 +	away, with <path> equal to <wpath>.  Watches may be triggered
   1.178 +	spuriously.  The tx_id in a WATCH request is ignored.
   1.179 +
   1.180 +WATCH_EVENT					<epath>|<token>|
   1.181 +	Unsolicited `reply' generated for matching modfication events
   1.182 +	as described above.  req_id and tx_id are both 0.
   1.183 +
   1.184 +	<epath> is the event's path, ie the actual path that was
   1.185 +	modifed; however if the event was the recursive removal of an
   1.186 +	parent of <wpath>, <epath> is just
   1.187 +	<wpath> (rather than the actual path which was removed).  So
   1.188 +	<epath> is a child of <epath>, regardless.
   1.189 +
   1.190 +	Iff <wpath> for the watch was specified as a relative pathname,
   1.191 +	the <epath> path will also be relative (with the same base,
   1.192 +	obviously).
   1.193 +
   1.194 +UNWATCH			<wpath>|<token>|?
   1.195 +
   1.196 +---------- Transactions ----------
   1.197 +
   1.198 +TRANSACTION_START	??			<transid>|
   1.199 +	<transid> is an opaque uint32_t allocated by xenstored
   1.200 +	represented as unsigned decimal.  After this, transaction may
   1.201 +	be referenced by using <transid> (as 32-bit binary) in the
   1.202 +	tx_id request header field.  When transaction is started whole
   1.203 +	db is copied; reads and writes happen on the copy.
   1.204 +	It is not legal to send non-0 tx_id in TRANSACTION_START.
   1.205 +	Currently xenstored has the bug that after 2^32 transactions
   1.206 +	it will allocate the transid 0 for an actual transaction.
   1.207 +
   1.208 +	Clients using the provided xs.c bindings will send a single
   1.209 +	nul byte for the argument payload.  We recommend that future
   1.210 +	clients continue to do the same; any future extension will not
   1.211 +	use that syntax.
   1.212 +
   1.213 +TRANSACTION_END		T|
   1.214 +TRANSACTION_END		F|
   1.215 +	tx_id must refer to existing transaction.  After this
   1.216 + 	request the tx_id is no longer valid and may be reused by
   1.217 +	xenstore.  If F, the transaction is discarded.  If T,
   1.218 +	it is committed: if there were any other intervening writes
   1.219 +	then our END gets get EAGAIN.
   1.220 +
   1.221 +	The plan is that in the future only intervening `conflicting'
   1.222 +	writes cause EAGAIN, meaning only writes or other commits
   1.223 +	which changed paths which were read or written in the
   1.224 +	transaction at hand.
   1.225 +
   1.226 +---------- Domain management and xenstored communications ----------
   1.227 +
   1.228 +INTRODUCE		<domid>|<mfn>|<evtchn>|?
   1.229 +	Notifies xenstored to communicate with this domain.
   1.230 +
   1.231 +	INTRODUCE is currently only used by xend (during domain
   1.232 +	startup and various forms of restore and resume), and
   1.233 +	xenstored prevents its use other than by dom0.
   1.234 +
   1.235 +	<domid> must be a real domain id (not 0 and not a special
   1.236 +	DOMID_... value).  <mfn> must be a machine page in that domain
   1.237 +	represented in signed decimal (!).  <evtchn> must be event
   1.238 +	channel is an unbound event channel in <domid> (likewise in
   1.239 +	decimal), on which xenstored will call bind_interdomain.
   1.240 +	Violations of these rules may result in undefined behaviour;
   1.241 +	for example passing a high-bit-set 32-bit mfn as an unsigned
   1.242 +	decimal will attempt to use 0x7fffffff instead (!).
   1.243 +
   1.244 +RELEASE			<domid>|
   1.245 +	Manually requests that xenstored disconnect from the domain.
   1.246 +	The event channel is unbound at the xenstored end and the page
   1.247 +	unmapped.  If the domain is still running it won't be able to
   1.248 +	communicate with xenstored.  NB that xenstored will in any
   1.249 +	case detect domain destruction and disconnect by itself.
   1.250 +	xenstored prevents the use of RELEASE other than by dom0.
   1.251 +
   1.252 +GET_DOMAIN_PATH		<domid>|		<path>|
   1.253 +	Returns the domain's base path, as is used for relative
   1.254 +	transactions: ie, /local/domain/<domid> (with <domid>
   1.255 +	normalised).  The answer will be useless unless <domid> is a
   1.256 +	real domain id.
   1.257 +
   1.258 +IS_DOMAIN_INTRODUCED	<domid>|		T| or F|
   1.259 +	Returns T if xenstored is in communication with the domain:
   1.260 +	ie, if INTRODUCE for the domain has not yet been followed by
   1.261 +	domain destruction or explicit RELEASE.
   1.262 +
   1.263 +RESUME			<domid>|
   1.264 +
   1.265 +	Arranges that @releaseDomain events will once more be
   1.266 +	generated when the domain becomes shut down.  This might have
   1.267 +	to be used if a domain were to be shut down (generating one
   1.268 +	@releaseDomain) and then subsequently restarted, since the
   1.269 +	state-sensitive algorithm in xenstored will not otherwise send
   1.270 +	further watch event notifications if the domain were to be
   1.271 +	shut down again.
   1.272 +
   1.273 +	It is not clear whether this is possible since one would
   1.274 +	normally expect a domain not to be restarted after being shut
   1.275 +	down without being destroyed in the meantime.  There are
   1.276 +	currently no users of this request in xen-unstable.
   1.277 +
   1.278 +	xenstored prevents the use of RESUME other than by dom0.
   1.279 +
   1.280 +---------- Miscellaneous ----------
   1.281 +
   1.282 +DEBUG			print|<string>|??	    sends <string> to debug log
   1.283 +DEBUG			print|<thing-with-no-nul>   EINVAL
   1.284 +DEBUG			check|??		    checks xenstored innards
   1.285 +DEBUG			<anything-else|>	    no-op (future extension)
   1.286 +
   1.287 +	These requests should not generally be used and may be
   1.288 +	withdrawn in the future.
   1.289 +
   1.290 +
     2.1 --- a/tools/xenstore/xenstored_core.c	Wed Dec 05 11:07:12 2007 +0000
     2.2 +++ b/tools/xenstore/xenstored_core.c	Wed Dec 05 11:08:07 2007 +0000
     2.3 @@ -563,7 +563,9 @@ static struct buffered_data *new_buffer(
     2.4  	return data;
     2.5  }
     2.6  
     2.7 -/* Return length of string (including nul) at this offset. */
     2.8 +/* Return length of string (including nul) at this offset.
     2.9 + * If there is no nul, returns 0 for failure.
    2.10 + */
    2.11  static unsigned int get_string(const struct buffered_data *data,
    2.12  			       unsigned int offset)
    2.13  {
    2.14 @@ -579,7 +581,12 @@ static unsigned int get_string(const str
    2.15  	return nul - (data->buffer + offset) + 1;
    2.16  }
    2.17  
    2.18 -/* Break input into vectors, return the number, fill in up to num of them. */
    2.19 +/* Break input into vectors, return the number, fill in up to num of them.
    2.20 + * Always returns the actual number of nuls in the input.  Stores the
    2.21 + * positions of the starts of the nul-terminated strings in vec.
    2.22 + * Callers who use this and then rely only on vec[] will
    2.23 + * ignore any data after the final nul.
    2.24 + */
    2.25  unsigned int get_strings(struct buffered_data *data,
    2.26  			 char *vec[], unsigned int num)
    2.27  {
    2.28 @@ -668,7 +675,9 @@ bool is_valid_nodename(const char *node)
    2.29  	return valid_chars(node);
    2.30  }
    2.31  
    2.32 -/* We expect one arg in the input: return NULL otherwise. */
    2.33 +/* We expect one arg in the input: return NULL otherwise.
    2.34 + * The payload must contain exactly one nul, at the end.
    2.35 + */
    2.36  static const char *onearg(struct buffered_data *in)
    2.37  {
    2.38  	if (!in->used || get_string(in, 0) != in->used)