annotate Documentation/filesystems/directory-locking @ 524:7f8b544237bf

netfront: Allow netfront in domain 0.

This is useful if your physical network device is in a utility domain.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Tue Apr 15 15:18:58 2008 +0100 (2008-04-15)
parents 831230e53067
rev   line source
ian@0 1 Locking scheme used for directory operations is based on two
ian@0 2 kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem).
ian@0 3
ian@0 4 For our purposes all operations fall in 5 classes:
ian@0 5
ian@0 6 1) read access. Locking rules: caller locks directory we are accessing.
ian@0 7
ian@0 8 2) object creation. Locking rules: same as above.
ian@0 9
ian@0 10 3) object removal. Locking rules: caller locks parent, finds victim,
ian@0 11 locks victim and calls the method.
ian@0 12
ian@0 13 4) rename() that is _not_ cross-directory. Locking rules: caller locks
ian@0 14 the parent, finds source and target, if target already exists - locks it
ian@0 15 and then calls the method.
ian@0 16
ian@0 17 5) link creation. Locking rules:
ian@0 18 * lock parent
ian@0 19 * check that source is not a directory
ian@0 20 * lock source
ian@0 21 * call the method.
ian@0 22
ian@0 23 6) cross-directory rename. The trickiest in the whole bunch. Locking
ian@0 24 rules:
ian@0 25 * lock the filesystem
ian@0 26 * lock parents in "ancestors first" order.
ian@0 27 * find source and target.
ian@0 28 * if old parent is equal to or is a descendent of target
ian@0 29 fail with -ENOTEMPTY
ian@0 30 * if new parent is equal to or is a descendent of source
ian@0 31 fail with -ELOOP
ian@0 32 * if target exists - lock it.
ian@0 33 * call the method.
ian@0 34
ian@0 35
ian@0 36 The rules above obviously guarantee that all directories that are going to be
ian@0 37 read, modified or removed by method will be locked by caller.
ian@0 38
ian@0 39
ian@0 40 If no directory is its own ancestor, the scheme above is deadlock-free.
ian@0 41 Proof:
ian@0 42
ian@0 43 First of all, at any moment we have a partial ordering of the
ian@0 44 objects - A < B iff A is an ancestor of B.
ian@0 45
ian@0 46 That ordering can change. However, the following is true:
ian@0 47
ian@0 48 (1) if object removal or non-cross-directory rename holds lock on A and
ian@0 49 attempts to acquire lock on B, A will remain the parent of B until we
ian@0 50 acquire the lock on B. (Proof: only cross-directory rename can change
ian@0 51 the parent of object and it would have to lock the parent).
ian@0 52
ian@0 53 (2) if cross-directory rename holds the lock on filesystem, order will not
ian@0 54 change until rename acquires all locks. (Proof: other cross-directory
ian@0 55 renames will be blocked on filesystem lock and we don't start changing
ian@0 56 the order until we had acquired all locks).
ian@0 57
ian@0 58 (3) any operation holds at most one lock on non-directory object and
ian@0 59 that lock is acquired after all other locks. (Proof: see descriptions
ian@0 60 of operations).
ian@0 61
ian@0 62 Now consider the minimal deadlock. Each process is blocked on
ian@0 63 attempt to acquire some lock and already holds at least one lock. Let's
ian@0 64 consider the set of contended locks. First of all, filesystem lock is
ian@0 65 not contended, since any process blocked on it is not holding any locks.
ian@0 66 Thus all processes are blocked on ->i_sem.
ian@0 67
ian@0 68 Non-directory objects are not contended due to (3). Thus link
ian@0 69 creation can't be a part of deadlock - it can't be blocked on source
ian@0 70 and it means that it doesn't hold any locks.
ian@0 71
ian@0 72 Any contended object is either held by cross-directory rename or
ian@0 73 has a child that is also contended. Indeed, suppose that it is held by
ian@0 74 operation other than cross-directory rename. Then the lock this operation
ian@0 75 is blocked on belongs to child of that object due to (1).
ian@0 76
ian@0 77 It means that one of the operations is cross-directory rename.
ian@0 78 Otherwise the set of contended objects would be infinite - each of them
ian@0 79 would have a contended child and we had assumed that no object is its
ian@0 80 own descendent. Moreover, there is exactly one cross-directory rename
ian@0 81 (see above).
ian@0 82
ian@0 83 Consider the object blocking the cross-directory rename. One
ian@0 84 of its descendents is locked by cross-directory rename (otherwise we
ian@0 85 would again have an infinite set of of contended objects). But that
ian@0 86 means that cross-directory rename is taking locks out of order. Due
ian@0 87 to (2) the order hadn't changed since we had acquired filesystem lock.
ian@0 88 But locking rules for cross-directory rename guarantee that we do not
ian@0 89 try to acquire lock on descendent before the lock on ancestor.
ian@0 90 Contradiction. I.e. deadlock is impossible. Q.E.D.
ian@0 91
ian@0 92
ian@0 93 These operations are guaranteed to avoid loop creation. Indeed,
ian@0 94 the only operation that could introduce loops is cross-directory rename.
ian@0 95 Since the only new (parent, child) pair added by rename() is (new parent,
ian@0 96 source), such loop would have to contain these objects and the rest of it
ian@0 97 would have to exist before rename(). I.e. at the moment of loop creation
ian@0 98 rename() responsible for that would be holding filesystem lock and new parent
ian@0 99 would have to be equal to or a descendent of source. But that means that
ian@0 100 new parent had been equal to or a descendent of source since the moment when
ian@0 101 we had acquired filesystem lock and rename() would fail with -ELOOP in that
ian@0 102 case.
ian@0 103
ian@0 104 While this locking scheme works for arbitrary DAGs, it relies on
ian@0 105 ability to check that directory is a descendent of another object. Current
ian@0 106 implementation assumes that directory graph is a tree. This assumption is
ian@0 107 also preserved by all operations (cross-directory rename on a tree that would
ian@0 108 not introduce a cycle will leave it a tree and link() fails for directories).
ian@0 109
ian@0 110 Notice that "directory" in the above == "anything that might have
ian@0 111 children", so if we are going to introduce hybrid objects we will need
ian@0 112 either to make sure that link(2) doesn't work for them or to make changes
ian@0 113 in is_subdir() that would make it work even in presence of such beasts.