annotate Documentation/pi-futex.txt @ 524:7f8b544237bf

netfront: Allow netfront in domain 0.

This is useful if your physical network device is in a utility domain.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Tue Apr 15 15:18:58 2008 +0100 (2008-04-15)
parents 831230e53067
rev   line source
ian@0 1 Lightweight PI-futexes
ian@0 2 ----------------------
ian@0 3
ian@0 4 We are calling them lightweight for 3 reasons:
ian@0 5
ian@0 6 - in the user-space fastpath a PI-enabled futex involves no kernel work
ian@0 7 (or any other PI complexity) at all. No registration, no extra kernel
ian@0 8 calls - just pure fast atomic ops in userspace.
ian@0 9
ian@0 10 - even in the slowpath, the system call and scheduling pattern is very
ian@0 11 similar to normal futexes.
ian@0 12
ian@0 13 - the in-kernel PI implementation is streamlined around the mutex
ian@0 14 abstraction, with strict rules that keep the implementation
ian@0 15 relatively simple: only a single owner may own a lock (i.e. no
ian@0 16 read-write lock support), only the owner may unlock a lock, no
ian@0 17 recursive locking, etc.
ian@0 18
ian@0 19 Priority Inheritance - why?
ian@0 20 ---------------------------
ian@0 21
ian@0 22 The short reply: user-space PI helps achieving/improving determinism for
ian@0 23 user-space applications. In the best-case, it can help achieve
ian@0 24 determinism and well-bound latencies. Even in the worst-case, PI will
ian@0 25 improve the statistical distribution of locking related application
ian@0 26 delays.
ian@0 27
ian@0 28 The longer reply:
ian@0 29 -----------------
ian@0 30
ian@0 31 Firstly, sharing locks between multiple tasks is a common programming
ian@0 32 technique that often cannot be replaced with lockless algorithms. As we
ian@0 33 can see it in the kernel [which is a quite complex program in itself],
ian@0 34 lockless structures are rather the exception than the norm - the current
ian@0 35 ratio of lockless vs. locky code for shared data structures is somewhere
ian@0 36 between 1:10 and 1:100. Lockless is hard, and the complexity of lockless
ian@0 37 algorithms often endangers to ability to do robust reviews of said code.
ian@0 38 I.e. critical RT apps often choose lock structures to protect critical
ian@0 39 data structures, instead of lockless algorithms. Furthermore, there are
ian@0 40 cases (like shared hardware, or other resource limits) where lockless
ian@0 41 access is mathematically impossible.
ian@0 42
ian@0 43 Media players (such as Jack) are an example of reasonable application
ian@0 44 design with multiple tasks (with multiple priority levels) sharing
ian@0 45 short-held locks: for example, a highprio audio playback thread is
ian@0 46 combined with medium-prio construct-audio-data threads and low-prio
ian@0 47 display-colory-stuff threads. Add video and decoding to the mix and
ian@0 48 we've got even more priority levels.
ian@0 49
ian@0 50 So once we accept that synchronization objects (locks) are an
ian@0 51 unavoidable fact of life, and once we accept that multi-task userspace
ian@0 52 apps have a very fair expectation of being able to use locks, we've got
ian@0 53 to think about how to offer the option of a deterministic locking
ian@0 54 implementation to user-space.
ian@0 55
ian@0 56 Most of the technical counter-arguments against doing priority
ian@0 57 inheritance only apply to kernel-space locks. But user-space locks are
ian@0 58 different, there we cannot disable interrupts or make the task
ian@0 59 non-preemptible in a critical section, so the 'use spinlocks' argument
ian@0 60 does not apply (user-space spinlocks have the same priority inversion
ian@0 61 problems as other user-space locking constructs). Fact is, pretty much
ian@0 62 the only technique that currently enables good determinism for userspace
ian@0 63 locks (such as futex-based pthread mutexes) is priority inheritance:
ian@0 64
ian@0 65 Currently (without PI), if a high-prio and a low-prio task shares a lock
ian@0 66 [this is a quite common scenario for most non-trivial RT applications],
ian@0 67 even if all critical sections are coded carefully to be deterministic
ian@0 68 (i.e. all critical sections are short in duration and only execute a
ian@0 69 limited number of instructions), the kernel cannot guarantee any
ian@0 70 deterministic execution of the high-prio task: any medium-priority task
ian@0 71 could preempt the low-prio task while it holds the shared lock and
ian@0 72 executes the critical section, and could delay it indefinitely.
ian@0 73
ian@0 74 Implementation:
ian@0 75 ---------------
ian@0 76
ian@0 77 As mentioned before, the userspace fastpath of PI-enabled pthread
ian@0 78 mutexes involves no kernel work at all - they behave quite similarly to
ian@0 79 normal futex-based locks: a 0 value means unlocked, and a value==TID
ian@0 80 means locked. (This is the same method as used by list-based robust
ian@0 81 futexes.) Userspace uses atomic ops to lock/unlock these mutexes without
ian@0 82 entering the kernel.
ian@0 83
ian@0 84 To handle the slowpath, we have added two new futex ops:
ian@0 85
ian@0 86 FUTEX_LOCK_PI
ian@0 88
ian@0 89 If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to
ian@0 90 TID fails], then FUTEX_LOCK_PI is called. The kernel does all the
ian@0 91 remaining work: if there is no futex-queue attached to the futex address
ian@0 92 yet then the code looks up the task that owns the futex [it has put its
ian@0 93 own TID into the futex value], and attaches a 'PI state' structure to
ian@0 94 the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware,
ian@0 95 kernel-based synchronization object. The 'other' task is made the owner
ian@0 96 of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the
ian@0 97 futex value. Then this task tries to lock the rt-mutex, on which it
ian@0 98 blocks. Once it returns, it has the mutex acquired, and it sets the
ian@0 99 futex value to its own TID and returns. Userspace has no other work to
ian@0 100 perform - it now owns the lock, and futex value contains
ian@0 102
ian@0 103 If the unlock side fastpath succeeds, [i.e. userspace manages to do a
ian@0 104 TID -> 0 atomic transition of the futex value], then no kernel work is
ian@0 105 triggered.
ian@0 106
ian@0 107 If the unlock fastpath fails (because the FUTEX_WAITERS bit is set),
ian@0 108 then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the
ian@0 109 behalf of userspace - and it also unlocks the attached
ian@0 110 pi_state->rt_mutex and thus wakes up any potential waiters.
ian@0 111
ian@0 112 Note that under this approach, contrary to previous PI-futex approaches,
ian@0 113 there is no prior 'registration' of a PI-futex. [which is not quite
ian@0 114 possible anyway, due to existing ABI properties of pthread mutexes.]
ian@0 115
ian@0 116 Also, under this scheme, 'robustness' and 'PI' are two orthogonal
ian@0 117 properties of futexes, and all four combinations are possible: futex,
ian@0 118 robust-futex, PI-futex, robust+PI-futex.
ian@0 119
ian@0 120 More details about priority inheritance can be found in
ian@0 121 Documentation/rtmutex.txt.