ia64/linux-2.6.18-xen.hg

view Documentation/mutex-design.txt @ 854:950b9eb27661

usbback: fix urb interval value for interrupt urbs.

Signed-off-by: Noboru Iwamatsu <n_iwamatsu@jp.fujitsu.com>
author Keir Fraser <keir.fraser@citrix.com>
date Mon Apr 06 13:51:20 2009 +0100 (2009-04-06)
parents 831230e53067
children
line source
1 Generic Mutex Subsystem
3 started by Ingo Molnar <mingo@redhat.com>
5 "Why on earth do we need a new mutex subsystem, and what's wrong
6 with semaphores?"
8 firstly, there's nothing wrong with semaphores. But if the simpler
9 mutex semantics are sufficient for your code, then there are a couple
10 of advantages of mutexes:
12 - 'struct mutex' is smaller on most architectures: .e.g on x86,
13 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
14 A smaller structure size means less RAM footprint, and better
15 CPU-cache utilization.
17 - tighter code. On x86 i get the following .text sizes when
18 switching all mutex-alike semaphores in the kernel to the mutex
19 subsystem:
21 text data bss dec hex filename
22 3280380 868188 396860 4545428 455b94 vmlinux-semaphore
23 3255329 865296 396732 4517357 44eded vmlinux-mutex
25 that's 25051 bytes of code saved, or a 0.76% win - off the hottest
26 codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%)
27 Smaller code means better icache footprint, which is one of the
28 major optimization goals in the Linux kernel currently.
30 - the mutex subsystem is slightly faster and has better scalability for
31 contended workloads. On an 8-way x86 system, running a mutex-based
32 kernel and testing creat+unlink+close (of separate, per-task files)
33 in /tmp with 16 parallel tasks, the average number of ops/sec is:
35 Semaphores: Mutexes:
37 $ ./test-mutex V 16 10 $ ./test-mutex V 16 10
38 8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
39 checking VFS performance. checking VFS performance.
40 avg loops/sec: 34713 avg loops/sec: 84153
41 CPU utilization: 63% CPU utilization: 22%
43 i.e. in this workload, the mutex based kernel was 2.4 times faster
44 than the semaphore based kernel, _and_ it also had 2.8 times less CPU
45 utilization. (In terms of 'ops per CPU cycle', the semaphore kernel
46 performed 551 ops/sec per 1% of CPU time used, while the mutex kernel
47 performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times
48 more efficient.)
50 the scalability difference is visible even on a 2-way P4 HT box:
52 Semaphores: Mutexes:
54 $ ./test-mutex V 16 10 $ ./test-mutex V 16 10
55 4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks.
56 checking VFS performance. checking VFS performance.
57 avg loops/sec: 127659 avg loops/sec: 181082
58 CPU utilization: 100% CPU utilization: 34%
60 (the straight performance advantage of mutexes is 41%, the per-cycle
61 efficiency of mutexes is 4.1 times better.)
63 - there are no fastpath tradeoffs, the mutex fastpath is just as tight
64 as the semaphore fastpath. On x86, the locking fastpath is 2
65 instructions:
67 c0377ccb <mutex_lock>:
68 c0377ccb: f0 ff 08 lock decl (%eax)
69 c0377cce: 78 0e js c0377cde <.text.lock.mutex>
70 c0377cd0: c3 ret
72 the unlocking fastpath is equally tight:
74 c0377cd1 <mutex_unlock>:
75 c0377cd1: f0 ff 00 lock incl (%eax)
76 c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7>
77 c0377cd6: c3 ret
79 - 'struct mutex' semantics are well-defined and are enforced if
80 CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have
81 virtually no debugging code or instrumentation. The mutex subsystem
82 checks and enforces the following rules:
84 * - only one task can hold the mutex at a time
85 * - only the owner can unlock the mutex
86 * - multiple unlocks are not permitted
87 * - recursive locking is not permitted
88 * - a mutex object must be initialized via the API
89 * - a mutex object must not be initialized via memset or copying
90 * - task may not exit with mutex held
91 * - memory areas where held locks reside must not be freed
92 * - held mutexes must not be reinitialized
93 * - mutexes may not be used in irq contexts
95 furthermore, there are also convenience features in the debugging
96 code:
98 * - uses symbolic names of mutexes, whenever they are printed in debug output
99 * - point-of-acquire tracking, symbolic lookup of function names
100 * - list of all locks held in the system, printout of them
101 * - owner tracking
102 * - detects self-recursing locks and prints out all relevant info
103 * - detects multi-task circular deadlocks and prints out all affected
104 * locks and tasks (and only those tasks)
106 Disadvantages
107 -------------
109 The stricter mutex API means you cannot use mutexes the same way you
110 can use semaphores: e.g. they cannot be used from an interrupt context,
111 nor can they be unlocked from a different context that which acquired
112 it. [ I'm not aware of any other (e.g. performance) disadvantages from
113 using mutexes at the moment, please let me know if you find any. ]
115 Implementation of mutexes
116 -------------------------
118 'struct mutex' is the new mutex type, defined in include/linux/mutex.h
119 and implemented in kernel/mutex.c. It is a counter-based mutex with a
120 spinlock and a wait-list. The counter has 3 states: 1 for "unlocked",
121 0 for "locked" and negative numbers (usually -1) for "locked, potential
122 waiters queued".
124 the APIs of 'struct mutex' have been streamlined:
126 DEFINE_MUTEX(name);
128 mutex_init(mutex);
130 void mutex_lock(struct mutex *lock);
131 int mutex_lock_interruptible(struct mutex *lock);
132 int mutex_trylock(struct mutex *lock);
133 void mutex_unlock(struct mutex *lock);
134 int mutex_is_locked(struct mutex *lock);