view docs/HOWTOs/Sched-HOWTO @ 2006:bd310c8b4b5c

bitkeeper revision 1.1108.43.1 (410a5973b_ww-XNociMt5BotV87vBQ)

author mwilli2@equilibrium.research.intel-research.net
date Fri Jul 30 14:21:39 2004 +0000 (2004-07-30)
parents 1d1e0a1795b8
line source
1 Xen Scheduler HOWTO
2 ===================
4 by Mark Williamson
5 (c) 2004 Intel Research Cambridge
8 Introduction
9 ------------
11 Xen offers a choice of CPU schedulers. All available schedulers are
12 included in Xen at compile time and the administrator may select a
13 particular scheduler using a boot-time parameter to Xen. It is
14 expected that administrators will choose the scheduler most
15 appropriate to their application and configure the machine to boot
16 with that scheduler.
18 Note: the default scheduler is the Borrowed Virtual Time (BVT)
19 scheduler which was also used in previous releases of Xen. No
20 configuration changes are required to keep using this scheduler.
22 This file provides a brief description of the CPU schedulers available
23 in Xen, what they are useful for and the parameters that are used to
24 configure them. This information is necessarily fairly technical at
25 the moment. The recommended way to fully understand the scheduling
26 algorithms is to read the relevant research papers.
28 The interface to the schedulers is basically "raw" at the moment,
29 without sanity checking - administrators should be careful when
30 setting the parameters since it is possible for a mistake to hang
31 domains, or the entire system (in particular, double check parameters
32 for sanity and make sure that DOM0 will get enough CPU time to remain
33 usable). Note that xc_dom_control.py takes time values in
34 nanoseconds.
36 Future tools will implement friendlier control interfaces.
39 Borrowed Virtual Time (BVT)
40 ---------------------------
42 All releases of Xen have featured the BVT scheduler, which is used to
43 provide proportional fair shares of the CPU based on weights assigned
44 to domains. BVT is "work conserving" - the CPU will never be left
45 idle if there are runnable tasks.
47 BVT uses "virtual time" to make decisions on which domain should be
48 scheduled on the processor. Each time a scheduling decision is
49 required, BVT evaluates the "Effective Virtual Time" of all domains
50 and then schedules the domain with the least EVT. Domains are allowed
51 to "borrow" virtual time by "time warping", which reduces their EVT by
52 a certain amount, so that they may be scheduled sooner. In order to
53 maintain long term fairness, there are limits on when a domain can
54 time warp and for how long. [ For more details read the SOSP'99 paper
55 by Duda and Cheriton ]
57 In the Xen implementation, domains time warp when they unblock, so
58 that domain wakeup latencies are reduced.
60 The BVT algorithm uses the following per-domain parameters (set using
61 xc_dom_control.py cpu_bvtset):
63 * mcuadv - the MCU (Minimum Charging Unit) advance determines the
64 proportional share of the CPU that a domain receives. It
65 is set inversely proportionally to a domain's sharing weight.
66 * warp - the amount of "virtual time" the domain is allowed to warp
67 backwards
68 * warpl - the warp limit is the maximum time a domain can run warped for
69 * warpu - the unwarp requirement is the minimum time a domain must
70 run unwarped for before it can warp again
72 BVT also has the following global parameter (set using
73 xc_dom_control.py cpu_bvtslice):
75 * ctx_allow - the context switch allowance is similar to the "quantum"
76 in traditional schedulers. It is the minimum time that
77 a scheduled domain will be allowed to run before be
78 pre-empted. This prevents thrashing of the CPU.
80 BVT can now be selected by passing the 'sched=bvt' argument to Xen at
81 boot-time and is the default scheduler if no 'sched' argument is
82 supplied.
84 Atropos
85 -------
87 Atropos is a scheduler originally developed for the Nemesis multimedia
88 operating system. Atropos can be used to reserve absolute shares of
89 the CPU. It also includes some features to improve the efficiency of
90 domains that block for I/O and to allow spare CPU time to be shared
91 out.
93 The Atropos algorithm has the following parameters for each domain
94 (set using xc_dom_control.py cpu_atropos_set):
96 * slice - The length of time per period that a domain is guaranteed.
97 * period - The period over which a domain is guaranteed to receive
98 its slice of CPU time.
99 * latency - The latency hint is used to control how soon after
100 waking up a domain should be scheduled.
101 * xtratime - This is a true (1) / false (0) flag that specifies whether
102 a domain should be allowed a share of the system slack time.
104 Every domain has an associated period and slice. The domain should
105 receive 'slice' nanoseconds every 'period' nanoseconds. This allows
106 the administrator to configure both the absolute share of the CPU a
107 domain receives and the frequency with which it is scheduled. When
108 domains unblock, their period is reduced to the value of the latency
109 hint (the slice is scaled accordingly so that they still get the same
110 proportion of the CPU). For each subsequent period, the slice and
111 period times are doubled until they reach their original values.
113 Atropos is selected by adding 'sched=atropos' to Xen's boot-time
114 arguments.
116 Note: don't overcommit the CPU when using Atropos (i.e. don't reserve
117 more CPU than is available - the utilisation should be kept to
118 slightly less than 100% in order to ensure predictable behaviour).
120 Round-Robin
121 -----------
123 The Round-Robin scheduler is provided as a simple example of Xen's
124 internal scheduler API. For production systems, one of the other
125 schedulers should be used, since they are more flexible and more
126 efficient.
128 The Round-robin scheduler has one global parameter (set using
129 xc_dom_control.py cpu_rrobin_slice):
131 * rr_slice - The time for which each domain runs before the next
132 scheduling decision is made.
134 The Round-Robin scheduler can be selected by adding 'sched=rrobin' to
135 Xen's boot-time arguments.