annotate Documentation/sched-domains.txt @ 912:dd42cdb0ab89

[IA64] Build blktap2 driver by default in x86 builds.

add CONFIG_XEN_BLKDEV_TAP2=y to buildconfigs/linux-defconfig_xen_ia64.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
author Isaku Yamahata <yamahata@valinux.co.jp>
date Mon Jun 29 12:09:16 2009 +0900 (2009-06-29)
parents 831230e53067
rev   line source
ian@0 1 Each CPU has a "base" scheduling domain (struct sched_domain). These are
ian@0 2 accessed via cpu_sched_domain(i) and this_sched_domain() macros. The domain
ian@0 3 hierarchy is built from these base domains via the ->parent pointer. ->parent
ian@0 4 MUST be NULL terminated, and domain structures should be per-CPU as they
ian@0 5 are locklessly updated.
ian@0 6
ian@0 7 Each scheduling domain spans a number of CPUs (stored in the ->span field).
ian@0 8 A domain's span MUST be a superset of it child's span (this restriction could
ian@0 9 be relaxed if the need arises), and a base domain for CPU i MUST span at least
ian@0 10 i. The top domain for each CPU will generally span all CPUs in the system
ian@0 11 although strictly it doesn't have to, but this could lead to a case where some
ian@0 12 CPUs will never be given tasks to run unless the CPUs allowed mask is
ian@0 13 explicitly set. A sched domain's span means "balance process load among these
ian@0 14 CPUs".
ian@0 15
ian@0 16 Each scheduling domain must have one or more CPU groups (struct sched_group)
ian@0 17 which are organised as a circular one way linked list from the ->groups
ian@0 18 pointer. The union of cpumasks of these groups MUST be the same as the
ian@0 19 domain's span. The intersection of cpumasks from any two of these groups
ian@0 20 MUST be the empty set. The group pointed to by the ->groups pointer MUST
ian@0 21 contain the CPU to which the domain belongs. Groups may be shared among
ian@0 22 CPUs as they contain read only data after they have been set up.
ian@0 23
ian@0 24 Balancing within a sched domain occurs between groups. That is, each group
ian@0 25 is treated as one entity. The load of a group is defined as the sum of the
ian@0 26 load of each of its member CPUs, and only when the load of a group becomes
ian@0 27 out of balance are tasks moved between groups.
ian@0 28
ian@0 29 In kernel/sched.c, rebalance_tick is run periodically on each CPU. This
ian@0 30 function takes its CPU's base sched domain and checks to see if has reached
ian@0 31 its rebalance interval. If so, then it will run load_balance on that domain.
ian@0 32 rebalance_tick then checks the parent sched_domain (if it exists), and the
ian@0 33 parent of the parent and so forth.
ian@0 34
ian@0 35 *** Implementing sched domains ***
ian@0 36 The "base" domain will "span" the first level of the hierarchy. In the case
ian@0 37 of SMT, you'll span all siblings of the physical CPU, with each group being
ian@0 38 a single virtual CPU.
ian@0 39
ian@0 40 In SMP, the parent of the base domain will span all physical CPUs in the
ian@0 41 node. Each group being a single physical CPU. Then with NUMA, the parent
ian@0 42 of the SMP domain will span the entire machine, with each group having the
ian@0 43 cpumask of a node. Or, you could do multi-level NUMA or Opteron, for example,
ian@0 44 might have just one domain covering its one NUMA level.
ian@0 45
ian@0 46 The implementor should read comments in include/linux/sched.h:
ian@0 47 struct sched_domain fields, SD_FLAG_*, SD_*_INIT to get an idea of
ian@0 48 the specifics and what to tune.
ian@0 49
ian@0 50 For SMT, the architecture must define CONFIG_SCHED_SMT and provide a
ian@0 51 cpumask_t cpu_sibling_map[NR_CPUS], where cpu_sibling_map[i] is the mask of
ian@0 52 all "i"'s siblings as well as "i" itself.
ian@0 53
ian@0 54 Architectures may retain the regular override the default SD_*_INIT flags
ian@0 55 while using the generic domain builder in kernel/sched.c if they wish to
ian@0 56 retain the traditional SMT->SMP->NUMA topology (or some subset of that). This
ian@0 57 can be done by #define'ing ARCH_HASH_SCHED_TUNE.
ian@0 58
ian@0 59 Alternatively, the architecture may completely override the generic domain
ian@0 60 builder by #define'ing ARCH_HASH_SCHED_DOMAIN, and exporting your
ian@0 61 arch_init_sched_domains function. This function will attach domains to all
ian@0 62 CPUs using cpu_attach_domain.
ian@0 63
ian@0 64 Implementors should change the line
ian@0 65 #undef SCHED_DOMAIN_DEBUG
ian@0 66 to
ian@0 67 #define SCHED_DOMAIN_DEBUG
ian@0 68 in kernel/sched.c as this enables an error checking parse of the sched domains
ian@0 69 which should catch most possible errors (described above). It also prints out
ian@0 70 the domain structure in a visual format.