ia64/linux-2.6.18-xen.hg

view Documentation/prio_tree.txt @ 897:329ea0ccb344

balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 14:01:20 2009 +0100 (2009-06-05)
parents 831230e53067
children
line source
1 The prio_tree.c code indexes vmas using 3 different indexes:
2 * heap_index = vm_pgoff + vm_size_in_pages : end_vm_pgoff
3 * radix_index = vm_pgoff : start_vm_pgoff
4 * size_index = vm_size_in_pages
6 A regular radix-priority-search-tree indexes vmas using only heap_index and
7 radix_index. The conditions for indexing are:
8 * ->heap_index >= ->left->heap_index &&
9 ->heap_index >= ->right->heap_index
10 * if (->heap_index == ->left->heap_index)
11 then ->radix_index < ->left->radix_index;
12 * if (->heap_index == ->right->heap_index)
13 then ->radix_index < ->right->radix_index;
14 * nodes are hashed to left or right subtree using radix_index
15 similar to a pure binary radix tree.
17 A regular radix-priority-search-tree helps to store and query
18 intervals (vmas). However, a regular radix-priority-search-tree is only
19 suitable for storing vmas with different radix indices (vm_pgoff).
21 Therefore, the prio_tree.c extends the regular radix-priority-search-tree
22 to handle many vmas with the same vm_pgoff. Such vmas are handled in
23 2 different ways: 1) All vmas with the same radix _and_ heap indices are
24 linked using vm_set.list, 2) if there are many vmas with the same radix
25 index, but different heap indices and if the regular radix-priority-search
26 tree cannot index them all, we build an overflow-sub-tree that indexes such
27 vmas using heap and size indices instead of heap and radix indices. For
28 example, in the figure below some vmas with vm_pgoff = 0 (zero) are
29 indexed by regular radix-priority-search-tree whereas others are pushed
30 into an overflow-subtree. Note that all vmas in an overflow-sub-tree have
31 the same vm_pgoff (radix_index) and if necessary we build different
32 overflow-sub-trees to handle each possible radix_index. For example,
33 in figure we have 3 overflow-sub-trees corresponding to radix indices
34 0, 2, and 4.
36 In the final tree the first few (prio_tree_root->index_bits) levels
37 are indexed using heap and radix indices whereas the overflow-sub-trees below
38 those levels (i.e. levels prio_tree_root->index_bits + 1 and higher) are
39 indexed using heap and size indices. In overflow-sub-trees the size_index
40 is used for hashing the nodes to appropriate places.
42 Now, an example prio_tree:
44 vmas are represented [radix_index, size_index, heap_index]
45 i.e., [start_vm_pgoff, vm_size_in_pages, end_vm_pgoff]
47 level prio_tree_root->index_bits = 3
48 -----
49 _
50 0 [0,7,7] |
51 / \ |
52 ------------------ ------------ | Regular
53 / \ | radix priority
54 1 [1,6,7] [4,3,7] | search tree
55 / \ / \ |
56 ------- ----- ------ ----- | heap-and-radix
57 / \ / \ | indexed
58 2 [0,6,6] [2,5,7] [5,2,7] [6,1,7] |
59 / \ / \ / \ / \ |
60 3 [0,5,5] [1,5,6] [2,4,6] [3,4,7] [4,2,6] [5,1,6] [6,0,6] [7,0,7] |
61 / / / _
62 / / / _
63 4 [0,4,4] [2,3,5] [4,1,5] |
64 / / / |
65 5 [0,3,3] [2,2,4] [4,0,4] | Overflow-sub-trees
66 / / |
67 6 [0,2,2] [2,1,3] | heap-and-size
68 / / | indexed
69 7 [0,1,1] [2,0,2] |
70 / |
71 8 [0,0,0] |
72 _
74 Note that we use prio_tree_root->index_bits to optimize the height
75 of the heap-and-radix indexed tree. Since prio_tree_root->index_bits is
76 set according to the maximum end_vm_pgoff mapped, we are sure that all
77 bits (in vm_pgoff) above prio_tree_root->index_bits are 0 (zero). Therefore,
78 we only use the first prio_tree_root->index_bits as radix_index.
79 Whenever index_bits is increased in prio_tree_expand, we shuffle the tree
80 to make sure that the first prio_tree_root->index_bits levels of the tree
81 is indexed properly using heap and radix indices.
83 We do not optimize the height of overflow-sub-trees using index_bits.
84 The reason is: there can be many such overflow-sub-trees and all of
85 them have to be suffled whenever the index_bits increases. This may involve
86 walking the whole prio_tree in prio_tree_insert->prio_tree_expand code
87 path which is not desirable. Hence, we do not optimize the height of the
88 heap-and-size indexed overflow-sub-trees using prio_tree->index_bits.
89 Instead the overflow sub-trees are indexed using full BITS_PER_LONG bits
90 of size_index. This may lead to skewed sub-trees because most of the
91 higher significant bits of the size_index are likely to be be 0 (zero). In
92 the example above, all 3 overflow-sub-trees are skewed. This may marginally
93 affect the performance. However, processes rarely map many vmas with the
94 same start_vm_pgoff but different end_vm_pgoffs. Therefore, we normally
95 do not require overflow-sub-trees to index all vmas.
97 From the above discussion it is clear that the maximum height of
98 a prio_tree can be prio_tree_root->index_bits + BITS_PER_LONG.
99 However, in most of the common cases we do not need overflow-sub-trees,
100 so the tree height in the common cases will be prio_tree_root->index_bits.
102 It is fair to mention here that the prio_tree_root->index_bits
103 is increased on demand, however, the index_bits is not decreased when
104 vmas are removed from the prio_tree. That's tricky to do. Hence, it's
105 left as a home work problem.