view Documentation/block/deadline-iosched.txt @ 897:329ea0ccb344

balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 14:01:20 2009 +0100 (2009-06-05)
parents 831230e53067
line source
1 Deadline IO scheduler tunables
2 ==============================
4 This little file attempts to document how the deadline io scheduler works.
5 In particular, it will clarify the meaning of the exposed tunables that may be
6 of interest to power users.
8 Each io queue has a set of io scheduler tunables associated with it. These
9 tunables control how the io scheduler works. You can find these entries
10 in:
12 /sys/block/<device>/queue/iosched
14 assuming that you have sysfs mounted on /sys. If you don't have sysfs mounted,
15 you can do so by typing:
17 # mount none /sys -t sysfs
20 ********************************************************************************
23 read_expire (in ms)
24 -----------
26 The goal of the deadline io scheduler is to attempt to guarentee a start
27 service time for a request. As we focus mainly on read latencies, this is
28 tunable. When a read request first enters the io scheduler, it is assigned
29 a deadline that is the current time + the read_expire value in units of
30 miliseconds.
33 write_expire (in ms)
34 -----------
36 Similar to read_expire mentioned above, but for writes.
39 fifo_batch
40 ----------
42 When a read request expires its deadline, we must move some requests from
43 the sorted io scheduler list to the block device dispatch queue. fifo_batch
44 controls how many requests we move, based on the cost of each request. A
45 request is either qualified as a seek or a stream. The io scheduler knows
46 the last request that was serviced by the drive (or will be serviced right
47 before this one). See seek_cost and stream_unit.
50 write_starved (number of dispatches)
51 -------------
53 When we have to move requests from the io scheduler queue to the block
54 device dispatch queue, we always give a preference to reads. However, we
55 don't want to starve writes indefinitely either. So writes_starved controls
56 how many times we give preference to reads over writes. When that has been
57 done writes_starved number of times, we dispatch some writes based on the
58 same criteria as reads.
61 front_merges (bool)
62 ------------
64 Sometimes it happens that a request enters the io scheduler that is contigious
65 with a request that is already on the queue. Either it fits in the back of that
66 request, or it fits at the front. That is called either a back merge candidate
67 or a front merge candidate. Due to the way files are typically laid out,
68 back merges are much more common than front merges. For some work loads, you
69 may even know that it is a waste of time to spend any time attempting to
70 front merge requests. Setting front_merges to 0 disables this functionality.
71 Front merges may still occur due to the cached last_merge hint, but since
72 that comes at basically 0 cost we leave that on. We simply disable the
73 rbtree front sector lookup when the io scheduler merge function is called.
76 Nov 11 2002, Jens Axboe <axboe@suse.de>