view Documentation/vm/numa @ 897:329ea0ccb344

balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 14:01:20 2009 +0100 (2009-06-05)
parents 831230e53067
line source
1 Started Nov 1999 by Kanoj Sarcar <kanoj@sgi.com>
3 The intent of this file is to have an uptodate, running commentary
4 from different people about NUMA specific code in the Linux vm.
6 What is NUMA? It is an architecture where the memory access times
7 for different regions of memory from a given processor varies
8 according to the "distance" of the memory region from the processor.
9 Each region of memory to which access times are the same from any
10 cpu, is called a node. On such architectures, it is beneficial if
11 the kernel tries to minimize inter node communications. Schemes
12 for this range from kernel text and read-only data replication
13 across nodes, and trying to house all the data structures that
14 key components of the kernel need on memory on that node.
16 Currently, all the numa support is to provide efficient handling
17 of widely discontiguous physical memory, so architectures which
18 are not NUMA but can have huge holes in the physical address space
19 can use the same code. All this code is bracketed by CONFIG_DISCONTIGMEM.
21 The initial port includes NUMAizing the bootmem allocator code by
22 encapsulating all the pieces of information into a bootmem_data_t
23 structure. Node specific calls have been added to the allocator.
24 In theory, any platform which uses the bootmem allocator should
25 be able to to put the bootmem and mem_map data structures anywhere
26 it deems best.
28 Each node's page allocation data structures have also been encapsulated
29 into a pg_data_t. The bootmem_data_t is just one part of this. To
30 make the code look uniform between NUMA and regular UMA platforms,
31 UMA platforms have a statically allocated pg_data_t too (contig_page_data).
32 For the sake of uniformity, the function num_online_nodes() is also defined
33 for all platforms. As we run benchmarks, we might decide to NUMAize
34 more variables like low_on_memory, nr_free_pages etc into the pg_data_t.
36 The NUMA aware page allocation code currently tries to allocate pages
37 from different nodes in a round robin manner. This will be changed to
38 do concentratic circle search, starting from current node, once the
39 NUMA port achieves more maturity. The call alloc_pages_node has been
40 added, so that drivers can make the call and not worry about whether
41 it is running on a NUMA or UMA platform.