view Documentation/arm/Interrupts @ 897:329ea0ccb344

balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 14:01:20 2009 +0100 (2009-06-05)
parents 831230e53067
line source
1 2.5.2-rmk5
2 ----------
4 This is the first kernel that contains a major shake up of some of the
5 major architecture-specific subsystems.
7 Firstly, it contains some pretty major changes to the way we handle the
8 MMU TLB. Each MMU TLB variant is now handled completely separately -
9 we have TLB v3, TLB v4 (without write buffer), TLB v4 (with write buffer),
10 and finally TLB v4 (with write buffer, with I TLB invalidate entry).
11 There is more assembly code inside each of these functions, mainly to
12 allow more flexible TLB handling for the future.
14 Secondly, the IRQ subsystem.
16 The 2.5 kernels will be having major changes to the way IRQs are handled.
17 Unfortunately, this means that machine types that touch the irq_desc[]
18 array (basically all machine types) will break, and this means every
19 machine type that we currently have.
21 Lets take an example. On the Assabet with Neponset, we have:
23 GPIO25 IRR:2
24 SA1100 ------------> Neponset -----------> SA1111
25 IIR:1
26 -----------> USAR
27 IIR:0
28 -----------> SMC9196
30 The way stuff currently works, all SA1111 interrupts are mutually
31 exclusive of each other - if you're processing one interrupt from the
32 SA1111 and another comes in, you have to wait for that interrupt to
33 finish processing before you can service the new interrupt. Eg, an
34 IDE PIO-based interrupt on the SA1111 excludes all other SA1111 and
35 SMC9196 interrupts until it has finished transferring its multi-sector
36 data, which can be a long time. Note also that since we loop in the
37 SA1111 IRQ handler, SA1111 IRQs can hold off SMC9196 IRQs indefinitely.
40 The new approach brings several new ideas...
42 We introduce the concept of a "parent" and a "child". For example,
43 to the Neponset handler, the "parent" is GPIO25, and the "children"d
44 are SA1111, SMC9196 and USAR.
46 We also bring the idea of an IRQ "chip" (mainly to reduce the size of
47 the irqdesc array). This doesn't have to be a real "IC"; indeed the
48 SA11x0 IRQs are handled by two separate "chip" structures, one for
49 GPIO0-10, and another for all the rest. It is just a container for
50 the various operations (maybe this'll change to a better name).
51 This structure has the following operations:
53 struct irqchip {
54 /*
55 * Acknowledge the IRQ.
56 * If this is a level-based IRQ, then it is expected to mask the IRQ
57 * as well.
58 */
59 void (*ack)(unsigned int irq);
60 /*
61 * Mask the IRQ in hardware.
62 */
63 void (*mask)(unsigned int irq);
64 /*
65 * Unmask the IRQ in hardware.
66 */
67 void (*unmask)(unsigned int irq);
68 /*
69 * Re-run the IRQ
70 */
71 void (*rerun)(unsigned int irq);
72 /*
73 * Set the type of the IRQ.
74 */
75 int (*type)(unsigned int irq, unsigned int, type);
76 };
78 ack - required. May be the same function as mask for IRQs
79 handled by do_level_IRQ.
80 mask - required.
81 unmask - required.
82 rerun - optional. Not required if you're using do_level_IRQ for all
83 IRQs that use this 'irqchip'. Generally expected to re-trigger
84 the hardware IRQ if possible. If not, may call the handler
85 directly.
86 type - optional. If you don't support changing the type of an IRQ,
87 it should be null so people can detect if they are unable to
88 set the IRQ type.
90 For each IRQ, we keep the following information:
92 - "disable" depth (number of disable_irq()s without enable_irq()s)
93 - flags indicating what we can do with this IRQ (valid, probe,
94 noautounmask) as before
95 - status of the IRQ (probing, enable, etc)
96 - chip
97 - per-IRQ handler
98 - irqaction structure list
100 The handler can be one of the 3 standard handlers - "level", "edge" and
101 "simple", or your own specific handler if you need to do something special.
103 The "level" handler is what we currently have - its pretty simple.
104 "edge" knows about the brokenness of such IRQ implementations - that you
105 need to leave the hardware IRQ enabled while processing it, and queueing
106 further IRQ events should the IRQ happen again while processing. The
107 "simple" handler is very basic, and does not perform any hardware
108 manipulation, nor state tracking. This is useful for things like the
109 SMC9196 and USAR above.
111 So, what's changed?
113 1. Machine implementations must not write to the irqdesc array.
115 2. New functions to manipulate the irqdesc array. The first 4 are expected
116 to be useful only to machine specific code. The last is recommended to
117 only be used by machine specific code, but may be used in drivers if
118 absolutely necessary.
120 set_irq_chip(irq,chip)
122 Set the mask/unmask methods for handling this IRQ
124 set_irq_handler(irq,handler)
126 Set the handler for this IRQ (level, edge, simple)
128 set_irq_chained_handler(irq,handler)
130 Set a "chained" handler for this IRQ - automatically
131 enables this IRQ (eg, Neponset and SA1111 handlers).
133 set_irq_flags(irq,flags)
135 Set the valid/probe/noautoenable flags.
137 set_irq_type(irq,type)
139 Set active the IRQ edge(s)/level. This replaces the
140 SA1111 INTPOL manipulation, and the set_GPIO_IRQ_edge()
141 function. Type should be one of the following:
143 #define IRQT_NOEDGE (0)
147 #define IRQT_LOW (__IRQT_LOWLVL)
148 #define IRQT_HIGH (__IRQT_HIGHLVL)
150 3. set_GPIO_IRQ_edge() is obsolete, and should be replaced by set_irq_type.
152 4. Direct access to SA1111 INTPOL is depreciated. Use set_irq_type instead.
154 5. A handler is expected to perform any necessary acknowledgement of the
155 parent IRQ via the correct chip specific function. For instance, if
156 the SA1111 is directly connected to a SA1110 GPIO, then you should
157 acknowledge the SA1110 IRQ each time you re-read the SA1111 IRQ status.
159 6. For any child which doesn't have its own IRQ enable/disable controls
160 (eg, SMC9196), the handler must mask or acknowledge the parent IRQ
161 while the child handler is called, and the child handler should be the
162 "simple" handler (not "edge" nor "level"). After the handler completes,
163 the parent IRQ should be unmasked, and the status of all children must
164 be re-checked for pending events. (see the Neponset IRQ handler for
165 details).
167 7. fixup_irq() is gone, as is include/asm-arm/arch-*/irq.h
169 Please note that this will not solve all problems - some of them are
170 hardware based. Mixing level-based and edge-based IRQs on the same
171 parent signal (eg neponset) is one such area where a software based
172 solution can't provide the full answer to low IRQ latency.