ia64/linux-2.6.18-xen.hg

view Documentation/IO-mapping.txt @ 897:329ea0ccb344

balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 14:01:20 2009 +0100 (2009-06-05)
parents 831230e53067
children
line source
1 [ NOTE: The virt_to_bus() and bus_to_virt() functions have been
2 superseded by the functionality provided by the PCI DMA
3 interface (see Documentation/DMA-mapping.txt). They continue
4 to be documented below for historical purposes, but new code
5 must not use them. --davidm 00/12/12 ]
7 [ This is a mail message in response to a query on IO mapping, thus the
8 strange format for a "document" ]
10 The AHA-1542 is a bus-master device, and your patch makes the driver give the
11 controller the physical address of the buffers, which is correct on x86
12 (because all bus master devices see the physical memory mappings directly).
14 However, on many setups, there are actually _three_ different ways of looking
15 at memory addresses, and in this case we actually want the third, the
16 so-called "bus address".
18 Essentially, the three ways of addressing memory are (this is "real memory",
19 that is, normal RAM--see later about other details):
21 - CPU untranslated. This is the "physical" address. Physical address
22 0 is what the CPU sees when it drives zeroes on the memory bus.
24 - CPU translated address. This is the "virtual" address, and is
25 completely internal to the CPU itself with the CPU doing the appropriate
26 translations into "CPU untranslated".
28 - bus address. This is the address of memory as seen by OTHER devices,
29 not the CPU. Now, in theory there could be many different bus
30 addresses, with each device seeing memory in some device-specific way, but
31 happily most hardware designers aren't actually actively trying to make
32 things any more complex than necessary, so you can assume that all
33 external hardware sees the memory the same way.
35 Now, on normal PCs the bus address is exactly the same as the physical
36 address, and things are very simple indeed. However, they are that simple
37 because the memory and the devices share the same address space, and that is
38 not generally necessarily true on other PCI/ISA setups.
40 Now, just as an example, on the PReP (PowerPC Reference Platform), the
41 CPU sees a memory map something like this (this is from memory):
43 0-2 GB "real memory"
44 2 GB-3 GB "system IO" (inb/out and similar accesses on x86)
45 3 GB-4 GB "IO memory" (shared memory over the IO bus)
47 Now, that looks simple enough. However, when you look at the same thing from
48 the viewpoint of the devices, you have the reverse, and the physical memory
49 address 0 actually shows up as address 2 GB for any IO master.
51 So when the CPU wants any bus master to write to physical memory 0, it
52 has to give the master address 0x80000000 as the memory address.
54 So, for example, depending on how the kernel is actually mapped on the
55 PPC, you can end up with a setup like this:
57 physical address: 0
58 virtual address: 0xC0000000
59 bus address: 0x80000000
61 where all the addresses actually point to the same thing. It's just seen
62 through different translations..
64 Similarly, on the Alpha, the normal translation is
66 physical address: 0
67 virtual address: 0xfffffc0000000000
68 bus address: 0x40000000
70 (but there are also Alphas where the physical address and the bus address
71 are the same).
73 Anyway, the way to look up all these translations, you do
75 #include <asm/io.h>
77 phys_addr = virt_to_phys(virt_addr);
78 virt_addr = phys_to_virt(phys_addr);
79 bus_addr = virt_to_bus(virt_addr);
80 virt_addr = bus_to_virt(bus_addr);
82 Now, when do you need these?
84 You want the _virtual_ address when you are actually going to access that
85 pointer from the kernel. So you can have something like this:
87 /*
88 * this is the hardware "mailbox" we use to communicate with
89 * the controller. The controller sees this directly.
90 */
91 struct mailbox {
92 __u32 status;
93 __u32 bufstart;
94 __u32 buflen;
95 ..
96 } mbox;
98 unsigned char * retbuffer;
100 /* get the address from the controller */
101 retbuffer = bus_to_virt(mbox.bufstart);
102 switch (retbuffer[0]) {
103 case STATUS_OK:
104 ...
106 on the other hand, you want the bus address when you have a buffer that
107 you want to give to the controller:
109 /* ask the controller to read the sense status into "sense_buffer" */
110 mbox.bufstart = virt_to_bus(&sense_buffer);
111 mbox.buflen = sizeof(sense_buffer);
112 mbox.status = 0;
113 notify_controller(&mbox);
115 And you generally _never_ want to use the physical address, because you can't
116 use that from the CPU (the CPU only uses translated virtual addresses), and
117 you can't use it from the bus master.
119 So why do we care about the physical address at all? We do need the physical
120 address in some cases, it's just not very often in normal code. The physical
121 address is needed if you use memory mappings, for example, because the
122 "remap_pfn_range()" mm function wants the physical address of the memory to
123 be remapped as measured in units of pages, a.k.a. the pfn (the memory
124 management layer doesn't know about devices outside the CPU, so it
125 shouldn't need to know about "bus addresses" etc).
127 NOTE NOTE NOTE! The above is only one part of the whole equation. The above
128 only talks about "real memory", that is, CPU memory (RAM).
130 There is a completely different type of memory too, and that's the "shared
131 memory" on the PCI or ISA bus. That's generally not RAM (although in the case
132 of a video graphics card it can be normal DRAM that is just used for a frame
133 buffer), but can be things like a packet buffer in a network card etc.
135 This memory is called "PCI memory" or "shared memory" or "IO memory" or
136 whatever, and there is only one way to access it: the readb/writeb and
137 related functions. You should never take the address of such memory, because
138 there is really nothing you can do with such an address: it's not
139 conceptually in the same memory space as "real memory" at all, so you cannot
140 just dereference a pointer. (Sadly, on x86 it _is_ in the same memory space,
141 so on x86 it actually works to just deference a pointer, but it's not
142 portable).
144 For such memory, you can do things like
146 - reading:
147 /*
148 * read first 32 bits from ISA memory at 0xC0000, aka
149 * C000:0000 in DOS terms
150 */
151 unsigned int signature = isa_readl(0xC0000);
153 - remapping and writing:
154 /*
155 * remap framebuffer PCI memory area at 0xFC000000,
156 * size 1MB, so that we can access it: We can directly
157 * access only the 640k-1MB area, so anything else
158 * has to be remapped.
159 */
160 char * baseptr = ioremap(0xFC000000, 1024*1024);
162 /* write a 'A' to the offset 10 of the area */
163 writeb('A',baseptr+10);
165 /* unmap when we unload the driver */
166 iounmap(baseptr);
168 - copying and clearing:
169 /* get the 6-byte Ethernet address at ISA address E000:0040 */
170 memcpy_fromio(kernel_buffer, 0xE0040, 6);
171 /* write a packet to the driver */
172 memcpy_toio(0xE1000, skb->data, skb->len);
173 /* clear the frame buffer */
174 memset_io(0xA0000, 0, 0x10000);
176 OK, that just about covers the basics of accessing IO portably. Questions?
177 Comments? You may think that all the above is overly complex, but one day you
178 might find yourself with a 500 MHz Alpha in front of you, and then you'll be
179 happy that your driver works ;)
181 Note that kernel versions 2.0.x (and earlier) mistakenly called the
182 ioremap() function "vremap()". ioremap() is the proper name, but I
183 didn't think straight when I wrote it originally. People who have to
184 support both can do something like:
186 /* support old naming silliness */
187 #if LINUX_VERSION_CODE < 0x020100
188 #define ioremap vremap
189 #define iounmap vfree
190 #endif
192 at the top of their source files, and then they can use the right names
193 even on 2.0.x systems.
195 And the above sounds worse than it really is. Most real drivers really
196 don't do all that complex things (or rather: the complexity is not so
197 much in the actual IO accesses as in error handling and timeouts etc).
198 It's generally not hard to fix drivers, and in many cases the code
199 actually looks better afterwards:
201 unsigned long signature = *(unsigned int *) 0xC0000;
202 vs
203 unsigned long signature = readl(0xC0000);
205 I think the second version actually is more readable, no?
207 Linus