view Documentation/networking/phy.txt @ 897:329ea0ccb344

balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 14:01:20 2009 +0100 (2009-06-05)
parents 831230e53067
line source
2 -------
3 PHY Abstraction Layer
4 (Updated 2005-07-21)
6 Purpose
8 Most network devices consist of set of registers which provide an interface
9 to a MAC layer, which communicates with the physical connection through a
10 PHY. The PHY concerns itself with negotiating link parameters with the link
11 partner on the other side of the network connection (typically, an ethernet
12 cable), and provides a register interface to allow drivers to determine what
13 settings were chosen, and to configure what settings are allowed.
15 While these devices are distinct from the network devices, and conform to a
16 standard layout for the registers, it has been common practice to integrate
17 the PHY management code with the network driver. This has resulted in large
18 amounts of redundant code. Also, on embedded systems with multiple (and
19 sometimes quite different) ethernet controllers connected to the same
20 management bus, it is difficult to ensure safe use of the bus.
22 Since the PHYs are devices, and the management busses through which they are
23 accessed are, in fact, busses, the PHY Abstraction Layer treats them as such.
24 In doing so, it has these goals:
26 1) Increase code-reuse
27 2) Increase overall code-maintainability
28 3) Speed development time for new network drivers, and for new systems
30 Basically, this layer is meant to provide an interface to PHY devices which
31 allows network driver writers to write as little code as possible, while
32 still providing a full feature set.
34 The MDIO bus
36 Most network devices are connected to a PHY by means of a management bus.
37 Different devices use different busses (though some share common interfaces).
38 In order to take advantage of the PAL, each bus interface needs to be
39 registered as a distinct device.
41 1) read and write functions must be implemented. Their prototypes are:
43 int write(struct mii_bus *bus, int mii_id, int regnum, u16 value);
44 int read(struct mii_bus *bus, int mii_id, int regnum);
46 mii_id is the address on the bus for the PHY, and regnum is the register
47 number. These functions are guaranteed not to be called from interrupt
48 time, so it is safe for them to block, waiting for an interrupt to signal
49 the operation is complete
51 2) A reset function is necessary. This is used to return the bus to an
52 initialized state.
54 3) A probe function is needed. This function should set up anything the bus
55 driver needs, setup the mii_bus structure, and register with the PAL using
56 mdiobus_register. Similarly, there's a remove function to undo all of
57 that (use mdiobus_unregister).
59 4) Like any driver, the device_driver structure must be configured, and init
60 exit functions are used to register the driver.
62 5) The bus must also be declared somewhere as a device, and registered.
64 As an example for how one driver implemented an mdio bus driver, see
65 drivers/net/gianfar_mii.c and arch/ppc/syslib/mpc85xx_devices.c
67 Connecting to a PHY
69 Sometime during startup, the network driver needs to establish a connection
70 between the PHY device, and the network device. At this time, the PHY's bus
71 and drivers need to all have been loaded, so it is ready for the connection.
72 At this point, there are several ways to connect to the PHY:
74 1) The PAL handles everything, and only calls the network driver when
75 the link state changes, so it can react.
77 2) The PAL handles everything except interrupts (usually because the
78 controller has the interrupt registers).
80 3) The PAL handles everything, but checks in with the driver every second,
81 allowing the network driver to react first to any changes before the PAL
82 does.
84 4) The PAL serves only as a library of functions, with the network device
85 manually calling functions to update status, and configure the PHY
88 Letting the PHY Abstraction Layer do Everything
90 If you choose option 1 (The hope is that every driver can, but to still be
91 useful to drivers that can't), connecting to the PHY is simple:
93 First, you need a function to react to changes in the link state. This
94 function follows this protocol:
96 static void adjust_link(struct net_device *dev);
98 Next, you need to know the device name of the PHY connected to this device.
99 The name will look something like, "phy0:0", where the first number is the
100 bus id, and the second is the PHY's address on that bus.
102 Now, to connect, just call this function:
104 phydev = phy_connect(dev, phy_name, &adjust_link, flags);
106 phydev is a pointer to the phy_device structure which represents the PHY. If
107 phy_connect is successful, it will return the pointer. dev, here, is the
108 pointer to your net_device. Once done, this function will have started the
109 PHY's software state machine, and registered for the PHY's interrupt, if it
110 has one. The phydev structure will be populated with information about the
111 current state, though the PHY will not yet be truly operational at this
112 point.
114 flags is a u32 which can optionally contain phy-specific flags.
115 This is useful if the system has put hardware restrictions on
116 the PHY/controller, of which the PHY needs to be aware.
118 Now just make sure that phydev->supported and phydev->advertising have any
119 values pruned from them which don't make sense for your controller (a 10/100
120 controller may be connected to a gigabit capable PHY, so you would need to
121 mask off SUPPORTED_1000baseT*). See include/linux/ethtool.h for definitions
122 for these bitfields. Note that you should not SET any bits, or the PHY may
123 get put into an unsupported state.
125 Lastly, once the controller is ready to handle network traffic, you call
126 phy_start(phydev). This tells the PAL that you are ready, and configures the
127 PHY to connect to the network. If you want to handle your own interrupts,
128 just set phydev->irq to PHY_IGNORE_INTERRUPT before you call phy_start.
129 Similarly, if you don't want to use interrupts, set phydev->irq to PHY_POLL.
131 When you want to disconnect from the network (even if just briefly), you call
132 phy_stop(phydev).
134 Keeping Close Tabs on the PAL
136 It is possible that the PAL's built-in state machine needs a little help to
137 keep your network device and the PHY properly in sync. If so, you can
138 register a helper function when connecting to the PHY, which will be called
139 every second before the state machine reacts to any changes. To do this, you
140 need to manually call phy_attach() and phy_prepare_link(), and then call
141 phy_start_machine() with the second argument set to point to your special
142 handler.
144 Currently there are no examples of how to use this functionality, and testing
145 on it has been limited because the author does not have any drivers which use
146 it (they all use option 1). So Caveat Emptor.
148 Doing it all yourself
150 There's a remote chance that the PAL's built-in state machine cannot track
151 the complex interactions between the PHY and your network device. If this is
152 so, you can simply call phy_attach(), and not call phy_start_machine or
153 phy_prepare_link(). This will mean that phydev->state is entirely yours to
154 handle (phy_start and phy_stop toggle between some of the states, so you
155 might need to avoid them).
157 An effort has been made to make sure that useful functionality can be
158 accessed without the state-machine running, and most of these functions are
159 descended from functions which did not interact with a complex state-machine.
160 However, again, no effort has been made so far to test running without the
161 state machine, so tryer beware.
163 Here is a brief rundown of the functions:
165 int phy_read(struct phy_device *phydev, u16 regnum);
166 int phy_write(struct phy_device *phydev, u16 regnum, u16 val);
168 Simple read/write primitives. They invoke the bus's read/write function
169 pointers.
171 void phy_print_status(struct phy_device *phydev);
173 A convenience function to print out the PHY status neatly.
175 int phy_clear_interrupt(struct phy_device *phydev);
176 int phy_config_interrupt(struct phy_device *phydev, u32 interrupts);
178 Clear the PHY's interrupt, and configure which ones are allowed,
179 respectively. Currently only supports all on, or all off.
181 int phy_enable_interrupts(struct phy_device *phydev);
182 int phy_disable_interrupts(struct phy_device *phydev);
184 Functions which enable/disable PHY interrupts, clearing them
185 before and after, respectively.
187 int phy_start_interrupts(struct phy_device *phydev);
188 int phy_stop_interrupts(struct phy_device *phydev);
190 Requests the IRQ for the PHY interrupts, then enables them for
191 start, or disables then frees them for stop.
193 struct phy_device * phy_attach(struct net_device *dev, const char *phy_id,
194 u32 flags);
196 Attaches a network device to a particular PHY, binding the PHY to a generic
197 driver if none was found during bus initialization. Passes in
198 any phy-specific flags as needed.
200 int phy_start_aneg(struct phy_device *phydev);
202 Using variables inside the phydev structure, either configures advertising
203 and resets autonegotiation, or disables autonegotiation, and configures
204 forced settings.
206 static inline int phy_read_status(struct phy_device *phydev);
208 Fills the phydev structure with up-to-date information about the current
209 settings in the PHY.
211 void phy_sanitize_settings(struct phy_device *phydev)
213 Resolves differences between currently desired settings, and
214 supported settings for the given PHY device. Does not make
215 the changes in the hardware, though.
217 int phy_ethtool_sset(struct phy_device *phydev, struct ethtool_cmd *cmd);
218 int phy_ethtool_gset(struct phy_device *phydev, struct ethtool_cmd *cmd);
220 Ethtool convenience functions.
222 int phy_mii_ioctl(struct phy_device *phydev,
223 struct mii_ioctl_data *mii_data, int cmd);
225 The MII ioctl. Note that this function will completely screw up the state
226 machine if you write registers like BMCR, BMSR, ADVERTISE, etc. Best to
227 use this only to write registers which are not standard, and don't set off
228 a renegotiation.
231 PHY Device Drivers
233 With the PHY Abstraction Layer, adding support for new PHYs is
234 quite easy. In some cases, no work is required at all! However,
235 many PHYs require a little hand-holding to get up-and-running.
237 Generic PHY driver
239 If the desired PHY doesn't have any errata, quirks, or special
240 features you want to support, then it may be best to not add
241 support, and let the PHY Abstraction Layer's Generic PHY Driver
242 do all of the work.
244 Writing a PHY driver
246 If you do need to write a PHY driver, the first thing to do is
247 make sure it can be matched with an appropriate PHY device.
248 This is done during bus initialization by reading the device's
249 UID (stored in registers 2 and 3), then comparing it to each
250 driver's phy_id field by ANDing it with each driver's
251 phy_id_mask field. Also, it needs a name. Here's an example:
253 static struct phy_driver dm9161_driver = {
254 .phy_id = 0x0181b880,
255 .name = "Davicom DM9161E",
256 .phy_id_mask = 0x0ffffff0,
257 ...
258 }
260 Next, you need to specify what features (speed, duplex, autoneg,
261 etc) your PHY device and driver support. Most PHYs support
262 PHY_BASIC_FEATURES, but you can look in include/mii.h for other
263 features.
265 Each driver consists of a number of function pointers:
267 config_init: configures PHY into a sane state after a reset.
268 For instance, a Davicom PHY requires descrambling disabled.
269 probe: Does any setup needed by the driver
270 suspend/resume: power management
271 config_aneg: Changes the speed/duplex/negotiation settings
272 read_status: Reads the current speed/duplex/negotiation settings
273 ack_interrupt: Clear a pending interrupt
274 config_intr: Enable or disable interrupts
275 remove: Does any driver take-down
277 Of these, only config_aneg and read_status are required to be
278 assigned by the driver code. The rest are optional. Also, it is
279 preferred to use the generic phy driver's versions of these two
280 functions if at all possible: genphy_read_status and
281 genphy_config_aneg. If this is not possible, it is likely that
282 you only need to perform some actions before and after invoking
283 these functions, and so your functions will wrap the generic
284 ones.
286 Feel free to look at the Marvell, Cicada, and Davicom drivers in
287 drivers/net/phy/ for examples (the lxt and qsemi drivers have
288 not been tested as of this writing)