view Documentation/networking/operstates.txt @ 897:329ea0ccb344

balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 14:01:20 2009 +0100 (2009-06-05)
parents 831230e53067
line source
2 1. Introduction
4 Linux distinguishes between administrative and operational state of an
5 interface. Admininstrative state is the result of "ip link set dev
6 <dev> up or down" and reflects whether the administrator wants to use
7 the device for traffic.
9 However, an interface is not usable just because the admin enabled it
10 - ethernet requires to be plugged into the switch and, depending on
11 a site's networking policy and configuration, an 802.1X authentication
12 to be performed before user data can be transferred. Operational state
13 shows the ability of an interface to transmit this user data.
15 Thanks to 802.1X, userspace must be granted the possibility to
16 influence operational state. To accommodate this, operational state is
17 split into two parts: Two flags that can be set by the driver only, and
18 a RFC2863 compatible state that is derived from these flags, a policy,
19 and changeable from userspace under certain rules.
22 2. Querying from userspace
24 Both admin and operational state can be queried via the netlink
25 operation RTM_GETLINK. It is also possible to subscribe to RTMGRP_LINK
26 to be notified of updates. This is important for setting from userspace.
28 These values contain interface state:
30 ifinfomsg::if_flags & IFF_UP:
31 Interface is admin up
32 ifinfomsg::if_flags & IFF_RUNNING:
33 Interface is in RFC2863 operational state UP or UNKNOWN. This is for
34 backward compatibility, routing daemons, dhcp clients can use this
35 flag to determine whether they should use the interface.
36 ifinfomsg::if_flags & IFF_LOWER_UP:
37 Driver has signaled netif_carrier_on()
38 ifinfomsg::if_flags & IFF_DORMANT:
39 Driver has signaled netif_dormant_on()
41 These interface flags can also be queried without netlink using the
46 contains RFC2863 state of the interface in numeric representation:
49 Interface is in unknown state, neither driver nor userspace has set
50 operational state. Interface must be considered for user data as
51 setting operational state has not been implemented in every driver.
53 Unused in current kernel (notpresent interfaces normally disappear),
54 just a numerical placeholder.
55 IF_OPER_DOWN (2):
56 Interface is unable to transfer data on L1, f.e. ethernet is not
57 plugged or interface is ADMIN down.
59 Interfaces stacked on an interface that is IF_OPER_DOWN show this
60 state (f.e. VLAN).
62 Unused in current kernel.
64 Interface is L1 up, but waiting for an external event, f.e. for a
65 protocol to establish. (802.1X)
66 IF_OPER_UP (6):
67 Interface is operational up and can be used.
69 This TLV can also be queried via sysfs.
73 contains link policy. This is needed for userspace interaction
74 described below.
76 This TLV can also be queried via sysfs.
79 3. Kernel driver API
81 Kernel drivers have access to two flags that map to IFF_LOWER_UP and
82 IFF_DORMANT. These flags can be set from everywhere, even from
83 interrupts. It is guaranteed that only the driver has write access,
84 however, if different layers of the driver manipulate the same flag,
85 the driver has to provide the synchronisation needed.
89 The driver uses netif_carrier_on() to clear and netif_carrier_off() to
90 set this flag. On netif_carrier_off(), the scheduler stops sending
91 packets. The name 'carrier' and the inversion are historical, think of
92 it as lower layer.
94 netif_carrier_ok() can be used to query that bit.
98 Set by the driver to express that the device cannot yet be used
99 because some driver controlled protocol establishment has to
100 complete. Corresponding functions are netif_dormant_on() to set the
101 flag, netif_dormant_off() to clear it and netif_dormant() to query.
103 On device allocation, networking core sets the flags equivalent to
104 netif_carrier_ok() and !netif_dormant().
107 Whenever the driver CHANGES one of these flags, a workqueue event is
108 scheduled to translate the flag combination to IFLA_OPERSTATE as
109 follows:
111 !netif_carrier_ok():
112 IF_OPER_LOWERLAYERDOWN if the interface is stacked, IF_OPER_DOWN
113 otherwise. Kernel can recognise stacked interfaces because their
114 ifindex != iflink.
116 netif_carrier_ok() && netif_dormant():
119 netif_carrier_ok() && !netif_dormant():
120 IF_OPER_UP if userspace interaction is disabled. Otherwise
121 IF_OPER_DORMANT with the possibility for userspace to initiate the
122 IF_OPER_UP transition afterwards.
125 4. Setting from userspace
127 Applications have to use the netlink interface to influence the
128 RFC2863 operational state of an interface. Setting IFLA_LINKMODE to 1
129 via RTM_SETLINK instructs the kernel that an interface should go to
130 IF_OPER_DORMANT instead of IF_OPER_UP when the combination
131 netif_carrier_ok() && !netif_dormant() is set by the
132 driver. Afterwards, the userspace application can set IFLA_OPERSTATE
133 to IF_OPER_DORMANT or IF_OPER_UP as long as the driver does not set
134 netif_carrier_off() or netif_dormant_on(). Changes made by userspace
135 are multicasted on the netlink group RTMGRP_LINK.
137 So basically a 802.1X supplicant interacts with the kernel like this:
139 -subscribe to RTMGRP_LINK
141 -query RTM_GETLINK once to get initial state
142 -if initial flags are not (IFF_LOWER_UP && !IFF_DORMANT), wait until
143 netlink multicast signals this state
144 -do 802.1X, eventually abort if flags go down again
145 -send RTM_SETLINK to set operstate to IF_OPER_UP if authentication
146 succeeds, IF_OPER_DORMANT otherwise
147 -see how operstate and IFF_RUNNING is echoed via netlink multicast
148 -set interface back to IF_OPER_DORMANT if 802.1X reauthentication
149 fails
150 -restart if kernel changes IFF_LOWER_UP or IFF_DORMANT flag
152 if supplicant goes down, bring back IFLA_LINKMODE to 0 and
153 IFLA_OPERSTATE to a sane value.
155 A routing daemon or dhcp client just needs to care for IFF_RUNNING or
156 waiting for operstate to go IF_OPER_UP/IF_OPER_UNKNOWN before
157 considering the interface / querying a DHCP address.
160 For technical questions and/or comments please e-mail to Stefan Rompf
161 (stefan at loplof.de).