annotate Documentation/pci.txt @ 897:329ea0ccb344

balloon: try harder to balloon up under memory pressure.

Currently if the balloon driver is unable to increase the guest's
reservation it assumes the failure was due to reaching its full
allocation, gives up on the ballooning operation and records the limit
it reached as the "hard limit". The driver will not try again until
the target is set again (even to the same value).

However it is possible that ballooning has in fact failed due to
memory pressure in the host and therefore it is desirable to keep
attempting to reach the target in case memory becomes available. The
most likely scenario is that some guests are ballooning down while
others are ballooning up and therefore there is temporary memory
pressure while things stabilise. You would not expect a well behaved
toolstack to ask a domain to balloon to more than its allocation nor
would you expect it to deliberately over-commit memory by setting
balloon targets which exceed the total host memory.

This patch drops the concept of a hard limit and causes the balloon
driver to retry increasing the reservation on a timer in the same
manner as when decreasing the reservation.

Also if we partially succeed in increasing the reservation
(i.e. receive less pages than we asked for) then we may as well keep
those pages rather than returning them to Xen.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Fri Jun 05 14:01:20 2009 +0100 (2009-06-05)
parents 831230e53067
rev   line source
ian@0 1 How To Write Linux PCI Drivers
ian@0 2
ian@0 3 by Martin Mares <mj@ucw.cz> on 07-Feb-2000
ian@0 4
ian@0 5 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ian@0 6 The world of PCI is vast and it's full of (mostly unpleasant) surprises.
ian@0 7 Different PCI devices have different requirements and different bugs --
ian@0 8 because of this, the PCI support layer in Linux kernel is not as trivial
ian@0 9 as one would wish. This short pamphlet tries to help all potential driver
ian@0 10 authors find their way through the deep forests of PCI handling.
ian@0 11
ian@0 12
ian@0 13 0. Structure of PCI drivers
ian@0 14 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
ian@0 15 There exist two kinds of PCI drivers: new-style ones (which leave most of
ian@0 16 probing for devices to the PCI layer and support online insertion and removal
ian@0 17 of devices [thus supporting PCI, hot-pluggable PCI and CardBus in a single
ian@0 18 driver]) and old-style ones which just do all the probing themselves. Unless
ian@0 19 you have a very good reason to do so, please don't use the old way of probing
ian@0 20 in any new code. After the driver finds the devices it wishes to operate
ian@0 21 on (either the old or the new way), it needs to perform the following steps:
ian@0 22
ian@0 23 Enable the device
ian@0 24 Access device configuration space
ian@0 25 Discover resources (addresses and IRQ numbers) provided by the device
ian@0 26 Allocate these resources
ian@0 27 Communicate with the device
ian@0 28 Disable the device
ian@0 29
ian@0 30 Most of these topics are covered by the following sections, for the rest
ian@0 31 look at <linux/pci.h>, it's hopefully well commented.
ian@0 32
ian@0 33 If the PCI subsystem is not configured (CONFIG_PCI is not set), most of
ian@0 34 the functions described below are defined as inline functions either completely
ian@0 35 empty or just returning an appropriate error codes to avoid lots of ifdefs
ian@0 36 in the drivers.
ian@0 37
ian@0 38
ian@0 39 1. New-style drivers
ian@0 40 ~~~~~~~~~~~~~~~~~~~~
ian@0 41 The new-style drivers just call pci_register_driver during their initialization
ian@0 42 with a pointer to a structure describing the driver (struct pci_driver) which
ian@0 43 contains:
ian@0 44
ian@0 45 name Name of the driver
ian@0 46 id_table Pointer to table of device ID's the driver is
ian@0 47 interested in. Most drivers should export this
ian@0 48 table using MODULE_DEVICE_TABLE(pci,...).
ian@0 49 probe Pointer to a probing function which gets called (during
ian@0 50 execution of pci_register_driver for already existing
ian@0 51 devices or later if a new device gets inserted) for all
ian@0 52 PCI devices which match the ID table and are not handled
ian@0 53 by the other drivers yet. This function gets passed a
ian@0 54 pointer to the pci_dev structure representing the device
ian@0 55 and also which entry in the ID table did the device
ian@0 56 match. It returns zero when the driver has accepted the
ian@0 57 device or an error code (negative number) otherwise.
ian@0 58 This function always gets called from process context,
ian@0 59 so it can sleep.
ian@0 60 remove Pointer to a function which gets called whenever a
ian@0 61 device being handled by this driver is removed (either
ian@0 62 during deregistration of the driver or when it's
ian@0 63 manually pulled out of a hot-pluggable slot). This
ian@0 64 function always gets called from process context, so it
ian@0 65 can sleep.
ian@0 66 save_state Save a device's state before it's suspend.
ian@0 67 suspend Put device into low power state.
ian@0 68 resume Wake device from low power state.
ian@0 69 enable_wake Enable device to generate wake events from a low power
ian@0 70 state.
ian@0 71
ian@0 72 (Please see Documentation/power/pci.txt for descriptions
ian@0 73 of PCI Power Management and the related functions)
ian@0 74
ian@0 75 The ID table is an array of struct pci_device_id ending with a all-zero entry.
ian@0 76 Each entry consists of:
ian@0 77
ian@0 78 vendor, device Vendor and device ID to match (or PCI_ANY_ID)
ian@0 79 subvendor, Subsystem vendor and device ID to match (or PCI_ANY_ID)
ian@0 80 subdevice
ian@0 81 class, Device class to match. The class_mask tells which bits
ian@0 82 class_mask of the class are honored during the comparison.
ian@0 83 driver_data Data private to the driver.
ian@0 84
ian@0 85 Most drivers don't need to use the driver_data field. Best practice
ian@0 86 for use of driver_data is to use it as an index into a static list of
ian@0 87 equivalent device types, not to use it as a pointer.
ian@0 88
ian@0 89 Have a table entry {PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID}
ian@0 90 to have probe() called for every PCI device known to the system.
ian@0 91
ian@0 92 New PCI IDs may be added to a device driver at runtime by writing
ian@0 93 to the file /sys/bus/pci/drivers/{driver}/new_id. When added, the
ian@0 94 driver will probe for all devices it can support.
ian@0 95
ian@0 96 echo "vendor device subvendor subdevice class class_mask driver_data" > \
ian@0 97 /sys/bus/pci/drivers/{driver}/new_id
ian@0 98 where all fields are passed in as hexadecimal values (no leading 0x).
ian@0 99 Users need pass only as many fields as necessary; vendor, device,
ian@0 100 subvendor, and subdevice fields default to PCI_ANY_ID (FFFFFFFF),
ian@0 101 class and classmask fields default to 0, and driver_data defaults to
ian@0 102 0UL. Device drivers must initialize use_driver_data in the dynids struct
ian@0 103 in their pci_driver struct prior to calling pci_register_driver in order
ian@0 104 for the driver_data field to get passed to the driver. Otherwise, only a
ian@0 105 0 is passed in that field.
ian@0 106
ian@0 107 When the driver exits, it just calls pci_unregister_driver() and the PCI layer
ian@0 108 automatically calls the remove hook for all devices handled by the driver.
ian@0 109
ian@0 110 Please mark the initialization and cleanup functions where appropriate
ian@0 111 (the corresponding macros are defined in <linux/init.h>):
ian@0 112
ian@0 113 __init Initialization code. Thrown away after the driver
ian@0 114 initializes.
ian@0 115 __exit Exit code. Ignored for non-modular drivers.
ian@0 116 __devinit Device initialization code. Identical to __init if
ian@0 117 the kernel is not compiled with CONFIG_HOTPLUG, normal
ian@0 118 function otherwise.
ian@0 119 __devexit The same for __exit.
ian@0 120
ian@0 121 Tips:
ian@0 122 The module_init()/module_exit() functions (and all initialization
ian@0 123 functions called only from these) should be marked __init/exit.
ian@0 124 The struct pci_driver shouldn't be marked with any of these tags.
ian@0 125 The ID table array should be marked __devinitdata.
ian@0 126 The probe() and remove() functions (and all initialization
ian@0 127 functions called only from these) should be marked __devinit/exit.
ian@0 128 If you are sure the driver is not a hotplug driver then use only
ian@0 129 __init/exit __initdata/exitdata.
ian@0 130
ian@0 131 Pointers to functions marked as __devexit must be created using
ian@0 132 __devexit_p(function_name). That will generate the function
ian@0 133 name or NULL if the __devexit function will be discarded.
ian@0 134
ian@0 135
ian@0 136 2. How to find PCI devices manually (the old style)
ian@0 137 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ian@0 138 PCI drivers not using the pci_register_driver() interface search
ian@0 139 for PCI devices manually using the following constructs:
ian@0 140
ian@0 141 Searching by vendor and device ID:
ian@0 142
ian@0 143 struct pci_dev *dev = NULL;
ian@0 144 while (dev = pci_get_device(VENDOR_ID, DEVICE_ID, dev))
ian@0 145 configure_device(dev);
ian@0 146
ian@0 147 Searching by class ID (iterate in a similar way):
ian@0 148
ian@0 149 pci_get_class(CLASS_ID, dev)
ian@0 150
ian@0 151 Searching by both vendor/device and subsystem vendor/device ID:
ian@0 152
ian@0 153 pci_get_subsys(VENDOR_ID, DEVICE_ID, SUBSYS_VENDOR_ID, SUBSYS_DEVICE_ID, dev).
ian@0 154
ian@0 155 You can use the constant PCI_ANY_ID as a wildcard replacement for
ian@0 156 VENDOR_ID or DEVICE_ID. This allows searching for any device from a
ian@0 157 specific vendor, for example.
ian@0 158
ian@0 159 These functions are hotplug-safe. They increment the reference count on
ian@0 160 the pci_dev that they return. You must eventually (possibly at module unload)
ian@0 161 decrement the reference count on these devices by calling pci_dev_put().
ian@0 162
ian@0 163
ian@0 164 3. Enabling and disabling devices
ian@0 165 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ian@0 166 Before you do anything with the device you've found, you need to enable
ian@0 167 it by calling pci_enable_device() which enables I/O and memory regions of
ian@0 168 the device, allocates an IRQ if necessary, assigns missing resources if
ian@0 169 needed and wakes up the device if it was in suspended state. Please note
ian@0 170 that this function can fail.
ian@0 171
ian@0 172 If you want to use the device in bus mastering mode, call pci_set_master()
ian@0 173 which enables the bus master bit in PCI_COMMAND register and also fixes
ian@0 174 the latency timer value if it's set to something bogus by the BIOS.
ian@0 175
ian@0 176 If you want to use the PCI Memory-Write-Invalidate transaction,
ian@0 177 call pci_set_mwi(). This enables the PCI_COMMAND bit for Mem-Wr-Inval
ian@0 178 and also ensures that the cache line size register is set correctly.
ian@0 179 Make sure to check the return value of pci_set_mwi(), not all architectures
ian@0 180 may support Memory-Write-Invalidate.
ian@0 181
ian@0 182 If your driver decides to stop using the device (e.g., there was an
ian@0 183 error while setting it up or the driver module is being unloaded), it
ian@0 184 should call pci_disable_device() to deallocate any IRQ resources, disable
ian@0 185 PCI bus-mastering, etc. You should not do anything with the device after
ian@0 186 calling pci_disable_device().
ian@0 187
ian@0 188 4. How to access PCI config space
ian@0 189 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ian@0 190 You can use pci_(read|write)_config_(byte|word|dword) to access the config
ian@0 191 space of a device represented by struct pci_dev *. All these functions return 0
ian@0 192 when successful or an error code (PCIBIOS_...) which can be translated to a text
ian@0 193 string by pcibios_strerror. Most drivers expect that accesses to valid PCI
ian@0 194 devices don't fail.
ian@0 195
ian@0 196 If you don't have a struct pci_dev available, you can call
ian@0 197 pci_bus_(read|write)_config_(byte|word|dword) to access a given device
ian@0 198 and function on that bus.
ian@0 199
ian@0 200 If you access fields in the standard portion of the config header, please
ian@0 201 use symbolic names of locations and bits declared in <linux/pci.h>.
ian@0 202
ian@0 203 If you need to access Extended PCI Capability registers, just call
ian@0 204 pci_find_capability() for the particular capability and it will find the
ian@0 205 corresponding register block for you.
ian@0 206
ian@0 207
ian@0 208 5. Addresses and interrupts
ian@0 209 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
ian@0 210 Memory and port addresses and interrupt numbers should NOT be read from the
ian@0 211 config space. You should use the values in the pci_dev structure as they might
ian@0 212 have been remapped by the kernel.
ian@0 213
ian@0 214 See Documentation/IO-mapping.txt for how to access device memory.
ian@0 215
ian@0 216 The device driver needs to call pci_request_region() to make sure
ian@0 217 no other device is already using the same resource. The driver is expected
ian@0 218 to determine MMIO and IO Port resource availability _before_ calling
ian@0 219 pci_enable_device(). Conversely, drivers should call pci_release_region()
ian@0 220 _after_ calling pci_disable_device(). The idea is to prevent two devices
ian@0 221 colliding on the same address range.
ian@0 222
ian@0 223 Generic flavors of pci_request_region() are request_mem_region()
ian@0 224 (for MMIO ranges) and request_region() (for IO Port ranges).
ian@0 225 Use these for address resources that are not described by "normal" PCI
ian@0 226 interfaces (e.g. BAR).
ian@0 227
ian@0 228 All interrupt handlers should be registered with IRQF_SHARED and use the devid
ian@0 229 to map IRQs to devices (remember that all PCI interrupts are shared).
ian@0 230
ian@0 231
ian@0 232 6. Other interesting functions
ian@0 233 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ian@0 234 pci_find_slot() Find pci_dev corresponding to given bus and
ian@0 235 slot numbers.
ian@0 236 pci_set_power_state() Set PCI Power Management state (0=D0 ... 3=D3)
ian@0 237 pci_find_capability() Find specified capability in device's capability
ian@0 238 list.
ian@0 239 pci_module_init() Inline helper function for ensuring correct
ian@0 240 pci_driver initialization and error handling.
ian@0 241 pci_resource_start() Returns bus start address for a given PCI region
ian@0 242 pci_resource_end() Returns bus end address for a given PCI region
ian@0 243 pci_resource_len() Returns the byte length of a PCI region
ian@0 244 pci_set_drvdata() Set private driver data pointer for a pci_dev
ian@0 245 pci_get_drvdata() Return private driver data pointer for a pci_dev
ian@0 246 pci_set_mwi() Enable Memory-Write-Invalidate transactions.
ian@0 247 pci_clear_mwi() Disable Memory-Write-Invalidate transactions.
ian@0 248
ian@0 249
ian@0 250 7. Miscellaneous hints
ian@0 251 ~~~~~~~~~~~~~~~~~~~~~~
ian@0 252 When displaying PCI slot names to the user (for example when a driver wants
ian@0 253 to tell the user what card has it found), please use pci_name(pci_dev)
ian@0 254 for this purpose.
ian@0 255
ian@0 256 Always refer to the PCI devices by a pointer to the pci_dev structure.
ian@0 257 All PCI layer functions use this identification and it's the only
ian@0 258 reasonable one. Don't use bus/slot/function numbers except for very
ian@0 259 special purposes -- on systems with multiple primary buses their semantics
ian@0 260 can be pretty complex.
ian@0 261
ian@0 262 If you're going to use PCI bus mastering DMA, take a look at
ian@0 263 Documentation/DMA-mapping.txt.
ian@0 264
ian@0 265 Don't try to turn on Fast Back to Back writes in your driver. All devices
ian@0 266 on the bus need to be capable of doing it, so this is something which needs
ian@0 267 to be handled by platform and generic code, not individual drivers.
ian@0 268
ian@0 269
ian@0 270 8. Vendor and device identifications
ian@0 271 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ian@0 272 For the future, let's avoid adding device ids to include/linux/pci_ids.h.
ian@0 273
ian@0 274 PCI_VENDOR_ID_xxx for vendors, and a hex constant for device ids.
ian@0 275
ian@0 276 Rationale: PCI_VENDOR_ID_xxx constants are re-used, but device ids are not.
ian@0 277 Further, device ids are arbitrary hex numbers, normally used only in a
ian@0 278 single location, the pci_device_id table.
ian@0 279
ian@0 280 9. Obsolete functions
ian@0 281 ~~~~~~~~~~~~~~~~~~~~~
ian@0 282 There are several functions which you might come across when trying to
ian@0 283 port an old driver to the new PCI interface. They are no longer present
ian@0 284 in the kernel as they aren't compatible with hotplug or PCI domains or
ian@0 285 having sane locking.
ian@0 286
ian@0 287 pci_find_device() Superseded by pci_get_device()
ian@0 288 pci_find_subsys() Superseded by pci_get_subsys()
ian@0 289 pci_find_slot() Superseded by pci_get_slot()