view Documentation/IPMI.txt @ 452:c7ed6fe5dca0

kexec: dont initialise regions in reserve_memory()

There is no need to initialise efi_memmap_res and boot_param_res in
reserve_memory() for the initial xen domain as it is done in
machine_kexec_setup_resources() using values from the kexec hypercall.

Signed-off-by: Simon Horman <horms@verge.net.au>
author Keir Fraser <keir.fraser@citrix.com>
date Thu Feb 28 10:55:18 2008 +0000 (2008-02-28)
parents 831230e53067
line source
2 The Linux IPMI Driver
3 ---------------------
4 Corey Minyard
5 <minyard@mvista.com>
6 <minyard@acm.org>
8 The Intelligent Platform Management Interface, or IPMI, is a
9 standard for controlling intelligent devices that monitor a system.
10 It provides for dynamic discovery of sensors in the system and the
11 ability to monitor the sensors and be informed when the sensor's
12 values change or go outside certain boundaries. It also has a
13 standardized database for field-replaceable units (FRUs) and a watchdog
14 timer.
16 To use this, you need an interface to an IPMI controller in your
17 system (called a Baseboard Management Controller, or BMC) and
18 management software that can use the IPMI system.
20 This document describes how to use the IPMI driver for Linux. If you
21 are not familiar with IPMI itself, see the web site at
22 http://www.intel.com/design/servers/ipmi/index.htm. IPMI is a big
23 subject and I can't cover it all here!
25 Configuration
26 -------------
28 The Linux IPMI driver is modular, which means you have to pick several
29 things to have it work right depending on your hardware. Most of
30 these are available in the 'Character Devices' menu then the IPMI
31 menu.
33 No matter what, you must pick 'IPMI top-level message handler' to use
34 IPMI. What you do beyond that depends on your needs and hardware.
36 The message handler does not provide any user-level interfaces.
37 Kernel code (like the watchdog) can still use it. If you need access
38 from userland, you need to select 'Device interface for IPMI' if you
39 want access through a device driver.
41 The driver interface depends on your hardware. If your system
42 properly provides the SMBIOS info for IPMI, the driver will detect it
43 and just work. If you have a board with a standard interface (These
44 will generally be either "KCS", "SMIC", or "BT", consult your hardware
45 manual), choose the 'IPMI SI handler' option. A driver also exists
46 for direct I2C access to the IPMI management controller. Some boards
47 support this, but it is unknown if it will work on every board. For
48 this, choose 'IPMI SMBus handler', but be ready to try to do some
49 figuring to see if it will work on your system if the SMBIOS/APCI
50 information is wrong or not present. It is fairly safe to have both
51 these enabled and let the drivers auto-detect what is present.
53 You should generally enable ACPI on your system, as systems with IPMI
54 can have ACPI tables describing them.
56 If you have a standard interface and the board manufacturer has done
57 their job correctly, the IPMI controller should be automatically
58 detected (via ACPI or SMBIOS tables) and should just work. Sadly,
59 many boards do not have this information. The driver attempts
60 standard defaults, but they may not work. If you fall into this
61 situation, you need to read the section below named 'The SI Driver' or
62 "The SMBus Driver" on how to hand-configure your system.
64 IPMI defines a standard watchdog timer. You can enable this with the
65 'IPMI Watchdog Timer' config option. If you compile the driver into
66 the kernel, then via a kernel command-line option you can have the
67 watchdog timer start as soon as it initializes. It also have a lot
68 of other options, see the 'Watchdog' section below for more details.
69 Note that you can also have the watchdog continue to run if it is
70 closed (by default it is disabled on close). Go into the 'Watchdog
71 Cards' menu, enable 'Watchdog Timer Support', and enable the option
72 'Disable watchdog shutdown on close'.
74 IPMI systems can often be powered off using IPMI commands. Select
75 'IPMI Poweroff' to do this. The driver will auto-detect if the system
76 can be powered off by IPMI. It is safe to enable this even if your
77 system doesn't support this option. This works on ATCA systems, the
78 Radisys CPI1 card, and any IPMI system that supports standard chassis
79 management commands.
81 If you want the driver to put an event into the event log on a panic,
82 enable the 'Generate a panic event to all BMCs on a panic' option. If
83 you want the whole panic string put into the event log using OEM
84 events, enable the 'Generate OEM events containing the panic string'
85 option.
87 Basic Design
88 ------------
90 The Linux IPMI driver is designed to be very modular and flexible, you
91 only need to take the pieces you need and you can use it in many
92 different ways. Because of that, it's broken into many chunks of
93 code. These chunks (by module name) are:
95 ipmi_msghandler - This is the central piece of software for the IPMI
96 system. It handles all messages, message timing, and responses. The
97 IPMI users tie into this, and the IPMI physical interfaces (called
98 System Management Interfaces, or SMIs) also tie in here. This
99 provides the kernelland interface for IPMI, but does not provide an
100 interface for use by application processes.
102 ipmi_devintf - This provides a userland IOCTL interface for the IPMI
103 driver, each open file for this device ties in to the message handler
104 as an IPMI user.
106 ipmi_si - A driver for various system interfaces. This supports KCS,
107 SMIC, and BT interfaces. Unless you have an SMBus interface or your
108 own custom interface, you probably need to use this.
110 ipmi_smb - A driver for accessing BMCs on the SMBus. It uses the
111 I2C kernel driver's SMBus interfaces to send and receive IPMI messages
112 over the SMBus.
114 ipmi_watchdog - IPMI requires systems to have a very capable watchdog
115 timer. This driver implements the standard Linux watchdog timer
116 interface on top of the IPMI message handler.
118 ipmi_poweroff - Some systems support the ability to be turned off via
119 IPMI commands.
121 These are all individually selectable via configuration options.
123 Note that the KCS-only interface has been removed. The af_ipmi driver
124 is no longer supported and has been removed because it was impossible
125 to do 32 bit emulation on 64-bit kernels with it.
127 Much documentation for the interface is in the include files. The
128 IPMI include files are:
130 net/af_ipmi.h - Contains the socket interface.
132 linux/ipmi.h - Contains the user interface and IOCTL interface for IPMI.
134 linux/ipmi_smi.h - Contains the interface for system management interfaces
135 (things that interface to IPMI controllers) to use.
137 linux/ipmi_msgdefs.h - General definitions for base IPMI messaging.
140 Addressing
141 ----------
143 The IPMI addressing works much like IP addresses, you have an overlay
144 to handle the different address types. The overlay is:
146 struct ipmi_addr
147 {
148 int addr_type;
149 short channel;
150 char data[IPMI_MAX_ADDR_SIZE];
151 };
153 The addr_type determines what the address really is. The driver
154 currently understands two different types of addresses.
156 "System Interface" addresses are defined as:
158 struct ipmi_system_interface_addr
159 {
160 int addr_type;
161 short channel;
162 };
164 and the type is IPMI_SYSTEM_INTERFACE_ADDR_TYPE. This is used for talking
165 straight to the BMC on the current card. The channel must be
168 Messages that are destined to go out on the IPMB bus use the
169 IPMI_IPMB_ADDR_TYPE address type. The format is
171 struct ipmi_ipmb_addr
172 {
173 int addr_type;
174 short channel;
175 unsigned char slave_addr;
176 unsigned char lun;
177 };
179 The "channel" here is generally zero, but some devices support more
180 than one channel, it corresponds to the channel as defined in the IPMI
181 spec.
184 Messages
185 --------
187 Messages are defined as:
189 struct ipmi_msg
190 {
191 unsigned char netfn;
192 unsigned char lun;
193 unsigned char cmd;
194 unsigned char *data;
195 int data_len;
196 };
198 The driver takes care of adding/stripping the header information. The
199 data portion is just the data to be send (do NOT put addressing info
200 here) or the response. Note that the completion code of a response is
201 the first item in "data", it is not stripped out because that is how
202 all the messages are defined in the spec (and thus makes counting the
203 offsets a little easier :-).
205 When using the IOCTL interface from userland, you must provide a block
206 of data for "data", fill it, and set data_len to the length of the
207 block of data, even when receiving messages. Otherwise the driver
208 will have no place to put the message.
210 Messages coming up from the message handler in kernelland will come in
211 as:
213 struct ipmi_recv_msg
214 {
215 struct list_head link;
217 /* The type of message as defined in the "Receive Types"
218 defines above. */
219 int recv_type;
221 ipmi_user_t *user;
222 struct ipmi_addr addr;
223 long msgid;
224 struct ipmi_msg msg;
226 /* Call this when done with the message. It will presumably free
227 the message and do any other necessary cleanup. */
228 void (*done)(struct ipmi_recv_msg *msg);
230 /* Place-holder for the data, don't make any assumptions about
231 the size or existence of this, since it may change. */
232 unsigned char msg_data[IPMI_MAX_MSG_LENGTH];
233 };
235 You should look at the receive type and handle the message
236 appropriately.
239 The Upper Layer Interface (Message Handler)
240 -------------------------------------------
242 The upper layer of the interface provides the users with a consistent
243 view of the IPMI interfaces. It allows multiple SMI interfaces to be
244 addressed (because some boards actually have multiple BMCs on them)
245 and the user should not have to care what type of SMI is below them.
248 Creating the User
250 To user the message handler, you must first create a user using
251 ipmi_create_user. The interface number specifies which SMI you want
252 to connect to, and you must supply callback functions to be called
253 when data comes in. The callback function can run at interrupt level,
254 so be careful using the callbacks. This also allows to you pass in a
255 piece of data, the handler_data, that will be passed back to you on
256 all calls.
258 Once you are done, call ipmi_destroy_user() to get rid of the user.
260 From userland, opening the device automatically creates a user, and
261 closing the device automatically destroys the user.
264 Messaging
266 To send a message from kernel-land, the ipmi_request() call does
267 pretty much all message handling. Most of the parameter are
268 self-explanatory. However, it takes a "msgid" parameter. This is NOT
269 the sequence number of messages. It is simply a long value that is
270 passed back when the response for the message is returned. You may
271 use it for anything you like.
273 Responses come back in the function pointed to by the ipmi_recv_hndl
274 field of the "handler" that you passed in to ipmi_create_user().
275 Remember again, these may be running at interrupt level. Remember to
276 look at the receive type, too.
278 From userland, you fill out an ipmi_req_t structure and use the
279 IPMICTL_SEND_COMMAND ioctl. For incoming stuff, you can use select()
280 or poll() to wait for messages to come in. However, you cannot use
281 read() to get them, you must call the IPMICTL_RECEIVE_MSG with the
282 ipmi_recv_t structure to actually get the message. Remember that you
283 must supply a pointer to a block of data in the msg.data field, and
284 you must fill in the msg.data_len field with the size of the data.
285 This gives the receiver a place to actually put the message.
287 If the message cannot fit into the data you provide, you will get an
288 EMSGSIZE error and the driver will leave the data in the receive
289 queue. If you want to get it and have it truncate the message, us
292 When you send a command (which is defined by the lowest-order bit of
293 the netfn per the IPMI spec) on the IPMB bus, the driver will
294 automatically assign the sequence number to the command and save the
295 command. If the response is not receive in the IPMI-specified 5
296 seconds, it will generate a response automatically saying the command
297 timed out. If an unsolicited response comes in (if it was after 5
298 seconds, for instance), that response will be ignored.
300 In kernelland, after you receive a message and are done with it, you
301 MUST call ipmi_free_recv_msg() on it, or you will leak messages. Note
302 that you should NEVER mess with the "done" field of a message, that is
303 required to properly clean up the message.
305 Note that when sending, there is an ipmi_request_supply_msgs() call
306 that lets you supply the smi and receive message. This is useful for
307 pieces of code that need to work even if the system is out of buffers
308 (the watchdog timer uses this, for instance). You supply your own
309 buffer and own free routines. This is not recommended for normal use,
310 though, since it is tricky to manage your own buffers.
313 Events and Incoming Commands
315 The driver takes care of polling for IPMI events and receiving
316 commands (commands are messages that are not responses, they are
317 commands that other things on the IPMB bus have sent you). To receive
318 these, you must register for them, they will not automatically be sent
319 to you.
321 To receive events, you must call ipmi_set_gets_events() and set the
322 "val" to non-zero. Any events that have been received by the driver
323 since startup will immediately be delivered to the first user that
324 registers for events. After that, if multiple users are registered
325 for events, they will all receive all events that come in.
327 For receiving commands, you have to individually register commands you
328 want to receive. Call ipmi_register_for_cmd() and supply the netfn
329 and command name for each command you want to receive. Only one user
330 may be registered for each netfn/cmd, but different users may register
331 for different commands.
333 From userland, equivalent IOCTLs are provided to do these functions.
336 The Lower Layer (SMI) Interface
337 -------------------------------
339 As mentioned before, multiple SMI interfaces may be registered to the
340 message handler, each of these is assigned an interface number when
341 they register with the message handler. They are generally assigned
342 in the order they register, although if an SMI unregisters and then
343 another one registers, all bets are off.
345 The ipmi_smi.h defines the interface for management interfaces, see
346 that for more details.
349 The SI Driver
350 -------------
352 The SI driver allows up to 4 KCS or SMIC interfaces to be configured
353 in the system. By default, scan the ACPI tables for interfaces, and
354 if it doesn't find any the driver will attempt to register one KCS
355 interface at the spec-specified I/O port 0xca2 without interrupts.
356 You can change this at module load time (for a module) with:
358 modprobe ipmi_si.o type=<type1>,<type2>....
359 ports=<port1>,<port2>... addrs=<addr1>,<addr2>...
360 irqs=<irq1>,<irq2>... trydefaults=[0|1]
361 regspacings=<sp1>,<sp2>,... regsizes=<size1>,<size2>,...
362 regshifts=<shift1>,<shift2>,...
363 slave_addrs=<addr1>,<addr2>,...
365 Each of these except si_trydefaults is a list, the first item for the
366 first interface, second item for the second interface, etc.
368 The si_type may be either "kcs", "smic", or "bt". If you leave it blank, it
369 defaults to "kcs".
371 If you specify si_addrs as non-zero for an interface, the driver will
372 use the memory address given as the address of the device. This
373 overrides si_ports.
375 If you specify si_ports as non-zero for an interface, the driver will
376 use the I/O port given as the device address.
378 If you specify si_irqs as non-zero for an interface, the driver will
379 attempt to use the given interrupt for the device.
381 si_trydefaults sets whether the standard IPMI interface at 0xca2 and
382 any interfaces specified by ACPE are tried. By default, the driver
383 tries it, set this value to zero to turn this off.
385 The next three parameters have to do with register layout. The
386 registers used by the interfaces may not appear at successive
387 locations and they may not be in 8-bit registers. These parameters
388 allow the layout of the data in the registers to be more precisely
389 specified.
391 The regspacings parameter give the number of bytes between successive
392 register start addresses. For instance, if the regspacing is set to 4
393 and the start address is 0xca2, then the address for the second
394 register would be 0xca6. This defaults to 1.
396 The regsizes parameter gives the size of a register, in bytes. The
397 data used by IPMI is 8-bits wide, but it may be inside a larger
398 register. This parameter allows the read and write type to specified.
399 It may be 1, 2, 4, or 8. The default is 1.
401 Since the register size may be larger than 32 bits, the IPMI data may not
402 be in the lower 8 bits. The regshifts parameter give the amount to shift
403 the data to get to the actual IPMI data.
405 The slave_addrs specifies the IPMI address of the local BMC. This is
406 usually 0x20 and the driver defaults to that, but in case it's not, it
407 can be specified when the driver starts up.
409 When compiled into the kernel, the addresses can be specified on the
410 kernel command line as:
412 ipmi_si.type=<type1>,<type2>...
413 ipmi_si.ports=<port1>,<port2>... ipmi_si.addrs=<addr1>,<addr2>...
414 ipmi_si.irqs=<irq1>,<irq2>... ipmi_si.trydefaults=[0|1]
415 ipmi_si.regspacings=<sp1>,<sp2>,...
416 ipmi_si.regsizes=<size1>,<size2>,...
417 ipmi_si.regshifts=<shift1>,<shift2>,...
418 ipmi_si.slave_addrs=<addr1>,<addr2>,...
420 It works the same as the module parameters of the same names.
422 By default, the driver will attempt to detect any device specified by
423 ACPI, and if none of those then a KCS device at the spec-specified
424 0xca2. If you want to turn this off, set the "trydefaults" option to
425 false.
427 If you have high-res timers compiled into the kernel, the driver will
428 use them to provide much better performance. Note that if you do not
429 have high-res timers enabled in the kernel and you don't have
430 interrupts enabled, the driver will run VERY slowly. Don't blame me,
431 these interfaces suck.
434 The SMBus Driver
435 ----------------
437 The SMBus driver allows up to 4 SMBus devices to be configured in the
438 system. By default, the driver will register any SMBus interfaces it finds
439 in the I2C address range of 0x20 to 0x4f on any adapter. You can change this
440 at module load time (for a module) with:
442 modprobe ipmi_smb.o
443 addr=<adapter1>,<i2caddr1>[,<adapter2>,<i2caddr2>[,...]]
444 dbg=<flags1>,<flags2>...
445 [defaultprobe=1] [dbg_probe=1]
447 The addresses are specified in pairs, the first is the adapter ID and the
448 second is the I2C address on that adapter.
450 The debug flags are bit flags for each BMC found, they are:
451 IPMI messages: 1, driver state: 2, timing: 4, I2C probe: 8
453 Setting smb_defaultprobe to zero disabled the default probing of SMBus
454 interfaces at address range 0x20 to 0x4f. This means that only the
455 BMCs specified on the smb_addr line will be detected.
457 Setting smb_dbg_probe to 1 will enable debugging of the probing and
458 detection process for BMCs on the SMBusses.
460 Discovering the IPMI compilant BMC on the SMBus can cause devices
461 on the I2C bus to fail. The SMBus driver writes a "Get Device ID" IPMI
462 message as a block write to the I2C bus and waits for a response.
463 This action can be detrimental to some I2C devices. It is highly recommended
464 that the known I2c address be given to the SMBus driver in the smb_addr
465 parameter. The default adrress range will not be used when a smb_addr
466 parameter is provided.
468 When compiled into the kernel, the addresses can be specified on the
469 kernel command line as:
471 ipmb_smb.addr=<adapter1>,<i2caddr1>[,<adapter2>,<i2caddr2>[,...]]
472 ipmi_smb.dbg=<flags1>,<flags2>...
473 ipmi_smb.defaultprobe=0 ipmi_smb.dbg_probe=1
475 These are the same options as on the module command line.
477 Note that you might need some I2C changes if CONFIG_IPMI_PANIC_EVENT
478 is enabled along with this, so the I2C driver knows to run to
479 completion during sending a panic event.
482 Other Pieces
483 ------------
485 Watchdog
486 --------
488 A watchdog timer is provided that implements the Linux-standard
489 watchdog timer interface. It has three module parameters that can be
490 used to control it:
492 modprobe ipmi_watchdog timeout=<t> pretimeout=<t> action=<action type>
493 preaction=<preaction type> preop=<preop type> start_now=x
494 nowayout=x
496 The timeout is the number of seconds to the action, and the pretimeout
497 is the amount of seconds before the reset that the pre-timeout panic will
498 occur (if pretimeout is zero, then pretimeout will not be enabled). Note
499 that the pretimeout is the time before the final timeout. So if the
500 timeout is 50 seconds and the pretimeout is 10 seconds, then the pretimeout
501 will occur in 40 second (10 seconds before the timeout).
503 The action may be "reset", "power_cycle", or "power_off", and
504 specifies what to do when the timer times out, and defaults to
505 "reset".
507 The preaction may be "pre_smi" for an indication through the SMI
508 interface, "pre_int" for an indication through the SMI with an
509 interrupts, and "pre_nmi" for a NMI on a preaction. This is how
510 the driver is informed of the pretimeout.
512 The preop may be set to "preop_none" for no operation on a pretimeout,
513 "preop_panic" to set the preoperation to panic, or "preop_give_data"
514 to provide data to read from the watchdog device when the pretimeout
515 occurs. A "pre_nmi" setting CANNOT be used with "preop_give_data"
516 because you can't do data operations from an NMI.
518 When preop is set to "preop_give_data", one byte comes ready to read
519 on the device when the pretimeout occurs. Select and fasync work on
520 the device, as well.
522 If start_now is set to 1, the watchdog timer will start running as
523 soon as the driver is loaded.
525 If nowayout is set to 1, the watchdog timer will not stop when the
526 watchdog device is closed. The default value of nowayout is true
527 if the CONFIG_WATCHDOG_NOWAYOUT option is enabled, or false if not.
529 When compiled into the kernel, the kernel command line is available
530 for configuring the watchdog:
532 ipmi_watchdog.timeout=<t> ipmi_watchdog.pretimeout=<t>
533 ipmi_watchdog.action=<action type>
534 ipmi_watchdog.preaction=<preaction type>
535 ipmi_watchdog.preop=<preop type>
536 ipmi_watchdog.start_now=x
537 ipmi_watchdog.nowayout=x
539 The options are the same as the module parameter options.
541 The watchdog will panic and start a 120 second reset timeout if it
542 gets a pre-action. During a panic or a reboot, the watchdog will
543 start a 120 timer if it is running to make sure the reboot occurs.
545 Note that if you use the NMI preaction for the watchdog, you MUST
546 NOT use nmi watchdog mode 1. If you use the NMI watchdog, you
547 must use mode 2.
549 Once you open the watchdog timer, you must write a 'V' character to the
550 device to close it, or the timer will not stop. This is a new semantic
551 for the driver, but makes it consistent with the rest of the watchdog
552 drivers in Linux.
555 Panic Timeouts
556 --------------
558 The OpenIPMI driver supports the ability to put semi-custom and custom
559 events in the system event log if a panic occurs. if you enable the
560 'Generate a panic event to all BMCs on a panic' option, you will get
561 one event on a panic in a standard IPMI event format. If you enable
562 the 'Generate OEM events containing the panic string' option, you will
563 also get a bunch of OEM events holding the panic string.
566 The field settings of the events are:
567 * Generator ID: 0x21 (kernel)
568 * EvM Rev: 0x03 (this event is formatting in IPMI 1.0 format)
569 * Sensor Type: 0x20 (OS critical stop sensor)
570 * Sensor #: The first byte of the panic string (0 if no panic string)
571 * Event Dir | Event Type: 0x6f (Assertion, sensor-specific event info)
572 * Event Data 1: 0xa1 (Runtime stop in OEM bytes 2 and 3)
573 * Event data 2: second byte of panic string
574 * Event data 3: third byte of panic string
575 See the IPMI spec for the details of the event layout. This event is
576 always sent to the local management controller. It will handle routing
577 the message to the right place
579 Other OEM events have the following format:
580 Record ID (bytes 0-1): Set by the SEL.
581 Record type (byte 2): 0xf0 (OEM non-timestamped)
582 byte 3: The slave address of the card saving the panic
583 byte 4: A sequence number (starting at zero)
584 The rest of the bytes (11 bytes) are the panic string. If the panic string
585 is longer than 11 bytes, multiple messages will be sent with increasing
586 sequence numbers.
588 Because you cannot send OEM events using the standard interface, this
589 function will attempt to find an SEL and add the events there. It
590 will first query the capabilities of the local management controller.
591 If it has an SEL, then they will be stored in the SEL of the local
592 management controller. If not, and the local management controller is
593 an event generator, the event receiver from the local management
594 controller will be queried and the events sent to the SEL on that
595 device. Otherwise, the events go nowhere since there is nowhere to
596 send them.
599 Poweroff
600 --------
602 If the poweroff capability is selected, the IPMI driver will install
603 a shutdown function into the standard poweroff function pointer. This
604 is in the ipmi_poweroff module. When the system requests a powerdown,
605 it will send the proper IPMI commands to do this. This is supported on
606 several platforms.
608 There is a module parameter named "poweroff_powercycle" that may
609 either be zero (do a power down) or non-zero (do a power cycle, power
610 the system off, then power it on in a few seconds). Setting
611 ipmi_poweroff.poweroff_control=x will do the same thing on the kernel
612 command line. The parameter is also available via the proc filesystem
613 in /proc/sys/dev/ipmi/poweroff_powercycle. Note that if the system
614 does not support power cycling, it will always do the power off.
616 Note that if you have ACPI enabled, the system will prefer using ACPI to
617 power off.