ia64/xen-unstable

changeset 18866:12c0acf08caf

New document on error handling in Xen.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Thu Dec 04 12:35:22 2008 +0000 (2008-12-04)
parents cb289056b523
children a71c610cc9e6
files docs/misc/xen-error-handling.txt
line diff
     1.1 --- /dev/null	Thu Jan 01 00:00:00 1970 +0000
     1.2 +++ b/docs/misc/xen-error-handling.txt	Thu Dec 04 12:35:22 2008 +0000
     1.3 @@ -0,0 +1,81 @@
     1.4 +Error handling in Xen
     1.5 +---------------------
     1.6 +
     1.7 +1. domain_crash()
     1.8 +-----------------
     1.9 +Crash the specified domain due to buggy or unsupported behaviour of the
    1.10 +guest. This should not be used where the hypervisor itself is in
    1.11 +error, even if the scope of that error affects only a single
    1.12 +domain. BUG() is a more appropriate failure method for hypervisor
    1.13 +bugs. To repeat: domain_crash() is the correct response for erroneous
    1.14 +or unsupported *guest* behaviour!
    1.15 +
    1.16 +Note that this should be used in most cases in preference to
    1.17 +domain_crash_synchronous(): domain_crash() returns to the caller,
    1.18 +allowing the crash to be deferred for the currently executing VCPU
    1.19 +until certain resources (notably, spinlocks) have been released.
    1.20 +
    1.21 +Example usages:
    1.22 + * Unrecoverable guest kernel stack overflows
    1.23 + * Unsupported corners of HVM device models
    1.24 +
    1.25 +2. BUG()
    1.26 +--------
    1.27 +Crashes the host system with an informative file/line error message
    1.28 +and a backtrace. Use this to check consistency assumptions within the
    1.29 +hypervisor.
    1.30 +
    1.31 +Be careful not to use BUG() (or BUG_ON(), or ASSERT()) for failures
    1.32 +*outside* the hypervisor software -- in particular, guest bugs (where
    1.33 +domain_crash() is more appropriate) or non-critical BIOS or hardware
    1.34 +errors (where retry or feature disable are more appropriate).
    1.35 +
    1.36 +Example usage: In arch/x86/hvm/i8254.c an I/O port handler includes
    1.37 +the check BUG_ON(bytes != 1). We choose this extreme reaction to the
    1.38 +unexpected error case because, although it could be handled by failing
    1.39 +the I/O access or crashing the domain, it is indicative of an
    1.40 +unexpected inconsistency in the hypervisor itself (since the I/O
    1.41 +handler was only registered for single-byte accesses).
    1.42 +
    1.43 +
    1.44 +3. BUG_ON()
    1.45 +-----------
    1.46 +BUG_ON(...) is merely a convenient short form for "if (...) BUG()". It
    1.47 +is most commonly used as an 'always on' alternative to ASSERT().
    1.48 +
    1.49 +
    1.50 +4. ASSERT()
    1.51 +-----------
    1.52 +Similar to BUG_ON(), except that it is only enabled for debug builds
    1.53 +of the hypervisor. Typically ASSERT() is used only where the (usually
    1.54 +small) overheads of an always-on debug check might be considered
    1.55 +excessive. A good example might be within inner loops of time-critical
    1.56 +functions, or where an assertion is extreme paranoia (considered
    1.57 +*particularly* unlikely ever to fail).
    1.58 +
    1.59 +In general, if in doubt, use BUG_ON() in preference to ASSERT().
    1.60 +
    1.61 +
    1.62 +5. panic()
    1.63 +----------
    1.64 +Like BUG() and ASSERT() this will crash and reboot the host
    1.65 +system. However it does this after printing only an error message with
    1.66 +no extra diagnostic information such as a backtrace. panic() is
    1.67 +generally used where an unsupported system configuration is detected,
    1.68 +particularly during boot, and where extra diagnostic information about
    1.69 +CPU context would not be useful. It may also be used before exception
    1.70 +handling is enabled during Xen bootstrap (on x86, BUG() and ASSERT()
    1.71 +depend on Xen's exception-handling capabilities).
    1.72 +
    1.73 +Example usage: Most commonly for out-of-memory errors during
    1.74 +bootstrap. The failure is unexpected since a host should always have
    1.75 +enough memory to boot Xen, but if the failure does occur then the
    1.76 +context of the failed memory allocation itself is not very
    1.77 +interesting.
    1.78 +
    1.79 +
    1.80 +6. Feature disable
    1.81 +------------------
    1.82 +A possible approach to dealing with boot-time errors, rather than
    1.83 +crashing the hypervisor. It's particularly appropriate when parsing
    1.84 +non-critical BIOS tables and detecting extended hardware features.