ia64/linux-2.6.18-xen.hg

annotate Documentation/BUG-HUNTING @ 524:7f8b544237bf

netfront: Allow netfront in domain 0.

This is useful if your physical network device is in a utility domain.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
author Keir Fraser <keir.fraser@citrix.com>
date Tue Apr 15 15:18:58 2008 +0100 (2008-04-15)
parents 831230e53067
children
rev   line source
ian@0 1 Table of contents
ian@0 2 =================
ian@0 3
ian@0 4 Last updated: 20 December 2005
ian@0 5
ian@0 6 Contents
ian@0 7 ========
ian@0 8
ian@0 9 - Introduction
ian@0 10 - Devices not appearing
ian@0 11 - Finding patch that caused a bug
ian@0 12 -- Finding using git-bisect
ian@0 13 -- Finding it the old way
ian@0 14 - Fixing the bug
ian@0 15
ian@0 16 Introduction
ian@0 17 ============
ian@0 18
ian@0 19 Always try the latest kernel from kernel.org and build from source. If you are
ian@0 20 not confident in doing that please report the bug to your distribution vendor
ian@0 21 instead of to a kernel developer.
ian@0 22
ian@0 23 Finding bugs is not always easy. Have a go though. If you can't find it don't
ian@0 24 give up. Report as much as you have found to the relevant maintainer. See
ian@0 25 MAINTAINERS for who that is for the subsystem you have worked on.
ian@0 26
ian@0 27 Before you submit a bug report read REPORTING-BUGS.
ian@0 28
ian@0 29 Devices not appearing
ian@0 30 =====================
ian@0 31
ian@0 32 Often this is caused by udev. Check that first before blaming it on the
ian@0 33 kernel.
ian@0 34
ian@0 35 Finding patch that caused a bug
ian@0 36 ===============================
ian@0 37
ian@0 38
ian@0 39
ian@0 40 Finding using git-bisect
ian@0 41 ------------------------
ian@0 42
ian@0 43 Using the provided tools with git makes finding bugs easy provided the bug is
ian@0 44 reproducible.
ian@0 45
ian@0 46 Steps to do it:
ian@0 47 - start using git for the kernel source
ian@0 48 - read the man page for git-bisect
ian@0 49 - have fun
ian@0 50
ian@0 51 Finding it the old way
ian@0 52 ----------------------
ian@0 53
ian@0 54 [Sat Mar 2 10:32:33 PST 1996 KERNEL_BUG-HOWTO lm@sgi.com (Larry McVoy)]
ian@0 55
ian@0 56 This is how to track down a bug if you know nothing about kernel hacking.
ian@0 57 It's a brute force approach but it works pretty well.
ian@0 58
ian@0 59 You need:
ian@0 60
ian@0 61 . A reproducible bug - it has to happen predictably (sorry)
ian@0 62 . All the kernel tar files from a revision that worked to the
ian@0 63 revision that doesn't
ian@0 64
ian@0 65 You will then do:
ian@0 66
ian@0 67 . Rebuild a revision that you believe works, install, and verify that.
ian@0 68 . Do a binary search over the kernels to figure out which one
ian@0 69 introduced the bug. I.e., suppose 1.3.28 didn't have the bug, but
ian@0 70 you know that 1.3.69 does. Pick a kernel in the middle and build
ian@0 71 that, like 1.3.50. Build & test; if it works, pick the mid point
ian@0 72 between .50 and .69, else the mid point between .28 and .50.
ian@0 73 . You'll narrow it down to the kernel that introduced the bug. You
ian@0 74 can probably do better than this but it gets tricky.
ian@0 75
ian@0 76 . Narrow it down to a subdirectory
ian@0 77
ian@0 78 - Copy kernel that works into "test". Let's say that 3.62 works,
ian@0 79 but 3.63 doesn't. So you diff -r those two kernels and come
ian@0 80 up with a list of directories that changed. For each of those
ian@0 81 directories:
ian@0 82
ian@0 83 Copy the non-working directory next to the working directory
ian@0 84 as "dir.63".
ian@0 85 One directory at time, try moving the working directory to
ian@0 86 "dir.62" and mv dir.63 dir"time, try
ian@0 87
ian@0 88 mv dir dir.62
ian@0 89 mv dir.63 dir
ian@0 90 find dir -name '*.[oa]' -print | xargs rm -f
ian@0 91
ian@0 92 And then rebuild and retest. Assuming that all related
ian@0 93 changes were contained in the sub directory, this should
ian@0 94 isolate the change to a directory.
ian@0 95
ian@0 96 Problems: changes in header files may have occurred; I've
ian@0 97 found in my case that they were self explanatory - you may
ian@0 98 or may not want to give up when that happens.
ian@0 99
ian@0 100 . Narrow it down to a file
ian@0 101
ian@0 102 - You can apply the same technique to each file in the directory,
ian@0 103 hoping that the changes in that file are self contained.
ian@0 104
ian@0 105 . Narrow it down to a routine
ian@0 106
ian@0 107 - You can take the old file and the new file and manually create
ian@0 108 a merged file that has
ian@0 109
ian@0 110 #ifdef VER62
ian@0 111 routine()
ian@0 112 {
ian@0 113 ...
ian@0 114 }
ian@0 115 #else
ian@0 116 routine()
ian@0 117 {
ian@0 118 ...
ian@0 119 }
ian@0 120 #endif
ian@0 121
ian@0 122 And then walk through that file, one routine at a time and
ian@0 123 prefix it with
ian@0 124
ian@0 125 #define VER62
ian@0 126 /* both routines here */
ian@0 127 #undef VER62
ian@0 128
ian@0 129 Then recompile, retest, move the ifdefs until you find the one
ian@0 130 that makes the difference.
ian@0 131
ian@0 132 Finally, you take all the info that you have, kernel revisions, bug
ian@0 133 description, the extent to which you have narrowed it down, and pass
ian@0 134 that off to whomever you believe is the maintainer of that section.
ian@0 135 A post to linux.dev.kernel isn't such a bad idea if you've done some
ian@0 136 work to narrow it down.
ian@0 137
ian@0 138 If you get it down to a routine, you'll probably get a fix in 24 hours.
ian@0 139
ian@0 140 My apologies to Linus and the other kernel hackers for describing this
ian@0 141 brute force approach, it's hardly what a kernel hacker would do. However,
ian@0 142 it does work and it lets non-hackers help fix bugs. And it is cool
ian@0 143 because Linux snapshots will let you do this - something that you can't
ian@0 144 do with vendor supplied releases.
ian@0 145
ian@0 146 Fixing the bug
ian@0 147 ==============
ian@0 148
ian@0 149 Nobody is going to tell you how to fix bugs. Seriously. You need to work it
ian@0 150 out. But below are some hints on how to use the tools.
ian@0 151
ian@0 152 To debug a kernel, use objdump and look for the hex offset from the crash
ian@0 153 output to find the valid line of code/assembler. Without debug symbols, you
ian@0 154 will see the assembler code for the routine shown, but if your kernel has
ian@0 155 debug symbols the C code will also be available. (Debug symbols can be enabled
ian@0 156 in the kernel hacking menu of the menu configuration.) For example:
ian@0 157
ian@0 158 objdump -r -S -l --disassemble net/dccp/ipv4.o
ian@0 159
ian@0 160 NB.: you need to be at the top level of the kernel tree for this to pick up
ian@0 161 your C files.
ian@0 162
ian@0 163 If you don't have access to the code you can also debug on some crash dumps
ian@0 164 e.g. crash dump output as shown by Dave Miller.
ian@0 165
ian@0 166 > EIP is at ip_queue_xmit+0x14/0x4c0
ian@0 167 > ...
ian@0 168 > Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
ian@0 169 > 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
ian@0 170 > <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85
ian@0 171 >
ian@0 172 > Put the bytes into a "foo.s" file like this:
ian@0 173 >
ian@0 174 > .text
ian@0 175 > .globl foo
ian@0 176 > foo:
ian@0 177 > .byte .... /* bytes from Code: part of OOPS dump */
ian@0 178 >
ian@0 179 > Compile it with "gcc -c -o foo.o foo.s" then look at the output of
ian@0 180 > "objdump --disassemble foo.o".
ian@0 181 >
ian@0 182 > Output:
ian@0 183 >
ian@0 184 > ip_queue_xmit:
ian@0 185 > push %ebp
ian@0 186 > push %edi
ian@0 187 > push %esi
ian@0 188 > push %ebx
ian@0 189 > sub $0xbc, %esp
ian@0 190 > mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb)
ian@0 191 > mov 0x8(%ebp), %ebx ! %ebx = skb->sk
ian@0 192 > mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt
ian@0 193
ian@0 194 Another very useful option of the Kernel Hacking section in menuconfig is
ian@0 195 Debug memory allocations. This will help you see whether data has been
ian@0 196 initialised and not set before use etc. To see the values that get assigned
ian@0 197 with this look at mm/slab.c and search for POISON_INUSE. When using this an
ian@0 198 Oops will often show the poisoned data instead of zero which is the default.
ian@0 199
ian@0 200 Once you have worked out a fix please submit it upstream. After all open
ian@0 201 source is about sharing what you do and don't you want to be recognised for
ian@0 202 your genius?
ian@0 203
ian@0 204 Please do read Documentation/SubmittingPatches though to help your code get
ian@0 205 accepted.