Bug 6180

Summary: XFS oopses on my box sometimes
Product: File System Reporter: Avuton Olrich (avuton)
Component: XFSAssignee: XFS Guru (xfs-masters)
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal    
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: v2.6.16-rc5+ Subsystem:
Regression: --- Bisected commit-id:
Attachments: .config

Description Avuton Olrich 2006-03-07 02:18:27 UTC
Most recent kernel where this bug did not occur:??  
Distribution:Gentoo  
Hardware Environment:x86  
Software Environment:  
Problem Description:  
I'm not sure when this started, but it has been happening (at least)  
for the last week, I regularly pull git. I just haven't had the time  
to record this oops output until now, and I had to use the  
hand-and-paper method. I did try to verify as well as possible. Also,  
I ran fsck to fix errors  
on this disk in the last week, but it appears the problem still exists.  
  
It does happen often, but I haven't happened to have netconsole on  
when it happens. Also S3 does work on this laptop and I use it often,  
although I'm not sure that's any help.  
  
I'm not sure when I changed to XFS (within the last two months) and  
I'm also not sure when this appeared to start happening :/ This  
problem does corrupt whatever I was last working on (usually source  
files, it makes them into binaries, at least until I fsck)  
 
I also have no way to just 'reproduce' this. After staying on this box for a 
while sometimes it will happen. 
  
Linux micromachine 2.6.16-rc5 #16 PREEMPT Sun Mar 5 13:13:58 PST 2006  
i686 Transmeta(tm) Crusoe(tm) Processor TM5800 GNU/Linux  
  
kernel: 2.6.16-rc5 git: >master      ea088b8d481fcff001f7e628c44daf39a229d9fc  
  
Unable to handle kernel NULL pointer dereference at virtual address: 00000000  
print eip:  
c022b3c1  
*pde = 00000000  
Oops: 0000 [#1]  
PREEMPT  
Modules linked in: snd_seq snd_seq_device snd_ali5154 snd_ac97_codec  
snd_ac97_bus snd_pcm snd_timer snd snd_page_alloc ext2 vfat  
CPU: 0  
EIP: 0060:[<c022b3c1>] Not tainted VLI  
EFLAGS: 00010246 (2.6.16-rc5 #15)  
EIP is at xfs_read+0x121/0x2e0  
eax: 00000000 ebx: 00001000 ecx: d0a30768 edx: cd2660b8  
esi: fffffffb edi: 00000000 ebp: 00000001 esp: d89d5e68  
ds:  007b es: 0076 ss: 0068  
Process xdm (pid 29670, threadinfo=d89d5000 task=d6cc7a90)  
Stack: <0>db6c1000 c1578990 c145f220 c01457bd b7fdb000 d89d5f64  
d89d5ed0 d89d5ef8 d0a30788 cd26b0b8 00000000 00000000 c6258f40 cd26b5d4  
d0a30768 c154c400 00000000 c0416ba0 00000000 d89d5ef8 cd26b5d4  
c0227ff6 00000001 d89d5f40  
  
Call Trace:  
[<c01457bd>] unmap_region+0xad/0x110  
[<c0227ff6>] linvfs_aio_read+0x76/0xa0  
[<c0154858>] do_sync_read+0xb8/0x100  
[<c012b7e0>] autoremove_wake_function+0x0/0x50  
[<c01552a1>] vfs_read+0xa1/0x160  
[<c01547a0>] do_sync_read+0x0/0x100  
[<c0155721>] sys_read+0x41/0x70  
[<c0102ea3>] sysenter_past_esp+0x54/0x75  
  
Code: 00 89 d1 83 e0 10 09 c1 75 c2 8b 44 24 2c 85 c0 0f 85 bc 01 00  
00 8b 44 24 38 ba 02 00 00 00 e8 26 2e fd ff 8b 54 24 24 8b 42 04 <f6>  
00 04 0f 85 13 01 00 00 8b 44 24 5c 89 e9 8b 54 24 18 89 04  
  
Steps to reproduce:
Comment 1 Avuton Olrich 2006-03-07 02:19:28 UTC
Created attachment 7526 [details]
.config
Comment 2 Nathan Scott 2006-03-13 22:56:06 UTC
| I ran fsck to fix errors on this disk in the last week,
| but it appears the problem still exists.

Could you unmount and run xfs_repair please, and capture the output.
(fsck is a no-op on XFS).

I spent some time last week on this, just didnt get a message out to
you - my disassembly of the code around the point of your panic gave
me no real clues - I think (not 100% sure though...) its an xfs_inode
NULL pointer deref, but my disassemly didnt have instructions quite
lining up with yours (probably different gcc versions).  I'm taking a
stab in the dark that you may have a corrupt inode nlink field ondisk
for a particular inode, but thats just a guess at this stage based on
the oops.

Can you try figure out more exactly when this started?  (is there a
kernel version you've reverted back to where it doesn't occur?).  If
there is anything unusual in the xfs_repair output, pls add it here.
Also, is it always the xdm process (from yoour trace) that triggers
this or does it vary?  And are your stacktrace EIP always in xfs_read
as in your hand-copied trace here?

thanks!
Comment 3 Avuton Olrich 2006-03-14 00:17:36 UTC
|Could you unmount and run xfs_repair please, and capture the output. 
|(fsck is a no-op on XFS). 
 
My apoligies, I actually already ran xfs_repair. Since I've had it crash since, 
so I will run it again and send you info, within the next 24h. 
 
|I spent some time last week on this, just didnt get a message out to 
|you - my disassembly of the code around the point of your panic gave 
|me no real clues - I think (not 100% sure though...) its an xfs_inode 
|NULL pointer deref, but my disassemly didnt have instructions quite 
|lining up with yours (probably different gcc versions).  I'm taking a 
|stab in the dark that you may have a corrupt inode nlink field ondisk 
|for a particular inode, but thats just a guess at this stage based on 
|the oops. 
 
I was probably running gcc-4.0.2 
 
|Can you try figure out more exactly when this started?  (is there a 
|kernel version you've reverted back to where it doesn't occur?).  If 
 
I'm sorry, I can't say where it started or ended, I believe after 2.6.15 until 
now. Every time it happened I would try to upgrade git and recompile hopeing 
not to see the issue again. I did that probably 3 or 4 times over 2 weeks. 
 
|there is anything unusual in the xfs_repair output, pls add it here. 
 
I'll post the full output. 
 
|Also, is it always the xdm process (from yoour trace) that triggers 
|this or does it vary?   
 
It definitely varys. I've had it happen at different times. 
 
|And are your stacktrace EIP always in xfs_read 
|as in your hand-copied trace here? 
 
I'm sorry, I only looked at the older traces long enough to see XFS being the 
cause and I had to reboot to get more work done :/. The good news is it hasn't 
happened in the last 5 days. I will get the xfs_repair output to you asap, 
thanks for looking at this. 
Comment 4 Avuton Olrich 2006-03-15 05:51:40 UTC
Granted it took 6 days, but the crash did reoccur. Unfortunately my netconsole 
didn't catch it so, here I give you the xfs_repair results: 
 
#xfs_results -vvv &> xfs_results.out 
 
Phase 1 - find and verify superblock... 
Phase 2 - using internal log 
        - zero log... 
zero_log: head block 26403 tail block 26403 
        - scan filesystem freespace and inode maps... 
        - found root inode chunk 
Phase 3 - for each AG... 
        - scan and clear agi unlinked lists... 
error following ag 10 unlinked list 
        - process known inodes and perform inode discovery... 
        - agno = 0 
        - agno = 1 
        - agno = 2 
        - agno = 3 
        - agno = 4 
        - agno = 5 
        - agno = 6 
        - agno = 7 
        - agno = 8 
        - agno = 9 
        - agno = 10 
        - agno = 11 
        - agno = 12 
        - agno = 13 
        - agno = 14 
        - agno = 15 
        - process newly discovered inodes... 
Phase 4 - check for duplicate blocks... 
        - setting up duplicate extent list... 
        - clear lost+found (if it exists) ... 
        - clearing existing "lost+found" inode 
        - marking entry "lost+found" to be deleted 
        - check for inodes claiming duplicate blocks... 
        - agno = 0 
        - agno = 1 
        - agno = 2 
        - agno = 3 
        - agno = 4 
        - agno = 5 
        - agno = 6 
        - agno = 7 
        - agno = 8 
        - agno = 9 
        - agno = 10 
        - agno = 11 
        - agno = 12 
        - agno = 13 
        - agno = 14 
        - agno = 15 
Phase 5 - rebuild AG headers and trees... 
        - reset superblock... 
Phase 6 - check inode connectivity... 
        - resetting contents of realtime bitmap and summary inodes 
        - ensuring existence of lost+found directory 
        - traversing filesystem starting at / ...  
rebuilding directory inode 128 
        - traversal finished ...  
        - traversing all unattached subtrees ...  
        - traversals finished ...  
        - moving disconnected inodes to lost+found ...  
disconnected inode 101622, moving to lost+found 
disconnected dir inode 102373, moving to lost+found 
disconnected inode 940819, moving to lost+found 
disconnected dir inode 16777652, moving to lost+found 
disconnected dir inode 16777659, moving to lost+found 
disconnected dir inode 16884299, moving to lost+found 
disconnected dir inode 33573578, moving to lost+found 
disconnected dir inode 33578133, moving to lost+found 
disconnected dir inode 33705470, moving to lost+found 
disconnected inode 33712740, moving to lost+found 
disconnected dir inode 34072960, moving to lost+found 
disconnected dir inode 50331837, moving to lost+found 
disconnected dir inode 50346747, moving to lost+found 
disconnected inode 50410891, moving to lost+found 
disconnected inode 50423887, moving to lost+found 
disconnected dir inode 50458665, moving to lost+found 
disconnected dir inode 50550768, moving to lost+found 
disconnected inode 50925062, moving to lost+found 
disconnected inode 52335040, moving to lost+found 
disconnected inode 52335041, moving to lost+found 
disconnected inode 52335042, moving to lost+found 
disconnected inode 52335043, moving to lost+found 
disconnected inode 52418163, moving to lost+found 
disconnected inode 52418166, moving to lost+found 
disconnected inode 52438903, moving to lost+found 
disconnected inode 52438904, moving to lost+found 
disconnected inode 52438905, moving to lost+found 
disconnected inode 52438906, moving to lost+found 
disconnected inode 52438907, moving to lost+found 
disconnected inode 52438908, moving to lost+found 
disconnected dir inode 67109002, moving to lost+found 
disconnected dir inode 67109030, moving to lost+found 
disconnected inode 67164119, moving to lost+found 
disconnected dir inode 67175491, moving to lost+found 
disconnected dir inode 67191117, moving to lost+found 
disconnected dir inode 67206707, moving to lost+found 
disconnected dir inode 83886246, moving to lost+found 
disconnected inode 83950331, moving to lost+found 
disconnected dir inode 83963818, moving to lost+found 
disconnected inode 84151441, moving to lost+found 
disconnected dir inode 84424758, moving to lost+found 
disconnected dir inode 84438505, moving to lost+found 
disconnected dir inode 84569913, moving to lost+found 
disconnected dir inode 100663431, moving to lost+found 
disconnected dir inode 100837292, moving to lost+found 
disconnected inode 102206716, moving to lost+found 
disconnected inode 102206719, moving to lost+found 
disconnected inode 102206734, moving to lost+found 
disconnected inode 102206807, moving to lost+found 
disconnected inode 102206811, moving to lost+found 
disconnected inode 102207428, moving to lost+found 
disconnected dir inode 102379461, moving to lost+found 
disconnected inode 117472299, moving to lost+found 
disconnected inode 117473359, moving to lost+found 
disconnected inode 117485360, moving to lost+found 
disconnected inode 117485365, moving to lost+found 
disconnected inode 117485366, moving to lost+found 
disconnected inode 117485367, moving to lost+found 
disconnected inode 117485368, moving to lost+found 
disconnected inode 117485369, moving to lost+found 
disconnected inode 117485370, moving to lost+found 
disconnected inode 117485371, moving to lost+found 
disconnected inode 117485372, moving to lost+found 
disconnected inode 117485373, moving to lost+found 
disconnected dir inode 117492066, moving to lost+found 
disconnected dir inode 117527679, moving to lost+found 
disconnected dir inode 117533817, moving to lost+found 
disconnected inode 117544364, moving to lost+found 
disconnected inode 117548115, moving to lost+found 
disconnected dir inode 117647435, moving to lost+found 
disconnected dir inode 134300742, moving to lost+found 
disconnected dir inode 134301107, moving to lost+found 
disconnected dir inode 150995538, moving to lost+found 
disconnected dir inode 151010925, moving to lost+found 
disconnected dir inode 151070725, moving to lost+found 
disconnected dir inode 151476838, moving to lost+found 
disconnected dir inode 167772304, moving to lost+found 
disconnected dir inode 167772437, moving to lost+found 
disconnected inode 167772677, moving to lost+found 
disconnected inode 167824533, moving to lost+found 
disconnected dir inode 167828338, moving to lost+found 
disconnected dir inode 167832595, moving to lost+found 
disconnected inode 167836641, moving to lost+found 
disconnected dir inode 184549558, moving to lost+found 
disconnected inode 184550730, moving to lost+found 
disconnected inode 184599459, moving to lost+found 
disconnected inode 185581966, moving to lost+found 
disconnected dir inode 201326722, moving to lost+found 
disconnected dir inode 201326779, moving to lost+found 
disconnected dir inode 201731800, moving to lost+found 
disconnected dir inode 218103941, moving to lost+found 
disconnected inode 218119733, moving to lost+found 
disconnected dir inode 218129489, moving to lost+found 
disconnected inode 218244421, moving to lost+found 
disconnected inode 234890099, moving to lost+found 
disconnected inode 234890102, moving to lost+found 
disconnected inode 234941238, moving to lost+found 
disconnected dir inode 251658371, moving to lost+found 
disconnected dir inode 251658416, moving to lost+found 
disconnected dir inode 251695387, moving to lost+found 
disconnected inode 251894389, moving to lost+found 
disconnected inode 251894392, moving to lost+found 
disconnected inode 251894394, moving to lost+found 
disconnected inode 251894395, moving to lost+found 
disconnected inode 251894397, moving to lost+found 
disconnected inode 251894398, moving to lost+found 
disconnected inode 251894399, moving to lost+found 
disconnected inode 251897728, moving to lost+found 
disconnected inode 251897731, moving to lost+found 
disconnected inode 251897732, moving to lost+found 
disconnected inode 251897733, moving to lost+found 
disconnected inode 251897734, moving to lost+found 
disconnected inode 252658848, moving to lost+found 
Phase 7 - verify and correct link counts... 
resetting inode 102373 nlinks from 0 to 2 
resetting inode 16777652 nlinks from 0 to 2 
resetting inode 16777659 nlinks from 0 to 2 
resetting inode 16884299 nlinks from 0 to 2 
resetting inode 33573578 nlinks from 0 to 2 
resetting inode 33578133 nlinks from 0 to 2 
resetting inode 33705470 nlinks from 0 to 2 
resetting inode 34072960 nlinks from 0 to 2 
resetting inode 50331837 nlinks from 0 to 2 
resetting inode 50346747 nlinks from 0 to 2 
resetting inode 50458665 nlinks from 0 to 2 
resetting inode 50550768 nlinks from 0 to 2 
resetting inode 67109002 nlinks from 0 to 2 
resetting inode 67109030 nlinks from 0 to 2 
resetting inode 67175491 nlinks from 0 to 2 
resetting inode 67191117 nlinks from 0 to 2 
resetting inode 67206707 nlinks from 0 to 2 
resetting inode 83886246 nlinks from 0 to 2 
resetting inode 83963818 nlinks from 0 to 2 
resetting inode 84424758 nlinks from 0 to 2 
resetting inode 84438505 nlinks from 0 to 2 
resetting inode 100663431 nlinks from 0 to 2 
resetting inode 100837292 nlinks from 0 to 2 
resetting inode 102379461 nlinks from 0 to 2 
resetting inode 117492066 nlinks from 0 to 2 
resetting inode 117527679 nlinks from 0 to 2 
resetting inode 117533817 nlinks from 0 to 2 
resetting inode 117647435 nlinks from 0 to 2 
resetting inode 134300742 nlinks from 0 to 2 
resetting inode 134301107 nlinks from 0 to 2 
resetting inode 150995538 nlinks from 0 to 2 
resetting inode 151010925 nlinks from 0 to 2 
resetting inode 151070725 nlinks from 0 to 2 
resetting inode 151476838 nlinks from 0 to 2 
resetting inode 167772304 nlinks from 0 to 2 
resetting inode 167772437 nlinks from 0 to 2 
resetting inode 167828338 nlinks from 0 to 2 
resetting inode 167832595 nlinks from 0 to 2 
resetting inode 184549558 nlinks from 0 to 2 
resetting inode 201326722 nlinks from 0 to 2 
resetting inode 201326779 nlinks from 0 to 2 
resetting inode 218103941 nlinks from 0 to 2 
resetting inode 218129489 nlinks from 0 to 2 
resetting inode 251658371 nlinks from 0 to 2 
resetting inode 251658416 nlinks from 0 to 2 
resetting inode 251695387 nlinks from 0 to 2 
done 
 
Comment 5 Nathan Scott 2006-03-15 22:14:32 UTC
Well, will you look at that?

| I'm taking a
| stab in the dark that you may have a corrupt inode nlink field ondisk
| for a particular inode, but thats just a guess at this stage based on
| the oops.

Spot on - so, thats why you're getting a panic anyway, somehow one of
those inodes with nlink==0 is visible through the directory hierarchy
and someone is accessing it; someone else takes it away at the same
time, and boom.

The real root of the problem though is how did the nlink field get that
way... this would probably have happened at some point well before your
panic, so we've got no real clues to go on unfortunately. :(

Hohum, so, I'm back at trying to get you to narrow down what you do to
reproduce it... but I've got no good ideas on how you might do that.

Only other data points here - we can say this is quite unlikely to be
a recent regression (lotsa people seem to be asking..), since I can't
think of anything thats changed recently in XFS that would affect the
inode link count (within XFS anyway, perhaps some VFS change is making
it more likely to occur, but...).  Noone else seems to be hitting this
though, which makes me wonder if theres something a bit unusual about
your workload / filesystem accesses thats tickling this ... anything
you can think of?

Oh, and the "error following ag 10 unlinked list" repair message is an
odd one too, I need to go think about what might cause that a bit, cos
it will explain some of your unlinked inodes.

I'd be interested in seeing if it happens again now its repaired, and
if so, whether it happens after a crash or unclean shutdown (i.e. no
unmount), and if so, whether that same xfs_repair message gets dumped.

cheers.
Comment 6 Avuton Olrich 2006-03-16 00:35:27 UTC
|Spot on - so, thats why you're getting a panic anyway, somehow one of 
|those inodes with nlink==0 is visible through the directory hierarchy 
|and someone is accessing it; someone else takes it away at the same 
|time, and boom. 
 
|The real root of the problem though is how did the nlink field get that 
|way... this would probably have happened at some point well before your 
|panic, so we've got no real clues to go on unfortunately. :( 
 
To be quite honest I don't think I'm really doing anything unusual, unless  
suspend and resume are unusual. I did xfs_repair this mount before and it did  
happen again afterwards. I'm fairly sure it started after 2.6.15, of course I  
could be mistaken. One thing I can say is there's nothing that really  
_reproduces_ it afaict. I have had it happen during an emerge, have had it  
happening after just comeing back to the computer to use it in the middle of an  
x session with nothing else happening.   
  
Simply put I can name most of the normal stuff that runs on this computer.  
Apache2, xdm, fvwm2, konqueror, kate (editor), jasspa microemacs, emerge.  
That's the most used programs. I have had it happen during high cpu usage and  
had it happen alot during no cpu usage.  
  
I don't suppose you have anything to take out the metadata, so you can research  
it? reiser4 has a program like this iirc.  
  
My netconsole isn't hooked up correctly atm, so I'm going to reconfigure that,  
rebuild a newer git and see if I can get you a newer dump. This one will be  
with GCC-4.1, if that'll help you any. If you can think of anything I can do to  
help please let me know.  
 
If you want I can also try going back to 2.6.15 run it for a week see what 
happens. 
 
Any ideas of where to go from here? 
 
Comment 7 Nathan Scott 2006-03-16 23:04:34 UTC
Rather than going back to an earlier kernel, could you try a build with
PREEMPT disabled and let me know if that makes a difference.

One other question - do you run the filesystem out of space very often?
(do these problems happen near/after running out of space for example?)

| I don't suppose you have anything to take out the metadata, so you can
| research it? reiser4 has a program like this iirc.  

That wont help here - I know what the problem with the metadata ondisk
is, the issue is now figuring out how it got into that state.

cheers.
Comment 8 Avuton Olrich 2006-03-16 23:57:28 UTC
|Rather than going back to an earlier kernel, could you try a build with 
|PREEMPT disabled and let me know if that makes a difference. 
 
I will do that, though as I said, without a good week to test I can't really be 
sure it's not going to happen again. 
 
|One other question - do you run the filesystem out of space very often? 
|(do these problems happen near/after running out of space for example?) 
 
Actually quite the opposite. Max disk usage is probably always about 20%, so 
that's definitely not the issue here. 
Comment 9 Avuton Olrich 2006-03-19 01:58:40 UTC
Turned preempt off and it has crashed again, took 2 days but happened. It seems 
to always corrupt my git directory of phpMp. It usually (but not always) 
crashes on me when I'm physically on the computer, editing php in 
jasspa-microemacs, in X/fvwm2, going back and forth between that and my 
konqueror browser which I also use apache2 heavily,  all on localhost not 
connected to the net. I'm not sure any of that is the problem, of course, I 
really have no idea what else I can give you to help you find the issue, unless 
you want me to revert to an earlier kernel version. 
Comment 10 Nathan Scott 2006-03-19 14:04:47 UTC
| unless you want me to revert to an earlier kernel version. 

That sounds like the best bet at this stage - that will at least give
us more confidence as to whether this is a regression from .15 or not.

Could you send me an "ls -ali" of that directory that always seems to be
affected too please?

thanks.
Comment 11 Avuton Olrich 2006-03-20 12:20:05 UTC
|Could you send me an "ls -ali" of that directory that always seems to be 
|affected too please? 
 
sbh@micromachine ~/public_html/phpMp $ ls -ali 
total 220 
168673823 drwxr-xr-x 5 sbh    users  4096 Mar 19 18:24 . 
 83931886 drwxr-xr-x 6 sbh    users   125 Mar 19 16:06 .. 
184753491 drwxr-xr-x 7 sbh    users   123 Mar 18 18:08 .git 
168673824 -rw-r--r-- 1 sbh    users 17992 Mar 18 18:08 COPYING 
168673825 -rw-r--r-- 1 sbh    users  8031 Mar 18 18:08 ChangeLog 
168673826 -rw-r--r-- 1 sbh    users  1276 Mar 18 18:08 INSTALL 
168673827 -rw-r--r-- 1 sbh    users  2151 Mar 18 18:08 README 
168673828 -rw-r--r-- 1 sbh    users   603 Mar 18 18:08 TODO 
 84169490 drwxr-xr-x 2 apache users    66 Mar 19 14:43 cache 
168673795 -rw-r--r-- 1 sbh    users 10851 Mar 19 18:24 config.php 
168673793 -rw-r--r-- 1 sbh    users 10835 Mar 19 18:19 config.php~ 
101256705 drwxr-xr-x 2 sbh    users    25 Mar 18 18:08 contrib 
168673792 -rw-r--r-- 1 sbh    users 19500 Mar 19 18:24 features.php 
168673794 -rw-r--r-- 1 sbh    users 19503 Mar 19 18:24 features.php~ 
168673831 -rw-r--r-- 1 sbh    users  4643 Mar 18 18:08 index.inc 
168673832 -rw-r--r-- 1 sbh    users  9275 Mar 18 18:08 index.php 
168673833 -rw-r--r-- 1 sbh    users 22625 Mar 18 18:08 main.php 
168673834 -rw-r--r-- 1 sbh    users  3638 Mar 18 18:08 mpd-favicon.ico 
168673835 -rw-r--r-- 1 sbh    users 11919 Mar 18 18:08 mtable.inc 
168673836 -rw-r--r-- 1 sbh    users 27178 Mar 18 18:08 playlist.php 
168673837 -rw-r--r-- 1 sbh    users  1626 Mar 18 18:08 sort.php 
168673838 -rw-r--r-- 1 sbh    users  6207 Mar 18 18:08 theme.php 
168673839 -rw-r--r-- 1 sbh    users   832 Mar 18 18:08 transparent.gif 
168673840 -rw-r--r-- 1 sbh    users  5176 Mar 18 18:08 xml-parse.php 
 
Comment 12 Avuton Olrich 2006-03-24 10:53:43 UTC
Hrm, on 2.6.15.6 for 4 days so far and no signs of wanting to crash. 
Comment 13 Nathan Scott 2006-03-26 13:39:48 UTC
Thanks Avuton,

I think the next step to understanding this that we'll need to take here is
to get it nailed down to a small set of changes - I understand git bisect is
the tool of choice for doing this kind of fault isolation.  I know that will
take awhile, given the time-between-failures, but not many other options at
this stage.  :|  I'd really like to get to the bottom of this though, so if
you could do that we'd really appreciate it.

thanks.
Comment 14 Avuton Olrich 2006-03-26 13:43:48 UTC
OK, I will look into doing this, and as you said it will take a while, but I 
will continue until I get something conclusive. 
Comment 15 Nathan Scott 2006-03-26 13:53:01 UTC
Thanks!
Comment 16 Avuton Olrich 2006-04-10 06:52:06 UTC
OK, I'm not really sure when it got fixed, but I updated to 2.6.17-rc1 and 
things seem fine (after a week). Seems fixed, will reopen if the problem 
presents itself again. Thanks for the help.
Comment 17 Martin Steigerwald 2006-04-12 07:27:39 UTC
I possibly have a related problem:

http://bugzilla.kernel.org/show_bug.cgi?id=6380