Distribution: Fedora Core 4 Hardware Environment: i686 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.66GHz stepp 9 Problem Description: System hangs after some disk io. Processes continue as usual until they hang inside some system call. Then, processes become uninterruptible. See details at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162027. There are no oopses nor kernel panics - but the system locks up more and more and has to be powered off. Steps to reproduce: No concrete steps, but reproducible when doing a lot of parallel disk io. In my case, running a disk crawler (beagle) and doing java development in parallel was enough, but also reproducible with other situations. -- from https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162027 ---- Once in the system state that this PR is about, I got stack dumps using Ctrl+ScrollLock. The set of uninterruptible user processes after killing X and any user processes was: PID TTY STAT TIME COMMAND 4141 ? D 0:00 id -gn 4132 ? D 0:00 /bin/sh -c /home/tei/bin/childProcesses.sh 4011 4012 ? D 0:04 /opt/jdks/jdk1.5.0_03/bin/java -Xms128m -Xmx256m /opt/eclipse/installs/I20050627-gtk/eclipse/startup.jar [...] 2798 ? Zsl 0:15 [nautilus] <defunct> The matching stack traces follow below (will attach full log). Each of them is waiting inside some reiserfs function. My system uses a mixture of reiserfs and ext3 partitions: /dev/hdc3 on / type ext3 (rw,user_xattr) /dev/hdc1 on /boot type ext3 (rw) /dev/hdc5 on /home type reiserfs (rw,user_xattr) /dev/hdc6 on /opt type reiserfs (rw,user_xattr) -------- stack dump obtained by pressing Ctrl+ScrollLock (excerpt) --------- id D 0000003D 2408 4141 4185 (NOTLB) f5b50d0c 00200082 0009a980 0000003d 000001fe 001de5fc d5a8d0cc e4ad8000 f6737c30 00000000 c5061640 000f4476 00000000 e4ad8000 c5061640 000f4476 00000002 e4ad8000 00000000 00000000 f8a27000 f5b50000 f8a27098 000a0fe0 Call Trace: [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs] [<c011be01>] default_wake_function+0x0/0xc [<f8cd835e>] do_journal_begin_r+0x11f/0x2f0 [reiserfs] [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs] [<f8cc64f2>] reiserfs_dirty_inode+0x40/0x72 [reiserfs] [<c01f84c8>] selinux_inode_follow_link+0x58/0x64 [<c01abf08>] __mark_inode_dirty+0x28/0x2e8 [<c01281d4>] current_fs_time+0x4e/0x69 [<c01a0444>] update_atime+0x45/0x72 [<c018e287>] __link_path_walk+0x2ce/0x12db [<c015391d>] filemap_nopage+0x2c9/0x379 [<c018f2d7>] link_path_walk+0x43/0xdb [<c017a935>] get_unused_fd+0x79/0x1d2 [<c018d4f7>] getname+0x87/0xc5 [<c018f582>] path_lookup+0x79/0x152 [<c018fd2d>] open_namei+0x90/0x61b [<c018f582>] path_lookup+0x79/0x152 [<c017a6dd>] filp_open+0x27/0x46 [<c017a935>] get_unused_fd+0x79/0x1d2 [<c018d4f7>] getname+0x87/0xc5 [<c017ad8f>] sys_open+0x35/0xad [<c0103a51>] syscall_call+0x7/0xb sh D 0000003D 2828 4132 1 4389 4012 (NOTLB) f5ae4d0c 00200082 00090001 0000003d 000001fe 001de5fc dc4f68b4 dbb94000 c03cb944 f5ae4d84 653b3780 000f4472 001e8480 dbb94000 653b3780 000f4472 653b3780 000f4472 01406f40 00000000 f8a27000 f5ae4000 f8a27098 000a0fe0 Call Trace: [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs] [<c011be01>] default_wake_function+0x0/0xc [<f8cd835e>] do_journal_begin_r+0x11f/0x2f0 [reiserfs] [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs] [<f8cc64f2>] reiserfs_dirty_inode+0x40/0x72 [reiserfs] [<c01f84c8>] selinux_inode_follow_link+0x58/0x64 [<c01abf08>] __mark_inode_dirty+0x28/0x2e8 [<c01281d4>] current_fs_time+0x4e/0x69 [<c01a0444>] update_atime+0x45/0x72 [<c018e287>] __link_path_walk+0x2ce/0x12db [<c015391d>] filemap_nopage+0x2c9/0x379 [<c018f2d7>] link_path_walk+0x43/0xdb [<c017a935>] get_unused_fd+0x79/0x1d2 [<c018d4f7>] getname+0x87/0xc5 [<c018f582>] path_lookup+0x79/0x152 [<c018fd2d>] open_namei+0x90/0x61b [<c018f582>] path_lookup+0x79/0x152 [<c017a6dd>] filp_open+0x27/0x46 [<c017a935>] get_unused_fd+0x79/0x1d2 [<c018d4f7>] getname+0x87/0xc5 [<c017ad8f>] sys_open+0x35/0xad [<c0103a51>] syscall_call+0x7/0xb java D 0000000B 1176 4012 1 4132 3764 (NOTLB) f5b8ac40 00200082 f5b8ace0 0000000b 00000000 00000000 f5b8ad0c f5920400 00000001 00000002 d913e480 000f4428 00000000 e4831aa0 d913e480 000f4428 00200086 00000000 92ce65c0 00000000 f5920288 ddd3d050 e4831aa0 c038df47 Call Trace: [<c0373b1f>] rwsem_down_read_failed+0xaf/0x2b8 [<f8ccec7c>] pathrelse+0x26/0x3b [reiserfs] [<f8cbbeea>] init_inode+0x348/0x4c8 [reiserfs] [<f8cde685>] .text.lock.xattr+0xd5/0x250 [reiserfs] [<c01f6546>] inode_doinit_with_dentry+0x410/0x66c [<c019ae29>] d_splice_alias+0x13c/0x2c8 [<f8cbc754>] reiserfs_iget+0x52/0x86 [reiserfs] [<f8cbc5d0>] reiserfs_init_locked_inode+0x0/0xe [reiserfs] [<f8cb7d2f>] reiserfs_lookup+0xd9/0x108 [reiserfs] [<c018db67>] real_lookup+0xb8/0xd7 [<c018df9f>] do_lookup+0x7c/0x96 [<c018e8e3>] __link_path_walk+0x92a/0x12db [<c018f2d7>] link_path_walk+0x43/0xdb [<c018f582>] path_lookup+0x79/0x152 [<c018f7a1>] __user_walk+0x21/0x31 [<c018841e>] vfs_stat+0x14/0x3a [<c01889ad>] sys_stat64+0xf/0x28 [<c0214f56>] copy_to_user+0x42/0x60 [<c01277ed>] sys_gettimeofday+0x22/0x58 [<c0103a51>] syscall_call+0x7/0xb nautilus X B7EEA708 1580 2798 1 2830 2417 (L-TLB) f4213f54 00200046 c011e0b4 b7eea708 00000001 c01240ff 00000000 00000000 00000000 c1b29980 9095c4c0 000f44ba 00000000 f3ab6550 9095c4c0 000f44ba 9095c4c0 000f44ba cbda0e80 00000002 f3ab6550 f7d5bde0 00000000 00000100 Call Trace: [<c011e0b4>] mm_release+0x7c/0x83 [<c01240ff>] exit_notify+0x33/0x9e6 [<c0124d91>] do_exit+0x2df/0x500 [<c03719e5>] schedule+0x55/0x7b3 [<c0145338>] do_futex+0x75/0x7c [<c012507c>] do_group_exit+0x9e/0x1f5 [<c0103a51>] syscall_call+0x7/0xb nautilus D 00000000 1264 2830 1 2831 2798 (NOTLB) f09cdd98 00200082 001e2243 00000000 00000000 002cffff 000106ec 00000000 00000000 00000001 d5ae2440 000f4428 0001b207 f09c9550 d5ae2440 000f4428 d5ae2440 000f4428 16694e00 00000000 f5920288 dc97cb90 f09c9550 c038df47 Call Trace: [<c0373b1f>] rwsem_down_read_failed+0xaf/0x2b8 [<f8cbd408>] reiserfs_new_inode+0x5b0/0x738 [reiserfs] [<f8cde685>] .text.lock.xattr+0xd5/0x250 [reiserfs] [<c01f6546>] inode_doinit_with_dentry+0x410/0x66c [<c019a4f5>] d_instantiate+0xb0/0x1c2 [<f8cb84bc>] reiserfs_create+0x1db/0x1f4 [reiserfs] [<c018f9e9>] vfs_create+0xd9/0x139 [<c0190203>] open_namei+0x566/0x61b [<c017a6dd>] filp_open+0x27/0x46 [<c018d4f7>] getname+0x87/0xc5 [<c017ad8f>] sys_open+0x35/0xad [<c0103a51>] syscall_call+0x7/0xb nautilus D 00000468 1280 2831 1 3527 2830 (NOTLB) f09cfc80 00200082 00200246 00000468 00000030 f5920000 0000f529 00000034 00000000 f8ccf785 d913e480 000f4428 00000000 f09c9000 d913e480 000f4428 00000001 00000000 b1251f00 00000000 f8a27000 f09cf000 f8a27098 f5920400 Call Trace: [<f8ccf785>] search_by_key+0x70f/0xf32 [reiserfs] [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs] [<c011be01>] default_wake_function+0x0/0xc [<f8cd8fbe>] check_journal_end+0x13d/0x281 [reiserfs] [<f8cd9685>] do_journal_end+0xef/0x9cf [reiserfs] [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs] [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs] [<f8cb9d6e>] reiserfs_delete_inode+0x1e/0xe5 [reiserfs] [<f8cd8292>] do_journal_begin_r+0x53/0x2f0 [reiserfs] [<c0197aa6>] dput+0x12c/0x6c1 [<c019d49e>] destroy_inode+0x3f/0x4e [<f8cd84cb>] do_journal_begin_r+0x28c/0x2f0 [reiserfs] [<f8cdcd05>] __xattr_readdir+0x27c/0x3d2 [reiserfs] [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs] [<f8cb8a0a>] reiserfs_rmdir+0x38/0x2a3 [reiserfs] [<c0371cad>] schedule+0x31d/0x7b3 [<c0197aa6>] dput+0x12c/0x6c1 [<c01909e1>] dentry_unhash+0x1df/0x31d [<c018d696>] permission+0x49/0xb5 [<c0190d0f>] vfs_rmdir+0x1f0/0x209 [<f8cdd7b5>] reiserfs_delete_xattrs+0x157/0x1ec [reiserfs] [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs] [<f8cb9d9d>] reiserfs_delete_inode+0x4d/0xe5 [reiserfs] [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs] [<c019fdaf>] generic_delete_inode+0xb0/0x340 [<c0197aa6>] dput+0x12c/0x6c1 [<c01a03a5>] iput+0x35/0x68 [<c0190f88>] vfs_unlink+0x196/0x1fd [<c01a03cc>] iput+0x5c/0x68 [<c01910cb>] sys_unlink+0xdc/0x119 [<c0103a51>] syscall_call+0x7/0xb
Created attachment 5278 [details] trace.txt Full dump obtained when the said four processes were hung using Ctrl+ScrollLock.
Tom, you said on bugzilla.redhat.com that you can reproduce the problem in 30 seconds. Would you please describe how to do that? Can you also try to reproduce the problem on stock kernel - 2.6.15 for instance?
It's been a while (with ext3)... I will give a recent kernel a try soon and report back.
Please reopen this bug if you are able to reproduce it with kernel 2.6.16.