Bug 4850

Summary: system hangs in queue_log_writer (no oops)
Product: File System Reporter: Tom Eicher (teicher-kernel)
Component: ReiserFSAssignee: ReiseFS developers team (reiserfs-devel)
Status: REJECTED INSUFFICIENT_DATA    
Severity: high CC: bunk
Priority: P2    
Hardware: i386   
OS: Linux   
Kernel Version: kernel-2.6.11-1.1369_FC4 Subsystem:
Regression: --- Bisected commit-id:
Attachments: trace.txt

Description Tom Eicher 2005-07-05 12:57:27 UTC
Distribution: Fedora Core 4
Hardware Environment: i686 GenuineIntel Intel(R) Pentium(R) 4 CPU 2.66GHz stepp 9
Problem Description:
System hangs after some disk io. Processes continue as usual until they hang
inside some system call. Then, processes become uninterruptible. See details at
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162027.

There are no oopses nor kernel panics - but the system locks up more and more
and has to be powered off.

Steps to reproduce:
No concrete steps, but reproducible when doing a lot of parallel disk io. In my
case, running a disk crawler (beagle) and doing java development in parallel was
enough, but also reproducible with other situations.

-- from https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=162027 ----

Once in the system state that this PR is about, I got stack dumps using
Ctrl+ScrollLock. The set of uninterruptible user processes after killing X and
any user processes was:

  PID TTY      STAT   TIME COMMAND
 4141 ?        D      0:00 id -gn
 4132 ?        D      0:00 /bin/sh -c /home/tei/bin/childProcesses.sh 4011
 4012 ?        D      0:04 /opt/jdks/jdk1.5.0_03/bin/java -Xms128m -Xmx256m
/opt/eclipse/installs/I20050627-gtk/eclipse/startup.jar [...]
 2798 ?        Zsl    0:15 [nautilus] <defunct>

The matching stack traces follow below (will attach full log). Each of them is
waiting inside some reiserfs function. My system uses a mixture of reiserfs and
ext3 partitions:

/dev/hdc3 on / type ext3 (rw,user_xattr)
/dev/hdc1 on /boot type ext3 (rw)
/dev/hdc5 on /home type reiserfs (rw,user_xattr)
/dev/hdc6 on /opt type reiserfs (rw,user_xattr)


-------- stack dump obtained by pressing Ctrl+ScrollLock (excerpt) ---------

id            D 0000003D  2408  4141   4185                     (NOTLB)
f5b50d0c 00200082 0009a980 0000003d 000001fe 001de5fc d5a8d0cc e4ad8000 
       f6737c30 00000000 c5061640 000f4476 00000000 e4ad8000 c5061640 000f4476 
       00000002 e4ad8000 00000000 00000000 f8a27000 f5b50000 f8a27098 000a0fe0 
Call Trace:
 [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs]
 [<c011be01>] default_wake_function+0x0/0xc
 [<f8cd835e>] do_journal_begin_r+0x11f/0x2f0 [reiserfs]
 [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs]
 [<f8cc64f2>] reiserfs_dirty_inode+0x40/0x72 [reiserfs]
 [<c01f84c8>] selinux_inode_follow_link+0x58/0x64
 [<c01abf08>] __mark_inode_dirty+0x28/0x2e8
 [<c01281d4>] current_fs_time+0x4e/0x69
 [<c01a0444>] update_atime+0x45/0x72
 [<c018e287>] __link_path_walk+0x2ce/0x12db
 [<c015391d>] filemap_nopage+0x2c9/0x379
 [<c018f2d7>] link_path_walk+0x43/0xdb
 [<c017a935>] get_unused_fd+0x79/0x1d2
 [<c018d4f7>] getname+0x87/0xc5
 [<c018f582>] path_lookup+0x79/0x152
 [<c018fd2d>] open_namei+0x90/0x61b
 [<c018f582>] path_lookup+0x79/0x152
 [<c017a6dd>] filp_open+0x27/0x46
 [<c017a935>] get_unused_fd+0x79/0x1d2
 [<c018d4f7>] getname+0x87/0xc5
 [<c017ad8f>] sys_open+0x35/0xad
 [<c0103a51>] syscall_call+0x7/0xb

sh            D 0000003D  2828  4132      1          4389  4012 (NOTLB)
f5ae4d0c 00200082 00090001 0000003d 000001fe 001de5fc dc4f68b4 dbb94000 
       c03cb944 f5ae4d84 653b3780 000f4472 001e8480 dbb94000 653b3780 000f4472 
       653b3780 000f4472 01406f40 00000000 f8a27000 f5ae4000 f8a27098 000a0fe0 
Call Trace:
 [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs]
 [<c011be01>] default_wake_function+0x0/0xc
 [<f8cd835e>] do_journal_begin_r+0x11f/0x2f0 [reiserfs]
 [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs]
 [<f8cc64f2>] reiserfs_dirty_inode+0x40/0x72 [reiserfs]
 [<c01f84c8>] selinux_inode_follow_link+0x58/0x64
 [<c01abf08>] __mark_inode_dirty+0x28/0x2e8
 [<c01281d4>] current_fs_time+0x4e/0x69
 [<c01a0444>] update_atime+0x45/0x72
 [<c018e287>] __link_path_walk+0x2ce/0x12db
 [<c015391d>] filemap_nopage+0x2c9/0x379
 [<c018f2d7>] link_path_walk+0x43/0xdb
 [<c017a935>] get_unused_fd+0x79/0x1d2
 [<c018d4f7>] getname+0x87/0xc5
 [<c018f582>] path_lookup+0x79/0x152
 [<c018fd2d>] open_namei+0x90/0x61b
 [<c018f582>] path_lookup+0x79/0x152
 [<c017a6dd>] filp_open+0x27/0x46
 [<c017a935>] get_unused_fd+0x79/0x1d2
 [<c018d4f7>] getname+0x87/0xc5
 [<c017ad8f>] sys_open+0x35/0xad
 [<c0103a51>] syscall_call+0x7/0xb

java          D 0000000B  1176  4012      1          4132  3764 (NOTLB)
f5b8ac40 00200082 f5b8ace0 0000000b 00000000 00000000 f5b8ad0c f5920400 
       00000001 00000002 d913e480 000f4428 00000000 e4831aa0 d913e480 000f4428 
       00200086 00000000 92ce65c0 00000000 f5920288 ddd3d050 e4831aa0 c038df47 
Call Trace:
 [<c0373b1f>] rwsem_down_read_failed+0xaf/0x2b8
 [<f8ccec7c>] pathrelse+0x26/0x3b [reiserfs]
 [<f8cbbeea>] init_inode+0x348/0x4c8 [reiserfs]
 [<f8cde685>] .text.lock.xattr+0xd5/0x250 [reiserfs]
 [<c01f6546>] inode_doinit_with_dentry+0x410/0x66c
 [<c019ae29>] d_splice_alias+0x13c/0x2c8
 [<f8cbc754>] reiserfs_iget+0x52/0x86 [reiserfs]
 [<f8cbc5d0>] reiserfs_init_locked_inode+0x0/0xe [reiserfs]
 [<f8cb7d2f>] reiserfs_lookup+0xd9/0x108 [reiserfs]
 [<c018db67>] real_lookup+0xb8/0xd7
 [<c018df9f>] do_lookup+0x7c/0x96
 [<c018e8e3>] __link_path_walk+0x92a/0x12db
 [<c018f2d7>] link_path_walk+0x43/0xdb
 [<c018f582>] path_lookup+0x79/0x152
 [<c018f7a1>] __user_walk+0x21/0x31
 [<c018841e>] vfs_stat+0x14/0x3a
 [<c01889ad>] sys_stat64+0xf/0x28
 [<c0214f56>] copy_to_user+0x42/0x60
 [<c01277ed>] sys_gettimeofday+0x22/0x58
 [<c0103a51>] syscall_call+0x7/0xb

nautilus      X B7EEA708  1580  2798      1          2830  2417 (L-TLB)
f4213f54 00200046 c011e0b4 b7eea708 00000001 c01240ff 00000000 00000000 
       00000000 c1b29980 9095c4c0 000f44ba 00000000 f3ab6550 9095c4c0 000f44ba 
       9095c4c0 000f44ba cbda0e80 00000002 f3ab6550 f7d5bde0 00000000 00000100 
Call Trace:
 [<c011e0b4>] mm_release+0x7c/0x83
 [<c01240ff>] exit_notify+0x33/0x9e6
 [<c0124d91>] do_exit+0x2df/0x500
 [<c03719e5>] schedule+0x55/0x7b3
 [<c0145338>] do_futex+0x75/0x7c
 [<c012507c>] do_group_exit+0x9e/0x1f5
 [<c0103a51>] syscall_call+0x7/0xb
nautilus      D 00000000  1264  2830      1          2831  2798 (NOTLB)
f09cdd98 00200082 001e2243 00000000 00000000 002cffff 000106ec 00000000 
       00000000 00000001 d5ae2440 000f4428 0001b207 f09c9550 d5ae2440 000f4428 
       d5ae2440 000f4428 16694e00 00000000 f5920288 dc97cb90 f09c9550 c038df47 
Call Trace:
 [<c0373b1f>] rwsem_down_read_failed+0xaf/0x2b8
 [<f8cbd408>] reiserfs_new_inode+0x5b0/0x738 [reiserfs]
 [<f8cde685>] .text.lock.xattr+0xd5/0x250 [reiserfs]
 [<c01f6546>] inode_doinit_with_dentry+0x410/0x66c
 [<c019a4f5>] d_instantiate+0xb0/0x1c2
 [<f8cb84bc>] reiserfs_create+0x1db/0x1f4 [reiserfs]
 [<c018f9e9>] vfs_create+0xd9/0x139
 [<c0190203>] open_namei+0x566/0x61b
 [<c017a6dd>] filp_open+0x27/0x46
 [<c018d4f7>] getname+0x87/0xc5
 [<c017ad8f>] sys_open+0x35/0xad
 [<c0103a51>] syscall_call+0x7/0xb
nautilus      D 00000468  1280  2831      1          3527  2830 (NOTLB)
f09cfc80 00200082 00200246 00000468 00000030 f5920000 0000f529 00000034 
       00000000 f8ccf785 d913e480 000f4428 00000000 f09c9000 d913e480 000f4428 
       00000001 00000000 b1251f00 00000000 f8a27000 f09cf000 f8a27098 f5920400 
Call Trace:
 [<f8ccf785>] search_by_key+0x70f/0xf32 [reiserfs]
 [<f8cd8174>] queue_log_writer+0x54/0x6d [reiserfs]
 [<c011be01>] default_wake_function+0x0/0xc
 [<f8cd8fbe>] check_journal_end+0x13d/0x281 [reiserfs]
 [<f8cd9685>] do_journal_end+0xef/0x9cf [reiserfs]
 [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs]
 [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs]
 [<f8cb9d6e>] reiserfs_delete_inode+0x1e/0xe5 [reiserfs]
 [<f8cd8292>] do_journal_begin_r+0x53/0x2f0 [reiserfs]
 [<c0197aa6>] dput+0x12c/0x6c1
 [<c019d49e>] destroy_inode+0x3f/0x4e
 [<f8cd84cb>] do_journal_begin_r+0x28c/0x2f0 [reiserfs]
 [<f8cdcd05>] __xattr_readdir+0x27c/0x3d2 [reiserfs]
 [<f8cd86e7>] journal_begin+0x6c/0xff [reiserfs]
 [<f8cb8a0a>] reiserfs_rmdir+0x38/0x2a3 [reiserfs]
 [<c0371cad>] schedule+0x31d/0x7b3
 [<c0197aa6>] dput+0x12c/0x6c1
 [<c01909e1>] dentry_unhash+0x1df/0x31d
 [<c018d696>] permission+0x49/0xb5
 [<c0190d0f>] vfs_rmdir+0x1f0/0x209
 [<f8cdd7b5>] reiserfs_delete_xattrs+0x157/0x1ec [reiserfs]
 [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs]
 [<f8cb9d9d>] reiserfs_delete_inode+0x4d/0xe5 [reiserfs]
 [<f8cb9d50>] reiserfs_delete_inode+0x0/0xe5 [reiserfs]
 [<c019fdaf>] generic_delete_inode+0xb0/0x340
 [<c0197aa6>] dput+0x12c/0x6c1
 [<c01a03a5>] iput+0x35/0x68
 [<c0190f88>] vfs_unlink+0x196/0x1fd
 [<c01a03cc>] iput+0x5c/0x68
 [<c01910cb>] sys_unlink+0xdc/0x119
 [<c0103a51>] syscall_call+0x7/0xb
Comment 1 Tom Eicher 2005-07-05 12:58:20 UTC
Created attachment 5278 [details]
trace.txt

Full dump obtained when the said four processes were hung using
Ctrl+ScrollLock.
Comment 2 Vladimir V. Saveliev 2006-01-19 09:58:51 UTC
Tom, you said on bugzilla.redhat.com that you can reproduce the problem in 30
seconds. Would you please describe how to do that?
Can you also try to reproduce the problem on stock kernel - 2.6.15 for instance?
Comment 3 Tom Eicher 2006-01-19 23:43:31 UTC
It's been a while (with ext3)... I will give a recent kernel a try soon and
report back.
Comment 4 Adrian Bunk 2006-04-22 10:26:05 UTC
Please reopen this bug if you are able to reproduce it with kernel 2.6.16.