Kernel Bug Tracker – Bug 27132
flush-btrfs gets into an infinite loop
Last modified: 2011-03-30 20:03:02 UTC
Created attachment 44422 [details]
dmesg with stack traces
Valgrind's coredumping regularly triggers an infinite loop in flush-btrfs. The symptoms are the following: valgrind starts dumping core, hangs up and, afterwards it is not possible to read contents of directory that valgrind was dumping core to; after a gap of minute or two flush-btrfs starts spinning in endless loop.
I have attached output of dmesg with stack traces of active processes. Valgrind's process is memcheck-x86, process trying to list directory contents containing coredump is mc. These stack traces have been captured after hangup of valgrind but before btrfs-flush got into infinite loop. I've left stack traces of other programs for many of them (notably mysqld) have references to btrfs in their stack traces.
In my setup the problem is regularly reproduced -- in last two days I had two hangups, so if you request to compile the kernel with options that make it produce more useful debug output, you will probably not wait long before I give new information.
I have also attached my .config.
The last kernel version that worked well for me was 2.6.35; I have skipped 2.6.36.
Created attachment 44432 [details]
Created attachment 44772 [details]
another dmesg with stack traces
Now I've hit the same problem while ar was running. The process that was writing to file system is llvm-ar, flusher that got into infinite loop is flush-btrfs-2.
The stacktraces were captured while flush-btrfs-2 was in infinite loop.
The last reported issue (the one concerning llvm-ar) worked with no errors under 2.6.36, but failed repeatedly under 2.6.37. Hence, regression from 2.6.36.
I've recently run into another bug: kmail started spinning in an infinite loop an could not be killed with 'kill -9'. Displaying stack traces showed that this was probably caused by btrfs. Kmail's stack trace was:
kmail R running 0 24427 1 0x00000004
0000007b c1050000 00000000 ffffffce c1045e8d 00000060 00000246 00000001
00000246 00008a29 f749e420 e04edb64 e8237d54 c105a84b e8237d70 c105a9ff
f5e92488 f749e420 f5e92484 00008a29 00008a29 e8237d84 c105abcb 089c0000
[<c1050000>] ? note_interrupt+0xc5/0x151
[<c1045e8d>] ? lock_release+0x13d/0x144
[<c105a84b>] ? rcu_read_unlock+0x17/0x1e
[<c105a9ff>] ? find_get_page+0x5d/0x67
[<c105abcb>] ? find_lock_page+0x13/0x3a
[<c105b069>] ? find_or_create_page+0x23/0x6e
[<c13417dc>] ? _raw_spin_unlock+0x33/0x3f
[<c1144818>] ? prepare_pages.clone.7+0xed/0x2eb
[<c112b252>] ? btrfs_delalloc_reserve_metadata+0x14b/0x17f
[<c11450ba>] ? btrfs_file_aio_write+0x48f/0x7e5
[<c105a7a0>] ? file_read_actor+0x6a/0xc4
[<c105a6e0>] ? file_accessed+0x14/0x16
[<c10835a2>] ? do_sync_write+0x9b/0xd5
[<c10844bc>] ? rcu_read_lock+0x0/0x3a
[<c1043207>] ? arch_local_irq_save+0x8/0xb
[<c1083507>] ? do_sync_write+0x0/0xd5
[<c1083bcd>] ? vfs_write+0x7e/0xab
[<c1083d3d>] ? sys_write+0x3d/0x5e
[<c1341c5d>] ? syscall_call+0x7/0xb
[<c1340000>] ? io_schedule+0x26/0x2c
Could bug #27842 be related to this one?
Once again ran into this bug. This time with 18.104.22.168.
Is this still a problem on 2.6.38.y ?
Don't know. The last time I tried btrfs was 2.6.38-rc6 and at that time it failed miserably (kmail would hang at startup, configure scripts would hang, etc; all this accompanied by complaints about btrfs in dmesg).
Now I have switched to ext4, so I can neither confirm the error any more, nor reproduce it.
Ok, I'm closing this as unreproducible.. if anyone had the same issue and can still reproduce this issue, please shout. (Martin?)