Bug 27132 - flush-btrfs gets into an infinite loop
Summary: flush-btrfs gets into an infinite loop
Status: CLOSED UNREPRODUCIBLE
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 blocking
Assignee: fs_btrfs@kernel-bugs.osdl.org
URL:
Keywords:
Depends on:
Blocks: 21782
  Show dependency tree
 
Reported: 2011-01-20 11:51 UTC by Artem Anisimov
Modified: 2011-03-30 20:03 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.37
Subsystem:
Regression: Yes
Bisected commit-id:


Attachments
dmesg with stack traces (240.04 KB, text/plain)
2011-01-20 11:51 UTC, Artem Anisimov
Details
My .config (53.96 KB, text/plain)
2011-01-20 11:52 UTC, Artem Anisimov
Details
another dmesg with stack traces (239.74 KB, text/plain)
2011-01-22 13:20 UTC, Artem Anisimov
Details

Description Artem Anisimov 2011-01-20 11:51:28 UTC
Created attachment 44422 [details]
dmesg with stack traces

Valgrind's coredumping regularly triggers an infinite loop in flush-btrfs. The symptoms are the following: valgrind starts dumping core, hangs up and, afterwards it is not possible to read contents of directory that valgrind was dumping core to; after a gap of minute or two flush-btrfs starts spinning in endless loop.

I have attached output of dmesg with stack traces of active processes. Valgrind's process is memcheck-x86, process trying to list directory contents containing coredump is mc. These stack traces have been captured after hangup of valgrind but before btrfs-flush got into infinite loop. I've left stack traces of other programs for many of them (notably mysqld) have references to btrfs in their stack traces.

In my setup the problem is regularly reproduced -- in last two days I had two hangups, so if you request to compile the kernel with options that make it produce more useful debug output, you will probably not wait long before I give new information.

I have also attached my .config.

The last kernel version that worked well for me was 2.6.35; I have skipped 2.6.36.
Comment 1 Artem Anisimov 2011-01-20 11:52:36 UTC
Created attachment 44432 [details]
My .config
Comment 2 Artem Anisimov 2011-01-22 13:20:46 UTC
Created attachment 44772 [details]
another dmesg with stack traces
Comment 3 Artem Anisimov 2011-01-22 13:22:17 UTC
Now I've hit the same problem while ar was running. The process that was writing to file system is llvm-ar, flusher that got into infinite loop is flush-btrfs-2.

The stacktraces were captured while flush-btrfs-2 was in infinite loop.
Comment 4 Artem Anisimov 2011-01-23 08:45:07 UTC
The last reported issue (the one concerning llvm-ar) worked with no errors under 2.6.36, but failed repeatedly under 2.6.37. Hence, regression from 2.6.36.
Comment 5 Artem Anisimov 2011-02-03 10:57:40 UTC
I've recently run into another bug: kmail started spinning in an infinite loop an could not be killed with 'kill -9'. Displaying stack traces showed that this was probably caused by btrfs. Kmail's stack trace was:

kmail         R running      0 24427      1 0x00000004
 0000007b c1050000 00000000 ffffffce c1045e8d 00000060 00000246 00000001
 00000246 00008a29 f749e420 e04edb64 e8237d54 c105a84b e8237d70 c105a9ff
 f5e92488 f749e420 f5e92484 00008a29 00008a29 e8237d84 c105abcb 089c0000
Call Trace:
 [<c1050000>] ? note_interrupt+0xc5/0x151
 [<c1045e8d>] ? lock_release+0x13d/0x144
 [<c105a84b>] ? rcu_read_unlock+0x17/0x1e
 [<c105a9ff>] ? find_get_page+0x5d/0x67
 [<c105abcb>] ? find_lock_page+0x13/0x3a
 [<c105b069>] ? find_or_create_page+0x23/0x6e
 [<c13417dc>] ? _raw_spin_unlock+0x33/0x3f
 [<c1144818>] ? prepare_pages.clone.7+0xed/0x2eb
 [<c112b252>] ? btrfs_delalloc_reserve_metadata+0x14b/0x17f
 [<c11450ba>] ? btrfs_file_aio_write+0x48f/0x7e5
 [<c105a7a0>] ? file_read_actor+0x6a/0xc4
 [<c105a6e0>] ? file_accessed+0x14/0x16
 [<c10835a2>] ? do_sync_write+0x9b/0xd5
 [<c10844bc>] ? rcu_read_lock+0x0/0x3a
 [<c1043207>] ? arch_local_irq_save+0x8/0xb
 [<c1083507>] ? do_sync_write+0x0/0xd5
 [<c1083bcd>] ? vfs_write+0x7e/0xab
 [<c1083d3d>] ? sys_write+0x3d/0x5e
 [<c1341c5d>] ? syscall_call+0x7/0xb
 [<c1340000>] ? io_schedule+0x26/0x2c
Comment 6 Martin Steigerwald 2011-02-03 11:45:53 UTC
Could bug #27842 be related to this one?
Comment 7 Artem Anisimov 2011-02-23 11:38:45 UTC
Once again ran into this bug. This time with 2.6.37.1.
Comment 8 Florian Mickler 2011-03-29 21:05:25 UTC
Is this still a problem on 2.6.38.y ?
Comment 9 Artem Anisimov 2011-03-30 04:56:06 UTC
Don't know. The last time I tried btrfs was 2.6.38-rc6 and at that time it failed miserably (kmail would hang at startup, configure scripts would hang, etc; all this accompanied by complaints about btrfs in dmesg).

Now I have switched to ext4, so I can neither confirm the error any more, nor reproduce it.
Comment 10 Florian Mickler 2011-03-30 20:02:41 UTC
Ok, I'm closing this as unreproducible.. if anyone had the same issue and can still reproduce this issue, please shout. (Martin?) 

Thanks,
Flo

Note You need to log in before you can comment on or make changes to this bug.