Bug 82251 - INFO: task btrfs-transacti:168 blocked for more than 120 seconds.
Summary: INFO: task btrfs-transacti:168 blocked for more than 120 seconds.
Status: RESOLVED WILL_NOT_FIX
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-12 15:06 UTC by beta992
Modified: 2015-02-24 01:09 UTC (History)
4 users (show)

See Also:
Kernel Version: 3.15.9
Tree: Mainline
Regression: No


Attachments
Kernel log including sysrq-w output (267.08 KB, text/plain)
2014-12-09 23:48 UTC, Andrew Wade
Details

Description beta992 2014-08-12 15:06:43 UTC
After a lot of activity (caused by NZBGet), the program crashes:
[91924.486480] INFO: task kworker/u4:1:17342 blocked for more than 120 seconds.
[91924.486557]       Not tainted 3.15.9-2-ck #1
[91924.486593] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this                                                                                                                                message.
[91924.486649] kworker/u4:1    D 0000000000000001     0 17342      2 0x00000000
[91924.486670] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1)
[91924.486677]  ffff88000bc2f9c8 0000000000000046 ffff88000bc2f8d8 ffffffff00000                                                                                                                               000
[91924.486699]  ffff880212740000 ffff880212740000 ffff88000bc2ffd8 0000000000014                                                                                                                               680
[91924.486709]  0000000000014680 ffff880212740000 ffff88000bc2ffd8 0000000000000                                                                                                                               002
[91924.486718] Call Trace:
[91924.486732]  [<ffffffff8111f0a0>] ? filemap_fdatawait+0x30/0x30
[91924.486741]  [<ffffffff814d059b>] ? preempt_schedule+0x3b/0x58
[91924.486751]  [<ffffffff8101cc81>] ? read_tsc+0x9/0x20
[91924.486760]  [<ffffffff8101cc81>] ? read_tsc+0x9/0x20
[91924.486769]  [<ffffffff8111f0a0>] ? filemap_fdatawait+0x30/0x30
[91924.486778]  [<ffffffff814d0781>] io_schedule+0x91/0xf0
[91924.486787]  [<ffffffff8111f0ae>] sleep_on_page+0xe/0x18
[91924.486795]  [<ffffffff814d0b78>] __wait_on_bit_lock+0x48/0xa8
[91924.486805]  [<ffffffff8111f1f8>] __lock_page+0x78/0x90
[91924.486815]  [<ffffffff8109a340>] ? autoremove_wake_function+0x38/0x38
[91924.486862]  [<ffffffffa00923b8>] extent_write_cache_pages.isra.31.constprop.                                                                                                                               48+0x248/0x380 [btrfs]
[91924.486909]  [<ffffffffa00937eb>] extent_writepages+0x5b/0x80 [btrfs]
[91924.486952]  [<ffffffffa0076620>] ? __btrfs_submit_bio_start_direct_io+0x38/0                                                                                                                               x38 [btrfs]
[91924.486995]  [<ffffffffa00753a0>] btrfs_writepages+0x28/0x30 [btrfs]
[91924.487005]  [<ffffffff8112b75e>] do_writepages+0x1e/0x30
[91924.487015]  [<ffffffff811b1200>] __writeback_single_inode+0x40/0x290
[91924.487024]  [<ffffffff811b2682>] writeback_sb_inodes+0x262/0x420
[91924.487035]  [<ffffffff811b28df>] __writeback_inodes_wb+0x9f/0xd0
[91924.487044]  [<ffffffff811b2b3b>] wb_writeback+0x22b/0x348
[91924.487054]  [<ffffffff811b30f8>] bdi_writeback_workfn+0x350/0x4c8
[91924.487066]  [<ffffffff810828aa>] process_one_work+0x16a/0x460
[91924.487075]  [<ffffffff810832f1>] worker_thread+0x131/0x3d0
[91924.487085]  [<ffffffff810831c0>] ? manage_workers.isra.26+0x2c0/0x2c0
[91924.487093]  [<ffffffff8108938a>] kthread+0xea/0x100
[91924.487103]  [<ffffffff810892a0>] ? kthread_create_on_node+0x1a0/0x1a0
[91924.487117]  [<ffffffff814dbf3c>] ret_from_fork+0x7c/0xb0
[91924.487126]  [<ffffffff810892a0>] ? kthread_create_on_node+0x1a0/0x1a0

Full log: http://pastebin.com/4FWpsgF2

I tried everything (at least I think):
- Switched to another kernel
- Removed btrfs mount flags (now running with: noatime,compress=zlib,ssd,discard)
- Reporting to the Arch Linux forums
- Updating kernel
..

I think this is a btrfs bug.

Let me know if you need any more info. :)

Thanks!
Comment 1 Andrew Wade 2014-12-09 23:48:37 UTC
Created attachment 160291 [details]
Kernel log including sysrq-w output

I am able to reproduce the freeze within a few hours by attempting copies from a damaged data dvd. Kernel log attached.

Kernel is a mainline 3.18.0 kernel compiled from source.
Comment 2 Andrew Wade 2014-12-23 02:12:17 UTC
Appears to be fixed: I cannot reproduce with 3.19.0-rc1.

Thanks!
Comment 3 Andrew Wade 2015-01-07 23:54:37 UTC
Bad news, I just got the freeze again with 3.19.0-rc3:

Jan  6 22:19:18 pluto kernel: [ 7320.100031] INFO: task btrfs-transacti:673 blocked for more than 120 seconds.
Jan  6 22:19:18 pluto kernel: [ 7320.100039]       Tainted: G            E  3.19.0-rc3-ajw #83
Jan  6 22:19:18 pluto kernel: [ 7320.100041] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  6 22:19:18 pluto kernel: [ 7320.100045] btrfs-transacti D ffff8800d9293b38     0   673      2 0x00000000
Jan  6 22:19:18 pluto kernel: [ 7320.100052]  ffff8800d9293b38 ffff8800d8908000 00000000000141c0 ffff8800d9293fd8
Jan  6 22:19:18 pluto kernel: [ 7320.100058]  00000000000141c0 ffff88011abfb1c0 ffff8800d8908000 ffff8800d9293be0
Jan  6 22:19:18 pluto kernel: [ 7320.100063]  ffff88011fc94ac0 ffff8800d9293be0 ffff88011ffc80e8 0000000000000002
Jan  6 22:19:18 pluto kernel: [ 7320.100069] Call Trace:
Jan  6 22:19:18 pluto kernel: [ 7320.100081]  [<ffffffff81778060>] ? bit_wait+0x50/0x50
Jan  6 22:19:18 pluto kernel: [ 7320.100086]  [<ffffffff81777830>] io_schedule+0xa0/0x130
Jan  6 22:19:18 pluto kernel: [ 7320.100091]  [<ffffffff8177808c>] bit_wait_io+0x2c/0x50
Jan  6 22:19:18 pluto kernel: [ 7320.100095]  [<ffffffff81777cc5>] __wait_on_bit+0x65/0x90
Jan  6 22:19:18 pluto kernel: [ 7320.100102]  [<ffffffff8116c13d>] ? find_get_pages_tag+0xcd/0x170
Jan  6 22:19:18 pluto kernel: [ 7320.100108]  [<ffffffff8116b297>] wait_on_page_bit+0xc7/0xd0
Jan  6 22:19:18 pluto kernel: [ 7320.100113]  [<ffffffff810b14d0>] ? autoremove_wake_function+0x40/0x40
Jan  6 22:19:18 pluto kernel: [ 7320.100118]  [<ffffffff8116b399>] filemap_fdatawait_range+0xf9/0x190
Jan  6 22:19:18 pluto kernel: [ 7320.100158]  [<ffffffffa0241769>] btrfs_wait_marked_extents+0xe9/0x180 [btrfs]
Jan  6 22:19:18 pluto kernel: [ 7320.100186]  [<ffffffffa02b8b5f>] btrfs_write_and_wait_transaction.isra.21+0x56/0x7b [btrfs]
Jan  6 22:19:18 pluto kernel: [ 7320.100212]  [<ffffffffa02422d3>] btrfs_commit_transaction+0x7e3/0xa30 [btrfs]
Jan  6 22:19:18 pluto kernel: [ 7320.100238]  [<ffffffffa023d885>] transaction_kthread+0x1b5/0x240 [btrfs]
Jan  6 22:19:18 pluto kernel: [ 7320.100263]  [<ffffffffa023d6d0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs]
Jan  6 22:19:18 pluto kernel: [ 7320.100269]  [<ffffffff8108f722>] kthread+0xd2/0xf0
Jan  6 22:19:18 pluto kernel: [ 7320.100274]  [<ffffffff8108f650>] ? kthread_create_on_node+0x180/0x180
Jan  6 22:19:18 pluto kernel: [ 7320.100280]  [<ffffffff8177b9fc>] ret_from_fork+0x7c/0xb0
Jan  6 22:19:18 pluto kernel: [ 7320.100285]  [<ffffffff8108f650>] ? kthread_create_on_node+0x180/0x180
Comment 4 David Sterba 2015-01-30 14:20:20 UTC
Dec  9 18:05:47 pluto kernel: [80426.568428] sr 3:0:0:0: [sr0]  
Dec  9 18:05:47 pluto kernel: [80426.568433] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Dec  9 18:05:47 pluto kernel: [80426.568435] sr 3:0:0:0: [sr0]  
Dec  9 18:05:47 pluto kernel: [80426.568437] Sense Key : Medium Error [current] 
Dec  9 18:05:47 pluto kernel: [80426.568440] sr 3:0:0:0: [sr0]  
Dec  9 18:05:47 pluto kernel: [80426.568442] Add. Sense: L-EC uncorrectable error
Dec  9 18:05:47 pluto kernel: [80426.568444] sr 3:0:0:0: [sr0] CDB: 
Dec  9 18:05:47 pluto kernel: [80426.568446] Read(10): 28 00 00 07 2c 66 00 00 40 00
Dec  9 18:05:47 pluto kernel: [80426.568453] blk_update_request: I/O error, dev sr0, sector 1880472

The device is slow to respond so the timeout warnings can be expected. They're informative and may happen on a system under heavy load or in the situation you observe.
Comment 5 Andrew Wade 2015-02-24 01:09:36 UTC
Thanks for looking at this. It looks like the timeouts were due to hardware issues. One of my hard drives had a read error during a scrub a week ago and when I ran smartctl there were hundreds of reallocated sectors. My guess is the vibration from the DVD drive was triggering issues with the hdd. It's replaced now.

Note You need to log in before you can comment on or make changes to this bug.