After a lot of activity (caused by NZBGet), the program crashes: [91924.486480] INFO: task kworker/u4:1:17342 blocked for more than 120 seconds. [91924.486557] Not tainted 3.15.9-2-ck #1 [91924.486593] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [91924.486649] kworker/u4:1 D 0000000000000001 0 17342 2 0x00000000 [91924.486670] Workqueue: writeback bdi_writeback_workfn (flush-btrfs-1) [91924.486677] ffff88000bc2f9c8 0000000000000046 ffff88000bc2f8d8 ffffffff00000 000 [91924.486699] ffff880212740000 ffff880212740000 ffff88000bc2ffd8 0000000000014 680 [91924.486709] 0000000000014680 ffff880212740000 ffff88000bc2ffd8 0000000000000 002 [91924.486718] Call Trace: [91924.486732] [<ffffffff8111f0a0>] ? filemap_fdatawait+0x30/0x30 [91924.486741] [<ffffffff814d059b>] ? preempt_schedule+0x3b/0x58 [91924.486751] [<ffffffff8101cc81>] ? read_tsc+0x9/0x20 [91924.486760] [<ffffffff8101cc81>] ? read_tsc+0x9/0x20 [91924.486769] [<ffffffff8111f0a0>] ? filemap_fdatawait+0x30/0x30 [91924.486778] [<ffffffff814d0781>] io_schedule+0x91/0xf0 [91924.486787] [<ffffffff8111f0ae>] sleep_on_page+0xe/0x18 [91924.486795] [<ffffffff814d0b78>] __wait_on_bit_lock+0x48/0xa8 [91924.486805] [<ffffffff8111f1f8>] __lock_page+0x78/0x90 [91924.486815] [<ffffffff8109a340>] ? autoremove_wake_function+0x38/0x38 [91924.486862] [<ffffffffa00923b8>] extent_write_cache_pages.isra.31.constprop. 48+0x248/0x380 [btrfs] [91924.486909] [<ffffffffa00937eb>] extent_writepages+0x5b/0x80 [btrfs] [91924.486952] [<ffffffffa0076620>] ? __btrfs_submit_bio_start_direct_io+0x38/0 x38 [btrfs] [91924.486995] [<ffffffffa00753a0>] btrfs_writepages+0x28/0x30 [btrfs] [91924.487005] [<ffffffff8112b75e>] do_writepages+0x1e/0x30 [91924.487015] [<ffffffff811b1200>] __writeback_single_inode+0x40/0x290 [91924.487024] [<ffffffff811b2682>] writeback_sb_inodes+0x262/0x420 [91924.487035] [<ffffffff811b28df>] __writeback_inodes_wb+0x9f/0xd0 [91924.487044] [<ffffffff811b2b3b>] wb_writeback+0x22b/0x348 [91924.487054] [<ffffffff811b30f8>] bdi_writeback_workfn+0x350/0x4c8 [91924.487066] [<ffffffff810828aa>] process_one_work+0x16a/0x460 [91924.487075] [<ffffffff810832f1>] worker_thread+0x131/0x3d0 [91924.487085] [<ffffffff810831c0>] ? manage_workers.isra.26+0x2c0/0x2c0 [91924.487093] [<ffffffff8108938a>] kthread+0xea/0x100 [91924.487103] [<ffffffff810892a0>] ? kthread_create_on_node+0x1a0/0x1a0 [91924.487117] [<ffffffff814dbf3c>] ret_from_fork+0x7c/0xb0 [91924.487126] [<ffffffff810892a0>] ? kthread_create_on_node+0x1a0/0x1a0 Full log: http://pastebin.com/4FWpsgF2 I tried everything (at least I think): - Switched to another kernel - Removed btrfs mount flags (now running with: noatime,compress=zlib,ssd,discard) - Reporting to the Arch Linux forums - Updating kernel .. I think this is a btrfs bug. Let me know if you need any more info. :) Thanks!
Created attachment 160291 [details] Kernel log including sysrq-w output I am able to reproduce the freeze within a few hours by attempting copies from a damaged data dvd. Kernel log attached. Kernel is a mainline 3.18.0 kernel compiled from source.
Appears to be fixed: I cannot reproduce with 3.19.0-rc1. Thanks!
Bad news, I just got the freeze again with 3.19.0-rc3: Jan 6 22:19:18 pluto kernel: [ 7320.100031] INFO: task btrfs-transacti:673 blocked for more than 120 seconds. Jan 6 22:19:18 pluto kernel: [ 7320.100039] Tainted: G E 3.19.0-rc3-ajw #83 Jan 6 22:19:18 pluto kernel: [ 7320.100041] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 6 22:19:18 pluto kernel: [ 7320.100045] btrfs-transacti D ffff8800d9293b38 0 673 2 0x00000000 Jan 6 22:19:18 pluto kernel: [ 7320.100052] ffff8800d9293b38 ffff8800d8908000 00000000000141c0 ffff8800d9293fd8 Jan 6 22:19:18 pluto kernel: [ 7320.100058] 00000000000141c0 ffff88011abfb1c0 ffff8800d8908000 ffff8800d9293be0 Jan 6 22:19:18 pluto kernel: [ 7320.100063] ffff88011fc94ac0 ffff8800d9293be0 ffff88011ffc80e8 0000000000000002 Jan 6 22:19:18 pluto kernel: [ 7320.100069] Call Trace: Jan 6 22:19:18 pluto kernel: [ 7320.100081] [<ffffffff81778060>] ? bit_wait+0x50/0x50 Jan 6 22:19:18 pluto kernel: [ 7320.100086] [<ffffffff81777830>] io_schedule+0xa0/0x130 Jan 6 22:19:18 pluto kernel: [ 7320.100091] [<ffffffff8177808c>] bit_wait_io+0x2c/0x50 Jan 6 22:19:18 pluto kernel: [ 7320.100095] [<ffffffff81777cc5>] __wait_on_bit+0x65/0x90 Jan 6 22:19:18 pluto kernel: [ 7320.100102] [<ffffffff8116c13d>] ? find_get_pages_tag+0xcd/0x170 Jan 6 22:19:18 pluto kernel: [ 7320.100108] [<ffffffff8116b297>] wait_on_page_bit+0xc7/0xd0 Jan 6 22:19:18 pluto kernel: [ 7320.100113] [<ffffffff810b14d0>] ? autoremove_wake_function+0x40/0x40 Jan 6 22:19:18 pluto kernel: [ 7320.100118] [<ffffffff8116b399>] filemap_fdatawait_range+0xf9/0x190 Jan 6 22:19:18 pluto kernel: [ 7320.100158] [<ffffffffa0241769>] btrfs_wait_marked_extents+0xe9/0x180 [btrfs] Jan 6 22:19:18 pluto kernel: [ 7320.100186] [<ffffffffa02b8b5f>] btrfs_write_and_wait_transaction.isra.21+0x56/0x7b [btrfs] Jan 6 22:19:18 pluto kernel: [ 7320.100212] [<ffffffffa02422d3>] btrfs_commit_transaction+0x7e3/0xa30 [btrfs] Jan 6 22:19:18 pluto kernel: [ 7320.100238] [<ffffffffa023d885>] transaction_kthread+0x1b5/0x240 [btrfs] Jan 6 22:19:18 pluto kernel: [ 7320.100263] [<ffffffffa023d6d0>] ? btrfs_cleanup_transaction+0x570/0x570 [btrfs] Jan 6 22:19:18 pluto kernel: [ 7320.100269] [<ffffffff8108f722>] kthread+0xd2/0xf0 Jan 6 22:19:18 pluto kernel: [ 7320.100274] [<ffffffff8108f650>] ? kthread_create_on_node+0x180/0x180 Jan 6 22:19:18 pluto kernel: [ 7320.100280] [<ffffffff8177b9fc>] ret_from_fork+0x7c/0xb0 Jan 6 22:19:18 pluto kernel: [ 7320.100285] [<ffffffff8108f650>] ? kthread_create_on_node+0x180/0x180
Dec 9 18:05:47 pluto kernel: [80426.568428] sr 3:0:0:0: [sr0] Dec 9 18:05:47 pluto kernel: [80426.568433] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Dec 9 18:05:47 pluto kernel: [80426.568435] sr 3:0:0:0: [sr0] Dec 9 18:05:47 pluto kernel: [80426.568437] Sense Key : Medium Error [current] Dec 9 18:05:47 pluto kernel: [80426.568440] sr 3:0:0:0: [sr0] Dec 9 18:05:47 pluto kernel: [80426.568442] Add. Sense: L-EC uncorrectable error Dec 9 18:05:47 pluto kernel: [80426.568444] sr 3:0:0:0: [sr0] CDB: Dec 9 18:05:47 pluto kernel: [80426.568446] Read(10): 28 00 00 07 2c 66 00 00 40 00 Dec 9 18:05:47 pluto kernel: [80426.568453] blk_update_request: I/O error, dev sr0, sector 1880472 The device is slow to respond so the timeout warnings can be expected. They're informative and may happen on a system under heavy load or in the situation you observe.
Thanks for looking at this. It looks like the timeouts were due to hardware issues. One of my hard drives had a read error during a scrub a week ago and when I ran smartctl there were hundreds of reallocated sectors. My guess is the vibration from the DVD drive was triggering issues with the hdd. It's replaced now.