Bug 76421
Summary: | Hang on deletion | ||
---|---|---|---|
Product: | File System | Reporter: | Joshua McKinney (bugzilla) |
Component: | btrfs | Assignee: | Josef Bacik (josef) |
Status: | RESOLVED CODE_FIX | ||
Severity: | normal | CC: | alan, boatingbum23, cedricde, dsterba, evetsso, joe, lukas.karas, m+kernel-bugs, martin, net147, ray.maslinski, stijn+bugs, szg00000, thonaldo |
Priority: | P1 | ||
Hardware: | x86-64 | ||
OS: | Linux | ||
Kernel Version: | 3.14.4-1-ARCH | Subsystem: | |
Regression: | No | Bisected commit-id: |
Description
Joshua McKinney
2014-05-18 16:23:54 UTC
I have been having a similar problem for the last week. I can create files until the disks are filled and they are fine and mounted R/W until I delete the files, at which point they get mounted Read Only. I am running a Raid 10 with a pair of 1TB and 3TB disks. Attached is my dmesg log. [ 1022.660808] WARNING: CPU: 1 PID: 2225 at fs/btrfs/super.c:257 __btrfs_abort_transaction+0x50/0x110 [btrfs]() [ 1022.660814] Modules linked in: fuse btrfs kvm_amd kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel radeon mousedev raid6_pq aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode psmouse pcspkr serio_raw k10temp xor snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi i2c_piix4 snd_hda_intel r8169 ttm mii snd_hda_codec drm_kms_helper tpm_tis snd_hwdep tpm snd_pcm drm snd_timer snd hwmon soundcore i2c_algo_bit i2c_core shpchp acpi_cpufreq processor fglrx(PO) evdev mac_hid amd_iommu_v2 button ext4 crc16 mbcache jbd2 sd_mod crc_t10dif crct10dif_common hid_generic usbhid hid atkbd libps2 sata_via ehci_pci ohci_pci ohci_hcd xhci_hcd ehci_hcd usbcore usb_common i8042 serio ahci libahci libata scsi_mod [ 1022.660902] CPU: 1 PID: 2225 Comm: btrfs Tainted: P W O 3.14.4-1-ARCH #1 [ 1022.660968] [<ffffffffa0fe2040>] __btrfs_abort_transaction+0x50/0x110 [btrfs] [ 1022.660985] [<ffffffffa0ff4aa6>] __btrfs_free_extent+0x136/0xa90 [btrfs] [ 1022.661006] [<ffffffffa1058d79>] ? btrfs_merge_delayed_refs+0x449/0x470 [btrfs] [ 1022.661024] [<ffffffffa1057b62>] ? find_ref_head.constprop.4+0x62/0x90 [btrfs] [ 1022.661040] [<ffffffffa0ffa046>] __btrfs_run_delayed_refs+0x916/0x1270 [btrfs] [ 1022.661062] [<ffffffffa102aefb>] ? release_extent_buffer+0x2b/0xd0 [btrfs] [ 1022.661080] [<ffffffffa0ffe8eb>] btrfs_run_delayed_refs.part.52+0x7b/0x260 [btrfs] [ 1022.661097] [<ffffffffa0ffeae7>] btrfs_run_delayed_refs+0x17/0x20 [btrfs] [ 1022.661116] [<ffffffffa10101a3>] __btrfs_end_transaction+0x243/0x380 [btrfs] [ 1022.661135] [<ffffffffa1010374>] btrfs_defrag_root+0x74/0xe0 [btrfs] [ 1022.661155] [<ffffffffa104457f>] btrfs_ioctl_defrag+0x13f/0x180 [btrfs] [ 1022.661174] [<ffffffffa1046617>] btrfs_ioctl+0x1a27/0x2a80 [btrfs] [ 1022.661246] BTRFS: error (device sda) in __btrfs_free_extent:5756: errno=-5 IO failure [ 1022.661256] BTRFS: error (device sda) in btrfs_run_delayed_refs:2714: errno=-5 IO failure I am also affected by the original issue. Again, it is triggered when SABnzbd deletes, but my setup is much simpler: A single 250GB SSD with btrfs, no RAID. The machine is running ArchLinux and the problem started with 3.14 (IIRC) and still occurs in 3.15.1. To be more precise: There is a folder with ~140 files inside (each ~50 MB in size). Upon deleting this folder (or the files therein), the error message below appears. The folder still exists, but it is empty (as it should be) and the program hangs until I restart the machine. The log looks as follows: [17874.289727] INFO: task SABnzbd.py:682 blocked for more than 120 seconds. [17874.289731] Tainted: G O 3.15.1-1-ARCH #1 [17874.289731] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [17874.289733] SABnzbd.py D 0000000000000001 0 682 1 0x00000000 [17874.289735] ffff88020faa7ce0 0000000000000082 ffff88020f91d180 0000000000014700 [17874.289737] ffff88020faa7fd8 0000000000014700 ffff88020f91d180 00000000ffffffef [17874.289739] ffff88008b2f2de0 ffffffffa08316bc ffff88020faa7c48 ffffffffa0830c73 [17874.289740] Call Trace: [17874.289759] [<ffffffffa08316bc>] ? __set_extent_bit+0x45c/0x550 [btrfs] [17874.289766] [<ffffffffa0830c73>] ? free_extent_state+0x43/0xc0 [btrfs] [17874.289771] [<ffffffffa08316bc>] ? __set_extent_bit+0x45c/0x550 [btrfs] [17874.289775] [<ffffffff81509a19>] schedule+0x29/0x70 [17874.289781] [<ffffffffa083261d>] lock_extent_bits+0x19d/0x200 [btrfs] [17874.289783] [<ffffffff810b1bd0>] ? __wake_up_sync+0x20/0x20 [17874.289791] [<ffffffffa08200f9>] btrfs_evict_inode+0x139/0x520 [btrfs] [17874.289794] [<ffffffff811cf754>] evict+0xb4/0x1a0 [17874.289796] [<ffffffff811cffe5>] iput+0xf5/0x1a0 [17874.289798] [<ffffffff811c4165>] do_unlinkat+0x1b5/0x300 [17874.289801] [<ffffffff81174d2c>] ? vm_munmap+0x4c/0x60 [17874.289803] [<ffffffff811c5226>] SyS_unlink+0x16/0x20 [17874.289805] [<ffffffff815159a9>] system_call_fastpath+0x16/0x1b I just got the same problem while trying to "rm -rf" a large directory tree. I am running kernel 3.15.3-1-ARCH. Here's the trace: Jul 9 20:07:42 nova kernel: sed D 0000000000000002 0 23893 1 0x00000004 Jul 9 20:07:42 nova kernel: ffff8800036a3ba0 0000000000000086 ffff880110df32f0 0000000000014700 Jul 9 20:07:42 nova kernel: ffff8800036a3fd8 0000000000014700 ffff880110df32f0 00000000ffffffef Jul 9 20:07:42 nova kernel: ffff88006bc76c00 ffffffffa02666fc ffff8800036a3b08 ffffffffa0265cb3 Jul 9 20:07:42 nova kernel: Call Trace: Jul 9 20:07:42 nova kernel: [<ffffffffa02666fc>] ? __set_extent_bit+0x45c/0x55 0 [btrfs] Jul 9 20:07:42 nova kernel: [<ffffffffa0265cb3>] ? free_extent_state+0x43/0xc0 [btrfs] Jul 9 20:07:42 nova kernel: [<ffffffffa02666fc>] ? __set_extent_bit+0x45c/0x55 0 [btrfs] Jul 9 20:07:42 nova kernel: [<ffffffff81509ef9>] schedule+0x29/0x70 Jul 9 20:07:42 nova kernel: [<ffffffffa026765d>] lock_extent_bits+0x19d/0x200 [btrfs] Jul 9 20:07:42 nova kernel: [<ffffffff810b1c10>] ? __wake_up_sync+0x20/0x20 Jul 9 20:07:42 nova kernel: [<ffffffffa0255149>] btrfs_evict_inode+0x139/0x520 [btrfs] Jul 9 20:07:42 nova kernel: [<ffffffff811cf974>] evict+0xb4/0x1a0 Jul 9 20:07:42 nova kernel: [<ffffffff811d0205>] iput+0xf5/0x1a0 Jul 9 20:07:42 nova kernel: [<ffffffff811cb8b8>] __dentry_kill+0x1a8/0x200 Jul 9 20:07:42 nova kernel: [<ffffffff811cb9d8>] dput+0xc8/0x1c0 Jul 9 20:07:42 nova kernel: [<ffffffff811c5c09>] SyS_renameat2+0x389/0x540 Jul 9 20:07:42 nova kernel: [<ffffffff81199769>] ? kmem_cache_free+0x199/0x1d0 Jul 9 20:07:42 nova kernel: [<ffffffff81174edc>] ? vm_munmap+0x4c/0x60 Jul 9 20:07:42 nova kernel: [<ffffffff811c5dfe>] SyS_rename+0x1e/0x20 Jul 9 20:07:42 nova kernel: [<ffffffff81515ee9>] system_call_fastpath+0x16/0x1 b Hope this is of help. Please advise if I can provide further info. I just got this again, doing a similar operation (a build of Chromium from source). Again, I could not remove the "src" dir (hung while attempting). I am now at kernel 3.18.2-2-ARCH. So the problem persists. I am also using a very simple filesystem (plain btrfs on SSD drive, no RAID). Anything we can do to help debug? Jan 27 13:32:36 nova kernel: sed D 0000000000000000 0 6344 1 0x00000004 Jan 27 13:32:36 nova kernel: ffff88017ab83b68 0000000000000086 ffff88022f21c670 0000000000013640 Jan 27 13:32:36 nova kernel: ffff88017ab83fd8 0000000000013640 ffff880076ca3250 ffff88022f21c670 Jan 27 13:32:36 nova kernel: ffff88017ab83ab8 ffffffffa02143ca 0000000000000fff 00000000ffffffef Jan 27 13:32:36 nova kernel: Call Trace: Jan 27 13:32:36 nova kernel: [<ffffffffa02143ca>] ? free_extent_state+0x4a/0xd0 [btrfs] Jan 27 13:32:36 nova kernel: [<ffffffffa0214e0f>] ? __set_extent_bit+0x46f/0x560 [btrfs] Jan 27 13:32:36 nova kernel: [<ffffffff81550b59>] schedule+0x29/0x70 Jan 27 13:32:36 nova kernel: [<ffffffffa0215cbb>] lock_extent_bits+0x1ab/0x220 [btrfs] Jan 27 13:32:36 nova kernel: [<ffffffff810b2e30>] ? __wake_up_sync+0x20/0x20 Jan 27 13:32:36 nova kernel: [<ffffffffa0203152>] btrfs_evict_inode+0x162/0x550 [btrfs] Jan 27 13:32:36 nova kernel: [<ffffffff811eab68>] evict+0xb8/0x1b0 Jan 27 13:32:36 nova kernel: [<ffffffff811eb405>] iput+0xf5/0x1a0 Jan 27 13:32:36 nova kernel: [<ffffffff811e69f8>] __dentry_kill+0x188/0x200 Jan 27 13:32:36 nova kernel: [<ffffffff811e6b38>] dput+0xc8/0x1c0 Jan 27 13:32:36 nova kernel: [<ffffffff811e0b29>] SyS_renameat2+0x429/0x5f0 Jan 27 13:32:36 nova kernel: [<ffffffff8118c117>] ? vma_rb_erase+0x147/0x260 Jan 27 13:32:36 nova kernel: [<ffffffff811e0d2e>] SyS_rename+0x1e/0x20 Jan 27 13:32:36 nova kernel: [<ffffffff81554ca9>] system_call_fastpath+0x12/0x17 Ran into this on a BTRFS RAID6 on 6x2TB LUKS encrypted drives. Kernel 3.19.0. The filesystem was created on a 3.19.0-rc6 kernel, with btrfs-progs-3.18.2, -O skinny-metadata,no-holes. [261745.159447] INFO: task python2.7:4956 blocked for more than 120 seconds. [261745.159452] Not tainted 3.19.0-gentoo #1 [261745.159453] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [261745.159455] python2.7 D ffff8800df48bcc8 0 4956 1 0x00000000 [261745.159460] ffff8800df48bcc8 ffff8800df48bcb8 0000000000009000 ffff88020b5a3120 [261745.159464] ffff8800df48bfd8 ffff88014ca14088 0000000000029fff ffff8800120a0c60 [261745.159467] ffff88014ca14088 ffff8800df48bc28 ffffffffa024bf45 ffff8800df48bc38 [261745.159470] Call Trace: [261745.159510] [<ffffffffa024bf45>] ? free_extent_state.part.34+0x25/0x40 [btrfs] [261745.159528] [<ffffffffa024c3d9>] ? free_extent_state+0x19/0x20 [btrfs] [261745.159545] [<ffffffffa024cc8e>] ? __set_extent_bit+0x3ae/0x510 [btrfs] [261745.159550] [<ffffffff814adcf4>] schedule+0x24/0x70 [261745.159566] [<ffffffffa024da8a>] lock_extent_bits+0x18a/0x1e0 [btrfs] [261745.159571] [<ffffffff8107ed20>] ? wait_woken+0x80/0x80 [261745.159589] [<ffffffffa023c90f>] btrfs_evict_inode+0x11f/0x4c0 [btrfs] [261745.159598] [<ffffffff8116458e>] evict+0xae/0x170 [261745.159602] [<ffffffff81164d7b>] iput+0xeb/0x180 [261745.159605] [<ffffffff8115a97b>] do_unlinkat+0x1bb/0x2e0 [261745.159608] [<ffffffff8115b231>] SyS_unlink+0x11/0x20 [261745.159612] [<ffffffff814b11d6>] system_call_fastpath+0x16/0x1b Hit this (or something similar) on Gentoo, 3.19.1, 2 separate machines. 100% triggerable when running emerge gentoo-sources. (installing the kernel source package). No RAID. [ 2162.243789] INFO: task emerge:19900 blocked for more than 120 seconds. [ 2162.243799] Not tainted 3.19.1-gentoo-ARCH #2 [ 2162.243802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 2162.243806] emerge D ffff8800153e7cb8 0 19900 1403 0x00000000 [ 2162.243814] ffff8800153e7cb8 ffff88000c1862c0 0000000000013e80 ffff8800153e7fd8 [ 2162.243820] 0000000000013e80 ffff88000c1809e0 ffff88000c1862c0 ffff8800206731b8 [ 2162.243826] ffffffff8129949c ffff8800153e7c18 ffffffff81298a9a 000000000000ffff [ 2162.243832] Call Trace: [ 2162.243848] [<ffffffff8129949c>] ? __set_extent_bit+0x44c/0x540 [ 2162.243855] [<ffffffff81298a9a>] ? free_extent_state+0x4a/0xd0 [ 2162.243862] [<ffffffff8129949c>] ? __set_extent_bit+0x44c/0x540 [ 2162.243870] [<ffffffff8160fac9>] schedule+0x29/0x70 [ 2162.243877] [<ffffffff8129a2dd>] lock_extent_bits+0x18d/0x1e0 [ 2162.243885] [<ffffffff810aef60>] ? wait_woken+0x90/0x90 [ 2162.243891] [<ffffffff81288052>] btrfs_evict_inode+0x152/0x520 [ 2162.243898] [<ffffffff811ddfc8>] evict+0xb8/0x1b0 [ 2162.243903] [<ffffffff811de855>] iput+0xf5/0x1a0 [ 2162.243910] [<ffffffff811d346a>] do_unlinkat+0x1ba/0x330 [ 2162.243917] [<ffffffff811e2a84>] ? mntput+0x24/0x40 [ 2162.243922] [<ffffffff811ccb82>] ? path_put+0x22/0x30 [ 2162.243927] [<ffffffff811c2033>] ? SyS_fchmodat+0x53/0xb0 [ 2162.243933] [<ffffffff811d3eb6>] SyS_unlink+0x16/0x20 [ 2162.243939] [<ffffffff81613729>] system_call_fastpath+0x12/0x17 Kernel 4.0.0-rc6 is still affected by this bug. It occurs once per week on my machine. I use SSD (C400-MTFDDAK256M) disk formated with btrfs for root fs and home directory (with ecryptfs)... [118637.160102] INFO: task java:1655 blocked for more than 120 seconds. [118637.160105] Tainted: G W 4.0.0-rc6 #1 [118637.160106] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [118637.160107] java D ffff880118837c28 0 1655 6245 0x00000004 [118637.160110] ffff880118837c28 ffff8802146149b0 ffff8801bd05e240 0000001018837c98 [118637.160112] ffff880118837fd8 00000000ffffffef ffff880118837fd8 ffff88001022262c [118637.160114] ffff880010222610 0000000000001000 ffff8801bcb36290 ffff880118837c48 [118637.160115] Call Trace: [118637.160121] [<ffffffff81712e87>] schedule+0x37/0x90 [118637.160124] [<ffffffff813051e2>] lock_extent_bits+0x202/0x240 [118637.160127] [<ffffffff8109ae10>] ? wait_woken+0x90/0x90 [118637.160129] [<ffffffff812f2b4f>] btrfs_evict_inode+0x14f/0x500 [118637.160131] [<ffffffff811b87c3>] evict+0xb3/0x1a0 [118637.160133] [<ffffffff811b914d>] iput+0x18d/0x200 [118637.160150] [<ffffffffa03710f1>] ecryptfs_evict_inode+0x31/0x40 [ecryptfs] [118637.160151] [<ffffffff811b87c3>] evict+0xb3/0x1a0 [118637.160152] [<ffffffff811b914d>] iput+0x18d/0x200 [118637.160154] [<ffffffff811ad929>] do_unlinkat+0x159/0x370 [118637.160157] [<ffffffff810117e5>] ? syscall_trace_enter_phase1+0x105/0x170 [118637.160160] [<ffffffff81002d25>] ? do_notify_resume+0x95/0xa0 [118637.160161] [<ffffffff811aed46>] SyS_unlink+0x16/0x20 [118637.160163] [<ffffffff81716ff2>] system_call_fastpath+0x12/0x17 Filipe just posted a patch that looks like it ought to solve this: http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg43561.html Just happened with 4.1.4. Again while installing a new version of kernel sources. [165395.120530] INFO: task emerge:7472 blocked for more than 120 seconds. [165395.120537] Not tainted 4.1.4-gentoo-ARCH #1 [165395.120539] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [165395.120542] emerge D ffff8800063cfc78 0 7472 6348 0x00000000 [165395.120549] ffff8800063cfc78 ffff88003b783cc0 ffff880035988a20 00000000016a0085 [165395.120554] ffff8800063d0000 ffff880007eb1b0c ffff880007eb1af0 0000000000001000 [165395.120558] ffff8800395bba60 ffff8800063cfc98 ffffffff81632577 ffff880007eb1b0c [165395.120563] Call Trace: [165395.120573] [<ffffffff81632577>] schedule+0x37/0x90 [165395.120578] [<ffffffff812a5dbd>] lock_extent_bits+0x18d/0x1e0 [165395.120584] [<ffffffff810b3e90>] ? wait_woken+0x90/0x90 [165395.120588] [<ffffffff81293772>] btrfs_evict_inode+0x152/0x580 [165395.120593] [<ffffffff811e5b78>] evict+0xb8/0x1b0 [165395.120596] [<ffffffff811e64db>] iput+0x1bb/0x220 [165395.120600] [<ffffffff811db023>] do_unlinkat+0x203/0x320 [165395.120605] [<ffffffff811eaa94>] ? mntput+0x24/0x40 [165395.120610] [<ffffffff811d4372>] ? path_put+0x22/0x30 [165395.120615] [<ffffffff811c9893>] ? SyS_fchmodat+0x53/0xb0 [165395.120619] [<ffffffff811db926>] SyS_unlink+0x16/0x20 [165395.120623] [<ffffffff816371ae>] system_call_fastpath+0x12/0x71 The patch I mentioned above was merged into 4.2-rc1, and so far on 4.2 (Arch Linux) I'm unable to reproduce this. In earlier kernels I was able to reproduce this easily using sabnzbd, but so far so good. Hi, sorry, somehow I got the idea that it got merged as a bugfix... Installed 4.2, emerged a couple of linux sources, no issues so far. Thanks. |