Bug 53041 - kernel bug at fs/btrfs/extent_io.c:1955 after removing a faulty disk
Summary: kernel bug at fs/btrfs/extent_io.c:1955 after removing a faulty disk
Status: RESOLVED OBSOLETE
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-01-25 18:14 UTC by Andrew McNabb
Modified: 2022-09-30 14:51 UTC (History)
5 users (show)

See Also:
Kernel Version: 3.7.2
Subsystem:
Regression: No
Bisected commit-id:


Attachments

Description Andrew McNabb 2013-01-25 18:14:27 UTC
The following kernel bug occurred after I removed a faulty disk that was part of a btrfs filesystem. I'm using the Fedora 3.7.2-204.fc18.x86_64 kernel.

kernel BUG at fs/btrfs/extent_io.c:1955!
invalid opcode: 0000 [#1] SMP 
Modules linked in: iptable_raw xt_CT nf_conntrack bridge stp llc f71882fg hwmon_vid snd_hda_codec_realtek snd_hda_intel snd_hda_codec vhost_net tun macvtap macvlan kvm_amd snd_hwdep edac_core kvm snd_seq snd_seq_device snd_pcm snd_page_alloc snd_timer snd edac_mce_amd soundcore sp5100_tco i2c_piix4 k10temp serio_raw ppdev shpchp microcode parport_pc parport nfsd auth_rpcgss nfs_acl lockd uinput binfmt_misc radeon btrfs i2c_algo_bit drm_kms_helper ata_generic pata_acpi ttm libcrc32c zlib_deflate drm r8169 pata_atiixp i2c_core mii wmi sunrpc
CPU 2 
Pid: 2348, comm: btrfs-endio-4 Not tainted 3.7.2-204.fc18.x86_64 #1 MICRO-STAR INTERNATIONAL CO.,LTD MS-7596/880GM-E43 (MS-7596)
RIP: 0010:[<ffffffffa0142a69>]  [<ffffffffa0142a69>] repair_io_failure+0x1c9/0x200 [btrfs]
RSP: 0018:ffff88031335dc98  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000028bc680 RCX: 0000000000000000
RDX: ffff88040345f720 RSI: 00000005a8459000 RDI: ffff880405c46088
RBP: ffff88031335dd08 R08: 0000000000000002 R09: ffff88040345f720
R10: ffff880405c46088 R11: 0000000026860000 R12: 00000005a8459000
R13: 0000000000001000 R14: ffffea000b29a580 R15: ffff8802cacf2600
FS:  00007f9d840e7780(0000) GS:ffff88041fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000003684e797d0 CR3: 00000003e7c92000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process btrfs-endio-4 (pid: 2348, threadinfo ffff88031335c000, task ffff8803e7a7c560)
Stack:
 0000000000000000 ffff880408aee110 0000000000007000 ffff88040345f720
 ffff880200000000 ffff8802d5740000 ffff88031335dcc8 ffff88031335dcc8
 ffff88031335dd08 ffffea000b29a580 0000000000000000 ffff8802cacf2c90
Call Trace:
 [<ffffffffa01433ba>] end_bio_extent_readpage+0x91a/0xa40 [btrfs]
 [<ffffffff811c8abd>] bio_endio+0x1d/0x30
 [<ffffffffa0120f11>] end_workqueue_fn+0x41/0x50 [btrfs]
 [<ffffffffa01514d6>] worker_loop+0x136/0x580 [btrfs]
 [<ffffffffa01513a0>] ? btrfs_queue_worker+0x300/0x300 [btrfs]
 [<ffffffff81081c80>] kthread+0xc0/0xd0
 [<ffffffff81010000>] ? ftrace_raw_event_xen_mmu_flush_tlb_others+0x50/0xe0
 [<ffffffff81081bc0>] ? kthread_create_on_node+0x120/0x120
 [<ffffffff8163de2c>] ret_from_fork+0x7c/0xb0
 [<ffffffff81081bc0>] ? kthread_create_on_node+0x120/0x120
Code: 40 00 4c 89 ff e8 a8 74 08 e1 31 f6 48 89 df e8 ae dd 00 00 b8 fb ff ff ff eb b6 0f 1f 80 00 00 00 00 b8 fb ff ff ff eb a8 0f 0b <0f> 0b 49 8b 46 08 48 8b 8b 88 00 00 00 4d 89 e0 48 8b 55 90 48 
RIP  [<ffffffffa0142a69>] repair_io_failure+0x1c9/0x200 [btrfs]
 RSP <ffff88031335dc98>
Comment 1 Andrew McNabb 2013-01-25 18:28:27 UTC
After the kernel bug, the filesystem is in a weird state. I tried adding a replacement partition with "btrfs dev add /dev/sdf1 /aml", but this is hanging. Some (but not all) reads and writes to the filesystem seem to be hanging, even though the filesystem is set for RAID1 (so all of the data should be available even with one disk missing). The kernel log continues to fill up with errors about the missing disk, even though it's been some 20 or 30 minutes since the disk was removed:

[ 2515.084551] btrfs: bdev /dev/sdc1 errs: wr 162, rd 0, flush 1369, corrupt 0, gen 0

I tried doing another "btrfs dev delete /dev/sdc1 /aml" to make it clear that /dev/sdc1 is really gone, but this command hangs, and the error messages continue to fill the kernel logs. When I run "btrfs fi show aml", it gives:

Label: 'aml'  uuid: aa1d71ba-57b0-4591-a4b0-f0130c05aa40
	Total devices 4 FS bytes used 62.89GB
	devid    4 size 3.64TB used 36.00GB path /dev/sde1
	devid    3 size 3.64TB used 37.00GB path /dev/sdd1
	devid    1 size 3.64TB used 32.03GB path /dev/sdb1
	*** Some devices missing

In short, there's a lot of hanging happening, where I would expect the filesystem to just move on without the missing disk. Is there any other information I could provide that would be helpful to add to this bug report?
Comment 2 Josef Bacik 2013-04-30 17:10:49 UTC
Note to self, try and reproduce this, I think we still have this problem.
Comment 3 Martin Hierholzer 2013-07-18 10:38:06 UTC
I can confirm this bug still being present in linux 3.10.1 (dmesg log below, kernel is tainted due to some acpi driver license issue). After facing data corruption beyond repair (I assume due to some faulty piece of hardware other than the disks. repair tool crashes, I sent a mail to the mailing list) I decided to remove one of the disks to build a fresh file system and copy all data from the old one there. Every time the rsync process comes across a broken file I get a lot of the following error messages and then the kernel BUG is raised. Then the file system is basically unusable, the only way out is a forced reboot (shutdown will hang at unmounting the broken fs).



btrfs: corrupt leaf, slot offset bad: block=13000034267136,root=1, slot=3
btrfs no csum found for inode 15235311 start 0
btrfs: corrupt leaf, slot offset bad: block=13000034267136,root=1, slot=3
btrfs no csum found for inode 15235311 start 0
btrfs csum failed ino 15235311 extent 1183252373504 csum 3063016663 wanted 0 mirror 0
btrfs: corrupt leaf, slot offset bad: block=13000034267136,root=1, slot=3
btrfs no csum found for inode 15235311 start 0
btrfs: corrupt leaf, slot offset bad: block=13000034267136,root=1, slot=3
btrfs no csum found for inode 15235311 start 0
btrfs: corrupt leaf, slot offset bad: block=13000034267136,root=1, slot=3
btrfs no csum found for inode 15235311 start 0
btrfs: corrupt leaf, slot offset bad: block=13000034267136,root=1, slot=3
btrfs no csum found for inode 15235311 start 0
[... more messages like this appear ...]
------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:2311!
invalid opcode: 0000 [#1] SMP 
Modules linked in: nfsd nfs_acl exportfs auth_rpcgss oid_registry xt_tcpudp iptable_filter ip_tables x_tables autofs4 hidp rfcomm bluetooth rfkill lockd sunrpc be2iscsi iscsi_boot_sysfs ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp libiscsi scsi_transport_iscsi binfmt_misc btrfs raid6_pq zlib_deflate xor loop dm_multipath scsi_dh video sbs sbshc hed battery acpi_ipmi ipmi_msghandler ac ipv6 parport_pc lp parport sg snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm snd_timer iTCO_wdt iTCO_vendor_support i2c_i801 snd i2c_core r8169 tpm_tis asus_atk0110 hwmon intel_agp mii lpc_ich intel_gtt pata_jmicron soundcore mfd_core tpm pata_acpi agpgart button tpm_bios rtc_cmos snd_page_alloc ata_generic pcspkr serio_raw ehci_pci acpi_cpufreq mperf shpchp pci_hotplug dm_snapshot dm_zero dm_mirror dm_region_hash dm_log dm_mod usb_storage uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
CPU: 3 PID: 26888 Comm: btrfs-endio-2 Tainted: P             3.10.1 #1
Hardware name: System manufacturer System Product Name/P7H55-M PRO, BIOS 1604    07/22/2010
task: ffff8804094481c0 ti: ffff88000ba14000 task.ti: ffff88000ba14000
RIP: 0010:[<ffffffffa03228dc>]  [<ffffffffa03228dc>] end_bio_extent_readpage+0x7ec/0x7f0 [btrfs]
RSP: 0018:ffff88000ba15da0  EFLAGS: 00010297
RAX: 0000000000000001 RBX: 0000000000002000 RCX: ffff88040b75c9c0
RDX: 0000000000000008 RSI: 0000000000002000 RDI: ffff88000b45e040
RBP: ffff8803e99e0f18 R08: 0000000000000002 R09: 00000b7df2630000
R10: ffff88007d1b5000 R11: ffff88000b243800 R12: ffffea00001e1a58
R13: ffff880009133a90 R14: ffff88000b45e1d0 R15: ffff880050abc180
FS:  0000000000000000(0000) GS:ffff88041fc60000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffffff600400 CR3: 000000000159f000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff88000b353d10 ffff88000ba15fd8 ffff88000b45e040 0000000000002fff
 ffff8803e99e0f08 ffff8803e99e0e90 0000000000000001 fffffffb00000001
 ffff880000000002 ffff88000b45e040 ffff88000b45e000 ffff880009133a90
Call Trace:
 [<ffffffffa0331529>] ? worker_loop+0x139/0x4e0 [btrfs]
 [<ffffffff8106371b>] ? idle_balance+0xdb/0x130
 [<ffffffff8141d743>] ? __schedule+0x263/0x680
 [<ffffffffa03313f0>] ? btrfs_queue_worker+0x300/0x300 [btrfs]
 [<ffffffff8104f0df>] ? kthread+0xaf/0xc0
 [<ffffffff8104f030>] ? kthread_create_on_node+0x110/0x110
 [<ffffffff8141f2ac>] ? ret_from_fork+0x7c/0xb0
 [<ffffffff8104f030>] ? kthread_create_on_node+0x110/0x110
Code: 37 a0 48 0f 44 f0 31 c0 e8 9f 5c 0f e1 e9 b6 f9 ff ff f0 41 ff 86 68 fe ff ff 4c 89 ff e8 cd 63 db e0 e9 1f f9 ff ff 0f 0b 0f 0b <0f> 0b 0f 0b 41 57 49 89 ff 41 56 41 55 41 89 d5 41 54 49 89 f4 
RIP  [<ffffffffa03228dc>] end_bio_extent_readpage+0x7ec/0x7f0 [btrfs]
 RSP <ffff88000ba15da0>
---[ end trace 0257f98a704c913f ]---
Comment 4 David Sterba 2022-09-30 14:51:42 UTC
This is a semi-automated bugzilla cleanup, report is against an old kernel version. If the problem still happens, please open a new bug. Thanks.

Note You need to log in before you can comment on or make changes to this bug.