Bug 196251 - kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent_io.c:2328 when replacing disk
Summary: kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent_io.c:2328 when replacing...
Status: NEW
Alias: None
Product: File System
Classification: Unclassified
Component: btrfs (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: Josef Bacik
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-03 13:54 UTC by yoasif
Modified: 2017-07-07 20:27 UTC (History)
2 users (show)

See Also:
Kernel Version: 4.12 rc7
Tree: Mainline
Regression: No


Attachments
dmesg (103.74 KB, text/plain)
2017-07-03 13:54 UTC, yoasif
Details
dmesg 4.12 (non RC) (107.70 KB, text/plain)
2017-07-03 15:48 UTC, yoasif
Details

Description yoasif 2017-07-03 13:54:59 UTC
Created attachment 257309 [details]
dmesg

I'm replacing a btrfs device that was set up as a partition with one that is a whole disk. 

$ uname -a
Linux ubuntu-server 4.12.0-041200rc7-generic #201706252231 SMP Mon Jun 26 02:33:00 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ btrfs --version
btrfs-progs v4.4

$ sudo btrfs fi show
[sudo] password for asif: 
Label: none  uuid: 48ed8a66-731d-499b-829e-dd07dd7260cc
	Total devices 14 FS bytes used 39.80TiB
	devid    0 size 7.28TiB used 7.07TiB path /dev/sdk
	devid    4 size 5.46TiB used 5.46TiB path /dev/sdd
	devid    5 size 5.46TiB used 5.43TiB path /dev/sde
	devid    7 size 5.46TiB used 5.43TiB path /dev/sdg
	devid    8 size 5.46TiB used 5.45TiB path /dev/sdi
	devid    9 size 5.46TiB used 5.45TiB path /dev/sdc
	devid   10 size 5.46TiB used 5.43TiB path /dev/sdf
	devid   11 size 7.28TiB used 7.25TiB path /dev/sdh
	devid   12 size 5.46TiB used 5.44TiB path /dev/sdn
	devid   14 size 7.28TiB used 7.25TiB path /dev/sdm
	devid   15 size 7.28TiB used 7.25TiB path /dev/sdo
	devid   17 size 5.46TiB used 5.43TiB path /dev/sdb
	devid   18 size 7.28TiB used 7.25TiB path /dev/sdl
	devid   19 size 7.28TiB used 7.07TiB path /dev/sdj2


$ sudo btrfs fi df /media/camino/
Data, RAID1: total=39.75TiB, used=39.75TiB
System, RAID1: total=43.12MiB, used=6.69MiB
Metadata, RAID1: total=54.00GiB, used=51.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

The replace is getting stuck at the 53% mark with dmesg errors as follows:

[  584.185198] ------------[ cut here ]------------
[  584.185199] kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent_io.c:2328!
[  584.185248] invalid opcode: 0000 [#1] SMP
[  584.185271] Modules linked in: zram binfmt_misc dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cmdlinepart pcbc intel_spi_platform intel_spi spi_nor mtd ipmi_ssif aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf joydev input_leds mei_me mei intel_pch_thermal lpc_ich shpchp ie31200_edac mac_hid tpm_crb ipmi_si ipmi_devintf ipmi_msghandler acpi_pad kvm_intel kvm irqbypass autofs4 btrfs xor raid6_pq igb ast ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops hid_generic e1000e dca uas ahci usbhid mpt3sas ptp libahci drm usb_storage hid i2c_algo_bit raid_class pps_core scsi_transport_sas video
[  584.185631] CPU: 6 PID: 4663 Comm: kworker/u16:1 Not tainted 4.12.0-041200rc7-generic #201706252231
[  584.185679] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 3.0 04/24/2015
[  584.185754] Workqueue: btrfs-endio btrfs_endio_helper [btrfs]
[  584.185787] task: ffff9317333e9680 task.stack: ffffae8f8ddc8000
[  584.185847] RIP: 0010:btrfs_check_repairable+0xf2/0x100 [btrfs]
[  584.185879] RSP: 0018:ffffae8f8ddcbce8 EFLAGS: 00010282
[  584.185909] RAX: 0000000000000001 RBX: ffff93172ab7b200 RCX: 0000000000000001
[  584.185948] RDX: 0000000000000003 RSI: 0000000000000000 RDI: ffff93171f9ace60
[  584.185986] RBP: ffffae8f8ddcbd08 R08: 00010f5dc7780000 R09: 00010f5e07780000
[  584.186025] R10: 0000000000000000 R11: 00000000fffffffb R12: 00000000f4492780
[  584.186063] R13: ffff93171f9a0000 R14: ffff9316f4492f68 R15: 0000000000000000
[  584.186102] FS:  0000000000000000(0000) GS:ffff93175fd80000(0000) knlGS:0000000000000000
[  584.186145] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  584.186160] BTRFS warning (device sdb): csum failed root 5 ino 319406 off 4265641668608 csum 0x2c6808e7 expected csum 0x00000000 mirror 0

More in log file attached.
Comment 1 yoasif 2017-07-03 15:46:59 UTC
Upgraded to 4.12 (instead of the rc) and the problem continues:

# uname -a
Linux ubuntu-server 4.12.0-041200rc7-generic #201706252231 SMP Mon Jun 26 02:33:00 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

$ btrfs --version
btrfs-progs v4.4

# btrfs fi show
Label: none  uuid: 48ed8a66-731d-499b-829e-dd07dd7260cc
	Total devices 14 FS bytes used 39.80TiB
	devid    0 size 7.28TiB used 7.07TiB path /dev/sdk
	devid    4 size 5.46TiB used 5.46TiB path /dev/sdb
	devid    5 size 5.46TiB used 5.43TiB path /dev/sdc
	devid    7 size 5.46TiB used 5.43TiB path /dev/sde
	devid    8 size 5.46TiB used 5.45TiB path /dev/sdf
	devid    9 size 5.46TiB used 5.45TiB path /dev/sda
	devid   10 size 5.46TiB used 5.43TiB path /dev/sdd
	devid   11 size 7.28TiB used 7.25TiB path /dev/sdg
	devid   12 size 5.46TiB used 5.44TiB path /dev/sdn
	devid   14 size 7.28TiB used 7.25TiB path /dev/sdm
	devid   15 size 7.28TiB used 7.25TiB path /dev/sdo
	devid   17 size 5.46TiB used 5.43TiB path /dev/sdj
	devid   18 size 7.28TiB used 7.25TiB path /dev/sdl
	devid   19 size 7.28TiB used 7.07TiB path /dev/sdh2


# btrfs fi df /media/camino/
Data, RAID1: total=39.75TiB, used=39.75TiB
System, RAID1: total=43.12MiB, used=6.69MiB
Metadata, RAID1: total=54.00GiB, used=51.55GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

New dmesg also attached.
Comment 2 yoasif 2017-07-03 15:48:19 UTC
Created attachment 257317 [details]
dmesg 4.12 (non RC)
Comment 3 William Koh 2017-07-07 20:27:55 UTC
Just looking at the new dmesg for v4.12, I'm seeing something really odd. 

[  419.531863] BTRFS warning (device sdb): csum failed root 5 ino 319406 off 298371032416256 csum 0x3722e609 expected csum 0x00000000 mirror -173339200

That's right before the bug was triggered, but the mirror number looks a bit funny to me. In v4.12rc7, the numbers seemed fine, so something must have changed between those two versions. That's probably not the fix to this bug, seeing as how rc7 had the same error, but wanted to point it out in case it causes something else later on that can be prevented.

Note You need to log in before you can comment on or make changes to this bug.