I keep hitting the bug detailed in the backtrace below. I am running Fedora 26 on an Hp Microserver Gen 8. I have a 2 disk RAID 1 array set up with mdadm comprising two 2TB disks. I set up the raid array as follows: mdadm -R --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1 I am hitting the bug when attempting to write random data to the md device by doing the following: cryptsetup open --type plain /dev/md0 container --key-file /dev/random dd bs=1M if=/dev/zero of=/dev/mapper/container status=progress At some point during the dd invocation, after a few GB has been written, I see the following bug: [ 464.808442] ------------[ cut here ]------------ [ 464.808445] kernel BUG at block/blk-core.c:2054! [ 464.808506] invalid opcode: 0000 [#1] SMP [ 464.808561] Modules linked in: dm_crypt ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support gpio_ich ipmi_ssif intel_uncore intel_rapl_perf raid1 ipmi_si ipmi_devintf hpwdt hpilo ipmi_msghandler acpi_power_meter pcc_cpufreq lpc_ich tpm_tis tpm_tis_core shpchp ie31200_edac tpm i2c_algo_bit [ 464.808724] drm_kms_helper ttm crc32c_intel drm uas serio_raw usb_storage tg3 ptp pps_core [ 464.808790] CPU: 0 PID: 493 Comm: md0_resync Not tainted 4.12.5-300.fc26.x86_64 #1 [ 464.808848] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015 [ 464.808906] task: ffff8a68eb41c980 task.stack: ffffaf2881e8c000 [ 464.808968] RIP: 0010:generic_make_request+0x2c4/0x2d0 [ 464.809025] RSP: 0000:ffffaf2881e8fbe0 EFLAGS: 00010286 [ 464.809241] RAX: ffff8a68eb41c980 RBX: ffff8a6836195a00 RCX: 0000000000000000 [ 464.809460] RDX: 0000000000000402 RSI: 0000000000000001 RDI: ffff8a6836abedb8 [ 464.809681] RBP: ffffaf2881e8fc30 R08: 0000000000000010 R09: 0000000000001000 [ 464.809900] R10: ffffaf2881e8fc48 R11: 00000000ffffffff R12: 0000000000000008 [ 464.810120] R13: ffff8a683601f600 R14: 0000000000000080 R15: ffff8a68eb8d8000 [ 464.810341] FS: 0000000000000000(0000) GS:ffff8a690a600000(0000) knlGS:0000000000000000 [ 464.810721] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 464.810938] CR2: 00007fb59a851000 CR3: 000000005d01c000 CR4: 00000000001406f0 [ 464.811161] Call Trace: [ 464.811381] ? bitmap_start_sync+0x56/0x100 [ 464.811601] raid1_sync_request+0xadf/0xb90 [raid1] [ 464.811820] ? raid1_sync_request+0xadf/0xb90 [raid1] [ 464.812039] ? is_mddev_idle+0xa6/0x10a [ 464.812256] md_do_sync+0x8cd/0xee0 [ 464.812475] ? finish_wait+0x80/0x80 [ 464.812693] md_thread+0x125/0x170 [ 464.812910] ? md_thread+0x125/0x170 [ 464.813128] kthread+0x125/0x140 [ 464.813346] ? find_pers+0x70/0x70 [ 464.813560] ? kthread_park+0x60/0x60 [ 464.813778] ? do_syscall_64+0x67/0x140 [ 464.814001] ret_from_fork+0x25/0x30 [ 464.814219] Code: bc 24 68 07 00 00 f0 49 83 ac 24 68 07 00 00 01 0f 85 85 fe ff ff 41 ff 94 24 78 07 00 00 e9 78 fe ff ff 4c 89 28 e9 ae fd ff ff <0f> 0b e8 95 6e cc ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 [ 464.814812] RIP: generic_make_request+0x2c4/0x2d0 RSP: ffffaf2881e8fbe0 [ 464.815058] ---[ end trace 2b032b91acd7b852 ]--- [ 464.820281] ------------[ cut here ]------------ [ 464.820511] WARNING: CPU: 0 PID: 493 at kernel/exit.c:785 do_exit+0x51/0xb50 [ 464.820734] Modules linked in: dm_crypt ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables sunrpc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel intel_cstate iTCO_wdt iTCO_vendor_support gpio_ich ipmi_ssif intel_uncore intel_rapl_perf raid1 ipmi_si ipmi_devintf hpwdt hpilo ipmi_msghandler acpi_power_meter pcc_cpufreq lpc_ich tpm_tis tpm_tis_core shpchp ie31200_edac tpm i2c_algo_bit [ 464.822702] drm_kms_helper ttm crc32c_intel drm uas serio_raw usb_storage tg3 ptp pps_core [ 464.823095] CPU: 0 PID: 493 Comm: md0_resync Tainted: G D 4.12.5-300.fc26.x86_64 #1 [ 464.823487] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015 [ 464.823710] task: ffff8a68eb41c980 task.stack: ffffaf2881e8c000 [ 464.823933] RIP: 0010:do_exit+0x51/0xb50 [ 464.824151] RSP: 0000:ffffaf2881e8fed8 EFLAGS: 00010202 [ 464.824373] RAX: ffffaf2881e8fd80 RBX: ffff8a68eb41c980 RCX: 0000000000000000 [ 464.824598] RDX: ffff8a68df4a2400 RSI: 0000000000000000 RDI: ffffffff94ef8d60 [ 464.824821] RBP: ffffaf2881e8ff48 R08: 00000000000e9000 R09: 00000000000e8025 [ 464.825043] R10: 0000000000000008 R11: 0000000000000000 R12: 000000000000000b [ 464.825266] R13: ffffaf2881e8fb38 R14: 0000000000000000 R15: ffffaf2881e8fb38 [ 464.825493] FS: 0000000000000000(0000) GS:ffff8a690a600000(0000) knlGS:0000000000000000 [ 464.825880] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 464.826100] CR2: 00007fb59a851000 CR3: 000000005d01c000 CR4: 00000000001406f0 [ 464.826322] Call Trace: [ 464.826546] ? kthread+0x125/0x140 [ 464.826767] rewind_stack_do_exit+0x17/0x20 [ 464.826988] Code: 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 a7 b8 06 00 48 8b 83 60 0b 00 00 48 85 c0 74 0e 48 8b 10 48 39 d0 0f 84 60 04 00 00 <0f> ff 65 8b 05 86 4f f6 6b 25 00 ff 1f 00 89 45 9c 0f 85 28 08 [ 464.827576] ---[ end trace 2b032b91acd7b853 ]---
Fixed by: commit 0c9d5b127f695818c2c5a3868c1f28ca2969e905 Author: NeilBrown <neilb@suse.com> Date: Thu Apr 6 12:06:37 2017 +1000 md/raid1: avoid reusing a resync bio after error handling.
(In reply to Jens Axboe from comment #1) > Fixed by: > > commit 0c9d5b127f695818c2c5a3868c1f28ca2969e905 > Author: NeilBrown <neilb@suse.com> > Date: Thu Apr 6 12:06:37 2017 +1000 > > md/raid1: avoid reusing a resync bio after error handling. Unfortunately that patch is already present in 4.12, and so is already in the Fedora kernel.
Sounds like raid1 has another issue of submitting a bio that isn't initialized properly.
can you check if below patch fixes it. It's in latest 4.12 stable commit 022e510fcbda79183fd2cdc01abb01b4be80d03f Author: Ming Lei <ming.lei@redhat.com> Date: Fri Jul 14 16:14:42 2017 +0800 md: remove 'idx' from 'struct resync_pages'
Same deal with 4.12.8 which nicludes that patch. /proc/version: Linux version 4.12.8-300.fc26.x86_64 (mockbuild@bkernel02.phx2.fedoraproject.org) (gcc version 7.1.1 20170622 (Red Hat 7.1.1-3) (GCC) ) #1 SMP Thu Aug 17 15:30:20 UTC 2017 [ 261.081131] kernel BUG at block/blk-core.c:2054! [ 261.081194] invalid opcode: 0000 [#1] SMP [ 261.081249] Modules linked in: dm_crypt ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul gpio_ich crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf ipmi_ssif raid1 lpc_ich hpilo hpwdt ipmi_si ipmi_devintf pcc_cpufreq ipmi_msghandler acpi_power_meter tpm_tis tpm_tis_core shpchp tpm ie31200_edac i2c_algo_bit [ 261.081412] drm_kms_helper crc32c_intel tg3 ttm serio_raw uas usb_storage drm ptp pps_core [ 261.081478] CPU: 0 PID: 471 Comm: md0_resync Not tainted 4.12.8-300.fc26.x86_64 #1 [ 261.081538] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015 [ 261.081598] task: ffff90a1ec0e0000 task.stack: ffffb26b01eac000 [ 261.081660] RIP: 0010:generic_make_request+0x2c4/0x2d0 [ 261.081717] RSP: 0018:ffffb26b01eafbe0 EFLAGS: 00010286 [ 261.081936] RAX: ffff90a1ec0e0000 RBX: ffff90a2000c0b00 RCX: 0000000000000000 [ 261.082157] RDX: 0000000000000402 RSI: 0000000000000001 RDI: ffff90a2019ac5b8 [ 261.082378] RBP: ffffb26b01eafc30 R08: 0000000000000010 R09: 0000000000001000 [ 261.082599] R10: ffffb26b01eafc48 R11: 00000000ffffffff R12: 0000000000000008 [ 261.082820] R13: ffff90a131672000 R14: 0000000000000080 R15: ffff90a1a9c95c80 [ 261.083043] FS: 0000000000000000(0000) GS:ffff90a206200000(0000) knlGS:0000000000000000 [ 261.083427] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.083649] CR2: 0000000d3c59e0d8 CR3: 00000000e221c000 CR4: 00000000001406f0 [ 261.083872] Call Trace: [ 261.084095] ? bitmap_start_sync+0x56/0x100 [ 261.084317] raid1_sync_request+0xadf/0xb90 [raid1] [ 261.084540] ? raid1_sync_request+0xadf/0xb90 [raid1] [ 261.084764] ? is_mddev_idle+0xa6/0x10a [ 261.084986] md_do_sync+0x8cd/0xee0 [ 261.085210] ? finish_wait+0x80/0x80 [ 261.085435] md_thread+0x125/0x170 [ 261.085667] ? md_thread+0x125/0x170 [ 261.085899] kthread+0x125/0x140 [ 261.086128] ? find_pers+0x70/0x70 [ 261.086349] ? kthread_park+0x60/0x60 [ 261.086573] ret_from_fork+0x25/0x30 [ 261.086794] Code: bc 24 68 07 00 00 f0 49 83 ac 24 68 07 00 00 01 0f 85 85 fe ff ff 41 ff 94 24 78 07 00 00 e9 78 fe ff ff 4c 89 28 e9 ae fd ff ff <0f> 0b e8 35 6c cc ff 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 [ 261.087389] RIP: generic_make_request+0x2c4/0x2d0 RSP: ffffb26b01eafbe0 [ 261.087631] ---[ end trace f20eba28dd100ea0 ]--- [ 261.094734] ------------[ cut here ]------------ [ 261.094962] WARNING: CPU: 0 PID: 471 at kernel/exit.c:785 do_exit+0x51/0xb50 [ 261.095186] Modules linked in: dm_crypt ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl sunrpc x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul gpio_ich crc32_pclmul iTCO_wdt iTCO_vendor_support ghash_clmulni_intel intel_cstate intel_uncore intel_rapl_perf ipmi_ssif raid1 lpc_ich hpilo hpwdt ipmi_si ipmi_devintf pcc_cpufreq ipmi_msghandler acpi_power_meter tpm_tis tpm_tis_core shpchp tpm ie31200_edac i2c_algo_bit [ 261.097158] drm_kms_helper crc32c_intel tg3 ttm serio_raw uas usb_storage drm ptp pps_core [ 261.097564] CPU: 0 PID: 471 Comm: md0_resync Tainted: G D 4.12.8-300.fc26.x86_64 #1 [ 261.097970] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 11/02/2015 [ 261.098201] task: ffff90a1ec0e0000 task.stack: ffffb26b01eac000 [ 261.098436] RIP: 0010:do_exit+0x51/0xb50 [ 261.098662] RSP: 0018:ffffb26b01eafed8 EFLAGS: 00010202 [ 261.098898] RAX: ffffb26b01eafd80 RBX: ffff90a1ec0e0000 RCX: 0000000000000000 [ 261.099127] RDX: ffff90a1b5ec2400 RSI: 0000000000000000 RDI: ffffffffb3ef8d20 [ 261.099355] RBP: ffffb26b01eaff48 R08: 00000000000e9000 R09: 00000000000e8025 [ 261.099584] R10: 0000000000000008 R11: 0000000000000000 R12: 000000000000000b [ 261.099820] R13: ffffb26b01eafb38 R14: 0000000000000000 R15: ffffb26b01eafb38 [ 261.100051] FS: 0000000000000000(0000) GS:ffff90a206200000(0000) knlGS:0000000000000000 [ 261.100450] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.100675] CR2: 0000000d3c59e0d8 CR3: 00000000e221c000 CR4: 00000000001406f0 [ 261.100915] Call Trace: [ 261.101150] ? kthread+0x125/0x140 [ 261.101377] rewind_stack_do_exit+0x17/0x20 [ 261.101608] Code: 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 d7 b8 06 00 48 8b 83 60 0b 00 00 48 85 c0 74 0e 48 8b 10 48 39 d0 0f 84 60 04 00 00 <0f> ff 65 8b 05 86 4f f6 4c 25 00 ff 1f 00 89 45 9c 0f 85 28 08 [ 261.102229] ---[ end trace f20eba28dd100ea1 ]---
Created attachment 258087 [details] debug patch Can you try this patch?
Created attachment 258089 [details] debug patch alright, we need set the bio private
Thanks for the patch. Unfortunately I have to go to bed (it's already gone midnight), and then will be travelling for a few days, but I'll try the patch when I get back.
Is that patch against 4.12.8, or latest upstream?
For upstream. Looks I still didn't set the correctly bio private. I'll cook a new patch.
I'll merge this patch to upstream. I think it should fix your issue: https://marc.info/?l=linux-raid&m=150362889201103&w=2
OK, I applied that against the Fedora 4.12.9 kernel and have ran the dd command without the BUG appearing - so far I have written 50GB of data. Previously this would have triggered the BUG after a few GB reproducibly, so it looks like this has fixed things. I'll leave it running overnight to truly confirm. Thanks for your help!
ps. I notice this isn't scheduled for the 4.2.10 stable release - is that an oversight, or was the patch just too late?
I run a little late to push the patches to upstream. Will push this to 4.14 and this will be picked up by stable.