mkfs.xfs on lv mirrior which include SSD will lead to NULL pointer How reproducible: 100% Steps to Reproduce: 1) 3 SSD: sd[n-p] #vgcreate helter_skelter /dev/sdl1 /dev/sd[n-p]1 #lvcreate -ay --type mirror -m 2 -n syncd_multiple_legs_3legs_1 -L 500M helter_skelter /dev/sdl1:0-2400 /dev/sdn1:0-2400 /dev/sdo1:0-2400 /dev/sdp1:0-150 #mkfs.xfs /dev/mapper/helter_skelter-syncd_multiple_legs_3legs_1 [ 344.300595] BUG: unable to handle kernel NULL pointer dereference at (null) [ 344.309351] IP: write_callback+0x21/0x100 [dm_mirror] [ 344.314988] PGD 0 [ 344.314989] P4D 0 [ 344.317232] [ 344.321134] Oops: 0000 [#1] SMP [ 344.324640] Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_inted [ 344.388414] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.12.0-rc5 #2 [ 344.395409] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 344.403759] task: ffffa0bc79f15a00 task.stack: ffffb8a3400e0000 [ 344.410367] RIP: 0010:write_callback+0x21/0x100 [dm_mirror] [ 344.416586] RSP: 0018:ffffa0cbffc03c70 EFLAGS: 00010282 [ 344.422419] RAX: 0000000000000000 RBX: ffffa0cbb7d84058 RCX: 0000000000002242 [ 344.430372] RDX: 0000000000002241 RSI: ffffa0cbb7d84058 RDI: ffffa0cbb7ed3680 [ 344.438334] RBP: ffffa0cbffc03c98 R08: 000000000001f040 R09: ffffffff899addd7 [ 344.446298] R10: ffffa0cbffc1f040 R11: ffffe7f580dfb4c0 R12: ffffa0cbb7d84058 [ 344.454260] R13: ffffffffc03a4a30 R14: ffffa0cbbb350000 R15: 000000000000315c [ 344.462224] FS: 0000000000000000(0000) GS:ffffa0cbffc00000(0000) knlGS:0000000000000000 [ 344.471253] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 344.477663] CR2: 0000000000000000 CR3: 0000000d27a09000 CR4: 00000000001406e0 [ 344.485628] Call Trace: [ 344.488358] <IRQ> [ 344.490604] ? recovery_complete+0x160/0x160 [dm_mirror] [ 344.496538] dec_count+0x40/0x50 [dm_mod] [ 344.501016] endio+0x41/0x60 [dm_mod] [ 344.505112] bio_endio+0x83/0x100 [ 344.508805] dec_pending+0x134/0x240 [dm_mod] [ 344.513674] ? mempool_free+0x2b/0x80 [ 344.517763] clone_endio+0x5b/0x120 [dm_mod] [ 344.522528] bio_endio+0x83/0x100 [ 344.526230] blk_update_request+0x8c/0x330 [ 344.530805] scsi_end_request+0x34/0x1d0 [ 344.535181] scsi_io_completion+0x13b/0x620 [ 344.539850] scsi_finish_command+0xd9/0x120 [ 344.544516] scsi_softirq_done+0x12a/0x150 [ 344.549093] blk_done_softirq+0x9e/0xd0 [ 344.553376] __do_softirq+0xc9/0x269 [ 344.557369] irq_exit+0xd9/0xf0 [ 344.560871] smp_call_function_single_interrupt+0x33/0x40 [ 344.566904] call_function_single_interrupt+0x93/0xa0 [ 344.572546] RIP: 0010:cpuidle_enter_state+0xe4/0x270 [ 344.578083] RSP: 0018:ffffb8a3400e3e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04 [ 344.586531] RAX: ffffa0cbffc1bc00 RBX: ffffd8a33fc00300 RCX: 000000000000001f [ 344.594492] RDX: 0000000000000000 RSI: ffffa0cbffc19418 RDI: 0000000000000000 [ 344.602453] RBP: ffffb8a3400e3ea0 R08: 0000000000000001 R09: cccccccccccccccd [ 344.610414] R10: 0000000000002bc2 R11: 0000000000000018 R12: 0000000000000004 [ 344.618375] R13: 0000000000000001 R14: ffffd8a33fc00300 R15: 0000005029e9e504 [ 344.626337] </IRQ> [ 344.628677] ? cpuidle_enter_state+0xc0/0x270 [ 344.633538] cpuidle_enter+0x17/0x20 [ 344.637535] call_cpuidle+0x23/0x40 [ 344.641427] do_idle+0x172/0x1e0 [ 344.645029] cpu_startup_entry+0x71/0x80 [ 344.649414] start_secondary+0x154/0x190 [ 344.653797] secondary_startup_64+0x9f/0x9f [ 344.658462] Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 f3 48 83 ec 08 48 8b 06 48 85 ff 48 89 7d d8 <4c> 8b 20 48 c7 06 00 [ 344.679558] RIP: write_callback+0x21/0x100 [dm_mirror] RSP: ffffa0cbffc03c70 [ 344.687421] CR2: 0000000000000000 [ 344.691132] ---[ end trace 6a9619dc0b776113 ]--- [ 344.767346] Kernel panic - not syncing: Fatal exception in interrupt [ 344.774490] Kernel Offset: 0x8800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 344.856848] ---[ end Kernel panic - not syncing: Fatal exception in interrupt [ 344.864835] sched: Unexpected reschedule of offline CPU#0! [ 344.870963] ------------[ cut here ]------------ [ 344.876119] WARNING: CPU: 1 PID: 0 at arch/x86/kernel/smp.c:128 native_smp_send_reschedule+0x3c/0x40 [ 344.886309] Modules linked in: intel_rapl sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_inted [ 344.950057] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G D 4.12.0-rc5 #2 [ 344.958406] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014 [ 344.966754] task: ffffa0bc79f15a00 task.stack: ffffb8a3400e0000 [ 344.973360] RIP: 0010:native_smp_send_reschedule+0x3c/0x40 [ 344.979482] RSP: 0018:ffffa0cbffc03768 EFLAGS: 00010046 [ 344.985312] RAX: 000000000000002e RBX: 0000000000000000 RCX: 0000000000000000 [ 344.993273] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 0000000000000300 [ 345.001233] RBP: ffffa0cbffc03768 R08: 00000000fffffffe R09: 00000000000005f4 [ 345.009195] R10: 0000000000000005 R11: 00000000000005f3 R12: 0000000000000001 [ 345.017157] R13: 000000010000ae9e R14: ffffa0bc79f15a00 R15: ffffa0cbffc147a8 [ 345.025119] FS: 0000000000000000(0000) GS:ffffa0cbffc00000(0000) knlGS:0000000000000000 [ 345.034146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 345.040555] CR2: 0000000000000000 CR3: 0000000d27a09000 CR4: 00000000001406e0 [ 345.048518] Call Trace: [ 345.051243] <IRQ> [ 345.053492] trigger_load_balance+0x10e/0x1f0 [ 345.058358] scheduler_tick+0xa8/0xe0 [ 345.062448] ? tick_sched_do_timer+0x70/0x70 [ 345.067216] update_process_times+0x47/0x60 [ 345.071884] tick_sched_handle.isra.18+0x25/0x60 [ 345.077035] tick_sched_timer+0x3d/0x70 [ 345.081314] __hrtimer_run_queues+0xe5/0x230 [ 345.086077] hrtimer_interrupt+0xa8/0x1a0 [ 345.090554] local_apic_timer_interrupt+0x35/0x60 [ 345.095801] smp_apic_timer_interrupt+0x38/0x50 [ 345.100859] apic_timer_interrupt+0x93/0xa0 [ 345.105528] RIP: 0010:panic+0x1f8/0x239 [ 345.109795] RSP: 0018:ffffa0cbffc039c0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff10 [ 345.118241] RAX: 0000000000000041 RBX: 0000000000000000 RCX: 0000000000000006 [ 345.126201] RDX: 0000000000000000 RSI: 0000000000000096 RDI: ffffa0cbffc0e0a0 [ 345.134163] RBP: ffffa0cbffc03a30 R08: 00000000fffffffe R09: 00000000000005f3 [ 345.142122] R10: 0000000000000005 R11: 00000000000005f2 R12: ffffffff8a236e11 [ 345.150083] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000046 [ 345.158056] oops_end+0xc6/0xd0 [ 345.161566] no_context+0x1a8/0x400 [ 345.165457] __bad_area_nosemaphore+0xee/0x1d0 [ 345.170413] bad_area_nosemaphore+0x14/0x20 [ 345.175081] __do_page_fault+0x89/0x4a0 [ 345.179361] ? __wake_up_common+0x55/0x90 [ 345.183827] do_page_fault+0x30/0x80 [ 345.187815] ? recovery_complete+0x160/0x160 [dm_mirror] [ 345.193746] page_fault+0x28/0x30 [ 345.197444] RIP: 0010:write_callback+0x21/0x100 [dm_mirror] [ 345.203661] RSP: 0018:ffffa0cbffc03c70 EFLAGS: 00010282 [ 345.209489] RAX: 0000000000000000 RBX: ffffa0cbb7d84058 RCX: 0000000000002242 [ 345.217450] RDX: 0000000000002241 RSI: ffffa0cbb7d84058 RDI: ffffa0cbb7ed3680 [ 345.225412] RBP: ffffa0cbffc03c98 R08: 000000000001f040 R09: ffffffff899addd7 [ 345.233372] R10: ffffa0cbffc1f040 R11: ffffe7f580dfb4c0 R12: ffffa0cbb7d84058 [ 345.241333] R13: ffffffffc03a4a30 R14: ffffa0cbbb350000 R15: 000000000000315c [ 345.249299] ? recovery_complete+0x160/0x160 [dm_mirror] [ 345.255227] ? mempool_free_slab+0x17/0x20 [ 345.259788] ? recovery_complete+0x160/0x160 [dm_mirror] [ 345.265938] dec_count+0x40/0x50 [dm_mod] [ 345.270412] endio+0x41/0x60 [dm_mod] [ 345.274498] bio_endio+0x83/0x100 [ 345.278194] dec_pending+0x134/0x240 [dm_mod] [ 345.283053] ? mempool_free+0x2b/0x80 [ 345.287140] clone_endio+0x5b/0x120 [dm_mod] [ 345.291902] bio_endio+0x83/0x100 [ 345.295596] blk_update_request+0x8c/0x330 [ 345.300164] scsi_end_request+0x34/0x1d0 [ 345.304538] scsi_io_completion+0x13b/0x620 [ 345.309202] scsi_finish_command+0xd9/0x120 [ 345.313867] scsi_softirq_done+0x12a/0x150 [ 345.318435] blk_done_softirq+0x9e/0xd0 [ 345.322713] __do_softirq+0xc9/0x269 [ 345.326700] irq_exit+0xd9/0xf0 [ 345.330201] smp_call_function_single_interrupt+0x33/0x40 [ 345.336224] call_function_single_interrupt+0x93/0xa0 [ 345.341857] RIP: 0010:cpuidle_enter_state+0xe4/0x270 [ 345.347394] RSP: 0018:ffffb8a3400e3e68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff04 [ 345.355838] RAX: ffffa0cbffc1bc00 RBX: ffffd8a33fc00300 RCX: 000000000000001f [ 345.363799] RDX: 0000000000000000 RSI: ffffa0cbffc19418 RDI: 0000000000000000 [ 345.371758] RBP: ffffb8a3400e3ea0 R08: 0000000000000001 R09: cccccccccccccccd [ 345.379719] R10: 0000000000002bc2 R11: 0000000000000018 R12: 0000000000000004 [ 345.387680] R13: 0000000000000001 R14: ffffd8a33fc00300 R15: 0000005029e9e504 [ 345.395638] </IRQ> [ 345.397976] ? cpuidle_enter_state+0xc0/0x270 [ 345.402834] cpuidle_enter+0x17/0x20 [ 345.406820] call_cpuidle+0x23/0x40 [ 345.410709] do_idle+0x172/0x1e0 [ 345.414310] cpu_startup_entry+0x71/0x80 [ 345.418685] start_secondary+0x154/0x190 [ 345.423058] secondary_startup_64+0x9f/0x9f [ 345.427721] Code: dd 00 0f 92 c0 84 c0 74 14 48 8b 05 df 48 a9 00 be fd 00 00 00 ff 90 a0 00 00 00 5d c3 89 fe 48 c7 c7 f8 ca 22 8a e8 c3 41 15 00 <0f> ff 5d c3 0f 1f 44 [ 345.448790] ---[ end trace 6a9619dc0b776114 ]--- The SSD I used: [ 6.783423] scsi 0:0:12:0: Direct-Access ATA INTEL SSDSC2BB48 DL13 PQ: 0 ANSI: 6 [ 6.807242] scsi 0:0:13:0: Direct-Access ATA INTEL SSDSC2BB48 DL13 PQ: 0 ANSI: 6 [ 6.831367] scsi 0:0:14:0: Direct-Access ATA INTEL SSDSC2BB48 DL13 PQ: 0 ANSI: 6 [ 6.855349] scsi 0:0:15:0: Direct-Access ATA INTEL SSDSC2BB48 DL13 PQ: 0 ANSI: 6 [ 6.968282] sd 0:0:12:0: [sdm] 937703088 512-byte logical blocks: (480 GB/447 GiB) [ 6.968283] sd 0:0:12:0: [sdm] 4096-byte physical blocks [ 6.969092] sd 0:0:12:0: [sdm] Write Protect is off [ 6.969093] sd 0:0:12:0: [sdm] Mode Sense: 9b 00 10 08 [ 6.969095] sd 0:0:13:0: [sdn] 937703088 512-byte logical blocks: (480 GB/447 GiB) [ 6.969097] sd 0:0:13:0: [sdn] 4096-byte physical blocks [ 6.969351] sd 0:0:14:0: [sdo] 937703088 512-byte logical blocks: (480 GB/447 GiB) [ 6.969352] sd 0:0:14:0: [sdo] 4096-byte physical blocks [ 6.969554] sd 0:0:15:0: [sdp] 937703088 512-byte logical blocks: (480 GB/447 GiB) [ 6.969556] sd 0:0:15:0: [sdp] 4096-byte physical blocks [ 6.969655] sd 0:0:12:0: [sdm] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 6.969800] sd 0:0:13:0: [sdn] Write Protect is off [ 6.969801] sd 0:0:13:0: [sdn] Mode Sense: 9b 00 10 08 [ 6.969990] sd 0:0:14:0: [sdo] Write Protect is off [ 6.969992] sd 0:0:14:0: [sdo] Mode Sense: 9b 00 10 08 [ 6.970281] sd 0:0:15:0: [sdp] Write Protect is off [ 6.970283] sd 0:0:15:0: [sdp] Mode Sense: 9b 00 10 08 [ 6.970437] sd 0:0:13:0: [sdn] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 6.970608] sd 0:0:14:0: [sdo] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 6.970722] sd 0:0:15:0: [sdp] Write cache: enabled, read cache: enabled, supports DPO and FUA [ 7.005461] sd 0:0:12:0: [sdm] Attached SCSI disk [ 7.007395] sd 0:0:13:0: [sdn] Attached SCSI disk [ 7.007534] sd 0:0:14:0: [sdo] Attached SCSI disk [ 7.007683] sd 0:0:15:0: [sdp] Attached SCSI disk [ 20.349557] sd 0:0:12:0: Attached scsi generic sg12 type 0 [ 20.355741] sd 0:0:13:0: Attached scsi generic sg13 type 0 [ 20.361946] sd 0:0:14:0: Attached scsi generic sg14 type 0 [ 20.368183] sd 0:0:15:0: Attached scsi generic sg15 type 0
This was reported via RedHat QE on 6/14/2017 so I was aware of this. I'll work on a fix and will report back once I it.
(In reply to Mike Snitzer from comment #1) > This was reported via RedHat QE on 6/14/2017 so I was aware of this. > > I'll work on a fix and will report back once I it. Heh, Zhang Yi... you already reported this. Not sure why we need a kernel.org bug too. Please don't do this in the future! If you'd like to raise awareness, or ping developers if you feel we've been unresponsive, post to dm-devel@redhat.com
(In reply to Mike Snitzer from comment #2) > (In reply to Mike Snitzer from comment #1) > > This was reported via RedHat QE on 6/14/2017 so I was aware of this. > > > > I'll work on a fix and will report back once I it. > > Heh, Zhang Yi... you already reported this. Not sure why we need a > kernel.org bug too. > Hi Mike I want to track this issue with upstream. > Please don't do this in the future! If you'd like to raise awareness, or > ping developers if you feel we've been unresponsive, post to > dm-devel@redhat.com OK, I will post to dm-devel@ next time, thanks for reminder. Yi
I reproduced on my testbed. It is definitely an issue with how dm-mirror is processing discards (via dm-io). If I stop having mkfs.xfs issue discards (using mkfs.xfs -K) then I don't see the crash. If I let mkfs.xfs issue discards it crashes as reported. Given it is so easy to reproduce I expect to have a fix by end of day.
Fixed, this will be included in 4.12: https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-4.12/dm&id=feb7695fe9fb83084aa29de0094774f4c9d4c9fc