Bug 116021 - NULL pointer dereference in rdev_clear_badblocks
Summary: NULL pointer dereference in rdev_clear_badblocks
Status: NEW
Alias: None
Product: IO/Storage
Classification: Unclassified
Component: MD (show other bugs)
Hardware: x86-64 Linux
: P1 high
Assignee: io_md
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-04-07 18:59 UTC by Matthew L. Martin
Modified: 2019-02-12 00:19 UTC (History)
5 users (show)

See Also:
Kernel Version: 4.1.8, 4.6
Subsystem:
Regression: No
Bisected commit-id:


Attachments
Context around NULL pointer dereference (43.73 KB, application/octet-stream)
2016-04-07 18:59 UTC, Matthew L. Martin
Details
messages extract after reproducing in 4.6 (9.14 KB, application/octet-stream)
2016-05-27 19:04 UTC, Matthew L. Martin
Details
fix (489 bytes, patch)
2016-05-31 22:35 UTC, Shaohua Li
Details | Diff
dmesg output from both controllers (16.49 KB, application/octet-stream)
2016-06-14 16:01 UTC, Matthew L. Martin
Details
second dmesg output (16.26 KB, application/octet-stream)
2016-06-14 16:01 UTC, Matthew L. Martin
Details
add rcu protection (956 bytes, patch)
2019-01-09 09:06 UTC, Guoqing Jiang
Details | Diff
add WQ_MEM_RECLAIM flag for firmware_event_thread (765 bytes, patch)
2019-01-09 09:13 UTC, Guoqing Jiang
Details | Diff

Description Matthew L. Martin 2016-04-07 18:59:52 UTC
Created attachment 212061 [details]
Context around NULL pointer dereference

Testing an all SSD RAID-10 device by pulling one SSD caused the Oops. Attached is the bracketing context.


Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: IP: [<ffffffff814cb5fd>] rdev_clear_badblocks+0xd/0x40
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: PGD 201c1a2067 PUD 14f9fa3067 PMD 0
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Oops: 0000 [#1] SMP
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Modules linked in: xt_nat(E) raid10(E) ext4(E) jbd2(E) mbcache(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) br_netfilter(E) nf_nat(E) nf_conntrack(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) ipt_REJECT(E) nf_reject_ipv4(E) iptable_filter(E) ip_tables(E) dm_service_time(E) mpt2sas(E) mptctl(E) mptbase(E) iTCO_wdt(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) coretemp(E) kvm_intel(E) kvm(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) microcode(E) pcspkr(E) sb_edac(E) lpc_ich(E) edac_core(E) bonding(E) mfd_core(E) ses(E) enclosure(E) ipmi_devintf(E) acpi_cpufreq(E)
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: shpchp(E) wmi(E) dm_multipath(E) dm_mod(E) xfs(E) libcrc32c(E) exportfs(E) sd_mod(E) ahci(E) mpt3sas(E) i40e(E) raid_class(E) libahci(E) configfs(E) scsi_transport_sas(E) megaraid_sas(E) ipv6(E) autofs4(E)
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: CPU: 13 PID: 26099 Comm: md1_raid10 Tainted: G            E   4.1.8-3.clearsky.el7.centos.x86_64 #1
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Hardware name:    /0CNCJW, BIOS 1.5.4 10/002/2015
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: task: ffff880fd738e0d0 ti: ffff880072324000 task.ti: ffff880072324000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RIP: 0010:[<ffffffff814cb5fd>]  [<ffffffff814cb5fd>] rdev_clear_badblocks+0xd/0x40
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RSP: 0018:ffff880072327d08  EFLAGS: 00010246
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RAX: f000000000000001 RBX: ffff880fd7b41a00 RCX: 0000000000000000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RBP: ffff880072327d08 R08: ffff88201ed53b38 R09: ffff88201ed53a00
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: R10: 0000000000000000 R11: ffff882033196258 R12: ffff88201e3971f8
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: R13: 0000000000000000 R14: ffff88201e3971f8 R15: 0000000000000000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: FS:  0000000000000000(0000) GS:ffff88207f980000(0000) knlGS:0000000000000000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: CR2: 0000000000000058 CR3: 0000001477c70000 CR4: 00000000001406e0
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Stack:
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: ffff880072327e38 ffffffffa3c62503 ffff880fd7b41aa8 ffff88207f98d5c0
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: ffff880fd7b41a00 0000000000000286 ffff880072327d68 ffff88201e3971c0
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: ffff880072327db8 0000000000000286 ffff880072327db8 ffff880072327db8
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Call Trace:
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffffa3c62503>] raid10d+0x4e3/0x1580 [raid10]
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff810af912>] ? del_timer_sync+0x52/0x60
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff81093ce4>] ? prepare_to_wait_event+0x84/0x100
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff814cbe16>] md_thread+0x136/0x140
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff81093ba0>] ? wait_woken+0x90/0x90
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff814cbce0>] ? find_pers+0x80/0x80
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff8106fc8b>] kthread+0xdb/0x100
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff8106fbb0>] ? kthread_create_on_node+0x170/0x170
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff815f3692>] ret_from_fork+0x42/0x70
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff8106fbb0>] ? kthread_create_on_node+0x170/0x170
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Code: c1 e0 09 4c 09 f8 48 85 c9 b9 00 00 00 00 48 0f 49 d1 48 09 d0 48 89 06 e9 d1 fe ff ff 90 0f 1f 44 00 00 55 85 c9 48 89 e5 75 13 <48> 03 77 58 48 81 c7 38 01 00 00 e8 83 fd ff ff 5d c3 90 48 03
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RIP  [<ffffffff814cb5fd>] rdev_clear_badblocks+0xd/0x40
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RSP <ffff880072327d08>
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: CR2: 0000000000000058
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: ---[ end trace 1fdc17991b09f242 ]---
Comment 1 Shaohua Li 2016-04-08 01:02:53 UTC
any chance you can try a latest kernel?
Comment 2 Matthew L. Martin 2016-04-08 14:52:02 UTC
Which one, 4.4 or 4.5? I can do either.

This is for a product so we are looking for a fix in a long term support kernel.
Comment 3 Shaohua Li 2016-04-08 16:43:53 UTC
I'll prefer 4.5. I'd like to check if upstream kernel has the issue. If we find the fix, back port to the kernel you use is easy. I tried to reproduce the issue here (eg, directly remove a pci controller and the disk), but didn't success yet, so please describe how you do the test.
Comment 4 Matthew L. Martin 2016-04-08 17:54:45 UTC
We have 12 SSD's in a dual ported SAS JBOD. We enable MPIO on all of the drives. We make 10 partitions on each drive so we have 120 partitions. We then take the matching partitions on each drive and create an MD RAID-10 device with two mirror sides. 10 of the partitions are for data and two are for spares.

The test was:

With I/O running to the JBOD, physically remove one data drive. The expected result was that one of the spare drives would be selected, made active and 10 partitions would be re-silvered.

We will attempt to reproduce with our original configuration (4.1.8 kernel). If we can reproduce this readily we will upgrade to 4.5 and try again. We may be able to report tomorrow.
Comment 5 Shaohua Li 2016-04-08 19:55:23 UTC
Great. there are 4 places calling rdev_clear_badblocks(), could you add BUG_ON(!rdev) there and report which one triggered the bug?
Maybe spares related, I'll try here too.
Comment 6 Matthew L. Martin 2016-05-27 19:03:16 UTC
We were finally able to put all of the ingredients into the same pot. Using linux 4.6 with raid10.c patched as requested we were able to reproduce this. It happens every time we try it. I have attached the extract from /var/log/messages related to the crash.
Comment 7 Matthew L. Martin 2016-05-27 19:04:47 UTC
Created attachment 217951 [details]
messages extract after reproducing in 4.6

This is the kernel spew that came out when we reproduced the crash.
Comment 8 Shaohua Li 2016-05-31 22:35:45 UTC
Created attachment 218501 [details]
fix

Can you try if this fixes it
Comment 9 Matthew L. Martin 2016-06-14 16:01:03 UTC
Created attachment 220031 [details]
dmesg output from both controllers


We are trying to reproduce this issue with our original kernel (without your suggested change). We haven't succeeded but we did get a kernel warning on each controller during our last attempt. I will attach the dmesg output from each controller.
Comment 10 Matthew L. Martin 2016-06-14 16:01:36 UTC
Created attachment 220041 [details]
second dmesg output
Comment 11 dchen 2018-12-19 00:02:37 UTC
Matthew Martin and I haven't been able to reproduce this issue until recently.

We're now using a 4.9.89 kernel, and when I add a BUG_ON() in raid10.c here:

@@ -2655,6 +2656,7 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
                                continue;
 
                        if (!r10_bio->devs[m].repl_bio->bi_error) {
+                         BUG_ON(!rdev);
                                rdev_clear_badblocks(
                                        rdev,
                                        r10_bio->devs[m].addr,

And when I test with this change (using "echo 1 > /sys/block/XXX/device/delete" while doing I/O), the kernel asserts at that BUG_ON().
Comment 12 Guoqing Jiang 2019-01-09 09:06:04 UTC
Created attachment 280357 [details]
add rcu protection

This is based on 5.0 kernel, so pls make related changes to your version
Comment 13 Guoqing Jiang 2019-01-09 09:08:01 UTC
Could you try (In reply to gary from comment #12)
> Created attachment 280357 [details]
> add rcu protection
> 
> This is based on 5.0 kernel, so pls make related changes to your version

Could you try this change? Though I am not sure if it can fix your problem or not.
Comment 14 Guoqing Jiang 2019-01-09 09:13:53 UTC
Created attachment 280359 [details]
add WQ_MEM_RECLAIM flag for firmware_event_thread
Comment 15 Guoqing Jiang 2019-01-09 09:15:50 UTC
(In reply to Matthew L. Martin from comment #10)
> Created attachment 220041 [details]
> second dmesg output

Maybe the complain about "WQ_MEM_RECLAIM fw_event_mpt3sas0:_firmware_event_work [mpt3sas] is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu" in dmesgs can be suppressed by attachment 280359 [details].
Comment 16 Guoqing Jiang 2019-01-10 07:21:35 UTC
(In reply to Guoqing from comment #15)
> (In reply to Matthew L. Martin from comment #10)
> > Created attachment 220041 [details]
> > second dmesg output
> 
> Maybe the complain about "WQ_MEM_RECLAIM
> fw_event_mpt3sas0:_firmware_event_work [mpt3sas] is flushing !WQ_MEM_RECLAIM
> events:lru_add_drain_per_cpu" in dmesgs can be suppressed by attachment
> 280359 [details].

Hmm, it is wrong, commit f3a932baa7f65072434f1c04c02c8a4d2746fcfc "mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all" is the right fix from my understanding.
Comment 17 dchen 2019-02-12 00:19:25 UTC
I haven't had time to look at this in a while; I'll try the change in Comment 12 when I get a chance but I don't know when that will be.

FWIW, it seems like maybe this change:

commit 3f5af7cc105f9bf5e6995db27e20c0f83ad5abcb
Author: Yufen Yu <yuyufen@huawei.com>
Date:   Tue Feb 6 17:39:15 2018 +0800

    md raid10: fix NULL deference in handle_write_completed()
    
    [ Upstream commit 01a69cab01c184d3786af09e9339311123d63d22 ]
    
    In the case of 'recover', an r10bio with R10BIO_WriteError &
    R10BIO_IsRecover will be progressed by handle_write_completed().
    This function traverses all r10bio->devs[copies].
    If devs[m].repl_bio != NULL, it thinks conf->mirrors[dev].replacement
    is also not NULL. However, this is not always true.


... prevents the panic we've been seeing. I have to double check though, so please take it with a grain of salt.

Note You need to log in before you can comment on or make changes to this bug.