116021 – NULL pointer dereference in rdev_clear_badblocks

Bug 116021 - NULL pointer dereference in rdev_clear_badblocks

Summary: NULL pointer dereference in rdev_clear_badblocks

Status:	NEW

Alias:	None

Product:	IO/Storage
Classification:	Unclassified
Component:	MD (show other bugs)
Hardware:	x86-64 Linux

Importance:	P1 high
Assignee:	io_md

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-04-07 18:59 UTC by Matthew L. Martin
Modified:	2019-02-12 00:19 UTC (History)
CC List:	5 users (show)

See Also:
Kernel Version:	4.1.8, 4.6
Subsystem:
Regression:	No
Bisected commit-id:

Attachments
Context around NULL pointer dereference (43.73 KB, application/octet-stream) 2016-04-07 18:59 UTC, Matthew L. Martin	Details
messages extract after reproducing in 4.6 (9.14 KB, application/octet-stream) 2016-05-27 19:04 UTC, Matthew L. Martin	Details
fix (489 bytes, patch) 2016-05-31 22:35 UTC, Shaohua Li	Details \| Diff
dmesg output from both controllers (16.49 KB, application/octet-stream) 2016-06-14 16:01 UTC, Matthew L. Martin	Details
second dmesg output (16.26 KB, application/octet-stream) 2016-06-14 16:01 UTC, Matthew L. Martin	Details
add rcu protection (956 bytes, patch) 2019-01-09 09:06 UTC, Guoqing Jiang	Details \| Diff
add WQ_MEM_RECLAIM flag for firmware_event_thread (765 bytes, patch) 2019-01-09 09:13 UTC, Guoqing Jiang	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Description Matthew L. Martin 2016-04-07 18:59:52 UTC

Created attachment 212061 [details]
Context around NULL pointer dereference

Testing an all SSD RAID-10 device by pulling one SSD caused the Oops. Attached is the bracketing context.


Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: IP: [<ffffffff814cb5fd>] rdev_clear_badblocks+0xd/0x40
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: PGD 201c1a2067 PUD 14f9fa3067 PMD 0
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Oops: 0000 [#1] SMP
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Modules linked in: xt_nat(E) raid10(E) ext4(E) jbd2(E) mbcache(E) xt_conntrack(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) iptable_nat(E) nf_conntrack_ipv4(E) nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) br_netfilter(E) nf_nat(E) nf_conntrack(E) bridge(E) stp(E) llc(E) dm_thin_pool(E) dm_persistent_data(E) dm_bio_prison(E) dm_bufio(E) ipt_REJECT(E) nf_reject_ipv4(E) iptable_filter(E) ip_tables(E) dm_service_time(E) mpt2sas(E) mptctl(E) mptbase(E) iTCO_wdt(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) coretemp(E) kvm_intel(E) kvm(E) crc32c_intel(E) ghash_clmulni_intel(E) aesni_intel(E) aes_x86_64(E) glue_helper(E) lrw(E) gf128mul(E) ablk_helper(E) cryptd(E) microcode(E) pcspkr(E) sb_edac(E) lpc_ich(E) edac_core(E) bonding(E) mfd_core(E) ses(E) enclosure(E) ipmi_devintf(E) acpi_cpufreq(E)
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: shpchp(E) wmi(E) dm_multipath(E) dm_mod(E) xfs(E) libcrc32c(E) exportfs(E) sd_mod(E) ahci(E) mpt3sas(E) i40e(E) raid_class(E) libahci(E) configfs(E) scsi_transport_sas(E) megaraid_sas(E) ipv6(E) autofs4(E)
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: CPU: 13 PID: 26099 Comm: md1_raid10 Tainted: G            E   4.1.8-3.clearsky.el7.centos.x86_64 #1
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Hardware name:    /0CNCJW, BIOS 1.5.4 10/002/2015
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: task: ffff880fd738e0d0 ti: ffff880072324000 task.ti: ffff880072324000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RIP: 0010:[<ffffffff814cb5fd>]  [<ffffffff814cb5fd>] rdev_clear_badblocks+0xd/0x40
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RSP: 0018:ffff880072327d08  EFLAGS: 00010246
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RAX: f000000000000001 RBX: ffff880fd7b41a00 RCX: 0000000000000000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RDX: 0000000000000008 RSI: 0000000000000000 RDI: 0000000000000000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RBP: ffff880072327d08 R08: ffff88201ed53b38 R09: ffff88201ed53a00
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: R10: 0000000000000000 R11: ffff882033196258 R12: ffff88201e3971f8
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: R13: 0000000000000000 R14: ffff88201e3971f8 R15: 0000000000000000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: FS:  0000000000000000(0000) GS:ffff88207f980000(0000) knlGS:0000000000000000
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: CR2: 0000000000000058 CR3: 0000001477c70000 CR4: 00000000001406e0
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Stack:
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: ffff880072327e38 ffffffffa3c62503 ffff880fd7b41aa8 ffff88207f98d5c0
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: ffff880fd7b41a00 0000000000000286 ffff880072327d68 ffff88201e3971c0
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: ffff880072327db8 0000000000000286 ffff880072327db8 ffff880072327db8
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Call Trace:
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffffa3c62503>] raid10d+0x4e3/0x1580 [raid10]
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff810af912>] ? del_timer_sync+0x52/0x60
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff81093ce4>] ? prepare_to_wait_event+0x84/0x100
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff814cbe16>] md_thread+0x136/0x140
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff81093ba0>] ? wait_woken+0x90/0x90
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff814cbce0>] ? find_pers+0x80/0x80
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff8106fc8b>] kthread+0xdb/0x100
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff8106fbb0>] ? kthread_create_on_node+0x170/0x170
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff815f3692>] ret_from_fork+0x42/0x70
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: [<ffffffff8106fbb0>] ? kthread_create_on_node+0x170/0x170
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: Code: c1 e0 09 4c 09 f8 48 85 c9 b9 00 00 00 00 48 0f 49 d1 48 09 d0 48 89 06 e9 d1 fe ff ff 90 0f 1f 44 00 00 55 85 c9 48 89 e5 75 13 <48> 03 77 58 48 81 c7 38 01 00 00 e8 83 fd ff ff 5d c3 90 48 03
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RIP  [<ffffffff814cb5fd>] rdev_clear_badblocks+0xd/0x40
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: RSP <ffff880072327d08>
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: CR2: 0000000000000058
Apr  7 14:00:35 38-hw-rig2-L3-2 kernel: ---[ end trace 1fdc17991b09f242 ]---

Comment 1 Shaohua Li 2016-04-08 01:02:53 UTC

any chance you can try a latest kernel?

Comment 2 Matthew L. Martin 2016-04-08 14:52:02 UTC

Which one, 4.4 or 4.5? I can do either.

This is for a product so we are looking for a fix in a long term support kernel.

Comment 3 Shaohua Li 2016-04-08 16:43:53 UTC

I'll prefer 4.5. I'd like to check if upstream kernel has the issue. If we find the fix, back port to the kernel you use is easy. I tried to reproduce the issue here (eg, directly remove a pci controller and the disk), but didn't success yet, so please describe how you do the test.

Comment 4 Matthew L. Martin 2016-04-08 17:54:45 UTC

We have 12 SSD's in a dual ported SAS JBOD. We enable MPIO on all of the drives. We make 10 partitions on each drive so we have 120 partitions. We then take the matching partitions on each drive and create an MD RAID-10 device with two mirror sides. 10 of the partitions are for data and two are for spares.

The test was:

With I/O running to the JBOD, physically remove one data drive. The expected result was that one of the spare drives would be selected, made active and 10 partitions would be re-silvered.

We will attempt to reproduce with our original configuration (4.1.8 kernel). If we can reproduce this readily we will upgrade to 4.5 and try again. We may be able to report tomorrow.

Comment 5 Shaohua Li 2016-04-08 19:55:23 UTC

Great. there are 4 places calling rdev_clear_badblocks(), could you add BUG_ON(!rdev) there and report which one triggered the bug?
Maybe spares related, I'll try here too.

Comment 6 Matthew L. Martin 2016-05-27 19:03:16 UTC

We were finally able to put all of the ingredients into the same pot. Using linux 4.6 with raid10.c patched as requested we were able to reproduce this. It happens every time we try it. I have attached the extract from /var/log/messages related to the crash.

Comment 7 Matthew L. Martin 2016-05-27 19:04:47 UTC

Created attachment 217951 [details]
messages extract after reproducing in 4.6

This is the kernel spew that came out when we reproduced the crash.

Comment 8 Shaohua Li 2016-05-31 22:35:45 UTC

Created attachment 218501 [details]
fix

Can you try if this fixes it

Comment 9 Matthew L. Martin 2016-06-14 16:01:03 UTC

Created attachment 220031 [details]
dmesg output from both controllers


We are trying to reproduce this issue with our original kernel (without your suggested change). We haven't succeeded but we did get a kernel warning on each controller during our last attempt. I will attach the dmesg output from each controller.

Comment 10 Matthew L. Martin 2016-06-14 16:01:36 UTC

Created attachment 220041 [details]
second dmesg output

Comment 11 dchen 2018-12-19 00:02:37 UTC

Matthew Martin and I haven't been able to reproduce this issue until recently.

We're now using a 4.9.89 kernel, and when I add a BUG_ON() in raid10.c here:

@@ -2655,6 +2656,7 @@ static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
                                continue;
 
                        if (!r10_bio->devs[m].repl_bio->bi_error) {
+                         BUG_ON(!rdev);
                                rdev_clear_badblocks(
                                        rdev,
                                        r10_bio->devs[m].addr,

And when I test with this change (using "echo 1 > /sys/block/XXX/device/delete" while doing I/O), the kernel asserts at that BUG_ON().

Comment 12 Guoqing Jiang 2019-01-09 09:06:04 UTC

Created attachment 280357 [details]
add rcu protection

This is based on 5.0 kernel, so pls make related changes to your version

Comment 13 Guoqing Jiang 2019-01-09 09:08:01 UTC

Could you try (In reply to gary from comment #12)
> Created attachment 280357 [details]
> add rcu protection
> 
> This is based on 5.0 kernel, so pls make related changes to your version

Could you try this change? Though I am not sure if it can fix your problem or not.

Comment 14 Guoqing Jiang 2019-01-09 09:13:53 UTC

Created attachment 280359 [details]
add WQ_MEM_RECLAIM flag for firmware_event_thread

Comment 15 Guoqing Jiang 2019-01-09 09:15:50 UTC

(In reply to Matthew L. Martin from comment #10)
> Created attachment 220041 [details]
> second dmesg output

Maybe the complain about "WQ_MEM_RECLAIM fw_event_mpt3sas0:_firmware_event_work [mpt3sas] is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu" in dmesgs can be suppressed by attachment 280359 [details].

Comment 16 Guoqing Jiang 2019-01-10 07:21:35 UTC

(In reply to Guoqing from comment #15)
> (In reply to Matthew L. Martin from comment #10)
> > Created attachment 220041 [details]
> > second dmesg output
> 
> Maybe the complain about "WQ_MEM_RECLAIM
> fw_event_mpt3sas0:_firmware_event_work [mpt3sas] is flushing !WQ_MEM_RECLAIM
> events:lru_add_drain_per_cpu" in dmesgs can be suppressed by attachment
> 280359 [details].

Hmm, it is wrong, commit f3a932baa7f65072434f1c04c02c8a4d2746fcfc "mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all" is the right fix from my understanding.

Comment 17 dchen 2019-02-12 00:19:25 UTC

I haven't had time to look at this in a while; I'll try the change in Comment 12 when I get a chance but I don't know when that will be.

FWIW, it seems like maybe this change:

commit 3f5af7cc105f9bf5e6995db27e20c0f83ad5abcb
Author: Yufen Yu <yuyufen@huawei.com>
Date:   Tue Feb 6 17:39:15 2018 +0800

    md raid10: fix NULL deference in handle_write_completed()
    
    [ Upstream commit 01a69cab01c184d3786af09e9339311123d63d22 ]
    
    In the case of 'recover', an r10bio with R10BIO_WriteError &
    R10BIO_IsRecover will be progressed by handle_write_completed().
    This function traverses all r10bio->devs[copies].
    If devs[m].repl_bio != NULL, it thinks conf->mirrors[dev].replacement
    is also not NULL. However, this is not always true.


... prevents the panic we've been seeing. I have to double check though, so please take it with a grain of salt.

Note You need to log in before you can comment on or make changes to this bug.