Bug 94381

Summary: NULL pointer deref in raid5_free
Product: IO/Storage Reporter: Richard W.M. Jones (rjones)
Component: MDAssignee: Neil Brown (neilb)
Status: RESOLVED CODE_FIX    
Severity: normal CC: neilb
Priority: P1    
Hardware: All   
OS: Linux   
Kernel Version: linux from git Subsystem:
Regression: No Bisected commit-id:
Attachments: tarball containing compressed faulty RAID images
Full log of boot failure
patch

Description Richard W.M. Jones 2015-03-06 08:53:59 UTC
Created attachment 169521 [details]
tarball containing compressed faulty RAID images

I have a set of faulty RAID disk images (see attachment) which when
they are added to a guest and the guest is booted, cause the following
back trace when `mdadm -As --auto=yes --run' is run (at boot).

[    2.633617] BUG: unable to handle kernel NULL pointer dereference at 00000000000005f8
[    2.634535] IP: [<ffffffffa0251d55>] free_conf+0x15/0x130 [raid456]
[    2.634535] PGD 1b9a4067 PUD 1bd34067 PMD 0 
[    2.634535] Oops: 0000 [#1] SMP 
[    2.634535] Modules linked in: raid456 async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq iosf_mbi kvm_intel kvm snd_pcsp snd_pcm snd_timer ghash_clmulni_intel snd soundcore ata_generic i2c_piix4 serio_raw pata_acpi libcrc32c crc8 crc_itu_t crc_ccitt virtio_pci virtio_mmio virtio_balloon virtio_scsi sym53c8xx scsi_transport_spi megaraid_sas megaraid_mbox megaraid_mm megaraid ideapad_laptop rfkill sparse_keymap virtio_net virtio_console virtio_rng virtio_blk virtio_ring virtio crc32 crct10dif_pclmul crc32c_intel crc32_pclmul
[    2.634535] CPU: 0 PID: 140 Comm: mdadm Not tainted 4.0.0-0.rc1.git1.1.fc23.x86_64 #1
[    2.634535] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.0-20150221_201410- 04/01/2014
[    2.634535] task: ffff88001bbf0000 ti: ffff88001a104000 task.ti: ffff88001a104000
[    2.634535] RIP: 0010:[<ffffffffa0251d55>]  [<ffffffffa0251d55>] free_conf+0x15/0x130 [raid456]
[    2.634535] RSP: 0018:ffff88001a107c48  EFLAGS: 00010296
[    2.634535] RAX: 0000000000000000 RBX: ffff88001df84800 RCX: 0000000000000000
[    2.634535] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[    2.634535] RBP: ffff88001a107c68 R08: 0000000000000001 R09: 0000000000000000
[    2.634535] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[    2.634535] R13: ffff88001df84800 R14: ffff88001df84818 R15: ffffffffa0262300
[    2.634535] FS:  00007f39168fc700(0000) GS:ffff88001ec00000(0000) knlGS:0000000000000000
[    2.634535] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.634535] CR2: 00000000000005f8 CR3: 000000001a2b6000 CR4: 00000000000407f0
[    2.634535] Stack:
[    2.634535]  ffff88001a107c58 ffff88001df84800 ffff88001df84818 ffff88001df84800
[    2.634535]  ffff88001a107c88 ffffffffa0251e89 ffff88001df84818 00000000fffffffb
[    2.634535]  ffff88001a107d38 ffffffff816ac0a1 0000000000000246 ffff88001df84a70
[    2.634535] Call Trace:
[    2.634535]  [<ffffffffa0251e89>] raid5_free+0x19/0x30 [raid456]
[    2.634535]  [<ffffffff816ac0a1>] md_run+0x901/0xa30
[    2.634535]  [<ffffffff816ad6ca>] ? md_ioctl+0x50a/0x1d20
[    2.634535]  [<ffffffff816ad6ca>] ? md_ioctl+0x50a/0x1d20
[    2.634535]  [<ffffffff816ac1e4>] do_md_run+0x14/0xa0
[    2.634535]  [<ffffffff816ae24e>] md_ioctl+0x108e/0x1d20
[    2.634535]  [<ffffffff810e7d55>] ? sched_clock_local+0x25/0x90
[    2.634535]  [<ffffffff8129ce6f>] ? mntput_no_expire+0x6f/0x360
[    2.634535]  [<ffffffff810e7f98>] ? sched_clock_cpu+0x98/0xd0
[    2.634535]  [<ffffffff814013fe>] blkdev_ioctl+0x1ce/0x850
[    2.634535]  [<ffffffff8129ce87>] ? mntput_no_expire+0x87/0x360
[    2.634535]  [<ffffffff8129ce05>] ? mntput_no_expire+0x5/0x360
[    2.634535]  [<ffffffff812ba993>] block_ioctl+0x43/0x50
[    2.634535]  [<ffffffff8128cf08>] do_vfs_ioctl+0x2e8/0x530
[    2.634535]  [<ffffffff811271e5>] ? rcu_read_lock_held+0x65/0x70
[    2.634535]  [<ffffffff81299bde>] ? __fget_light+0xbe/0xe0
[    2.634535]  [<ffffffff8128d1d1>] SyS_ioctl+0x81/0xa0
[    2.634535]  [<ffffffff81881b29>] system_call_fastpath+0x12/0x17
[    2.634535] Code: 1f 80 00 00 00 00 48 c7 c0 ea ff ff ff eb bb e8 52 94 e5 e0 66 90 66 66 66 66 90 55 48 89 e5 41 55 41 54 53 49 89 fc 48 83 ec 08 <48> 8b bf f8 05 00 00 48 85 ff 74 11 48 8b 7f 18 e8 66 e5 ff e0 
[    2.634535] RIP  [<ffffffffa0251d55>] free_conf+0x15/0x130 [raid456]
[    2.634535]  RSP <ffff88001a107c48>
[    2.634535] CR2: 00000000000005f8
[    2.776821] ---[ end trace 06a77318965f6cfb ]---

I bisected this, and the first problematic commit where we get
a NULL pointer deref (with not quite the same backtrace as above) is:

commit 5aa61f427e4979be733e4847b9199ff9cc48a47e
Author: NeilBrown <neilb@suse.de>
Date:   Mon Dec 15 12:56:57 2014 +1100

    md: split detach operation out from ->stop.
    
    Each md personality has a 'stop' operation which does two
    things:
     1/ it finalizes some aspects of the array to ensure nothing
        is accessing the ->private data
     2/ it frees the ->private data.
    
    All the steps in '1' can apply to all arrays and so can be
    performed in common code.
    
    This is useful as in the case where we change the personality which
    manages an array (in level_store()), it would be helpful to do
    step 1 early, and step 2 later.
    
    So split the 'step 1' functionality out into a new mddev_detach().
    
    Signed-off-by: NeilBrown <neilb@suse.de>

After the next 3 commits are added on top (afa0f557cb, db721d32b,
36d091f47), then we get the exact backtrace above, and that backtrace
is also when I see with the current upstream linux from git.

Before 5aa61f42 there was no null pointer deref.  Instead Linux prints the
following error:

[    2.283800] md: pers->run() failed ...
mdadm: failed to RUN_ARRAY /dev/md/r5t3: Input/output error
mdadm: Not enough devices to start the array while not clean - consider --force.
Comment 1 Richard W.M. Jones 2015-03-06 09:18:35 UTC
FWIW, same backtrace with git://neil.brown.name/md for-next branch.
Comment 2 Richard W.M. Jones 2015-03-06 10:20:20 UTC
This patch is one possible way to fix it.  I have tested it
and it fixes the bug for me.

Other possibilities include changing free_conf so that it can be
called with a NULL pointer, or changing raid5_free so it doesn't
call free_conf when conf==NULL.

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index dab908b..06e93ba 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6307,8 +6307,6 @@ static int run(struct mddev *mddev)
 abort:
        md_unregister_thread(&mddev->thread);
        print_raid5_conf(conf);
-       free_conf(conf);
-       mddev->private = NULL;
        printk(KERN_ALERT "md/raid:%s: failed to run raid set.\n", mdname(mddev));
        return -EIO;
 }
Comment 3 Richard W.M. Jones 2015-03-06 11:29:52 UTC
Created attachment 169531 [details]
Full log of boot failure

Just FYI here is the full goot log of the failure, with the
current upstream linux from git.
Comment 4 Neil Brown 2015-03-13 00:52:34 UTC
Created attachment 170551 [details]
patch

Thanks for the report.
I think this patch is the best fix.
Could you please test and confirm that it works for you?
Comment 5 Richard W.M. Jones 2015-03-13 09:07:15 UTC
Yes the patch in comment 4 also fixes it.  You can add:

  Tested-by: Richard W.M. Jones <rjones@redhat.com>

FWIW with the patch, the error message is:

[    3.034287] md/raid:md123: not clean -- starting background reconstruction
[    3.038205] md/raid:md123: device sdc3 operational as raid disk 0
[    3.043332] md/raid:md123: allocated 0kB
[    3.044813] md/raid:md123: cannot start dirty degraded array.
[    3.048098] md/raid:md123: failed to run raid set.
[    3.049906] md: pers->run() failed ...
mdadm: failed to RUN_ARRAY /dev/md/r5t3: Input/output error
mdadm: Not enough devices to start the array while not clean - consider --force.

and there is no oops.
Comment 6 Richard W.M. Jones 2015-03-25 13:22:42 UTC
I'm closing this because the patch has gone into Linus's tree now
(commit 0c35bd4723e4a39ba2da4c13a22cb97986ee10c8).