Bug 104921

Summary: BUG: unable to handle kernel NULL pointer dereference at (null)
Product: IO/Storage Reporter: Curtis Lee Bolin (CurtisLeeBolin)
Component: MDAssignee: io_md
Status: RESOLVED PATCH_ALREADY_AVAILABLE    
Severity: normal CC: neilb
Priority: P1    
Hardware: x86-64   
OS: Linux   
Kernel Version: Linux version 4.1.6-1-ARCH (builduser@tobias) (gcc version 5.2.0 (GCC) ) #1 SMP PREEMPT Mon Aug 17 08:52:28 CEST 2015 Subsystem:
Regression: No Bisected commit-id:
Attachments: journal of error
Commit: 49895bcc7e56
Commit: 2d5b569b665e

Description Curtis Lee Bolin 2015-09-24 00:36:59 UTC
Created attachment 188201 [details]
journal of error

Sep 23 18:44:12 sv kernel: BUG: unable to handle kernel NULL pointer dereference at           (null)
Sep 23 18:44:12 sv kernel: IP: [<ffffffffa07d8b91>] get_free_stripe+0x31/0xf0 [raid456]
Comment 1 Neil Brown 2015-09-24 01:51:56 UTC
This is almost certainly fixed by 

Commit: 49895bcc7e56 ("md/raid5: don't let shrink_slab shrink too far.")

I had hoped this would be in 4.1-stable by now, but it isn't.  I have just sent to to stable@vger.kernel.org so it should be in 4.1.9.

If you are going to apply it, it would be good to apply

Commit: 2d5b569b665e ("md/raid5: avoid races when changing cache size.")

first.  It has a simple easy to resolve conflict.
Comment 2 Curtis Lee Bolin 2015-09-24 05:28:36 UTC
Created attachment 188251 [details]
Commit: 49895bcc7e56
Comment 3 Curtis Lee Bolin 2015-09-24 05:29:07 UTC
Created attachment 188261 [details]
Commit: 2d5b569b665e
Comment 4 Curtis Lee Bolin 2015-09-24 05:37:32 UTC
Neil Brown, thank you for your fast reply.  I was getting worried because every time the qemu VM would tax the MD RAID5 hard, my server would lock up.

I use Arch Linux so I create patches from your Commits, added them to the PKGBUILD, rebuilt the linux package, installed, and rebooted.  I have tried very hard now for over 2 hours to get it to lock up again.  I can confirm this solved my problem.

I attached the patches just to confirm which ones I tried.

Thanks again, Neil.
Comment 5 Neil Brown 2015-09-24 05:51:08 UTC
Yes, those are the correct patches.
Thanks for the confirmation that they fix the problem.