The kernel oopsed on me after I added a bitmap to a RAID1 md device with: mdadm --grow /dev/md2 --bitmap=internal This happened to me during normal operation -- twice, because I hoped the first time it was just bad luck. Then I booted with init=/bin/bash, i.e. nothing but the bare kernel, built a RAID1 on two spare partitions, added a bitmap as described above and it oopsed as soon as I tried to write something to the device. This was with a Debian distribution kernel, linux-image-3.2.0-1-amd64 versions 3.2.4-1 and 3.2.6-1; see Debian bug 661558: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=661558 I've now reproduced it with a vanilla 3.2.6 kernel, I'm attaching the kernel messages captured with netconsole. The commands used are as follows: mdadm --zero-superblock /dev/disk/by-id/ata-XXX-part1 mdadm --zero-superblock /dev/disk/by-id/ata-YYY-part3 mdadm --create /dev/md3 --metadata=0.90 --assume-clean -l1 -n2 \ /dev/disk/by-id/ata-XXX-part1 \ /dev/disk/by-id/ata-YYY-part3 mdadm --grow /dev/md3 --bitmap=internal dd if=/dev/zero of=/dev/md3 bs=1M count=1 mdadm is Debian version 3.2.3-2.
Created attachment 72641 [details] kernel messages
Created attachment 72642 [details] kernel config
Thanks for the report. I believe this is fixed by: http://neil.brown.name/git?p=md;a=commitdiff;h=37b8fb4a7443ad1d83a977f4b1720b5617447fed which I have queued to send to Linus as soon as 3.3 is out. It will then be added to recent stable kernels. (maybe I should just submit it now .. but I thought 3.3 was imminent).
I applied the patch to a vanilla 3.2.6 and tried again, but got the same crash. I'm sorry I didn't capture the kernel messages as I was in a hurry, but can do that if you think it might be useful.
Yes please - and double check that you are really running the new kernel as I have a high degree of confidence that the patch fixes that problem.
Well, it appears that I did indeed boot the wrong kernel. After I applied your patch and rebuilt the kernel, "make deb-pkg" added a + to the version number (I suppose it's meant to signal that the sources aren't pristine) and to apt-get, a trailing + means "install the package with this name WITHOUT the +" so it installed the unpatched kernel again. :-/ I've now rebuilt and installed the right kernel and I can confirm that your patch fixes the crash I've seen. However, you might want to look at the trace I'm attaching, because at 157.8 seconds there's another kernel BUG in drivers/md/md.c, probably unrelated to the bitmap thing but still firmly in your territory. :) After creating the array, I ran "dd if=/dev/zero of=/dev/md3 bs=1M count=1" as usual and it survived. I ran dd again without count=1 to stress test the thing a bit more, oblivious to the fact that job control is disabled in a shell started with init=/bin/bash, so after a minute or so when I was satisfied that it was working fine I just hit Ctrl+Alt+Del and I got the kernel BUG you'll find in the trace.
Created attachment 72687 [details] kernel trace with a different BUG
Thanks. That second one is fixed by: http://neil.brown.name/git?p=md;a=commitdiff;h=c744a65c1e2d59acc54333ce80a5b0702a98010b already sent to Linus, but he doesn't seem to have pulled it yet.