|Summary:||Boot Oops+hang in 2.6.25-rc and 2.6.25-final kernels|
|Product:||IO/Storage||Reporter:||Nicolas Mailhot (Nicolas.Mailhot)|
|Bug Depends on:|
oops screen capture
dmesg on the same kernel after a non-oopsing boot
source code and disassembly of failing function
Another screen capture, this time on a vanilla kernel
patch to fix oops.
Description Nicolas Mailhot 2008-04-19 06:51:26 UTC
Latest working kernel version: 2.6.24-rc5.mm1 has been confirmed to work fine Earliest failing kernel version: N/A unfortunately. I don't reboot often enough to notice it Distribution: Fedora Devel Hardware Environment: CK804 + AMD X2 Software Environment: early udev boot Problem Description: See attached picture. As the kernel scrolls very fast at this point it took me weeks to get a correct screen capture Steps to reproduce: Boot. Will almost always result in hang. shift+page-up repeatedly at boot time reduces hang probability See also https://bugzilla.redhat.com/show_bug.cgi?id=441765
Comment 1 Nicolas Mailhot 2008-04-19 06:52:37 UTC
Created attachment 15814 [details] oops screen capture
Comment 2 Nicolas Mailhot 2008-04-19 06:53:42 UTC
Created attachment 15815 [details] dmesg on the same kernel after a non-oopsing boot
Comment 4 Roland Kletzing 2008-04-20 15:50:17 UTC
so - if this happens with fedora kernel, which is a distro specific kernel which may contain several patches - does the same happen with vanilla kernel? can you try, if 2.6.25 vanilla makes a difference?
Comment 5 Nicolas Mailhot 2008-04-21 02:29:15 UTC
I probably won't have access to this particular system before the end of the week, sorry. Already spent an awful lot of time just to get a good Oops capture
Comment 6 Roland Kletzing 2008-04-21 16:12:55 UTC
thank you for your time. nevertheless, additional input would be very appreciated. not sure if you can get a "vanilla kernel rpm" for fedora, so you could save compile effort/time - for suse there is such. does this happen with 32 or 64 bit ?
Comment 7 Nicolas Mailhot 2008-04-21 23:52:56 UTC
Comment 8 Chuck Ebbert 2008-04-22 08:18:45 UTC
Created attachment 15841 [details] source code and disassembly of failing function mddev is used once after being stored here: 2087 mddev_t *mddev = rdev->mddev; Later on rdev->mddev is used but it is no longer equal to mddev -- something has changed it. We then try to unlock using a bad address.
Comment 9 Nicolas Mailhot 2008-04-24 14:45:35 UTC
Created attachment 15902 [details] Another screen capture, this time on a vanilla kernel It seems a vanilla kernel such as http://koji.fedoraproject.org/koji/taskinfo?taskID=581601 fails the same way
Comment 10 Chuck Ebbert 2008-04-26 17:50:29 UTC
The oops is on line 2099 in drivers/md/md.c: 2099 mddev_unlock(rdev->mddev); rdev is NULL but it was a valid address upon entry to the function.
Comment 11 Roland Kletzing 2008-04-27 14:08:25 UTC
so this oops is in md/raid code ? nicolas, are you using software raid or lvm volumes ?
Comment 12 Nicolas Mailhot 2008-04-27 14:23:44 UTC
I'm using lvm over md # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid1 sda1 sdb1 2096384 blocks [2/2] [UU] md1 : active raid1 sda3 sdb3 288856640 blocks [2/2] [UU] unused devices: <none> # /sbin/pvdisplay --- Physical volume --- PV Name /dev/md1 VG Name VolGroup00 PV Size 275,48 GB / not usable 38,56 MB Allocatable yes PE Size (KByte) 65536 Total PE 4407 Free PE 2211 Allocated PE 2196 PV UUID 5vhc8L-w0Jt-cTIo-Hswk-NtbK-eWuP-d3q6J1
Comment 13 Neil Brown 2008-04-27 21:59:54 UTC
Created attachment 15938 [details] patch to fix oops. This patch will probably fix the problem. I'll submit it for the -stable series.
Comment 14 Roland Kletzing 2008-04-28 13:35:42 UTC
if you like you may try the patch and report result here.... if not, you may wait that it appears upstream and you can try the vanilla kernel from fedora project then.
Comment 15 Alan 2008-09-23 04:18:11 UTC