Latest working kernel version: 2.6.24-rc5.mm1 has been confirmed to work fine Earliest failing kernel version: N/A unfortunately. I don't reboot often enough to notice it Distribution: Fedora Devel Hardware Environment: CK804 + AMD X2 Software Environment: early udev boot Problem Description: See attached picture. As the kernel scrolls very fast at this point it took me weeks to get a correct screen capture Steps to reproduce: Boot. Will almost always result in hang. shift+page-up repeatedly at boot time reduces hang probability See also https://bugzilla.redhat.com/show_bug.cgi?id=441765
Created attachment 15814 [details] oops screen capture
Created attachment 15815 [details] dmesg on the same kernel after a non-oopsing boot
Created attachment 15816 [details] system lspci
so - if this happens with fedora kernel, which is a distro specific kernel which may contain several patches - does the same happen with vanilla kernel? can you try, if 2.6.25 vanilla makes a difference?
I probably won't have access to this particular system before the end of the week, sorry. Already spent an awful lot of time just to get a good Oops capture
thank you for your time. nevertheless, additional input would be very appreciated. not sure if you can get a "vanilla kernel rpm" for fedora, so you could save compile effort/time - for suse there is such. does this happen with 32 or 64 bit ?
64bit kernel
Created attachment 15841 [details] source code and disassembly of failing function mddev is used once after being stored here: 2087 mddev_t *mddev = rdev->mddev; Later on rdev->mddev is used but it is no longer equal to mddev -- something has changed it. We then try to unlock using a bad address.
Created attachment 15902 [details] Another screen capture, this time on a vanilla kernel It seems a vanilla kernel such as http://koji.fedoraproject.org/koji/taskinfo?taskID=581601 fails the same way
The oops is on line 2099 in drivers/md/md.c: 2099 mddev_unlock(rdev->mddev); rdev is NULL but it was a valid address upon entry to the function.
so this oops is in md/raid code ? nicolas, are you using software raid or lvm volumes ?
I'm using lvm over md # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid1 sda1[0] sdb1[1] 2096384 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 288856640 blocks [2/2] [UU] unused devices: <none> # /sbin/pvdisplay --- Physical volume --- PV Name /dev/md1 VG Name VolGroup00 PV Size 275,48 GB / not usable 38,56 MB Allocatable yes PE Size (KByte) 65536 Total PE 4407 Free PE 2211 Allocated PE 2196 PV UUID 5vhc8L-w0Jt-cTIo-Hswk-NtbK-eWuP-d3q6J1
Created attachment 15938 [details] patch to fix oops. This patch will probably fix the problem. I'll submit it for the -stable series.
if you like you may try the patch and report result here.... if not, you may wait that it appears upstream and you can try the vanilla kernel from fedora project then.
Verified applied