Latest working kernel version: 2.6.24-rc5.mm1 has been confirmed to work fine
Earliest failing kernel version: N/A unfortunately. I don't reboot often enough to notice it
Distribution: Fedora Devel
Hardware Environment: CK804 + AMD X2
Software Environment: early udev boot
See attached picture. As the kernel scrolls very fast at this point it took me weeks to get a correct screen capture
Steps to reproduce:
Boot. Will almost always result in hang. shift+page-up repeatedly at boot time reduces hang probability
See also https://bugzilla.redhat.com/show_bug.cgi?id=441765
Created attachment 15814 [details]
oops screen capture
Created attachment 15815 [details]
dmesg on the same kernel after a non-oopsing boot
Created attachment 15816 [details]
so - if this happens with fedora kernel, which is a distro specific kernel which may contain several patches - does the same happen with vanilla kernel?
can you try, if 2.6.25 vanilla makes a difference?
I probably won't have access to this particular system before the end of the week, sorry. Already spent an awful lot of time just to get a good Oops capture
thank you for your time.
nevertheless, additional input would be very appreciated.
not sure if you can get a "vanilla kernel rpm" for fedora, so you could save compile effort/time - for suse there is such.
does this happen with 32 or 64 bit ?
Created attachment 15841 [details]
source code and disassembly of failing function
mddev is used once after being stored here:
2087 mddev_t *mddev = rdev->mddev;
Later on rdev->mddev is used but it is no longer equal to mddev -- something has changed it. We then try to unlock using a bad address.
Created attachment 15902 [details]
Another screen capture, this time on a vanilla kernel
It seems a vanilla kernel such as
fails the same way
The oops is on line 2099 in drivers/md/md.c:
rdev is NULL but it was a valid address upon entry to the function.
so this oops is in md/raid code ?
nicolas, are you using software raid or lvm volumes ?
I'm using lvm over md
# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid1 sda1 sdb1
2096384 blocks [2/2] [UU]
md1 : active raid1 sda3 sdb3
288856640 blocks [2/2] [UU]
unused devices: <none>
--- Physical volume ---
PV Name /dev/md1
VG Name VolGroup00
PV Size 275,48 GB / not usable 38,56 MB
PE Size (KByte) 65536
Total PE 4407
Free PE 2211
Allocated PE 2196
PV UUID 5vhc8L-w0Jt-cTIo-Hswk-NtbK-eWuP-d3q6J1
Created attachment 15938 [details]
patch to fix oops.
This patch will probably fix the problem.
I'll submit it for the -stable series.
if you like you may try the patch and report result here....
if not, you may wait that it appears upstream and you can try the vanilla kernel from fedora project then.