Bug 10484
Summary: | Boot Oops+hang in 2.6.25-rc and 2.6.25-final kernels | ||
---|---|---|---|
Product: | IO/Storage | Reporter: | Nicolas Mailhot (Nicolas.Mailhot) |
Component: | MD | Assignee: | io_md |
Status: | CLOSED CODE_FIX | ||
Severity: | normal | CC: | devzero |
Priority: | P1 | ||
Hardware: | All | ||
OS: | Linux | ||
Kernel Version: | 2.6.25 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Bug Depends on: | |||
Bug Blocks: | 9832 | ||
Attachments: |
oops screen capture
dmesg on the same kernel after a non-oopsing boot system lspci source code and disassembly of failing function Another screen capture, this time on a vanilla kernel patch to fix oops. |
Description
Nicolas Mailhot
2008-04-19 06:51:26 UTC
Created attachment 15814 [details]
oops screen capture
Created attachment 15815 [details]
dmesg on the same kernel after a non-oopsing boot
Created attachment 15816 [details]
system lspci
so - if this happens with fedora kernel, which is a distro specific kernel which may contain several patches - does the same happen with vanilla kernel? can you try, if 2.6.25 vanilla makes a difference? I probably won't have access to this particular system before the end of the week, sorry. Already spent an awful lot of time just to get a good Oops capture thank you for your time. nevertheless, additional input would be very appreciated. not sure if you can get a "vanilla kernel rpm" for fedora, so you could save compile effort/time - for suse there is such. does this happen with 32 or 64 bit ? 64bit kernel Created attachment 15841 [details]
source code and disassembly of failing function
mddev is used once after being stored here:
2087 mddev_t *mddev = rdev->mddev;
Later on rdev->mddev is used but it is no longer equal to mddev -- something has changed it. We then try to unlock using a bad address.
Created attachment 15902 [details] Another screen capture, this time on a vanilla kernel It seems a vanilla kernel such as http://koji.fedoraproject.org/koji/taskinfo?taskID=581601 fails the same way The oops is on line 2099 in drivers/md/md.c: 2099 mddev_unlock(rdev->mddev); rdev is NULL but it was a valid address upon entry to the function. so this oops is in md/raid code ? nicolas, are you using software raid or lvm volumes ? I'm using lvm over md # cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md0 : active raid1 sda1[0] sdb1[1] 2096384 blocks [2/2] [UU] md1 : active raid1 sda3[0] sdb3[1] 288856640 blocks [2/2] [UU] unused devices: <none> # /sbin/pvdisplay --- Physical volume --- PV Name /dev/md1 VG Name VolGroup00 PV Size 275,48 GB / not usable 38,56 MB Allocatable yes PE Size (KByte) 65536 Total PE 4407 Free PE 2211 Allocated PE 2196 PV UUID 5vhc8L-w0Jt-cTIo-Hswk-NtbK-eWuP-d3q6J1 Created attachment 15938 [details]
patch to fix oops.
This patch will probably fix the problem.
I'll submit it for the -stable series.
if you like you may try the patch and report result here.... if not, you may wait that it appears upstream and you can try the vanilla kernel from fedora project then. Verified applied |