Most recent kernel where this bug did not occur: 2.6.15 Distribution: Fedora Core 4 and 5 (also appears on gentoo with 2.6.16) Hardware Environment: i686, Adaptec 2100S SCSI RAID card Software Environment: Fedora Core 4 with 2.6.16 errata kernel, or FC5 default Problem Description: The 2.6.16 kernel appears to be unstable when used with a RAID array supported by the i2o driver such as the Adaptec 2100S. Steps to reproduce: 1) Install an OS using the 2.6.15 kernel or below. 2) Upgrade to 2.6.16 and watch the kernel oops OR 1) Attempt to install an OS using the 2.6.15 kernel and watch the kernel oops I have thoroughly bugzilla'd this at below but after further investigation (and help from others) I believe it to be an upstream (i.e. here) bug so I've raised it here also. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189570 There is considerable debug and traceback info available at bugzilla.redhat.com.
OK, I see from the RH report that Markus is working on getting the appropriate hardware. But 2.6.15 used to work. Unfortunately we put a lot of changes into that driver between 2.6.15 and 2.6.16. I was unable to locate an oops trace in that RH report. Maybe I missed it. Do we have one?
Thanks for the feedback. You're quite right that 2.6.15 used to (and indeed still does) work, the problem definately surfaced in 2.6.16. Unfortunately I don't think I have a full kernel oops, I think the only way I can get one is by using serial port logging and I don't have another machine that has a serial port. I think I've taken a few pictures of the screen at the critical point, but I won't even be able to get those uploaded until the weekend. If there's another way to log that kind of information I'm all ears.
Digital photos work well. You can email it to me if you like and I'll attach it to the bugzilla report.
That's fine. I'm working off-site at the moment, but as soon as I get back at the weekend I'll upload the images I took last time.
Kernel oops, and some very useful testing now present on: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189570 Hope that helps.
Created attachment 8271 [details] Bugfix for 2.6.16 Changes: - Fixes memory corrupt caused from access memory after free - Fixed locking of struct i2o_exec_wait in Executive-OSM - Removed LCT Notify in i2o_exec_probe() which caused freeing memory during first enumeration - Added missing locking in i2o_exec_lct_notify() - removed put_device() of I2O controller in i2o_iop_remove() which caused the controller structure get freed to early - Fixed size of mempool in i2o_iop_alloc() - Fixed access to memory after free in i2o_msg_get()
+ list_add(&wait->list, &i2o_exec_wait_list); I'm a newbie, so please forgive me if this is obvious. Shouldn't that be under a spin_lock_irqsave()? Say if you add something to the list and something else is deleting something from the list at the same time couldn't that trigger the BUG() in list_del().
Hello, which BUG() do you mean? Hmmm, at the moment i don't see a problem, but probably i have overseen something. Best regards, Markus Lidel
Hardware: Adaptec SCSI RAID 2015S Kenel: 2.6.16 without new patches from Markus Lidel Result: fully filesystem crash
[root@luggage ~]# uname -a Linux luggage.darkglobe.int 2.6.16.20-withi2opatchi2opatch #1 Wed Jun 7 22:56:06 BST 2006 i686 athlon i386 GNU/Linux [root@luggage ~]# Works for me! I'm stress testing it now, not expecting any problems but fingers crossed.
Created attachment 8280 [details] Using bonnie++ on a system running the newly patched kernel [dave@luggage ~]$ uname -a Linux luggage.darkglobe.int 2.6.16.20-withi2opatchi2opatch #1 Wed Jun 7 22:56:06 BST 2006 i686 athlon i386 GNU/Linux [dave@luggage ~]$ /usr/sbin/bonnie++ -s 4096 -r 1024 -n 5 -x 10 | tee bonnierun.log Creates the following log... P.S. The short answer is that everything appears to be stable. Please submit into the mainstream kernel asap, many thanks to all involved!
Another downstream bug at http://bugs.gentoo.org/show_bug.cgi?id=136088 (nothing interesting to add at this time)
The patch from this bug was included into Linus' tree (and will therefore be in 2.6.17).