Most recent kernel where this bug did not occur: 2.6.15
Distribution: Fedora Core 4 and 5 (also appears on gentoo with 2.6.16)
Hardware Environment: i686, Adaptec 2100S SCSI RAID card
Software Environment: Fedora Core 4 with 2.6.16 errata kernel, or FC5 default
The 2.6.16 kernel appears to be unstable when used with a RAID array supported
by the i2o driver such as the Adaptec 2100S.
Steps to reproduce:
1) Install an OS using the 2.6.15 kernel or below.
2) Upgrade to 2.6.16 and watch the kernel oops
1) Attempt to install an OS using the 2.6.15 kernel and watch the kernel oops
I have thoroughly bugzilla'd this at below but after further investigation (and
help from others) I believe it to be an upstream (i.e. here) bug so I've raised
it here also.
There is considerable debug and traceback info available at bugzilla.redhat.com.
OK, I see from the RH report that Markus is working on getting the appropriate
But 2.6.15 used to work. Unfortunately we put a lot of changes into
that driver between 2.6.15 and 2.6.16.
I was unable to locate an oops trace in that RH report. Maybe I missed
it. Do we have one?
Thanks for the feedback.
You're quite right that 2.6.15 used to (and indeed still does) work, the problem
definately surfaced in 2.6.16.
Unfortunately I don't think I have a full kernel oops, I think the only way I
can get one is by using serial port logging and I don't have another machine
that has a serial port.
I think I've taken a few pictures of the screen at the critical point, but I
won't even be able to get those uploaded until the weekend.
If there's another way to log that kind of information I'm all ears.
Digital photos work well. You can email it to me if you like
and I'll attach it to the bugzilla report.
That's fine. I'm working off-site at the moment, but as soon as I get back at
the weekend I'll upload the images I took last time.
Kernel oops, and some very useful testing now present on:
Hope that helps.
Created attachment 8271 [details]
Bugfix for 2.6.16
- Fixes memory corrupt caused from access memory after free
- Fixed locking of struct i2o_exec_wait in Executive-OSM
- Removed LCT Notify in i2o_exec_probe() which caused freeing memory during
- Added missing locking in i2o_exec_lct_notify()
- removed put_device() of I2O controller in i2o_iop_remove() which caused the
controller structure get freed to early
- Fixed size of mempool in i2o_iop_alloc()
- Fixed access to memory after free in i2o_msg_get()
+ list_add(&wait->list, &i2o_exec_wait_list);
I'm a newbie, so please forgive me if this is obvious. Shouldn't that be under
a spin_lock_irqsave()? Say if you add something to the list and something else
is deleting something from the list at the same time couldn't that trigger the
BUG() in list_del().
which BUG() do you mean?
Hmmm, at the moment i don't see a problem, but probably i have overseen something.
Hardware: Adaptec SCSI RAID 2015S
Kenel: 2.6.16 without new patches from Markus Lidel
Result: fully filesystem crash
[root@luggage ~]# uname -a
Linux luggage.darkglobe.int 220.127.116.11-withi2opatchi2opatch #1 Wed Jun 7 22:56:06
BST 2006 i686 athlon i386 GNU/Linux
Works for me!
I'm stress testing it now, not expecting any problems but fingers crossed.
Created attachment 8280 [details]
Using bonnie++ on a system running the newly patched kernel
[dave@luggage ~]$ uname -a
Linux luggage.darkglobe.int 18.104.22.168-withi2opatchi2opatch #1 Wed Jun 7
22:56:06 BST 2006 i686 athlon i386 GNU/Linux
[dave@luggage ~]$ /usr/sbin/bonnie++ -s 4096 -r 1024 -n 5 -x 10 | tee
Creates the following log...
P.S. The short answer is that everything appears to be stable.
Please submit into the mainstream kernel asap, many thanks to all involved!
Another downstream bug at http://bugs.gentoo.org/show_bug.cgi?id=136088
(nothing interesting to add at this time)
The patch from this bug was included into Linus' tree (and will therefore be in