Bug 6561
Summary: | 2.6.16 kernels unstable with Adaptec 2100 SCSI RAID | ||
---|---|---|---|
Product: | Drivers | Reporter: | Dave R (meherenow) |
Component: | I2O | Assignee: | Alan (alan) |
Status: | CLOSED CODE_FIX | ||
Severity: | high | CC: | akpm, bunk, kernel, Markus.Lidel |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.16 | Subsystem: | |
Regression: | --- | Bisected commit-id: | |
Attachments: |
Bugfix for 2.6.16
Using bonnie++ on a system running the newly patched kernel |
Description
Dave R
2006-05-15 14:32:44 UTC
OK, I see from the RH report that Markus is working on getting the appropriate hardware. But 2.6.15 used to work. Unfortunately we put a lot of changes into that driver between 2.6.15 and 2.6.16. I was unable to locate an oops trace in that RH report. Maybe I missed it. Do we have one? Thanks for the feedback. You're quite right that 2.6.15 used to (and indeed still does) work, the problem definately surfaced in 2.6.16. Unfortunately I don't think I have a full kernel oops, I think the only way I can get one is by using serial port logging and I don't have another machine that has a serial port. I think I've taken a few pictures of the screen at the critical point, but I won't even be able to get those uploaded until the weekend. If there's another way to log that kind of information I'm all ears. Digital photos work well. You can email it to me if you like and I'll attach it to the bugzilla report. That's fine. I'm working off-site at the moment, but as soon as I get back at the weekend I'll upload the images I took last time. Kernel oops, and some very useful testing now present on: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=189570 Hope that helps. Created attachment 8271 [details]
Bugfix for 2.6.16
Changes:
- Fixes memory corrupt caused from access memory after free
- Fixed locking of struct i2o_exec_wait in Executive-OSM
- Removed LCT Notify in i2o_exec_probe() which caused freeing memory during
first enumeration
- Added missing locking in i2o_exec_lct_notify()
- removed put_device() of I2O controller in i2o_iop_remove() which caused the
controller structure get freed to early
- Fixed size of mempool in i2o_iop_alloc()
- Fixed access to memory after free in i2o_msg_get()
+ list_add(&wait->list, &i2o_exec_wait_list); I'm a newbie, so please forgive me if this is obvious. Shouldn't that be under a spin_lock_irqsave()? Say if you add something to the list and something else is deleting something from the list at the same time couldn't that trigger the BUG() in list_del(). Hello, which BUG() do you mean? Hmmm, at the moment i don't see a problem, but probably i have overseen something. Best regards, Markus Lidel Hardware: Adaptec SCSI RAID 2015S Kenel: 2.6.16 without new patches from Markus Lidel Result: fully filesystem crash [root@luggage ~]# uname -a Linux luggage.darkglobe.int 2.6.16.20-withi2opatchi2opatch #1 Wed Jun 7 22:56:06 BST 2006 i686 athlon i386 GNU/Linux [root@luggage ~]# Works for me! I'm stress testing it now, not expecting any problems but fingers crossed. Created attachment 8280 [details]
Using bonnie++ on a system running the newly patched kernel
[dave@luggage ~]$ uname -a
Linux luggage.darkglobe.int 2.6.16.20-withi2opatchi2opatch #1 Wed Jun 7
22:56:06 BST 2006 i686 athlon i386 GNU/Linux
[dave@luggage ~]$ /usr/sbin/bonnie++ -s 4096 -r 1024 -n 5 -x 10 | tee
bonnierun.log
Creates the following log...
P.S. The short answer is that everything appears to be stable.
Please submit into the mainstream kernel asap, many thanks to all involved!
Another downstream bug at http://bugs.gentoo.org/show_bug.cgi?id=136088 (nothing interesting to add at this time) The patch from this bug was included into Linus' tree (and will therefore be in 2.6.17). |