Bug 7845
Summary: | access to large RAID arrays on Adaptec 2400A RAID controller with dpt_i2o module causes system hang | ||
---|---|---|---|
Product: | SCSI Drivers | Reporter: | Robert B (robert.boeck) |
Component: | Other | Assignee: | scsi_drivers-other |
Status: | REJECTED UNREPRODUCIBLE | ||
Severity: | high | CC: | aacraid, bischof, bunk, jejb, oscar, protasnb, stepheng+kernel |
Priority: | P2 | ||
Hardware: | i386 | ||
OS: | Linux | ||
Kernel Version: | 2.6.8, 2.6.18 | Subsystem: | |
Regression: | --- | Bisected commit-id: |
Description
Robert B
2007-01-17 12:21:35 UTC
Robert, If you still have this problem - it might be helpful to get the trace with alt-sysrq-t at the time of hang and attach to the report. Have you tried recent kernel lately? Thanks. Natalie, I solved the problem by using a 3Ware RAID controller for the 500 GB array. Since the machine is productive now, I cannot do any more tests, sorry. Thanks for the update. I guess this bug can be closed for now until someone runs into same problem... I can confirm the problems described by Robert with a system running up-to-date Debian Etch stable on a Adaptec 2400A with 2x 320Gig Seagate HDDs as RAID-1. It's easy to force the Server to hang when copying files 500MB and up to the disk. Sometimes it even hangs with much smaller files. The problem seems to have some strange relationship to some apps. Copying files from USB-HDD to the RAID seems to work fine. Using TAR will crash the system most of the time. Samba seems to be a constant source of crashes too. System backup and restore with Acronis TrueImage 9.1 (which is using Linux under the hood) always worked flawless. Stefen, Can you provide information on such crashes? Whether those are oopses, or if your system hangs maybe you can get the alt-sysrq-t trace and attach it here. If this is a hard hang and you can reproduce it at will then you can start top or "vmstat 3" on some VT and have it running while escalation your load and getting system to hang. Please make sure the drives being used are RAID compatible. Desktop class drives that perform their own error recovery and bad block remapping will clash with any RAID controller's own recovery actions. Western Digital JD drives I believe (I may be mistaken) are such an example. Mark, thanks for bringing this topic to the table. I am aware of this fact. There is a special RAID Edition of WD Drives which differs from the consumer version in exactly this single point, the error handling. I am almost 100% sure that at the time the 2400A was released to the market there were no special Enterprise grade IDE HDDs and no special RAID versions. Is there a drive compatibility list available for the controller? I couldn't find one at Adaptec.com. The 2.4 kernels have a default 30 to 90 seconds timeout for commands depending on release (Distributions, which Adaptec tested with, err on the high side), the 2.6 kernels have a default 30 second timeout. This may also play as a factor so you may wish to extendthe timeout. The dpt_i2o driver, however, circumvents the timeout and introduces it's own extended timeout of 300 seconds (trust the controller). At least that is the case for the 64 bit capable dpt_i2o driver I hold upstream (available upon request), but some variants in kernel.org did not override the timeout. Not sure if extending the timeout will help given the problems shutting down, that problem points to issues with the hardware (?) Drives made in the past were all RAID edition at the time the 2400A was released to the market until they started introducing error recovery and bad block remapping features into later consumer version drives. Just because they are consumer version drives does not make them incompatible, the issue arises when the drive's error recovery sets up an interference pattern with the RAID card's recovery, or if commands are blocked from completing, due to the their error recovery, from the drive within a reasonable period of time (ten seconds I believe, when the Adapter starts itroducing it's own error recovery actions). I contacted Adaptec Technical Support about any compatibility issues and got the following response: 'We have had reports drive recognition issues with WD drives generally at boot and recommend the drives are jumpered for the factory default "cable select" setting. Information concerning Western Digital drive specifications for Enterprise and Desktop class drives can be found on their knowledge base at: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_alp.php article numbers 1277 and 1397. Western Digital reports there are no firmware updates for any EIDE hard disk manufactured after 3/25/03 in article number 1348. We do not maintain a list of tested drives, however we will provide any information concerning known compatibility issues/reports of problems on our ASK knowledge base on our website. We only have the article concerning the cable select jumper setting at this time.' |