Bug 31212 - aacraid is generally unstable with newer kernels
Summary: aacraid is generally unstable with newer kernels
Status: RESOLVED UNREPRODUCIBLE
Alias: None
Product: SCSI Drivers
Classification: Unclassified
Component: AACRAID (show other bugs)
Hardware: All Linux
: P1 normal
Assignee: scsi_drivers-aacraid
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-03-16 17:43 UTC by Ryan Underwood
Modified: 2013-04-28 08:45 UTC (History)
2 users (show)

See Also:
Kernel Version: 2.6.32, 2.6.37.3
Subsystem:
Regression: No
Bisected commit-id:


Attachments
dmesg (44.22 KB, text/plain)
2011-03-16 17:43 UTC, Ryan Underwood
Details
lspci -vvv (7.66 KB, text/plain)
2011-03-16 17:44 UTC, Ryan Underwood
Details
Server a22 kernel panic (478.86 KB, image/jpeg)
2012-12-25 09:31 UTC, Dmitry Korzhevin
Details
Server a3 kernel panic (156.90 KB, image/jpeg)
2012-12-25 09:32 UTC, Dmitry Korzhevin
Details
Server a20 kernel panic (111.89 KB, image/png)
2012-12-25 09:33 UTC, Dmitry Korzhevin
Details

Description Ryan Underwood 2011-03-16 17:43:54 UTC
Created attachment 50962 [details]
dmesg

I have a HP DL380 with an Adaptec 2120S that, many moons ago, never exhibited any problems.  Since lenny and squeeze I have had frequent "Host adapter abort request" messages followed by a RAID firmware panic under heavy disk I/O.  In 2.6.32 it seemed impossible to recover from these things, it would offline the RAID and disable the system.  In 2.6.37.3 it seems better at recovering after a delay, but I am afraid something is wrong at a low level that may corrupt data:

[572820.064043] aacraid: Host adapter abort request (2,0,0,0)
[572820.149996] aacraid: Host adapter abort request (2,0,0,0)
[572820.235786] aacraid: Host adapter abort request (2,0,0,0)
[572820.321554] aacraid: Host adapter abort request (2,0,0,0)
[572820.407418] aacraid: Host adapter reset request. SCSI hang ?
[572820.496444] AAC: Host adapter BLINK LED 0x3
[572820.567966] AAC0: adapter kernel panic'd 3.
[595047.992045] aacraid: Host adapter abort request (2,0,0,0)
[595048.078463] aacraid: Host adapter abort request (2,0,0,0)
[595048.164959] aacraid: Host adapter abort request (2,0,0,0)
[595048.250844] aacraid: Host adapter abort request (2,0,0,0)
[595048.336629] aacraid: Host adapter abort request (2,0,0,0)
[595048.336637] aacraid: Host adapter abort request (2,0,0,0)
[595048.336644] aacraid: Host adapter abort request (2,0,0,0)
[595048.336650] aacraid: Host adapter abort request (2,0,0,0)
[595048.336657] aacraid: Host adapter abort request (2,0,0,0)
[595048.336663] aacraid: Host adapter abort request (2,0,0,0)
[595048.336773] aacraid: Host adapter reset request. SCSI hang ?
[595048.336782] AAC: Host adapter BLINK LED 0x3
[595048.336916] AAC0: adapter kernel panic'd 3.
[596524.064044] aacraid: Host adapter abort request (2,0,0,0)
[596524.146368] aacraid: Host adapter abort request (2,0,0,0)
[596524.228197] aacraid: Host adapter abort request (2,0,0,0)
[596524.309288] aacraid: Host adapter abort request (2,0,0,0)
[596524.389967] aacraid: Host adapter reset request. SCSI hang ?
[596524.473168] AAC: Host adapter BLINK LED 0x3
[596524.538332] AAC0: adapter kernel panic'd 3.
Comment 1 Ryan Underwood 2011-03-16 17:44:18 UTC
Created attachment 50972 [details]
lspci -vvv
Comment 2 Ryan Underwood 2011-03-16 17:45:17 UTC
Also, I have tried a controller firmware 4.2-0[8205] as well as 8208, and on two different controllers, both exhibit the abort/panic problem.
Comment 4 Ryan Underwood 2011-03-17 17:02:29 UTC
Today 2.6.27.3 hung, these were its last words on the serial console:

[661903.992040] aacraid: Host adapter abort request (2,0,0,0)
[661904.064211] aacraid: Host adapter abort request (2,0,0,0)
[661904.136081] aacraid: Host adapter reset request. SCSI hang ?
Comment 5 Ryan Underwood 2011-03-17 17:02:44 UTC
Today 2.6.37.3 hung, these were its last words on the serial console:

[661903.992040] aacraid: Host adapter abort request (2,0,0,0)
[661904.064211] aacraid: Host adapter abort request (2,0,0,0)
[661904.136081] aacraid: Host adapter reset request. SCSI hang ?
Comment 6 Ryan Underwood 2011-04-06 23:56:32 UTC
Changed to an older card with firmware 8205, I haven't seen a hang in almost 3 weeks now.  Perhaps a weird firmware problem?  The changelogs don't indicate that anything significant changed in the newer revs.
Comment 7 Ryan Underwood 2012-05-29 22:59:47 UTC
Just to provide more info, the older card did not solve the problem and there are still sporadic hangs.  I am trying to gather more information but it is difficult to get to.
Comment 8 Ryan Underwood 2012-11-04 16:05:49 UTC
I have found that there are at least two different versions of the ASR-2120S.  There is an older version which shipped with v5xxx and v6xxx firmware which has a RAID processor which looks like a PPGA Intel Celeron.  There is a newer version which is labeled RoHS on the product label, comes with v8208 firmware (latest), and has a "shiny" BGA looking RAID processor.  This latter version has been working since I last reported without a single crash or other problem.

I think there may be hardware defects arising in the older versions perhaps due to age or due to problems solved in later versions.  Since when the older vesrions of the card crashes Linux it remains upset in other ways (will not POST, will not bring drives online, etc), I am reluctant to blame this on Linux and am therefore closing this bug.  Anyone having problems with ASR-2120S should seek the later RoHS hardware revision which can visually be identified by the RoHS indication on the product sticker and the shiny surfaced RAID processor.
Comment 9 Dmitry Korzhevin 2012-12-25 09:31:45 UTC
Created attachment 89571 [details]
Server a22 kernel panic

Server a22 kernel panic
Comment 10 Dmitry Korzhevin 2012-12-25 09:32:41 UTC
Created attachment 89581 [details]
Server a3 kernel panic

Server a3 kernel panic
Comment 11 Dmitry Korzhevin 2012-12-25 09:33:43 UTC
Created attachment 89591 [details]
Server a20 kernel panic

Server a3 kernel panic
Comment 12 Dmitry Korzhevin 2012-12-25 09:44:47 UTC
Hi, our company have same problems with aacraid.

We using Debian 6.0.6 x86_64 wih proxmox-ve kernel:

Linux 2.6.32-17-pve #1 SMP Wed Nov 28 07:15:55 CET 2012 x86_64 GNU/Linux

modinfo aacraid:

filename:       /lib/modules/2.6.32-17-pve/kernel/drivers/scsi/aacraid/aacraid.ko
version:        1.1-7[28000]-ms
license:        GPL
description:    Dell PERC2, 2/Si, 3/Si, 3/Di, Adaptec Advanced Raid Products, HP NetRAID-4M, IBM ServeRAID & ICP SCSI driver
author:         Red Hat Inc and Adaptec
srcversion:     DAE1E62971BE24D8A2CE61F
vermagic:       2.6.32-17-pve SMP mod_unload modversions

Also, i will attach lspci info.
Comment 13 Sid 2013-04-28 08:45:57 UTC
Hi everyone, 


I have Adaptec 6405E HW Card which make me crazy with these errors : 

[661903.992040] aacraid: Host adapter abort request (2,0,0,0)
[661904.064211] aacraid: Host adapter abort request (2,0,0,0)
[661904.136081] aacraid: Host adapter reset request. SCSI hang ?


This issue is when you use a new kernel. I have found the issue. 

Go to http://www.adaptec.com/fr-fr/ and upgrade the firmware of your hw card. 

I have 6405E and with all OS I have this errors (ebord request, SCSI hang) 
After the upgrade (yesterday) 18668 to 19076 and 19109 it works very well now. 

So, upgrade the firmware of your card and it could fix this error. 


I hope it can helps you.

Note You need to log in before you can comment on or make changes to this bug.